Email to Data (with an LLM)
Published: November 3, 2024
Problem: I needed to file my taxes.
I had many email receipts (around 330) in .eml
format. I needed to convert them to .pdf
format before sending them to my accountant as a link to a cloud folder.
Issues:
- My email provider (FastMail) only allows me to download the emails in
.eml
format. - The receipts follow a non-standard format (so getting a suitable regular expressions may be difficult).
My simple repo does the following:
- Converts all email receipts (
.eml
files) to PDFs (The PDFs are largely plain text). - Extracts key financial data from those PDFs into a CSV file (which I can import neatly into Google Sheets)
data:image/s3,"s3://crabby-images/d7740/d7740ac14a67361d6c68186f73641cd513a8d44e" alt="The process"
Example Output
The PDF output looks like this (I manually redacted the actual data):
data:image/s3,"s3://crabby-images/1abcf/1abcf7ea88ed6563a4dd9914cf13d182d0870d26" alt="PDF Example"
The CSV output looks like this:
File Name,Total Amount ($),Currency,Transaction Date,Descriptive Details
Receipt-2566-5568.pdf,47.42,USD,2050-06-01,"Render - Servers, PostgresDB, Redis usage for May 2050"
Receipt-2952-5288.pdf,9.52,EUR,2050-03-03,Twitter International ULC - Twitter Blue subscription
And the imported Google Sheets looks like this:
data:image/s3,"s3://crabby-images/7d0a6/7d0a6b8c9a134915496123f598299b9489815c9c" alt="Google Sheets Example"
Use Cases
- Generally converting
.eml
files to PDFs - Tax preparation
- Expense tracking
- Accounting reconciliation
- Digital receipt organization
- Audit preparation
Full repo here