Email to Data (with an LLM)
November 3, 2024
Problem: I needed to file my taxes.
I had many email receipts (around 330) in .eml format. I needed to convert them to .pdf format before sending them to my accountant as a link to a cloud folder.
Issues:
- My email provider (FastMail) only allows me to download the emails in .emlformat.
- The receipts follow a non-standard format (so getting a suitable regular expressions may be difficult).
My simple repo does the following:
- Converts all email receipts (.emlfiles) to PDFs (The PDFs are largely plain text).
- Extracts key financial data from those PDFs into a CSV file (which I can import neatly into Google Sheets)
 
Example Output
The PDF output looks like this (I manually redacted the actual data):
 
The CSV output looks like this:
File Name,Total Amount ($),Currency,Transaction Date,Descriptive Details
Receipt-2566-5568.pdf,47.42,USD,2050-06-01,"Render - Servers, PostgresDB, Redis usage for May 2050"
Receipt-2952-5288.pdf,9.52,EUR,2050-03-03,Twitter International ULC - Twitter Blue subscriptionAnd the imported Google Sheets looks like this:
 
Use Cases
- Generally converting .emlfiles to PDFs
- Tax preparation
- Expense tracking
- Accounting reconciliation
- Digital receipt organization
- Audit preparation
Full repo here