Email to Data (with an LLM)

Published: November 3, 2024

Problem: I needed to file my taxes.

I had many email receipts (around 330) in .eml format. I needed to convert them to .pdf format before sending them to my accountant as a link to a cloud folder.

Issues:

  • My email provider (FastMail) only allows me to download the emails in .eml format.
  • The receipts follow a non-standard format (so getting a suitable regular expressions may be difficult).

My simple repo does the following:

  1. Converts all email receipts (.eml files) to PDFs (The PDFs are largely plain text).
  2. Extracts key financial data from those PDFs into a CSV file (which I can import neatly into Google Sheets)
The process

Example Output

The PDF output looks like this (I manually redacted the actual data):

PDF Example

The CSV output looks like this:

File Name,Total Amount ($),Currency,Transaction Date,Descriptive Details
Receipt-2566-5568.pdf,47.42,USD,2050-06-01,"Render - Servers, PostgresDB, Redis usage for May 2050"
Receipt-2952-5288.pdf,9.52,EUR,2050-03-03,Twitter International ULC - Twitter Blue subscription

And the imported Google Sheets looks like this:

Google Sheets Example

Use Cases

  • Generally converting .eml files to PDFs
  • Tax preparation
  • Expense tracking
  • Accounting reconciliation
  • Digital receipt organization
  • Audit preparation

Full repo here

Subscribe to my free newsletter

Get updates on AI, software, and business.