Skip to main content

What file formats does WeTransform accept?

CSV, Excel, XML, JSON, PDF, images — here's what WeTransform can take as input, and what happens with each format.

Written by Stéphane Jauffret
Updated over 2 weeks ago

WeTransform is built to handle the messy reality of file exchange — suppliers and clients rarely send files in the exact format you need. Here's what you can feed into WeTransform, and what happens with each type.

📄 Structured formats

These formats work out of the box. Upload your file, and WeTransform will read it directly and move on to the Match step.

  • 📊 CSV / TXT — comma, semicolon, tab-separated files

  • 📗 Excel (.xlsx, .xls) — standard spreadsheets with a clear header row

  • 🔷 XML

  • 🔶 JSON

For structured files, WeTransform will automatically detect your columns and suggest a line for your headers. You confirm, then move on to Match.

🧹 Poorly structured files — Excel with merged cells or multiple tables

Got an Excel file where the data doesn't start on row 1, columns are merged, or multiple tables share the same sheet? WeTransform detects this and offers you the Autoclean option at the upload step.

Click "Try AI Autoclean" to let the AI normalize your file before the transformation begins. You can also add specific instructions to guide the extraction — or leave it blank for a default normalization.

📑 PDFs and images

PDFs and images are automatically routed through Autoclean — no button to click. WeTransform's AI reads the document and extracts the data into a clean, structured table.

You can add a prompt to guide the extraction: "Extract only the line items from the invoice" or "Ignore the header section". If the first result isn't quite right, you can adjust your prompt and run again.

🤖 Autoclean works on meaning, not formatting. Even a scanned invoice or a PDF with multi-column layout can be normalized. Once extracted, your data enters the standard transformation pipeline — you can then apply rules, fix errors, and submit as usual.

📋 Format summary

Format

Handled how

CSV / TXT

Direct → Match

Excel (clean)

Direct → Match

Excel (merged cells, multiple tables)

Autoclean (optional) → Match

XML / JSON

Direct → Match

PDF

Autoclean (automatic) → Match

Images

Autoclean (automatic) → Match

👉 What to do next

Did this answer your question?