Skip to main content

My file is messy — how to normalize it with Autoclean

Got a PDF, a scanned document, or an Excel file with merged cells and multiple tables? Autoclean uses AI to normalize your file before transformation begins.

Written by Stéphane Jauffret
Updated over 2 weeks ago

Not all files arrive clean. Some Excel files have merged cells, data scattered across multiple tables, or headers buried on row 5. PDFs and scanned documents have no structure at all. Feeding these directly into a transformation would fail immediately.

Autoclean solves this. It uses AI to read your file as a human would, extract the meaningful data, and output a clean, structured table — ready for the normal transformation pipeline.

🧹 When does Autoclean kick in?

There are two scenarios:

  • 📑 PDF or image — Autoclean runs automatically. As soon as you upload the file, WeTransform routes it through normalization. No button to click.

  • 📗 Messy Excel — WeTransform detects that your file may be poorly structured and shows you the "Try AI Autoclean" button at the bottom of the upload screen. Click it to trigger normalization.

✏️ Adding your own instructions

Autoclean works well without any instructions — it will extract the data it finds and normalize it by default. But you can guide it with a prompt if the result isn't quite what you need.

Click "Add specific instructions" to open the prompt field. Examples of useful instructions:

  • "Extract only the second table on the page, with header 'Investment summary'"

  • "Pivot the information of the row price so that it matches the corresponding product"

  • "Extract only the line items — ignore the summary section at the bottom"

  • "This is an invoice — extract supplier name, date, and all line items"

💡 You can add or adjust your prompt after seeing the first result. If the initial extraction doesn't match what you expected, refine your instructions and run Autoclean again — no need to re-upload your file.

➡️ What happens after Autoclean?

Once Autoclean has normalized your file, you land on the standard preview screen. Your data now looks like a clean, structured table. From there, the flow is identical to any other transformation:

  1. Review the extracted data

  2. Move on to the Match step — column matching works exactly the same

  3. Add rules if needed, then review in Finalize

🤖 Autoclean uses AI credits. Each normalization consumes credits from your account balance. You can monitor your remaining credits in your account settings.

👉 What to do next

Did this answer your question?