Clean and Standardize Form Submissions with GPT

Table of Contents

Why raw form data becomes a mess

If you’ve ever built a form in something simple like Google Forms or Typeform and connected it directly to a Google Sheet, you already know the shock of opening that sheet after a day of responses and seeing how wildly inconsistent the data is. One person types “California” while another just writes “CA” and a third one puts “calif.”. Even emails come in half broken sometimes — missing the @ symbol or with accidental spaces. It’s not that people are bad at forms, it’s that people fill them in as fast as possible and data entry rules go out the window when you’re just trying to sign up for something free 😛

When I first started cleaning things manually, my method was scrolling row by row, fixing states, capitalizing names, and fixing phone numbers. It felt okay until the volume doubled and I realized I was spending literal hours retyping. And the worst was when someone pasted their entire life story into the “First Name” field. So naturally, I decided automation with GPT would fix this. Spoiler: it fixed it… until it didn’t.

Setting up the GPT cleaning step

The core idea is you send the raw form submission into GPT with specific instructions for how you want it returned. For example, “Convert all state names to their two letter code, capitalize every word in full names, return in JSON format with keys email name state phone.” If you’re using Zapier, this can be a Webhooks by Zapier step that sends a prompt and gets structured data back. In Make, you can use an HTTP module to do the same.

What saved me huge pain was forcing GPT to output without any commentary — just the clean data. Because if you don’t, sometimes it replies with a friendly explanation before the data, which then breaks the automation. One annoying detail is that even with “always return valid JSON” in your instructions, sometimes you’ll get extra quotes or trailing commas that cause parsing errors downstream. When that happens, I now run the text through a JSON repair step before sending it to Airtable.

Normalizing text fields automatically

Let’s say someone enters their company name in all caps. GPT can normalize it to “Acme Inc” instead of “ACME INC”. For phone numbers, I ask GPT to strip out all spaces and dashes, then reformat to (xxx) xxx xxxx. Here’s what my prompt looked like for that:

“`
Format phone numbers to (###) ### ####
Capitalize names and company names
Standardize states to two letter abbreviations
Return data in the same order it was given
“`

When I tested this, it was almost perfect — except when someone added an extension in their phone number (ext. 45) and GPT just deleted it. So now I say “If phone number contains an extension, preserve it at the end.” You have to talk to it like it’s a slightly distracted assistant who will forget obvious things unless explicitly told.

Dealing with invalid email addresses

This is still hit-or-miss. GPT is decent at catching obviously fake ones like “asdf@asdf” or “noemail”. But subtle typos like “gmal.com” sometimes pass through unless I tell it to guess the intended domain. That’s risky — once it corrected “hotnail.com” to “hotmail.com” but the sender really did use hotnail as a custom domain. So if your automation has to be 100% correct, better to flag questionable emails for review rather than auto-correct them.

My Zap currently sends suspect addresses to a separate Google Sheet tab for manual checking, while the clean ones go straight into the CRM. It’s an annoying extra step, but better than losing real leads.

When GPT randomly breaks the format

Here’s the part that really drove me up the wall — for weeks GPT was returning perfectly formatted cleaned data. Then out of nowhere, I started seeing text like “Sure, here’s the data reformatted for you:” before the JSON. This wasn’t a prompt change, and it wasn’t even a GPT version update I knew about. That single extra line meant my downstream automation couldn’t parse the data, so everything after that failed.

The quick fix was adding a step to strip any text before the first curly brace { in the GPT response. But the mystery of why it started doing that mid-week still bugs me. I suspect the model re-trained and slightly changed its default response style. Whatever it was, it’s a good example of why you shouldn’t rely on GPT being 100% predictable even with the same prompt.

Logging every cleaned request

One habit that has saved me repeatedly is keeping a running log of raw form data alongside the cleaned GPT version. I dump both into a spreadsheet with a timestamp. That way, when something looks off in the CRM (“Why is this person’s name ‘Test Test’?”), I can see exactly what they entered and how GPT changed it. You’d be surprised at how often “weird” results are actually faithful cleanups of equally weird inputs.

I tried setting up Airtable to store these, but the API insert step was the one most likely to fail if GPT returned invalid JSON, so now they live in Google Sheets where I can still fix data after the fact.

Final checks before data is stored

Even after GPT standardizes your fields, I don’t trust it blindly. I added my own regex filters in the automation to do one last pass on key fields. If state is not exactly two letters, the row is flagged. If phone number doesn’t have 10 digits, it’s flagged. This catches edge cases like someone entering “Worldwide” as their state (yes, that really happened) and GPT deciding to keep it instead of replacing it.

Also, sometimes GPT tries to “help” by adding missing fields. For instance, a blank company name might become “Individual” out of nowhere. I had to specify that missing values should stay blank so I can tell what’s actually missing versus what GPT invented.

Small tweaks that make a big difference

After dozens of failures and small maddening glitches, I’ve learned that the strongest setup is a combination of:
– A strict, detailed GPT prompt
– A post-cleanup step that removes unexpected output before parsing
– A logging spreadsheet of both raw and cleaned data
– Regex filters on key fields

These extra steps mean the automation still works even if GPT decides to change its personality on a Tuesday ¯\_(ツ)_/¯

If you’ve been struggling to clean form submissions without spending weekends in spreadsheet hell, this mix of GPT cleanup plus manual safety nets will probably be the difference between smooth automation and the occasional slow-burning meltdown. And that’s… honestly still an improvement.