Start by watching your prompt actually fail
This feels obvious, but hear me out because I kept skipping this step myself. If you’re trying to fix GPT outputs, it won’t help to just keep tweaking your prompt in a vacuum. You have to actually run it against a few inputs and *see what broke*. What looked good inside ChatGPT often completely unraveled when the same prompt was used in Make or Zapier, especially when the incoming text was too short or had weird line breaks.
For example: I had a basic summarizing prompt for Google Docs that was working fine for weeks—until one day the output started repeating the intro paragraph twice. I assumed GPT was hallucinating or ignoring my stop words. After literally 40 test runs, I found the issue: the webhook was firing off *mid-document*, so GPT was getting the same partial content twice back to back 🤦♀️. The fix wasn’t even in the prompt. It was in the Data Inbound settings in Make.
So yeah. Don’t just revise the prompt broadly—run it with real data, inspect actual outputs, and **look for failures first**.
Create a simple success test case for sanity
Before making fancy improvements, just write the most basic version of the input that you know should succeed. This is your canary in the coalmine. Like, for a tweet-to-summary Zap, I used this test message:
> Here’s a normal tweet with clean grammar and no hashtags
If your base case fails, *you do not have a prompt problem*. You have a format, encoding, or context window issue. (I learned this the hard way during an 8-hour debug where I literally screamed when I figured it out.)
Sometimes the issue isn’t even GPT’s comprehension. It’s something like:
– Line breaks were being escaped from HTML
– The text contained a Unicode character GPT skipped
– The input was being truncated at the router level in Zapier
Weirdly consistent tip: copying and pasting your “clean test input” directly into ChatGPT web interface often reveals issues Zapier or Make won’t show. If something behaves differently there, something upstream is mangling the data.
Add numbered examples directly into your prompt
This is my favorite trick—and the only thing that made my tab-renaming workflow in Chrome actually work reliably. When prompts go vague or GPT mixes up intent, *explicitly add labeled examples* like this:
> Here are example requests and what should happen:
>
> 1. Input: “Send this to marketing”
> Output: Assign to Marketing folder
>
> 2. Input: “Schedule Thursday call”
> Output: Add calendar event on Thursday at 10am default
Then below that, say:
> Follow the same format. Only give the exact output as shown in examples.
It’s wild how much this improved accuracy. Especially in automations. When I added this to a prompt inside Make, the hallucination rate dropped like a rock. GPT started copying the style more mechanically, even though it sounded more like me pretending to be authoritarian in tone 😛
Tips that helped:
– Do not add explanations or reasoning. Just the before and after.
– Use flat, clear phrasing.
– If casing matters (like Title Case vs lowercase), be explicit in the example.
Also, if the context window gets tight, don’t keep adding more examples further down. Old examples get ignored. Put your most important pattern first.
Write your bad instructions as bad examples
This one surprised me. I had a prompt that kept failing because it was too polite. Seriously. It said things like:
> Try to reply in a friendly tone if possible
Every time I said “try to,” GPT interpreted it like jazz improv 😂 So instead, I added this to the examples section:
> ❌ Input: Try to summarize the email nicely
> ❌ Output: Sure, here’s a friendly version…
>
> ✅ Input: Summarize the email with no extra text
> ✅ Output: Project launch was delayed to next week
GPT actually responded to this better than me explaining what not to do. It was like a child finally understanding what you mean when you *model the mistake*.
Same goes for showing what a wrong answer actually looks like. If GPT kept adding emails like:
> I hope this message finds you well
I literally started feeding it inputs that showed:
> ❌ Response: I hope this message finds you well
> ✅ Response: Next steps for Monday are X, Y, Z
If you use this trick, only do 2-3 at a time. Too many examples of wrong answers and GPT will start copying them by mistake ¯\_(ツ)_/¯
Force the output format early not late
One of the worst things I did was assuming GPT would “get around” to formatting correctly. It never did. I had a Notion entry creator that broke because GPT kept outputting lists in markdown… even though Notion’s API didn’t like that.
Fixing it meant putting the format rules *before* I ever explained the task. Like this:
> Respond only using comma-separated plain text. No markdown. Lists must look like this: item1, item2, item3.
Then only after that did I explain the task context:
> Now write a list of services the user mentioned in the message
Suddenly GPT stopped adding headers like:
> **Here’s the summary you requested:**
Ever again.
Also helpful: if possible, define the output label others are expecting. So instead of asking “classify this,” say:
> Output one of the following tags exactly: urgent, medium, low
Then give examples of how to pick the right one (see earlier section).
Now, if you’re generating structured data like JSON, definitely do a nested example before the input. GPT will *copy the structure* almost perfectly if it’s shown what to match. Just be careful with trailing commas—GPT still fumbles those :/
Use markdown tables to lock behavior
I know markdown doesn’t show up in every tool, but GPT *thinks* in markdown like a second brain. Using tables in your prompt won’t always land perfectly in the final integration, but when you’re trying to lock it into decision-making structure, they’re gloriously deterministic.
Here’s one I used in a prompt parsing cold emails:
| Phrase Occurs | Category |
|———————–|————–|
| Please contact us | Sales Inquiry|
| I forgot my password | Support |
| Schedule a demo | Lead |
Then I just instructed:
> Find a matching phrase from the message input. Use the category shown.
This worked **way** better than asking GPT to just classify the intent. When phrases were close but not exact, it would usually pick the right category based on table proximity logic—somehow. (It’s still weird to me that this works as well as it does.)
If you’re having issues getting consistent tag formats, a table like this usually beats bullet lists or nested decision trees.
Handle edge cases with repeatable overrides
So here’s the deal. GPT just won’t always handle edge cases correctly unless you describe *exactly* what to do—*and* you phrase it like an urgent rule. Passive suggestions don’t seem to override pattern-matching behavior.
What worked best for me:
– Start with “If [condition], do not [expected format]”
– Then immediately follow with “Instead, write [new format]”
Example from my calendar parsing Zap that had to handle subject-only emails:
> If the email contains no body text, do not try to summarize. Instead, return: “No summary available. Body text missing.”
When I skipped this and said “ignore empty emails,” it took me *hours* to notice that GPT was still summarizing *just the subject line*. Real facepalm there.
Also: if you find a recurring failure, just capture that input and make it its own override rule. This doesn’t scale infinitely, but for most flows, a handful of these covers it.
Do final pass with zero fluff instructions
Start stripping out soft language in your last prompt revision. You don’t need to be polite to GPT. I mean, you can… but it just leads to it being polite right back 🙃.
Bad:
> Try to create a concise, helpful response
Better:
> Return a plain English sentence. Limit to 1 sentence. No pleasantries.
Bad:
> Please tag this message if possible
Better:
> Tag this message. Use only these formats: urgent, low, none
If you’ve added clean examples, a format restriction, and error overrides, cutting the nonsense is your final step. I usually delete dumb things like “This is for a Zapier flow” or “Act like a smart assistant,” because they generally don’t help.
Only leave in things GPT can act on. If it can’t impact formatting or word choice or logic in any way, kill it.
Or put another way: if the sentence doesn’t help *a dumb robot do something narrower*, it’s probably just in your way 🙂