GPT Prompts That Pull Insights from Customer Feedback Surveys

Table of Contents

Start with What You Actually Want to Know

The most common mistake I’ve made (repeatedly) when trying to sift insights from customer feedback is thinking I know what question I’m answering. I’ll write a ChatGPT prompt like “Summarize all the negative feedback” and then get annoyed that it returns a list of generic complaints like “The app is slow.” Cool. Thanks, counselor obvious 😛

So now, before I even open the survey spreadsheet (or paste in a CSV), I write down what I actually want to figure out. Real questions like:
– Are people confused about our pricing?
– Do users mention competitors when they churn?
– Is there a pattern in the words people use when they like vs hate the onboarding?

Then I build my questions for GPT around one of those. The specificity totally changes the result.

Example prompt:
“Here are 100 free-text customer responses from a survey. Can you identify quotes that mention pricing confusion, including responses that don’t use the word ‘pricing’ but imply it?”

When I finally tried that, it returned some incredible out-of-left-field comments that I hadn’t noticed — like a person who said “I didn’t expect an invoice after syncing. I thought it was free.” That would’ve never shown up in a keyword-only analysis 🙃

Use Categories But Let GPT Invent the Labels

At one point I tried to pre-label feedback into categories like “bugs,” “features,” and “support issues,” then feed those into GPT for analysis. Nope. Instantly flattened anything interesting. It basically turned into sentiment analysis with extra steps.

What works way better is asking GPT to invent labels based on patterns it finds — and then ask for multiple ones per comment.

Prompt I used that worked shockingly well:
“Read the following 50 customer comments and organize them into approximately 5 to 7 themes. For each theme, give it a name, a short description of what the comments are about, and list 3 example quotes. A comment might belong to more than one theme.”

Instead of just getting “positive” vs “negative,” this approach surfaced a whole theme I didn’t expect — people getting stuck between our mobile app and desktop interface. I wouldn’t have caught that manually because they weren’t saying technical terms, just stuff like “I clicked it at home and nothing happened at work.” ¯\_(ツ)_/¯

Break Long Surveys Into Batches for Accuracy

If you drop 900 rows of feedback into ChatGPT all at once, it’ll smile politely and give you three bullet points like “Users want improvements.” Thanks, I guess.

It turns out, breaking these into batches of 50 to 100 comments per prompt keeps the analysis tighter. When I run them this way, I get more detailed groupings, plus the model doesn’t skip over weird but important one-offs.

Here’s my workflow:
1. Export responses from your survey tool as CSV.
2. Clean it manually — delete any duplicates or partial entries. (Or worse, the ones that say “Lorem ipsum qwerty.” Yes, I’ve seen those in real data 😐)
3. Split into chunks of, say, 75 rows each.
4. Feed each batch into GPT with a consistent prompt asking for themes.
5. After all batches are done, ask GPT to consolidate the themes from all batches.

When I did this across four batches from a post-purchase survey, I finally saw that shipping expectations were a problem — “overnight” meant different things to customers in different states. Not something you catch from one big unsorted dump.

Use Contrast Prompts To Reveal What Changed

One of the coolest things I started doing lately (usually when something feels off but I can’t say why) is running before-and-after comparisons through ChatGPT.

For example:
“Here’s a set of customer survey responses from last quarter, and another from this week. Please identify differences in tone, topic frequency, and sentiment between these groups. Include at least one quote to support each difference.”

This is especially helpful when we launch a new feature, or when something breaks and I need to know if users have *really* started complaining more — or if it just *feels* that way because two people yelled at support on the same day.

Last time I used this, it showed a spike in people asking if “access to Team Boards was removed.” We hadn’t changed permissions (or so I thought), but digging into it revealed a draft update that accidentally shipped with a visibility toggle off. It was fixed in 3 minutes after that 🙂

Watch for Identical Complaints in Opposite Sentiments

This one took me way too long to understand. Customers often describe the *same behavior* as positive or negative depending on their expectations. So if your prompt filters only by tone — like “show only dissatisfied users” — you’ll miss critical overlapping issues.

My fix: I ask GPT to group by described experience or behavior *before* tone.

Prompt:
“From the following survey comments, group together responses that describe a similar user experience, regardless of tone. For each group, indicate whether the feedback is mostly positive, negative, or mixed.”

I tried this after switching our welcome email format. Some users absolutely loved the simplicity. Others said it looked “scammy.” Same email! Without mixing tone buckets, that wouldn’t have surfaced.

Now I treat those comments as design friction, not just mood swings.

Ask for Examples Instead of Just Trends

One mistake I made a lot early on — and see a ton of other teams do — is asking GPT to “summarize trends.” You’ll get vague stuff like “users want better performance” (…wow… shocking).

Instead, every prompt I use for feedback analysis now ends with something like:
“Include 3 verbatim user quotes per group that illustrate the pattern you’re describing.”

Sometimes that little change completely flips the usefulness of the result.

For instance, we had this massive theme come up in our Net Promoter Score survey about account deletion clarity. But the actual comments were really helpful:
– “I tried deleting my profile but all it did was log me out.”
– “Not sure if my data is gone or still there.”

Seeing those word-for-word gave the team precise language to use in the fix — not just “clarify deletion.” Because you know the person writing that ticket isn’t reading a whole spreadsheet from support.

Use Reflection Prompts To Uncover Assumptions

This one feels a little mushy, but it’s actually where GPT shines. When I feel stuck — like, I’ve read the feedback ten times, but nothing’s clicking — I’ll ask GPT to explain *why* users might be feeling a certain way.

Prompt:
“Based on this customer feedback, what might users be assuming about how the product works that isn’t true? List potential false assumptions and the comments that support them.”

This exact trick helped me realize our mobile download button looked like it was saving to the phone — but it actually triggered a cloud sync. Users were saying “But where did it go?” and I was over here thinking “You pressed the button? It worked!” 🤦‍♂️

A single reflection-style answer from ChatGPT reminded me that visual location = expectation. We added a progress bar that says “syncing to cloud” and the tickets dropped almost instantly.

Group Written Surveys By Job Or Persona First

This is something I totally ignored until we launched a multi-user dashboard, and suddenly survey responses went two completely different directions — one group super excited, the other deeply confused.

When I re-ran the prompts but prefaced the data with a split by persona (e.g., ‘admin user’ vs ‘guest viewer’), the insights finally made sense.

Prompt:
“This is a collection of customer feedback from two types of users: admins and invited users. Please analyze each group separately, identifying concerns and confusion points that are unique to each.”

Doing it this way helped me catch that invited users didn’t even *know* they had limited access — they just thought features weren’t working. Admins weren’t confused at all, so the earlier trends totally hid those misunderstandings.

Now I automatically tag survey data on export with known fields like user type before feeding it into GPT. Not always perfect, but way more useful than flattening everyone into one big average.