For product marketers and sales enablement teams, the ability to systematically mine sales call transcripts for go-to-market intelligence is a compelling opportunity. Before spinning up a full-scale project with an AI engineer, I wanted to test whether the data could be mined successfully at a limited scale. This post walks through my experience using ChatGPT to analyze sales transcripts and the key lessons from that small-scale test.
My conclusion: no suprise here – there is real value in using AI to mine calls. However, given all the competing AI projects available to marketing teams today, I would not prioritize building a continuous automated system as a top-five engineering project. For one-off analysis — perhaps run quarterly to answer specific questions — bulk AI analysis of call transcripts makes a lot of sense. But investing AI engineering time in a continuously running system probably doesn’t have the ROI today until the models get slighly better and your sales team is agentic ready.
The Opportunity
Platforms like Gong delivered an order-of-magnitude improvement in an organization’s ability to understand what was happening in discovery calls when they were first introduced. Many marketers and sales leaders have spent their commute time listening to call recordings, trying to understand how to better support their sales teams or surface insights for marketing. Now, with accurate transcripts and accessible AI tools, the question became: can this process run at scale? Can we go from 1:1 to 1:many for sales call analysis?
The Experiment
The goal was straightforward. Using a sample of several hundred transcripts, what could be learned across the dataset to assist the overall product marketing effort? I used the pro version of ChatGPT for this exercise. Some colleagues recommended Claude, but for simplicity, I limited the test to a single model. The company used sales reps and BDRs for first-pass discovery calls, covering prospects who reached out for support, demonstrations, or as part of a trial. All calls were anonymized, with identifying information removed from the transcripts.
Lessons Learned
#1 – Focus on a Single Stage of the Sales Cycle
Looking across the full collection of transcripts, the variability was immediately apparent — the calls spanned multiple stages of the sales cycle with wildly different contexts. By focusing on a single stage, the instructions to the AI could be much sharper. For this experiment, I focused on discovery calls and asked one core question: “What caused a customer to consider us?” This narrowed the working dataset down to 50 transcripts.
#2 – Label Your Sales Call Types
Understanding the purpose of a call is critical for segmenting transcripts for analysis. Labeling calls in Gong somewhere in the metadata is the first step. Options might include: Discovery Call, Troubleshooting, Demonstration, and so on. Don’t underestimate how hard it is to get a sales team to do this consistently.
#3 – Understand How Your Company’s Stage Impacts the Analysis
Most of my work is with early-market, sub-$300M ARR companies — companies that are scaling and bringing new technology to market. They are not commodity businesses. The language customers use isn’t standardized. Think about buying a car: a tire is a tire, brakes are brakes, MPG is MPG. Mining a car sales transcript is generally straightforward. Now consider an emerging product company where consistent language doesn’t yet exist to describe the product, its features, or its benefits. The analysis gets significantly more complex. Set your expectations accordingly.
#4 – Broad Prompts Produce Impressive — but Inaccurate — Results
Focused on discovery calls, I began asking broad questions like “Tell me why the prospect is talking to us.” The AI’s response was striking in what it surfaced. There was just one problem: after auditing the results against the actual audio recordings, the information was off. It was directionally plausible, but not accurate enough to base analysis on. I spent lots of time updating the instructions, providing more context, even providing a list of options. Perhaps my AI game isn’t where it needs to be, but I could not get consistent results to broad queries and even more targeted prompts. This could be due to lesson learned #5.
#5 – AI Doesn’t Understand Discovery Call Nuance (and Neither Do Some Reps)
If a prospect says “One of the reasons I’m talking to you is your pricing,” and the sales rep accepts that at face value, ChatGPT will correctly conclude that pricing was the driver. The rep walks away with the same answer. But the follow-up question was never asked: “You mentioned pricing as one reason — what are the others?” The first reason a prospect offers is often the easiest, least controversial one to give. The real reason usually comes out with a follow-up probe. ChatGPT cannot surface nuance the rep didn’t capture. Queries like “Find examples where reps didn’t ask appropriate follow-up questions” were largely unsuccessful.
#6 – Answers Get Shaped by External Content
Whether due to my instructions or model behavior, ChatGPT repeatedly imported external context — from the company website and other web sources — into its analysis of the transcripts. It would tell me a prospect requested a call due to a specific reason listed on the company’s website, but when I searched the transcript for any mention of it, it wasn’t there. The reason offered was directionally correct, but the language used was the company’s, not the prospect’s. Which leads directly to lesson #7.
#7 – Always Ask the Model to Cite Verbatim Transcript Excerpts
If the AI concluded that “pricing was the issue,” I needed to see the exact transcript passage to verify the model wasn’t hallucinating or misreading context. A rep asking “Are you interested in our pricing?” with the prospect replying “yes” is very different from a prospect who opens with pricing unprompted. ChatGPT frequently could not distinguish between the two.
#8 – Sales Call Mechanics Reporting Was Spot On
Questions about the mechanics of sales calls were accurate and extremely valuable. Prompts like “How many open-ended questions were asked by the sales team?” or “Categorize the open-ended questions asked” were highly reliable and, in many cases, surprised the sales team with what they revealed about their own effectiveness.
#9 – Capturing the Language of Customers Was Top-Notch
Companies bringing new products to market almost always struggle with the right language to use when describing the product, the problems it solves, and the benefits it delivers. Using AI to find instances of customers using their own language to describe those things is invaluable market research for product marketing.
#10 – Competitive Mentions, Titles, Anything Discrete Was Easy and Effective to Analyze
Searching for and categorizing discrete mentions of companies, titles, roles across the transcript set was straightforward and produced clean, reliable results. Not a surprising finding, but a consistently useful one. It is less useful if the reps don’t consistently ask the question.
#11 – Use Last-Mile Manual Analysis for High-Stakes Questions
Despite repeated attempts, I could not get the model to reliably answer the central question: “Why did this customer want to talk to us?” The inaccuracies and assumptions were too frequent. What I was able to do was have the AI generate a list of citations where it believed the customer was stating their reason for the call. I then manually reviewed and categorized those citations, which produced high-quality analysis. The AI did the legwork; a human did the judgment call. I was able to stand behind the results.
What You Should Do Now
Based on these lessons, here is a practical action plan:
#1 – Make Your Discovery Calls Agentic-Ready
You will have far greater success mining calls for insights if the right questions are asked using consistent phrasing. If every rep always asks, “Tell me why you called us today,” it becomes significantly easier to locate and analyze the answers at scale. Come up with your list of Agentic 5 – the five questions you want answered asked in the same way. These are likely the questions the team should be asking anyway.
#2 – Sales Discovery Tactics Must Be Solid
If your sales team is weak at discovery — rarely asking probing, open-ended questions — you will never capture the data needed for analysis in the first place. You cannot mine what was never collected.
#3 – Label Your Sales Calls Consistently
Label every call by type so you can segment and focus on the calls relevant to your analysis. This is foundational and harder to enforce than it sounds.
#4 – Run a Pilot Before Scaling
Once steps 1 through 3 are in place, run a focused pilot on a limited set of calls. If the pilot surfaces meaningful insights, then pursue a broader rollout based on your other project priorities. Don’t scale before you have validated the approach.
#5 – Always Spot-Check AI Results Against the Original Recording
Go back to the source audio regularly. AI results can look authoritative while being wrong. Spot-checking against the actual recording is the only reliable quality control.
Conclusion
AI-powered analysis of sales transcripts is a genuine opportunity, but it comes with real constraints — particularly for companies with non-standardized products and sales teams that haven’t yet mastered discovery call discipline. The best near-term use case is targeted, periodic analysis with a human in the loop for the final judgment. The longer-term prize goes to organizations that invest now in making their discovery calls agentic-ready.
Photo by Dominik Vanyi on Unsplash



