How to evaluate an AI agent vendor: the questions to ask.
Buying an AI agent system is hard because the demos all look impressive and the hard parts are invisible. I sell agent systems for a living, and I'd still tell you to interrogate every vendor — including us. Here are the questions that separate a vendor who will still be working at 3am from one whose demo was the best part.

Okan Özalan
Co-founder, GOGOGO LLC

I'm Okan — I run the business side of GOGOGO LLC, which means I'm often the vendor in the room. So take this as it's meant: a vendor telling you how to interrogate vendors, including mine. I'm comfortable with that, because the questions below reward whoever is actually building agent systems properly, and a buyer who asks them gets a better project regardless of who they pick.
The reason buying an AI agent system is genuinely hard: every demo looks impressive. A demo runs the happy path once. The thing you're actually buying has to run the unhappy path, unattended, for months. The questions that matter are the ones that probe the gap between those two.
Question 1 — 'Show me what happens when it fails.'
This is the most important question, so ask it first. Any system built on AI will sometimes be wrong — that's not a flaw, it's the nature of the technology. A vendor who implies their agent doesn't fail is either inexperienced or not being straight with you. The honest answer describes the failure design: does it fail loudly and stop, or guess and continue? What's the worst thing a failure can reach? A vendor who has a crisp, practiced answer here has run real systems in production. A vendor who's surprised by the question has not.
Question 2 — 'How do you know it's getting better, not worse?'
An AI system's output is non-deterministic, so you cannot tell if a change improved it just by looking. Ask the vendor how they measure quality. You're listening for the word evaluation — a real eval harness, a scored test set, before-and-after numbers on every change. If the answer is 'we test it' or 'our team reviews outputs,' that's vibes, and vibes stop scaling at about ten customers. You want a vendor who can show you a number.
Question 3 — 'Can you show me exactly what it did last Tuesday?'
This tests observability. When the agent does something you didn't expect — and it will — can the vendor pull up that specific run and show you every step, every input, every decision? Or do they shrug? A system you cannot inspect is a system you cannot trust and cannot improve. If the vendor can't show you a trace of a single past run on demand, they can't debug your problem when it's urgent either.
Question 4 — 'What does it cost me when I have ten times the volume?'
Every agent run has a real, countable inference cost. Ask the vendor to explain how cost scales with your usage — not the monthly license, the underlying cost. A vendor who knows their unit economics can answer this in concrete numbers. A vendor who waves it away either hasn't done the math or doesn't want you to. Either way, you'll meet that number eventually; better to meet it in the sales conversation.
Question 5 — 'What happens to this if I stop working with you?'
The exit question. Who owns the data, the configuration, the workflow logic? How locked in are you? You're not asking because you plan to leave — you're asking because a vendor confident in their work answers it calmly, and a vendor whose value is mostly lock-in gets uncomfortable. The calm answer is the good sign.
“Don't buy the demo — the demo is the easy part and every vendor's demo works. Buy the answers to what happens when it fails, how they measure it, whether they can show you what it did, what it costs at scale, and how you leave. Those five answers are the actual product.”
One question to ask yourself
Before any vendor call, answer this on your own: which specific workflow are we trying to hand over, and how would we know it worked? A buyer who can name the workflow and the success measure runs a good project with almost any competent vendor. A buyer who can't will be disappointed by the best vendor on earth, because 'add AI' is not a goal a project can hit. We wrote a sector-by-sector readiness map to help you pick that first workflow. And if you want to point these five questions straight at us — please do: [email protected].