Long arcMay 18, 20269 min read

From AIML in 2015 to multi-agent in 2026: the 10-year arc of conversational AI.

I wrote XML rule trees for Turkcell BiP Messenger in 2015. I now build typed multi-agent runtimes for GOGOGO LLC in 2026. The path between the two isn't a straight line — it's a series of forced compromises with the model of the day, and each compromise produced a pattern that survived into the next era.

Atakan Özalan

Co-founder & engineering lead, GOGOGO LLC

From AIML in 2015 to multi-agent in 2026: the 10-year arc of conversational AI.

In 2015 I wrote AIML — the XML dialect for chatbot rules — for Turkcell BiP Messenger. Hundreds of <category><pattern> blocks. No transformer was going to write you a sentence then. The state of the art was a tree of regex matchers with a fallback paragraph. I shipped two years before the world started using the word 'chatbot' in earnest.

In 2026 I build the multi-agent runtime at GOGOGO LLC. Typed orchestrator, specialised agents, tool calls as the runtime, traces you can replay, eval harnesses gating every deploy. The thing in front of you is unrecognisable to my 2015 self — except in the parts that actually mattered then and still matter now.

2015 — AIML, or: rules with very good taste

AIML's job was to read the user's message, decide which <pattern> it matched, and emit a <template> response. You spent your days writing variants:

xml

<category>
  <pattern>WHAT IS MY BALANCE</pattern>
  <template>Your current balance is <get name="balance"/>.</template>
</category>
<category>
  <pattern>HOW MUCH * MONEY *</pattern>
  <template><srai>WHAT IS MY BALANCE</srai></template>
</category>

It was a state machine wearing a costume. You couldn't generate the right response; you could only recognise whether the user's intent matched something you'd already authored. The model had no understanding — but the system, as a whole, often felt smart, because the people writing the rules were thinking very carefully about how Turkish-speaking customers actually phrased questions.

What 2015 forced you to learn (that you still need in 2026)

Slot a structured payload before generating prose. AIML forced you to extract balance / phone / customer-id BEFORE composing the answer. That pattern is exactly typed function-calling in 2026.
Authoring is engineering. You wrote rules; you ran a test corpus; you measured intent-match coverage. We did A/B testing on <srai> redirects. This is what's now called eval-driven prompting.
Fallbacks are first-class. Every AIML system had a 'I didn't understand' path with a graceful hand-off to a human. 2026's word for this is 'human-in-the-loop'.

2017–2020 — intent classifiers and Rasa: the supervised-learning era

The next phase wasn't a leap, it was a substitution. The <pattern> matcher became a small neural classifier (BiLSTM, then DistilBERT). You labelled 200 utterances per intent, trained a model to pick the intent, and kept the rule-engine for the slot-filling and the response composition. Rasa, Dialogflow, Watson — same architecture, different vendors.

It felt revolutionary at the time. Looking back: it was the same AIML architecture with a different classifier. The model was learning to read; it still wasn't writing.

2022 — instruction-tuned LLMs eat the middle layer

Then GPT-3.5 happened, and ChatGPT made everyone realise the model could now write — not just classify. The intent classifier collapsed into a prompt. The response template collapsed into a generation. Half the chatbot industry threw away their Rasa stacks and replaced them with a single system prompt.

Some of that was correct. A lot of it was overcorrection. We replaced typed, testable, debuggable middle layers with an opaque autocomplete. The first version of the GOGOGO orchestrator was the same mistake — one model, one long prompt, tool calls. It worked in demos and collapsed in production. (We wrote about the three rewrites that followed.)

2024–2026 — multi-agent: getting structure back

Multi-agent isn't a new idea. It's the AIML factoring with one more dimension: instead of one model branching through rules, you have several models, each with a narrow job, talking to each other through typed hand-offs. The orchestrator is small and rule-shaped. The specialists are generative. The runtime — tools, traces, evals — is what holds the whole thing together.

// 2015 — AIML
<category><pattern>WHAT IS MY BALANCE</pattern>
  <template>Your balance is <get name="balance"/>.</template>
</category>

// 2026 — typed multi-agent hand-off
type HandOff =
  | { to: "billing.balance"; input: { customerId: string } }
  | { to: "billing.invoice"; input: { invoiceId: string } };

const next = orchestrator.decide(state);
const result = await runtime.run(next);

The two snippets are doing the same job. They're written eleven years apart. The AIML rule named a function and bound a variable; the typed hand-off names a function and binds a variable. The fact that one runs through a regex tree and the other through a transformer's tool-calling head is — surprisingly — the least interesting difference between them.

What the 10-year arc actually teaches you

The structure of the problem hasn't changed. Recognise intent → extract slots → call a tool → compose response. AIML factored it this way in 1995. Multi-agent factors it this way in 2026.
Generation should be the LAST step. Every time we let the model generate first and structure later, reliability collapsed. The model should structure first, then generate against the structure.
Tools are the runtime. This is the deepest lesson. In AIML, the 'tools' were <set> and <get> and out-of-band hooks. In 2026 they're typed function calls. The model is decorative; the tools are the work.
Replay is non-negotiable. Every system that survived three years of model upheaval had a trace you could replay. The ones without it got rewritten every time the model changed.

“Eleven years between writing AIML and writing typed agent hand-offs. The frameworks change every two years. The factoring doesn't. Bet on the factoring.”

What I'm betting on for 2026–2036

Multi-agent will keep eating the middle layer the way LLMs ate it in 2022 — but the layer underneath (typed contracts, traces, evals) is the part you should own outright. Frameworks come and go. The orchestrator I write today won't be the orchestrator I write in 2030. But the rule that hand-offs are typed values, not prose, will outlive both.

If you want the underlying agent runtime we ship across Goddo, GoPeople, GoVista, and GoTrack, or to compare notes on agent design over coffee — I'm at atakanozalan.com, or as ezagor since the days I was writing those first XML rules.