Types of AI Systems

Here is a thought experiment that cuts through a lot of noise: imagine you need to sort a million emails into "spam" or "not spam." Now imagine you need to write a sonnet about longing. Now imagine you need to teach a robot arm to catch a ball. These are three completely different problems. Yet in popular conversation, the same word — AI — gets applied to all three solutions, as though they're variations on a single thing. They're not. Understanding the landscape means understanding that AI isn't a technology. It's a family of technologies, each suited to different shapes of problem.

Start with how the learning happens. Supervised learning is the workhorse: you feed a model thousands of labelled examples — this email is spam, this X-ray shows pneumonia, this credit application defaults — and it learns to generalize. The label is the signal. Remove the label and you're in unsupervised territory: the model has to find structure on its own, which is how recommendation systems discover that people who buy one thing also tend to buy another, without anyone telling them that's the pattern to find. Then there's reinforcement learning, which works like training a dog except the dog is a software agent and the treats are numerical rewards. AlphaGo learned to play Go by playing millions of games against itself, losing, adjusting, winning slightly more, iterating. No labelled examples of correct Go moves — just the reward signal of winning.

The architecture question is separate from the learning question, though they're often conflated. Traditional machine learning models — decision trees, support vector machines, gradient boosted forests — are still doing serious work in production systems handling structured data. Your bank's fraud detection is probably a gradient boosted model trained on transaction records, not a large language model. Deep neural networks became dominant for unstructured data: images, audio, raw text. The convolutional neural network introduced an inductive bias that made image recognition tractable; instead of processing every pixel independently, it learned to detect features regardless of where they appeared in the image. AlexNet's 2012 ImageNet victory wasn't just a benchmark win — it forced the field to take deep learning seriously after years of dismissal.

Then came the transformer, and it changed the shape of the conversation. The 2017 paper "Attention Is All You Need" proposed a mechanism where every part of an input attends to every other part simultaneously, weighting relevance dynamically. No sequential processing, no vanishing gradients, effortless parallelization across GPU cores. Within five years, transformers dominated not just language but vision, audio, and protein structure prediction. GPT, Claude, Gemini, Llama — all transformers. The architecture that was supposed to solve language translation ended up being the substrate for the current moment.

The most important distinction isn't how a model was trained — it's what shape of problem it was designed to solve. Choosing the wrong architecture for the wrong problem is expensive in both time and compute.

What gets called "capability" is worth dwelling on. Classification — is this spam, is this tumour malignant, does this sentence express positive sentiment — is mature technology. It works. Generation — write a paragraph, synthesize a voice, produce an image — is recent and rapidly improving but still unreliable enough that you'd never deploy it unsupervised in a high-stakes context. Reasoning — multi-step logic, causal inference, planning under uncertainty — is genuinely emerging. The reasoning models that use inference-time compute to deliberate before answering are spending more processing cycles before output rather than just predicting the next token. Whether this constitutes reasoning in any meaningful sense, or whether it's pattern-matching that looks like reasoning from the outside, is one of the legitimately open questions in the field.

The field is converging on transformers for many things, but "converging" is not "arrived." Specialized small language models trained on narrow domains now outperform general large models on specific professional tasks — a 7 billion parameter model fine-tuned on medical literature can outperform a 70 billion parameter generalist on clinical reasoning benchmarks. The efficiency era is real: you don't always need the biggest hammer.

Try this. You're building a tool to flag potentially fraudulent loan applications. Your team is debating between a gradient boosted model trained on historical fraud cases versus a large language model that can read the application text. Both would be labelled as "AI." Think through which you'd actually choose and why.

What we'd notice

A careful reader might notice that the structured numerical data in a loan application — income, debt ratio, credit score, account age — is exactly where traditional ML has decades of robust tooling and interpretability. A gradient boosted model would also produce feature importance scores, which regulators tend to require. The LLM might catch something interesting in free-text fields, but you'd be introducing a much heavier, more expensive, less auditable system for a problem that a simpler tool already handles well. The right choice isn't the most sophisticated technology — it's the one that matches the structure of your problem.

What this means practically: before deploying any AI system, the most useful question isn't "which model is best?" It's "what shape is my problem, and which family of approaches was designed for problems of that shape?" The taxonomy isn't pedantry. It's the difference between a solution that works and a solution that burns your budget and fails in production.

The Deeper Question

If supervised learning requires labelled examples, and labelling is expensive and human, then in what sense are we building "artificial" intelligence versus encoding and scaling human judgment?

Reinforcement learning systems like AlphaGo discovered strategies that surprised the humans who trained them. Does that surprise tell us anything meaningful about the nature of the intelligence involved, or only about the limits of human intuition about the problem space?

The transformer architecture was designed for machine translation and ended up reshaping how we think about intelligence broadly. What does it mean for a field when a single architectural innovation displaces dozens of prior approaches — is that convergence a sign of maturity or a warning sign of premature closure?

As reasoning models use more inference-time compute to deliberate before answering, the line between fast pattern recognition and slow deliberate reasoning begins to blur. Is that distinction still useful for understanding what these systems are actually doing?

If a small model fine-tuned on domain-specific data outperforms a large general model on that domain, does scale still mean what we thought it meant? What are we actually measuring when we measure capability?

The Deeper Question

Check your understanding