This Week's Term: Jagged Intelligence - frontier AI models that are simultaneously brilliant and weirdly broken, performing unpredictably across tasks that look similar on the surface.
Ask a frontier model which is bigger, 9.11 or 9.9. Many still get it wrong. Then ask the same model to solve an Olympiad-level math problem. It will. Ask it to count the letter "r" in "barrier," and you might get two instead of three. Then watch it pass the bar exam on the next tab.
This is jagged intelligence, a term Andrej Karpathy coined in August 2024. Sundar Pichai picked it up last summer on the Lex Fridman podcast as AJI, artificial jagged intelligence. Karpathy's framing was that state-of-the-art models are simultaneously "a genius polymath and a confused grade schooler." Human intelligence tends to correlate across domains, so someone who is good at reasoning is usually good at arithmetic too. AI is not like that. The same model that passes the bar exam may fail at counting letters in a word.
The academic foundation predates the name. In 2023, Fabrizio Dell'Acqua, Ethan Mollick, and colleagues at Harvard and BCG ran a field experiment with 758 management consultants. On 18 realistic tasks inside what they called the jagged frontier, AI users were 40% better, 25% faster, and completed 12% more tasks. On a single task deliberately placed just outside the frontier, consultants without AI got it right 84% of the time. Consultants with AI got it right 60 to 70 percent. The model looked equally confident in both cases. There was no signal from the tool itself that the second task had crossed any line.
Why it matters for business
MIT NANDA found 95% of enterprise AI projects produce no measurable P&L impact. McKinsey reports over 80% see no tangible EBIT movement. For anyone who has watched pilots quietly die in production, these numbers are grim without being surprising. The usual explanation is "the models are not good enough." The jagged intelligence lens says something different: the models are plenty good inside their frontier. Deployments fail when teams cannot tell which of their own tasks land inside the frontier and which land outside. Average benchmarks hide the jaggedness. Your specific workflows expose it.
This also reshapes what "recognize exceptional talent" means for AI agents, as I discussed in this issue's leadership section. Each model has its own jagged profile. Recognizing exceptional in an agent means knowing which tasks sit inside its frontier, which ones sit outside, and where the edges are soft versus sharp. That is a skill leaders are starting to build, and it is different from picking "the best model" on a leaderboard.
For deeper learning
For a focused explanation of this term, watch Helen Toner's "AI's Jagged Frontier" at The Curve 2025 (November 2025). Toner runs the Center for Security and Emerging Technology and brings a sharp policy lens to why jaggedness may not smooth out over time.
Your action step
Pick one AI deployment your team is relying on today (a support copilot, a sales assistant, a research agent, anything in real use). This week, stress-test its jaggedness. Generate 20 real tasks pulled from actual work. Score each one: did the AI do this well, poorly, or wrong-but-confidently? The last bucket is the one that matters. Map those cases and share the map with the team. That map is your frontier. It will tell you more about where to trust AI in your workflows than any benchmark will.
If you'd like to map the jagged frontier across your AI tools with your team, or want me to speak to your leadership team on why average benchmarks mislead enterprise AI strategy, I'd love to help.