Blog

"Which AI Should I Use?" Is the Wrong Question

Before selecting AI tools, define the problem being solved. Not all hard problems are the same kind of hard - and matching problem types to model strengths is the emerging leadership skill.

Business ValueAI StrategyModel SelectionGeminiClaudeCastifai

Most organizations I work with start their AI journey with the same question: "Which AI should we use?" It feels like a reasonable question. There are dozens of models, new releases every week, benchmarks that shift monthly. Surely picking the right one matters.

It does matter. But it's the wrong starting point.

The right starting point is: what problem are you solving? And more specifically - what kind of hard is that problem?

This mirrors something I learned at Motorola running innovation workshops. The methodology always depended on the specific problem. You wouldn't use the same approach for a supply chain bottleneck and a customer experience redesign. The same principle applies to AI.

Hard Is Not One Thing

I've been thinking about this through the lens of Nate Jones's analysis of Google Gemini 3.1 Pro, which does an excellent job of decomposing what "hard" actually means in the context of AI tasks.

Reasoning problems require sustained logical deduction across multiple variables. Tax optimization, financial modeling, regulatory analysis - problems where you need to hold many constraints simultaneously and trace through their implications. Google's Gemini 3.1 Pro, optimized for pure reasoning, excels here.

Effort problems involve massive scope but relatively straightforward steps. Contract auditing across thousands of documents. Codebase migration from one framework to another. Pattern analysis across large datasets. The challenge isn't figuring out what to do - it's doing it at scale without losing consistency. This is where agentic models like Claude shine - tools that can work autonomously for hours, coordinating across files and systems.

Coordination problems are about aligning teams, routing work, and managing information flow across dependencies. These are often the problems organizations think they're solving with AI, but they're actually organizational problems that AI can assist with - not solve.

Ambiguity problems are about determining the actual question, not computing the answer. What should our pricing strategy be? Which market should we enter next? Where should we place our AI bets? No model solves this. This is product sense, strategic intuition, leadership judgment. AI can inform these decisions with data and analysis, but the decision itself remains fundamentally human.

Emotional intelligence problems involve delivering difficult feedback, reading rooms, navigating negotiations, building trust. AI does not attempt this with any reliability today. And leaders who outsource these moments to AI will pay a trust cost they can't easily recover.

How I Build AI Products

This framework isn't theoretical for me. With Castifai, I started with problem definition: turn video and text content into visual infographics where the text is readable and professional.

That's primarily an effort problem with a reasoning component. The effort: processing many pieces of content at scale. The reasoning: layout decisions, text placement, visual hierarchy.

So I tested models, evaluated quality against specific requirements, considered the unit economics ($0.15 per image matters when you're processing thousands), and built fallback chains for reliability.

When a new model launches, the question isn't "is it better?" It's "is it better for my specific problem?" A model that scores higher on a general benchmark might score lower on my specific image generation requirements. The problem anchors the evaluation.

The Routing Skill

Here's where this gets practical. The gap between "I use ChatGPT for everything" and "I route financial modeling to Gemini on high thinking, coding tasks to Claude Code, quick research to Gemini Flash, and deep document analysis to Opus" - that gap is real and it is growing every month.

Six months ago, using one model for everything was a reasonable default. The models were similar enough that switching costs outweighed the marginal benefits. That's no longer true. Google has optimized Gemini for reasoning depth. Anthropic has built Claude for sustained agentic work. The specialization is real and accelerating.

Teams that develop routing intuition - knowing which problems match which models - will systematically outperform teams that stick with a single-model approach. Not because any one model is dramatically better, but because the right model for the right problem compounds across hundreds of decisions per week.

Your action step

Take three time-consuming work tasks from your past week and classify each one across these dimensions:

  1. Reasoning component - How much sustained logical deduction does this require?
  2. Effort component - How much is about scope versus complexity?
  3. Coordination component - How much is about aligning people and information?
  4. Ambiguity component - How clear is the actual question being asked?
  5. Judgment/emotional intelligence component - What parts require human-only capabilities?

Write it on one page. That clarity is worth more than any benchmark score.

Originally published in Think Big Newsletter #20 on Amir Elion's Think Big Newsletter.

Subscribe to Think Big Newsletter