Quick tour

Watch the video

Each lesson starts with a short video. It covers the same material as the text — just pick the format you prefer. You can skip it and read instead.

Cognitive architectures: the science of systematic reasoning

Even with perfect context, AI fails at complex tasks like multi-step math, strategic planning, and nuanced critique. The reason: it tries to calculate the final answer in one pass. Cognitive architectures fix this by structuring how the model thinks before it answers.

Written by Daniel OkaforAI for Business

Lesson 2/5AI~45 min read

Watch

Video version of this lesson

Read

Full lesson text

Why this matters?

In Lesson 1, we established that large language models are statistical prediction engines. We learned that context and a persona narrow the "probability space" to get better results.

But even with perfect context, models often fail at complex tasks. Multi-step math. Strategic planning. Nuanced critique.

Why? Because the model tries to calculate the final answer in a single pass.

95% of all practical problems folks encounter can be solved by turning on extended thinking. Ethan Mollick, Wharton University

This lesson is about "cognitive architectures." The art of structuring the model's internal dialogue to expand its computational space. We will move from linear thought to branching logic, and finally to adversarial systems where AI critiques itself.

The token trap: why models fail at logic

To understand why we need cognitive architectures, look at the "token trap."

The linear prediction problem:

An LLM generates text one "token" (word or character) at a time. When you ask: "What is the 17th prime number?", the model tries to predict the answer immediately.

If it has not memorized that specific fact as a single statistical association, it guesses. And because it is a helpful assistant, it predicts a number that looks like a prime number. Even if it is wrong.

The human parallel:

Imagine someone asks you to multiply 457 × 892 in your head in exactly one second. You will fail or give a vague estimate.

But give you a pen, a piece of paper, and one minute, and you can chain your thoughts together. Small, easy calculations that build toward the complex final answer.

Cognitive architecture is the pen and paper for the AI.

Chain of thought: creating compute space

The most fundamental cognitive architecture is "chain of thought" (CoT). You instruct the model to think step-by-step before providing the final answer.

Why CoT works (the technical reason):

When a model "thinks" out loud, it generates intermediate tokens. Each new token becomes part of its own context.

In a direct answer, the model has 1 token of thinking space
In a CoT prompt, the model might generate 500 tokens of reasoning

Each of those 500 tokens allows the model to perform "attention" (the primary mechanism of LLMs) on its previous logic. It builds a ladder. Each rung is a simplified logical step that makes the next rung easier to reach.

The "extended thinking" revolution:

Modern models now have "extended thinking" or "thinking blocks" built-in.

Manual CoT: You write "think step-by-step"
Systemic CoT: The model is trained to automatically open a scratchpad where it verifies its logic before showing you the result

The difference between manual and systemic CoT: you used to ask the model to think. Now it is trained to think on its own.

Feature	Direct prompting	Chain of thought (CoT)
Logic processing	Calculated in one pass	Broken into sequential sub-tasks
Accuracy	Lower for multi-step problems	Significantly higher for logic and math
Transparency	Black box (you only see the answer)	Glass box (you see the "show your work" steps)
Computational cost	Low (fewer tokens)	Higher (more tokens generated)

Beyond the line: tree of thoughts

If "chain of thought" is a linear ladder, "tree of thoughts" (ToT) is a search through a maze.

The problem with linear logic:

Sometimes the first logical path a model takes is a dead end. In a standard CoT prompt, if the model makes a mistake in step 2, it is statistically forced to continue that mistake through step 10. It becomes a victim of its own previous tokens.

The ToT solution: branching and evaluation:

"Tree of thoughts" allows the model to explore multiple branches of reasoning simultaneously. This mimics how a human expert or chess player thinks: "If I do A, then B happens. But what if I do C?"

The 3-step ToT framework:

Brainstorming (the branches): Ask the model to generate three distinct strategies for a complex problem
Evaluation (the pruning): Instruct the model to act as a critic, identifying the flaws and risks in each of the three strategies
Synthesis (the golden path): The model takes the surviving logic from the best branches and combines them into the final solution

Tree of thoughts introduces self-correction. The model identifies dead ends before it commits to the final answer.

Adversarial validation: the battle of the bots

Here is a profound insight: AI is often better at critiquing than at original writing.

In Lesson 1, we talked about the model as a prediction engine. Now we use that engine to "roast" its own output. This is called "adversarial validation," or the playoff method.

The psychology of the roast:

When a model writes a draft, it tends to be polite and agreeable. But assign it the persona of a brutal critic or an angry customer, and it accesses a different part of its training data. One focused on finding flaws, errors, and inconsistencies.

Implementing the "battle of the bots" workflow:

To create a high-stakes deliverable (like a legal contract or a marketing strategy), use this multi-persona architecture:

The creator: Write the initial draft using a senior persona
The adversary: "You are a skeptical auditor who wants to find every reason why this plan will fail. Be brutal. Identify 5 logic gaps."
The synthesizer: "Read the draft and the critique. Rewrite the draft to address every single concern raised by the auditor."

By forcing the model to occupy two conflicting perspectives, you ensure the final output has been stress-tested.

Structuring the internal dialogue: using tags

To manage these complex architectures, use XML-style tags (e.g., `<thinking>`, `<draft>`, `<critique>`).

The power of tags:

Models like Claude are specifically trained to understand structure within tags. Tags act as compartments for different cognitive tasks. They prevent the model's internal thought from leaking into the final answer.

The professional prompt structure:

Tag	Purpose
`<task>`	What you need done
`<instructions>`	Rules for the model
`<reasoning>`	Scratchpad for thinking
`<critique>`	Space to challenge its own logic
`<output>`	The final result the user sees

The tags keep each cognitive task separate. Without them, reasoning bleeds into the final answer.

The reasoning shift: from standard LLMs to reasoning models

We are transitioning from standard LLMs to "reasoning models" (like the OpenAI o1 series or Claude's extended thinking).

What changed:

In the past, we manually prompted for CoT. Now, the model performs "reinforcement learning" (RL) to learn how to think. These models are trained to prioritize reasoning tokens before output tokens.

Does this mean manual prompting is dead?

No. Even with a reasoning model, you still need to define the parameters of the reasoning. If you do not tell a reasoning model to look for market risks, it might spend its thinking time on grammar.

Your job as a prompt engineer is now to direct the focus of the reasoning, not just trigger it.

Systematic evaluation: the Stanford approach

As your prompts become architectures, you cannot simply look at the result and say "looks good." You need a system.

LLM as a judge:

Stanford researchers suggest using a "judge model" to evaluate your architectures.

The workflow:

Run your complex prompt 10 times
Give those 10 outputs to a different model (the judge)
Provide a rubric: "Score these on a scale of 1-5 for logic, clarity, and safety"

The reflection loop:

An "agentic" system does not just judge; it reflects. If the judge model gives a score of 2/5, the system automatically feeds that score back into the original prompt and asks for a rewrite.

This is the bridge to Lesson 4 (Agentic AI), where we build systems that run in loops until they reach a certain quality threshold.

Summary: four levels of cognitive complexity

To master the thinking of the machine, remember these four levels:

Level 1: "Zero-shot": immediate answer, high risk
Level 2: "Chain of thought": linear logic, think step-by-step
Level 3: "Tree of thoughts": branching logic, exploring multiple paths
Level 4: "Adversarial loops": creator vs. critic, self-correction

Goal	Recommended architecture	Key instruction
Simple extraction	Standard T-C-R-E-I	"Extract X from this text."
Logic or math problem	Chain of thought (CoT)	"Think through this step-by-step."
Strategic decision	Tree of thoughts (ToT)	"Explore 3 paths and pick the best one."
High-stakes writing	Adversarial validation	"Roast this draft then rewrite it."

The logic ladder: putting it all together

Here is the difference between a novice prompt and an expert cognitive architecture.

The task: Design a 12-month marketing plan for a new eco-friendly water bottle.

The direct prompt (novice):

"Write a 12-month marketing plan for my water bottle."

Result: a generic list of "January: Social Media, February: Email..."

The cognitive architecture prompt (expert):

Phase 1 (thinking): Use `<market_analysis>` tags to identify 3 target demographics
Phase 2 (branching): Generate 2 strategies: "Aggressive Influencer Growth" and "Organic Community Building"
Phase 3 (critique): Act as a cynical CFO and find 3 reasons why these strategies will waste money
Phase 4 (output): Combine the best parts into a final, bulleted 12-month timeline

The AI is not a mind; it is a mirror. If your instructions are shallow, the logic will be shallow. Build a deep architecture, and the AI will provide deep results.

Listen

Audio version of this lesson

PODCASTAUDIO

READY

Cognitive architectures: the science of systematic reasoning

00:00 / 21:4644.1 kHz

Think

What would you do in these scenarios?

Simulator

1 / 5

Sim_v4.0.exe

The budget that doesn't add up

A CFO asks AI to split a $500K budget across 6 departments. The output is a clean table, but the rows sum to $570K, and Q3 contradicts Q1. Why are the numbers inconsistent?

Practice

Test yourself and review key terms

Knowledge check

Q1/10

Why do AI models often fail at complex tasks like multi-step math or strategic planning?

Concepts

Question

What are the three roles used in a battle of the bots workflow?

Show answer

Answer

The creator, the adversary, and the synthesizer.

1 / 25

Apply

Your action steps for today

01
The CoT test
Take a task where AI gave you a wrong or shallow answer. Re-run it with "think step-by-step before answering." Compare the two outputs side by side.
02
The roast prompt
Write a draft with AI, then paste it back with the instruction: "You are a skeptical auditor. Find 5 flaws in this plan." Rewrite the draft using the critique.
03
The tag experiment
Structure your next complex prompt using `<task>`, `<instructions>`, and `<output>` tags. Check whether the final answer stays cleaner than an unstructured prompt.

Finish

You made it through this lesson

Thank you!

Your feedback helps us improve. We appreciate the time you took to share your thoughts!

What's next

Lesson 2 of 5 complete

Continue to next lesson →

← Previous All lessons

Note

Some examples and details may be simplified to better convey the core idea. Every business is different — adapt these ideas to your specific context and situation.

CORE MBA

Watch the video

Cognitive architectures: the science of systematic reasoning

Watch

Read

Why this matters?

The token trap: why models fail at logic

Chain of thought: creating compute space

Beyond the line: tree of thoughts

Adversarial validation: the battle of the bots

Structuring the internal dialogue: using tags

The reasoning shift: from standard LLMs to reasoning models

Systematic evaluation: the Stanford approach

Summary: four levels of cognitive complexity

The logic ladder: putting it all together

Listen

Think

Simulator

The budget that doesn't add up

Practice

Knowledge check

Why do AI models often fail at complex tasks like multi-step math or strategic planning?

Concepts

What are the three roles used in a battle of the bots workflow?

Apply

Finish

How was this lesson?

Thank you!

What's next