Quick tour

Watch the video

Each lesson starts with a short video. It covers the same material as the text — just pick the format you prefer. You can skip it and read instead.

Cognitive architectures: the science of systematic reasoning

Even with perfect context, AI fails at complex tasks — multi-step math, strategic planning, nuanced critique. The reason: it tries to calculate the final answer in one pass. Cognitive architectures fix this by structuring how the model thinks before it answers.

Lesson 2/5AI~45 min read

Watch

Video version of this lesson


Read

Full lesson text

Why this matters?

In Lesson 1, we established that large language models are statistical prediction engines. We learned that context and a persona narrow the "probability space" to get better results.

But even with perfect context, models often fail at complex tasks. Multi-step math. Strategic planning. Nuanced critique.

Why? Because the model tries to calculate the final answer in a single pass.

95% of all practical problems folks encounter can be solved by turning on extended thinking. — Ethan Mollick, Wharton University

This lesson is about "cognitive architectures." The art of structuring the model's internal dialogue to expand its computational space. We will move from linear thought to branching logic — and finally to adversarial systems where AI critiques itself.


The token trap: why models fail at logic

To understand why we need cognitive architectures, look at the "token trap."

The linear prediction problem:

An LLM generates text one "token" (word or character) at a time. When you ask: "What is the 17th prime number?" — the model tries to predict the answer immediately.

If it has not memorized that specific fact as a single statistical association, it guesses. And because it is a helpful assistant, it predicts a number that looks like a prime number. Even if it is wrong.

The human parallel:

Imagine someone asks you to multiply 457 × 892 in your head in exactly one second. You will fail or give a vague estimate.

But give you a pen, a piece of paper, and one minute — and you can chain your thoughts together. Small, easy calculations that build toward the complex final answer.

Cognitive architecture is the pen and paper for the AI.

Chain of thought: creating compute space

The most fundamental cognitive architecture is "chain of thought" (CoT). You instruct the model to think step-by-step before providing the final answer.

Why CoT works (the technical reason):

When a model "thinks" out loud, it generates intermediate tokens. Each new token becomes part of its own context.

  • In a direct answer, the model has 1 token of thinking space
  • In a CoT prompt, the model might generate 500 tokens of reasoning

Each of those 500 tokens allows the model to perform "attention" (the primary mechanism of LLMs) on its previous logic. It builds a ladder. Each rung is a simplified logical step that makes the next rung easier to reach.

The "extended thinking" revolution:

Modern models now have "extended thinking" or "thinking blocks" built-in.

  • Manual CoT: You write "think step-by-step"
  • Systemic CoT: The model is trained to automatically open a scratchpad where it verifies its logic before showing you the result
The difference between manual and systemic CoT: you used to ask the model to think. Now it is trained to think on its own.
FeatureDirect promptingChain of thought (CoT)
Logic processingCalculated in one passBroken into sequential sub-tasks
AccuracyLower for multi-step problemsSignificantly higher for logic and math
TransparencyBlack box — you only see the answerGlass box — you see the "show your work" steps
Computational costLow (fewer tokens)Higher (more tokens generated)

Beyond the line: tree of thoughts

If "chain of thought" is a linear ladder, "tree of thoughts" (ToT) is a search through a maze.

The problem with linear logic:

Sometimes the first logical path a model takes is a dead end. In a standard CoT prompt, if the model makes a mistake in step 2, it is statistically forced to continue that mistake through step 10. It becomes a victim of its own previous tokens.

The ToT solution — branching and evaluation:

"Tree of thoughts" allows the model to explore multiple branches of reasoning simultaneously. This mimics how a human expert or chess player thinks: "If I do A, then B happens. But what if I do C?"

The 3-step ToT framework:

  1. Brainstorming (the branches): Ask the model to generate three distinct strategies for a complex problem
  2. Evaluation (the pruning): Instruct the model to act as a critic, identifying the flaws and risks in each of the three strategies
  3. Synthesis (the golden path): The model takes the surviving logic from the best branches and combines them into the final solution
Tree of thoughts introduces self-correction. The model identifies dead ends before it commits to the final answer.

Adversarial validation: the battle of the bots

Here is a profound insight: AI is often better at critiquing than at original writing.

In Lesson 1, we talked about the model as a prediction engine. Now we use that engine to "roast" its own output. This is called "adversarial validation" — or the playoff method.

The psychology of the roast:

When a model writes a draft, it tends to be polite and agreeable. But assign it the persona of a brutal critic or an angry customer — and it accesses a different part of its training data. One focused on finding flaws, errors, and inconsistencies.

Implementing the "battle of the bots" workflow:

To create a high-stakes deliverable (like a legal contract or a marketing strategy), use this multi-persona architecture:

  1. The creator: Write the initial draft using a senior persona
  2. The adversary: "You are a skeptical auditor who wants to find every reason why this plan will fail. Be brutal. Identify 5 logic gaps."
  3. The synthesizer: "Read the draft and the critique. Rewrite the draft to address every single concern raised by the auditor."
By forcing the model to occupy two conflicting perspectives, you ensure the final output has been stress-tested.

Structuring the internal dialogue: using tags

To manage these complex architectures, use XML-style tags (e.g., `<thinking>`, `<draft>`, `<critique>`).

The power of tags:

Models like Claude are specifically trained to understand structure within tags. Tags act as compartments for different cognitive tasks. They prevent the model's internal thought from leaking into the final answer.

The professional prompt structure:

TagPurpose
`<task>`What you need done
`<instructions>`Rules for the model
`<reasoning>`Scratchpad for thinking
`<critique>`Space to challenge its own logic
`<output>`The final result the user sees

The tags keep each cognitive task separate. Without them, reasoning bleeds into the final answer.


The reasoning shift: from standard LLMs to reasoning models

We are transitioning from standard LLMs to "reasoning models" (like the OpenAI o1 series or Claude's extended thinking).

What changed:

In the past, we manually prompted for CoT. Now, the model performs "reinforcement learning" (RL) to learn how to think. These models are trained to prioritize reasoning tokens before output tokens.

Does this mean manual prompting is dead?

No. Even with a reasoning model, you still need to define the parameters of the reasoning. If you do not tell a reasoning model to look for market risks, it might spend its thinking time on grammar.

Your job as a prompt engineer is now to direct the focus of the reasoning, not just trigger it.

Systematic evaluation: the Stanford approach

As your prompts become architectures, you cannot simply look at the result and say "looks good." You need a system.

LLM as a judge:

Stanford researchers suggest using a "judge model" to evaluate your architectures.

The workflow:

  1. Run your complex prompt 10 times
  2. Give those 10 outputs to a different model (the judge)
  3. Provide a rubric: "Score these on a scale of 1-5 for logic, clarity, and safety"

The reflection loop:

An "agentic" system does not just judge — it reflects. If the judge model gives a score of 2/5, the system automatically feeds that score back into the original prompt and asks for a rewrite.

This is the bridge to Lesson 4 (Agentic AI), where we build systems that run in loops until they reach a certain quality threshold.


Summary: four levels of cognitive complexity

To master the thinking of the machine, remember these four levels:

  • Level 1: "Zero-shot" — immediate answer, high risk
  • Level 2: "Chain of thought" — linear logic, think step-by-step
  • Level 3: "Tree of thoughts" — branching logic, exploring multiple paths
  • Level 4: "Adversarial loops" — creator vs. critic, self-correction
GoalRecommended architectureKey instruction
Simple extractionStandard T-C-R-E-I"Extract X from this text."
Logic or math problemChain of thought (CoT)"Think through this step-by-step."
Strategic decisionTree of thoughts (ToT)"Explore 3 paths and pick the best one."
High-stakes writingAdversarial validation"Roast this draft then rewrite it."

The logic ladder: putting it all together

Here is the difference between a novice prompt and an expert cognitive architecture.

The task: Design a 12-month marketing plan for a new eco-friendly water bottle.

The direct prompt (novice):

"Write a 12-month marketing plan for my water bottle."

Result: a generic list of "January: Social Media, February: Email..."

The cognitive architecture prompt (expert):

  1. Phase 1 (thinking): Use `<market_analysis>` tags to identify 3 target demographics
  2. Phase 2 (branching): Generate 2 strategies — "Aggressive Influencer Growth" and "Organic Community Building"
  3. Phase 3 (critique): Act as a cynical CFO and find 3 reasons why these strategies will waste money
  4. Phase 4 (output): Combine the best parts into a final, bulleted 12-month timeline
The AI is not a mind — it is a mirror. If your instructions are shallow, the logic will be shallow. Build a deep architecture, and the AI will provide deep results.

Listen

Audio version of this lesson

PODCASTAUDIO

READY

Cognitive architectures: the science of systematic reasoning

00:00 / 21:46

Think

What would you do in these scenarios?

Simulator

1 / 5
Sim_v4.0.exe

The budget that doesn't add up

A CFO asks AI to split a $500K budget across 6 departments. The output is a clean table — but the rows sum to $570K, and Q3 contradicts Q1. Why are the numbers inconsistent?


Practice

Test yourself and review key terms

Knowledge check

Q1/10

Why do AI models often fail at complex tasks like multi-step math or strategic planning?

Concepts

Question

What is adversarial validation?

Show answer

Answer

A multi-persona architecture where one AI persona creates a draft and another persona critiques it to find flaws.

1 / 25

Apply

Your action steps for today

  1. 01

    The CoT test

    Take a task where AI gave you a wrong or shallow answer. Re-run it with "think step-by-step before answering." Compare the two outputs side by side.

  2. 02

    The roast prompt

    Write a draft with AI, then paste it back with the instruction: "You are a skeptical auditor. Find 5 flaws in this plan." Rewrite the draft using the critique.

  3. 03

    The tag experiment

    Structure your next complex prompt using `<task>`, `<instructions>`, and `<output>` tags. Check whether the final answer stays cleaner than an unstructured prompt.

Finish

You made it through this lesson

How was this lesson?

Thank you!

Your feedback helps us improve. We appreciate the time you took to share your thoughts!

Note

Some examples and details may be simplified to better convey the core idea. Every business is different — adapt these ideas to your specific context and situation.