Role

Viewing as:Developer

LLM Fundamentals for Agentic Systems

Core concepts of large language models that underpin agentic AI: how they work, what they can do, and their limitations.

Learning Objectives

•Understand how LLMs process and generate text
•Explain the role of context windows and token limits
•Describe function calling and tool use as the bridge to agentic behavior
•Recognize LLM limitations that agentic patterns address

From Introduction to LLM Fundamentals

Basic

From Introduction to LLM Fundamentals

In the introduction, you learned what makes AI "agentic" — the ability to observe, think, and act autonomously. But what powers that intelligence? The answer is Large Language Models (LLMs).

Before we can build agents, we need to understand the engine that drives them. This section breaks down how LLMs work, what they can do, and where their limits lie — all without requiring a PhD.

What Is a Large Language Model?

Basic

What Is a Large Language Model?

A Large Language Model (LLM) is a type of artificial intelligence built on a neural network that has been trained on enormous amounts of text. By learning patterns across billions of sentences, an LLM can generate coherent, contextually appropriate language — answering questions, summarizing documents, writing code, and much more.

Examples You May Already Know

ChatGPT is powered by OpenAI's GPT-4 family of models
Claude is built by Anthropic (and is the model powering many of the examples in this training)
Gemini is Google DeepMind's large language model

What LLMs Are NOT

It is just as important to understand what LLMs are not:

Not a database — they do not store or retrieve facts from a structured source; they generate text based on learned patterns
Not a search engine — they do not browse the web in real time (unless given tools to do so)
Not sentient — they have no awareness, feelings, or understanding; they predict the next most likely word in a sequence

Understanding these boundaries is essential. The agentic patterns in this training exist precisely to compensate for what LLMs cannot do on their own — like accessing live data, taking actions, and verifying facts.

How LLMs Work (Without the PhD)

Basic

How LLMs Work (Without the PhD)

Large Language Models are at the core of every agentic AI system. Here's what you need to know about how they work.

The Core Idea

An LLM is a neural network trained on vast amounts of text. Given a sequence of words, it predicts what comes next — but it does this so well that it can generate coherent reasoning, follow instructions, and even write code.

Key insight: LLMs don't "understand" in the way humans do. They've learned incredibly sophisticated patterns from training data that allow them to produce useful, contextually appropriate responses.

What Happens When You Send a Message

Tokenization: Your text is broken into tokens (roughly word-pieces). "Agentic AI is fascinating" becomes ~4-5 tokens.
Processing: Each token passes through many layers of the neural network, building up a representation of the full context.
Generation: The model produces one token at a time, each choice influenced by everything that came before.

Why This Matters for Agentic AI

Understanding this foundation helps explain both the capabilities and limitations of agentic systems:

Capabilities: LLMs can follow complex instructions, reason through problems, and generate structured outputs (like tool calls)
Limitations: They process text sequentially, have fixed context windows, and can "hallucinate" — generating plausible but incorrect information

The agentic patterns you'll learn in this training are designed to harness the capabilities while mitigating the limitations.

How a Prompt Becomes a Response

Basic

~3 min

Context Windows: The LLM's Working Memory

Basic

Context Windows: The LLM's Working Memory

The context window is the total amount of text an LLM can "see" at once — including both your input and its response. Think of it as the model's working memory.

Current Context Window Sizes

Context windows have grown dramatically:

Model	Context Window	Equivalent
Claude Opus 4.6	200K tokens	~150K words (~300 pages)
Gemini 2.5 Pro	1M tokens	~750K words (~1,500 pages)
GPT-4o	128K tokens	~96K words (~190 pages)

Why Context Windows Matter for Agents

For agentic systems, the context window is critical because it determines:

How much information an agent can consider when making decisions
How many tool results can be accumulated during a multi-step task
How long a conversation an agent can maintain coherently

The Challenge: Context Isn't Free

Larger context windows come with tradeoffs:

Cost: Most APIs charge per token — more context means higher costs
Speed: Processing more context takes more time
Attention degradation: Models can struggle to find relevant information in very long contexts (the "lost in the middle" problem)

How Agentic Patterns Help

Agentic systems use several strategies to manage context effectively:

RAG (Retrieval-Augmented Generation): Only fetch relevant information when needed
Summarization: Compress long histories into concise summaries
Tool delegation: Let specialized tools handle data-heavy operations

The Economics of LLM Usage

Basic

The Economics of LLM Usage

Understanding token pricing isn't just a developer concern — it's essential for anyone building or managing agent systems. Costs can surprise you if you don't understand how they accumulate.

Token Pricing Basics

LLM providers charge per token, with a critical asymmetry: output tokens cost more than input tokens — typically 3–5x more. This matters because agents generate substantial output (reasoning traces, tool calls, responses) on every loop iteration.

Component	Typical Cost (Frontier Model)
Input tokens	$10–15 per million tokens
Output tokens	$30–75 per million tokens

How Context Window Usage Affects Cost

Every API call sends the full conversation context. As the conversation grows, you're paying for all previous messages again on each call. A 10-turn agent loop with growing context doesn't cost 10x — it can cost significantly more because each turn includes all prior context.

The Critical Insight for Agents: Loops Multiply Costs

This is the most important cost concept for agentic systems:

Each agent loop iteration is a separate API call. An agent that takes 10 steps to complete a task makes 10 API calls, each carrying the full (and growing) context.

A simple calculation: if a single LLM call costs $0.05, a 10-step agent loop doesn't cost $0.50 — it costs more like $0.80–$1.50 as context accumulates with each step.

Cost Management Tips

Set token budgets per request — Cap how many tokens an agent can consume before stopping or escalating
Use cheaper models for simple steps — Route classification and simple decisions to smaller models, reserve frontier models for complex reasoning
Minimize context accumulation — Summarize earlier steps rather than carrying full history
Monitor and alert — Track cost-per-task, not just total spend. A runaway agent loop can burn through budget surprisingly fast
Cache when possible — If the same tool call produces the same result, cache it rather than re-calling the API

LLM Limitations That Agentic Patterns Solve

Basic

LLM Limitations That Agentic Patterns Solve

Understanding LLM limitations is essential for designing effective agentic systems. Each limitation has a corresponding agentic pattern that addresses it.

1. No Persistent Memory

Limitation: LLMs start fresh with each conversation. They don't remember previous interactions.

Agentic solution: External memory systems — vector databases, conversation stores, and knowledge graphs that agents can read from and write to.

2. Knowledge Cutoff

Limitation: Training data has a cutoff date. The model doesn't know about recent events.

Agentic solution: Web search tools and RAG systems that let agents retrieve current information on demand.

3. Hallucination

Limitation: LLMs can generate confident but incorrect information.

Agentic solution: Tool-grounded responses where agents verify claims against real data sources, and self-checking patterns where agents review their own outputs.

4. Single-Turn Reasoning

Limitation: Complex problems often can't be solved in a single response.

Agentic solution: Multi-step planning and execution loops where agents break problems into subtasks and tackle them iteratively.

5. No Direct Action

Limitation: LLMs can only generate text — they can't interact with the world.

Agentic solution: Tool use / function calling that lets agents read files, query APIs, run code, and take real-world actions.

The Pattern

Every major agentic AI pattern exists to solve a specific LLM limitation. As you progress through this training, you'll see how each pattern maps to the limitations listed here.

Section Recap

Basic

Key Takeaways

Before you move on, here's what to remember from this section:

Tokens are the fundamental unit — LLMs read, think, and generate one token at a time
Context windows define the LLM's working memory — everything must fit within this limit
Temperature controls randomness — lower for reliability, higher for creativity
Function calling is the bridge from chat to action — the LLM decides when and how to use tools
LLMs have real limitations — no persistent memory, no real-time data, and potential for hallucination
Model selection matters — match model capability and cost to your task's actual requirements

Check Your Understanding: LLM Fundamentals

Basic

Test Your Knowledge

5 questions selected from a pool based on your difficulty level. Retry for different questions.

~5 min

Viewing as:Developer

LLM Fundamentals for Agentic Systems

Core concepts of large language models that underpin agentic AI: how they work, what they can do, and their limitations.

Learning Objectives

•Understand how LLMs process and generate text
•Explain the role of context windows and token limits
•Describe function calling and tool use as the bridge to agentic behavior
•Recognize LLM limitations that agentic patterns address

From Introduction to LLM Fundamentals

Basic

From Introduction to LLM Fundamentals

In the introduction, you learned what makes AI "agentic" — the ability to observe, think, and act autonomously. But what powers that intelligence? The answer is Large Language Models (LLMs).

Before we can build agents, we need to understand the engine that drives them. This section breaks down how LLMs work, what they can do, and where their limits lie — all without requiring a PhD.

What Is a Large Language Model?

Basic

What Is a Large Language Model?

Examples You May Already Know

ChatGPT is powered by OpenAI's GPT-4 family of models
Claude is built by Anthropic (and is the model powering many of the examples in this training)
Gemini is Google DeepMind's large language model

What LLMs Are NOT

It is just as important to understand what LLMs are not:

Not a database — they do not store or retrieve facts from a structured source; they generate text based on learned patterns
Not a search engine — they do not browse the web in real time (unless given tools to do so)
Not sentient — they have no awareness, feelings, or understanding; they predict the next most likely word in a sequence

How LLMs Work (Without the PhD)

Basic

How LLMs Work (Without the PhD)

Large Language Models are at the core of every agentic AI system. Here's what you need to know about how they work.

The Core Idea

What Happens When You Send a Message

Tokenization: Your text is broken into tokens (roughly word-pieces). "Agentic AI is fascinating" becomes ~4-5 tokens.
Processing: Each token passes through many layers of the neural network, building up a representation of the full context.
Generation: The model produces one token at a time, each choice influenced by everything that came before.

Why This Matters for Agentic AI

Understanding this foundation helps explain both the capabilities and limitations of agentic systems:

Capabilities: LLMs can follow complex instructions, reason through problems, and generate structured outputs (like tool calls)
Limitations: They process text sequentially, have fixed context windows, and can "hallucinate" — generating plausible but incorrect information

The agentic patterns you'll learn in this training are designed to harness the capabilities while mitigating the limitations.

How a Prompt Becomes a Response

Basic

How a Prompt Becomes a Response

Peek inside the black box — follow your prompt through the four stages of LLM processing.

1 / 4

Tokenization

Your text is split into tokens — small pieces that the model can process. A token might be a whole word ('hello'), part of a word ('un' + 'likely'), or punctuation. The average English word is about 1.3 tokens. This is why API pricing and context limits are measured in tokens, not words.

// 'Hello, how are you?' → 5 tokens
// ['Hello', ',', ' how', ' are', ' you', '?']
tokenize('Hello, how are you?')
// → [15339, 11, 1268, 527, 499, 30]

~3 min

Context Windows: The LLM's Working Memory

Basic

Context Windows: The LLM's Working Memory

The context window is the total amount of text an LLM can "see" at once — including both your input and its response. Think of it as the model's working memory.

Current Context Window Sizes

Context windows have grown dramatically:

Model	Context Window	Equivalent
Claude Opus 4.6	200K tokens	~150K words (~300 pages)
Gemini 2.5 Pro	1M tokens	~750K words (~1,500 pages)
GPT-4o	128K tokens	~96K words (~190 pages)

Why Context Windows Matter for Agents

For agentic systems, the context window is critical because it determines:

How much information an agent can consider when making decisions
How many tool results can be accumulated during a multi-step task
How long a conversation an agent can maintain coherently

The Challenge: Context Isn't Free

Larger context windows come with tradeoffs:

Cost: Most APIs charge per token — more context means higher costs
Speed: Processing more context takes more time
Attention degradation: Models can struggle to find relevant information in very long contexts (the "lost in the middle" problem)

How Agentic Patterns Help

Agentic systems use several strategies to manage context effectively:

RAG (Retrieval-Augmented Generation): Only fetch relevant information when needed
Summarization: Compress long histories into concise summaries
Tool delegation: Let specialized tools handle data-heavy operations

The Economics of LLM Usage

Basic

The Economics of LLM Usage

Understanding token pricing isn't just a developer concern — it's essential for anyone building or managing agent systems. Costs can surprise you if you don't understand how they accumulate.

Token Pricing Basics

Component	Typical Cost (Frontier Model)
Input tokens	$10–15 per million tokens
Output tokens	$30–75 per million tokens

How Context Window Usage Affects Cost

The Critical Insight for Agents: Loops Multiply Costs

This is the most important cost concept for agentic systems:

Each agent loop iteration is a separate API call. An agent that takes 10 steps to complete a task makes 10 API calls, each carrying the full (and growing) context.

A simple calculation: if a single LLM call costs $0.05, a 10-step agent loop doesn't cost $0.50 — it costs more like $0.80–$1.50 as context accumulates with each step.

Cost Management Tips

Set token budgets per request — Cap how many tokens an agent can consume before stopping or escalating
Use cheaper models for simple steps — Route classification and simple decisions to smaller models, reserve frontier models for complex reasoning
Minimize context accumulation — Summarize earlier steps rather than carrying full history
Monitor and alert — Track cost-per-task, not just total spend. A runaway agent loop can burn through budget surprisingly fast
Cache when possible — If the same tool call produces the same result, cache it rather than re-calling the API

LLM Limitations That Agentic Patterns Solve

Basic

LLM Limitations That Agentic Patterns Solve

Understanding LLM limitations is essential for designing effective agentic systems. Each limitation has a corresponding agentic pattern that addresses it.

1. No Persistent Memory

Limitation: LLMs start fresh with each conversation. They don't remember previous interactions.

Agentic solution: External memory systems — vector databases, conversation stores, and knowledge graphs that agents can read from and write to.

2. Knowledge Cutoff

Limitation: Training data has a cutoff date. The model doesn't know about recent events.

Agentic solution: Web search tools and RAG systems that let agents retrieve current information on demand.

3. Hallucination

Limitation: LLMs can generate confident but incorrect information.

Agentic solution: Tool-grounded responses where agents verify claims against real data sources, and self-checking patterns where agents review their own outputs.

4. Single-Turn Reasoning

Limitation: Complex problems often can't be solved in a single response.

Agentic solution: Multi-step planning and execution loops where agents break problems into subtasks and tackle them iteratively.

5. No Direct Action

Limitation: LLMs can only generate text — they can't interact with the world.

Agentic solution: Tool use / function calling that lets agents read files, query APIs, run code, and take real-world actions.

The Pattern

Every major agentic AI pattern exists to solve a specific LLM limitation. As you progress through this training, you'll see how each pattern maps to the limitations listed here.

Section Recap

Basic

Key Takeaways

Before you move on, here's what to remember from this section:

Tokens are the fundamental unit — LLMs read, think, and generate one token at a time
Context windows define the LLM's working memory — everything must fit within this limit
Temperature controls randomness — lower for reliability, higher for creativity
Function calling is the bridge from chat to action — the LLM decides when and how to use tools
LLMs have real limitations — no persistent memory, no real-time data, and potential for hallucination
Model selection matters — match model capability and cost to your task's actual requirements

Check Your Understanding: LLM Fundamentals

Basic

Test Your Knowledge

5 questions selected from a pool based on your difficulty level. Retry for different questions.

Check Your Understanding: LLM Fundamentals

4B / 1I1/5

What does 'temperature' control in LLM generation?

~5 min