LLM Fundamentals for Agentic Systems
Core concepts of large language models that underpin agentic AI: how they work, what they can do, and their limitations.
Learning Objectives
- •Understand how LLMs process and generate text
- •Explain the role of context windows and token limits
- •Describe function calling and tool use as the bridge to agentic behavior
- •Recognize LLM limitations that agentic patterns address
From Introduction to LLM Fundamentals
BasicFrom Introduction to LLM Fundamentals
In the introduction, you learned what makes AI "agentic" — the ability to observe, think, and act autonomously. But what powers that intelligence? The answer is Large Language Models (LLMs).
Before we can build agents, we need to understand the engine that drives them. This section breaks down how LLMs work, what they can do, and where their limits lie — all without requiring a PhD.
What Is a Large Language Model?
BasicWhat Is a Large Language Model?
A Large Language Model (LLM) is a type of artificial intelligence built on a neural network that has been trained on enormous amounts of text. By learning patterns across billions of sentences, an LLM can generate coherent, contextually appropriate language — answering questions, summarizing documents, writing code, and much more.
Examples You May Already Know
- ChatGPT is powered by OpenAI's GPT-4 family of models
- Claude is built by Anthropic (and is the model powering many of the examples in this training)
- Gemini is Google DeepMind's large language model
What LLMs Are NOT
It is just as important to understand what LLMs are not:
- Not a database — they do not store or retrieve facts from a structured source; they generate text based on learned patterns
- Not a search engine — they do not browse the web in real time (unless given tools to do so)
- Not sentient — they have no awareness, feelings, or understanding; they predict the next most likely word in a sequence
Understanding these boundaries is essential. The agentic patterns in this training exist precisely to compensate for what LLMs cannot do on their own — like accessing live data, taking actions, and verifying facts.
How LLMs Work (Without the PhD)
BasicHow LLMs Work (Without the PhD)
Large Language Models are at the core of every agentic AI system. Here's what you need to know about how they work.
The Core Idea
An LLM is a neural network trained on vast amounts of text. Given a sequence of words, it predicts what comes next — but it does this so well that it can generate coherent reasoning, follow instructions, and even write code.
Key insight: LLMs don't "understand" in the way humans do. They've learned incredibly sophisticated patterns from training data that allow them to produce useful, contextually appropriate responses.
What Happens When You Send a Message
- Tokenization: Your text is broken into tokens (roughly word-pieces). "Agentic AI is fascinating" becomes ~4-5 tokens.
- Processing: Each token passes through many layers of the neural network, building up a representation of the full context.
- Generation: The model produces one token at a time, each choice influenced by everything that came before.
Why This Matters for Agentic AI
Understanding this foundation helps explain both the capabilities and limitations of agentic systems:
- Capabilities: LLMs can follow complex instructions, reason through problems, and generate structured outputs (like tool calls)
- Limitations: They process text sequentially, have fixed context windows, and can "hallucinate" — generating plausible but incorrect information
The agentic patterns you'll learn in this training are designed to harness the capabilities while mitigating the limitations.
How a Prompt Becomes a Response
Basic~3 min
Context Windows: The LLM's Working Memory
BasicContext Windows: The LLM's Working Memory
The context window is the total amount of text an LLM can "see" at once — including both your input and its response. Think of it as the model's working memory.
Current Context Window Sizes
Context windows have grown dramatically:
| Model | Context Window | Equivalent |
|---|---|---|
| Claude Opus 4.6 | 200K tokens | ~150K words (~300 pages) |
| Gemini 2.5 Pro | 1M tokens | ~750K words (~1,500 pages) |
| GPT-4o | 128K tokens | ~96K words (~190 pages) |
Why Context Windows Matter for Agents
For agentic systems, the context window is critical because it determines:
- How much information an agent can consider when making decisions
- How many tool results can be accumulated during a multi-step task
- How long a conversation an agent can maintain coherently
The Challenge: Context Isn't Free
Larger context windows come with tradeoffs:
- Cost: Most APIs charge per token — more context means higher costs
- Speed: Processing more context takes more time
- Attention degradation: Models can struggle to find relevant information in very long contexts (the "lost in the middle" problem)
How Agentic Patterns Help
Agentic systems use several strategies to manage context effectively:
- RAG (Retrieval-Augmented Generation): Only fetch relevant information when needed
- Summarization: Compress long histories into concise summaries
- Tool delegation: Let specialized tools handle data-heavy operations
The Economics of LLM Usage
BasicThe Economics of LLM Usage
Understanding token pricing isn't just a developer concern — it's essential for anyone building or managing agent systems. Costs can surprise you if you don't understand how they accumulate.
Token Pricing Basics
LLM providers charge per token, with a critical asymmetry: output tokens cost more than input tokens — typically 3–5x more. This matters because agents generate substantial output (reasoning traces, tool calls, responses) on every loop iteration.
| Component | Typical Cost (Frontier Model) |
|---|---|
| Input tokens | $10–15 per million tokens |
| Output tokens | $30–75 per million tokens |
How Context Window Usage Affects Cost
Every API call sends the full conversation context. As the conversation grows, you're paying for all previous messages again on each call. A 10-turn agent loop with growing context doesn't cost 10x — it can cost significantly more because each turn includes all prior context.
The Critical Insight for Agents: Loops Multiply Costs
This is the most important cost concept for agentic systems:
Each agent loop iteration is a separate API call. An agent that takes 10 steps to complete a task makes 10 API calls, each carrying the full (and growing) context.
A simple calculation: if a single LLM call costs $0.05, a 10-step agent loop doesn't cost $0.50 — it costs more like $0.80–$1.50 as context accumulates with each step.
Cost Management Tips
- Set token budgets per request — Cap how many tokens an agent can consume before stopping or escalating
- Use cheaper models for simple steps — Route classification and simple decisions to smaller models, reserve frontier models for complex reasoning
- Minimize context accumulation — Summarize earlier steps rather than carrying full history
- Monitor and alert — Track cost-per-task, not just total spend. A runaway agent loop can burn through budget surprisingly fast
- Cache when possible — If the same tool call produces the same result, cache it rather than re-calling the API
LLM Limitations That Agentic Patterns Solve
BasicLLM Limitations That Agentic Patterns Solve
Understanding LLM limitations is essential for designing effective agentic systems. Each limitation has a corresponding agentic pattern that addresses it.
1. No Persistent Memory
Limitation: LLMs start fresh with each conversation. They don't remember previous interactions.
Agentic solution: External memory systems — vector databases, conversation stores, and knowledge graphs that agents can read from and write to.
2. Knowledge Cutoff
Limitation: Training data has a cutoff date. The model doesn't know about recent events.
Agentic solution: Web search tools and RAG systems that let agents retrieve current information on demand.
3. Hallucination
Limitation: LLMs can generate confident but incorrect information.
Agentic solution: Tool-grounded responses where agents verify claims against real data sources, and self-checking patterns where agents review their own outputs.
4. Single-Turn Reasoning
Limitation: Complex problems often can't be solved in a single response.
Agentic solution: Multi-step planning and execution loops where agents break problems into subtasks and tackle them iteratively.
5. No Direct Action
Limitation: LLMs can only generate text — they can't interact with the world.
Agentic solution: Tool use / function calling that lets agents read files, query APIs, run code, and take real-world actions.
The Pattern
Every major agentic AI pattern exists to solve a specific LLM limitation. As you progress through this training, you'll see how each pattern maps to the limitations listed here.
Section Recap
BasicKey Takeaways
Before you move on, here's what to remember from this section:
- Tokens are the fundamental unit — LLMs read, think, and generate one token at a time
- Context windows define the LLM's working memory — everything must fit within this limit
- Temperature controls randomness — lower for reliability, higher for creativity
- Function calling is the bridge from chat to action — the LLM decides when and how to use tools
- LLMs have real limitations — no persistent memory, no real-time data, and potential for hallucination
- Model selection matters — match model capability and cost to your task's actual requirements
Check Your Understanding: LLM Fundamentals
BasicTest Your Knowledge
5 questions selected from a pool based on your difficulty level. Retry for different questions.
~5 min