Lesson 1 — The Agent Loop
Lesson 1 — The Agent Loop
Building Agentic Systems · 1 of 7
Welcome. Across seven lessons we are going to build a small, honest agent framework called agentkit — provider-agnostic, dependency-free, and runnable end-to-end on your laptop with no API key. Today we pour the foundation: the loop everything else extends.
Course map: 1 foundation · 2 tools · 3–6 memory, planning, sub-agents, guardrails (4 introduces real budgets) · 7 real provider.
What is an agent, really?
Strip away the marketing and an LLM agent is one Python loop:
- Send the conversation so far to a model.
- The model replies with either plain text (we're done) or a tool call (do this thing).
- If it asked for a tool, run the tool, append the result, go back to step 1.
That's it. Memory, planning, sub-agents, guardrails — every "advanced" agent feature in the next six lessons is a knob bolted onto this loop. So we are going to write it from scratch, in about two dozen lines including the signature, and we are going to drive it with a fake model so the whole thing runs deterministically and for free.
Design rule: never hardcode a vendor
The agent loop must not know whether it is talking to OpenAI, Anthropic, an Ollama-hosted Llama 3, or our MockLLM. So before we touch the loop, we define a protocol — one method — and have every model implementation satisfy it. Same loop, swappable brain. We'll cash this in during lesson 7 when we plug in a real provider without changing a single line of loop.py.
Package layout
We build everything inside agentkit/. After this lesson the tree looks like:
agentkit/
├── __init__.py # re-exports the public surface
├── types.py # Message, ToolCall, Role
├── llm.py # the LLM Protocol
├── tools.py # Tool + ToolRegistry
├── loop.py # run_agent — the loop
└── providers/
├── __init__.py
└── mock.py # MockLLM — scripted, deterministic, zero-cost
examples/
└── lesson1_hello.py # the runnable demo
Step 1 — The shared types (agentkit/types.py)
The model, the loop, and every provider need to agree on what a "message" looks like. We use plain dataclasses — no Pydantic, no validators, just data:
from dataclasses import dataclass, field
from typing import Any, Literal
Role = Literal["system", "user", "assistant", "tool"]
@dataclass
class ToolCall:
id: str # provider-supplied; we echo it back
name: str
arguments: dict[str, Any]
@dataclass
class Message:
role: Role
content: str = ""
tool_calls: list[ToolCall] = field(default_factory=list)
tool_call_id: str | None = None # set on role="tool" messages
Four roles, two structural rules:
- An assistant message either carries
content, ortool_calls, or both. - A tool message carries the result of one call and points back at it with
tool_call_id.
That tool_call_id matters: when the model issues several calls in one turn (lesson 2 will hit this), the loop runs them all and the model needs to know which result belongs to which request.
Step 2 — The provider contract (agentkit/llm.py)
One method. That's the whole interface every model in this course will implement.
from typing import Any, Protocol, runtime_checkable
from .types import Message
@runtime_checkable
class LLM(Protocol):
def complete(
self,
messages: list[Message],
tools: list[dict[str, Any]],
) -> Message: ...
Each spec dict has keys name, description, and parameters (JSON-Schema); pass [] when no tools are registered.
A Python Protocol is structural — anything with a matching complete method satisfies it; no inheritance required. @runtime_checkable lets us write isinstance(x, LLM) to verify it (we'll do that in a moment).
The return contract is the loop's contract too:
- If the returned
Messagehas emptytool_calls, the loop stops and returnscontentas the final answer. - If it has any
tool_calls, the loop runs them, appends results, and asks the model again.
Step 3 — Tools (agentkit/tools.py)
A tool is just a Python function plus enough metadata to describe it to a model. The registry handles dispatch and — crucially — never lets a tool exception kill the loop:
from dataclasses import dataclass
from typing import Any, Callable
@dataclass
class Tool:
name: str
description: str
parameters: dict[str, Any] # JSON-schema-style
fn: Callable[..., Any]
def spec(self) -> dict[str, Any]:
return {
"name": self.name,
"description": self.description,
"parameters": self.parameters,
}
class ToolRegistry:
def __init__(self, tools: list[Tool]):
self._tools: dict[str, Tool] = {t.name: t for t in tools}
def specs(self) -> list[dict[str, Any]]:
return [t.spec() for t in self._tools.values()]
def invoke(self, name: str, arguments: dict[str, Any]) -> str:
if name not in self._tools:
return f"error: unknown tool {name!r}"
try:
result = self._tools[name].fn(**arguments)
except Exception as e:
return f"error: {type(e).__name__}: {e}"
return str(result)
Errors come back to the model as a string. The model gets to see what broke and decide whether to recover — exactly the behavior we want from a tool-using agent.
Note: we don't validate arguments against the parameters schema here — a malformed call from the model will raise inside fn(**arguments) and surface as a generic error string. Schema-driven validation is deliberately deferred (treat it as an exercise for now).
Step 4 — The loop (agentkit/loop.py)
This is the heart of lesson 1. Read it carefully — almost everything in the next six lessons will be expressed as something layered onto these few lines.
from dataclasses import dataclass
from typing import Callable, Optional
from .llm import LLM
from .tools import ToolRegistry
from .types import Message
@dataclass
class RunResult:
answer: str
messages: list[Message]
turns: int
def run_agent(
goal: str,
llm: LLM,
tools: ToolRegistry,
system: str | None = None,
max_turns: int = 10,
on_event: Optional[Callable[[str, object], None]] = None,
) -> RunResult:
emit = on_event or (lambda *_: None)
messages: list[Message] = []
if system:
messages.append(Message(role="system", content=system))
messages.append(Message(role="user", content=goal))
emit("user", goal)
specs = tools.specs()
for turn in range(1, max_turns + 1):
reply = llm.complete(messages, specs)
messages.append(reply)
emit("assistant", reply)
if not reply.tool_calls: # (A) done
return RunResult(answer=reply.content, messages=messages, turns=turn)
for call in reply.tool_calls: # (B) keep going
result = tools.invoke(call.name, call.arguments)
messages.append(Message(role="tool",
tool_call_id=call.id,
content=result))
emit("tool", {"call": call, "result": result})
raise RuntimeError(f"agent exceeded max_turns={max_turns} without final answer")
Two exits:
- (A) the model spoke without asking for tools — return its text as the answer.
- (B) every tool call gets executed, every result gets appended, and we loop.
max_turns is a safety belt. Real agents need a budget — tokens, dollars, wall-clock. We'll replace this naive count with a real budget in lesson 4. on_event is a tracing hook so we can watch the loop think; if you don't pass one, emit is a no-op.
Step 5 — A model we can actually run (agentkit/providers/mock.py)
Here is the trick that makes this course tractable: we ship a MockLLM that takes a script — a hand-written list of assistant replies — and returns them in order.
class MockLLM:
def __init__(self, script):
self._script = script
self._i = 0
def complete(self, messages, tools):
if self._i >= len(self._script):
raise RuntimeError("MockLLM script exhausted")
reply = self._script[self._i]
self._i += 1
return reply
Determinism by construction. Every example in every lesson runs the same way on your machine, in CI, in a coffee-shop with no wifi. Zero cost. Lesson 7 swaps MockLLM for an AnthropicLLM (or OpenAILLM) and the loop above doesn't change one character.
Step 6 — Drive it (examples/lesson1_hello.py)
Goal: "What is 12 + 30?". One tool: add(a, b). Script the mock to (1) call the tool, then (2) answer.
from agentkit import Message, Tool, ToolCall, ToolRegistry, run_agent
from agentkit.providers.mock import MockLLM
def add(a: int, b: int) -> int:
return a + b
def trace(kind: str, payload: object) -> None:
if kind == "user":
print(f"[user] {payload}")
elif kind == "assistant":
if payload.tool_calls:
for c in payload.tool_calls:
print(f"[assistant] -> tool_call {c.name}({c.arguments}) id={c.id}")
else:
print(f"[assistant] {payload.content}")
elif kind == "tool":
print(f"[tool] {payload['call'].name} -> {payload['result']!r}")
tools = ToolRegistry([
Tool(name="add",
description="Add two integers and return the sum.",
parameters={"type": "object",
"properties": {"a": {"type": "integer"},
"b": {"type": "integer"}},
"required": ["a", "b"]},
fn=add),
])
llm = MockLLM(script=[
Message(role="assistant",
tool_calls=[ToolCall(id="call_1", name="add",
arguments={"a": 12, "b": 30})]),
Message(role="assistant", content="12 + 30 = 42."),
])
result = run_agent(
goal="What is 12 + 30?",
llm=llm,
tools=tools,
system="You are a careful calculator. Use the tools provided.",
on_event=trace,
)
One honest caveat: the mock is scripted, so the second assistant reply is hardcoded — it would say "12 + 30 = 42." even if add returned 0. A real model would actually read the tool message and ground its answer on it; that's the behavior we're standing in for here.
Step 7 — Run it
$ python3 examples/lesson1_hello.py
[user] What is 12 + 30?
[assistant] -> tool_call add({'a': 12, 'b': 30}) id=call_1
[tool] add -> '42'
[assistant] 12 + 30 = 42.
final answer: '12 + 30 = 42.'
turns: 2
messages: 5 in transcript
That is the loop, working end-to-end. Read the transcript top to bottom and you can see every move it made:
- Turn 1: the model asked for
add(12, 30). The loop executed it, appended'42'as a tool message, and looped. - Turn 2: the model produced plain content, so the loop returned.
Five messages in the transcript: system, user, assistant(tool_call), tool, assistant(answer). That ordering is the canonical shape every provider expects.
One more check — the protocol holds
The whole architectural claim of this lesson is that the loop talks to any LLM. Let's prove the mock satisfies the protocol:
$ python3 -c "
from agentkit import LLM
from agentkit.providers.mock import MockLLM
print('isinstance(MockLLM(...), LLM):', isinstance(MockLLM(script=[]), LLM))
"
isinstance(MockLLM(...), LLM): True
That True is the seam every future provider will plug into.
What we built
Message/ToolCall— the wire format every layer agrees on.LLM— the one-method protocol that keeps the loop vendor-neutral.Tool/ToolRegistry— typed callables with safe dispatch.MockLLM— deterministic replies so the whole course runs for free.run_agent— the small loop the rest of the framework hangs off.
What's next — lesson 2: Tools That Do Real Work
We're going to give the agent a tool with side effects (a tiny in-memory filesystem), let the model issue multiple calls per turn, and watch the loop fan them out correctly using those tool_call_ids we set up today. Same loop. New behavior.
See you there.
— The Resident
— the resident
the resident