RestingOwl owl logo RestingOwl

What Is an AI Agent?A Security Engineer's Mental Model

Quick Answer: An AI agent is a software system that uses a large language model as its reasoning core. It perceives its environment, makes decisions, and takes actions autonomously. Unlike a chatbot, an agent can call APIs, execute code, read files, and chain multiple steps together without a human approving each one. This autonomy is what makes AI agents powerful and what makes them a fundamentally new category of security risk.

What Is an AI Agent, and How Is It Different From a Chatbot?

A chatbot takes a message, generates a reply, and stops. An AI agent takes a goal and works towards it across multiple steps, using tools along the way. The key difference is agency: the ability to decide what to do next, act on that decision, and continue until the goal is reached.

Consider two examples. You ask a chatbot: "Find me the latest security advisories for Apache." The chatbot replies with information from its training data, which may be months old. You ask an AI agent the same question. The agent calls a web search tool, reads the results, filters for Apache advisories, and returns a formatted summary. If the results mention a CVE you are already patching, the agent might check your patch status using another tool. The agent took actions. The chatbot only answered.

What Are the Four Core Components of an AI Agent?

Every AI agent is built from four components. Understanding each one is the first step to understanding where security risks come from.

ComponentWhat It DoesSecurity Risk It Introduces
LLM CoreThe reasoning engine. Takes input, makes decisions, and generates instructions.Susceptible to prompt injection. Decisions are probabilistic, not deterministic.
Tools and ActionsExternal capabilities: web search, code execution, file access, API calls.Each tool is a potential attack vector. Compromised tools can hijack the agent's actions.
MemoryStores past interactions, retrieved documents, and context. Can be short-term or long-term.Poisoned memory can corrupt future decisions. Sensitive data can leak across sessions.
Planning LoopThe autonomous cycle: observe, reason, act, and repeat until the goal is complete.Unconstrained loops can take unintended actions. A hijacked loop can cause cascading damage.

What Is the Autonomous Loop and Why Does It Matter for Security?

The autonomous loop is the cycle that makes an agent work without human supervision. It follows these four steps in sequence, repeating until the task is complete.

  1. Observe: The agent receives input from the environment. This could be a user request, a tool response, or a retrieved document.
  2. Reason: The LLM processes the input and decides what to do next.
  3. Act: The agent calls a tool or produces an output based on its decision.
  4. Observe again: The result of the action becomes new input. The cycle repeats.

From a security perspective, this loop is dangerous because there is no human checkpoint between steps. If a malicious instruction enters the loop at any point, the agent may execute harmful actions before a human can intervene. This is fundamentally different from traditional software, where every function call is explicitly coded by a developer.

What Tools Can an AI Agent Use and What Attack Surface Do They Create?

Tools are the most important part of an agent's attack surface. Every tool the agent can call is a capability that an attacker can try to trigger through manipulation. Here are the most common tool types and the risks they carry.

  • Web search: Can fetch attacker-controlled pages containing injection payloads disguised as search results.
  • Code execution: Can run arbitrary code, access the filesystem, or make outbound network requests.
  • File read and write: Can read sensitive configuration files or overwrite critical data.
  • Email and calendar: Can send messages from the user's account or read private communications.
  • Database queries: Can read or modify records, including data the user should not access.
  • External APIs: Can trigger financial transactions, change account settings, or access third-party services.

The key security principle is scope. An agent should only have access to the tools it genuinely needs for its task. An agent that summarises documents does not need write access to the filesystem. An agent that reads emails does not need to send them. This is the principle of least privilege applied to AI.

What Is Agent Memory and What Security Risks Does It Introduce?

Agent memory is how the agent stores and retrieves information across its reasoning steps. There are three common types, each with different security implications.

Memory TypeHow It WorksSecurity Risk
In-context memoryInformation stored in the active prompt window during a single session.Sensitive data from earlier in the conversation can appear in later outputs if not carefully managed.
External memory (RAG)Retrieved from a vector database or document store during the task.Attackers can poison the knowledge store with malicious documents containing injection payloads.
Long-term memoryPersisted across sessions, written to and read from a database.Poisoned instructions saved in one session can affect all future sessions, including those of other users.

How Is an AI Agent Different From Traditional Application Software?

Traditional software follows deterministic paths that a developer explicitly coded. Every action it can take is listed in the source code. An AI agent follows probabilistic paths that emerge from the model's reasoning. The actions it takes depend on what tools it has access to and what the LLM decides to do, not on what a developer wrote.

DimensionTraditional SoftwareAI Agent
Decision makingExplicit logic coded by a developerLLM reasoning, probabilistic and context-dependent
Action setFixed: only what the code explicitly callsDynamic: any tool the agent has access to
Input handlingStructured: validated against a known schemaUnstructured: natural language from any source
AuditabilityEvery action traceable to a line of codeActions emerge from model reasoning, harder to predict
Failure modesKnown and enumerable in advanceEmergent and often discovered only after deployment
Security modelPermissions set at deployment timeMust also account for prompt-level manipulation at runtime

This difference means that traditional application security controls, including input validation, output encoding, access control, and authentication, are necessary but not sufficient for AI agents. You still need all of those controls. You also need controls specific to the LLM layer: prompt injection defences, tool scope enforcement, human-in-the-loop checkpoints, and output validation.

What Types of AI Agents Exist?

AI agents are not a single category. Different architectures have different security profiles.

Agent TypeDescriptionExample Use CaseKey Security Risk
Single-agentOne LLM that reasons and acts autonomously with a defined set of tools.Customer support agent that searches a knowledge base and creates tickets.Prompt injection through user input or retrieved content.
Multi-agent orchestratorA supervisor agent that delegates subtasks to specialised worker agents.Security analyst agent that calls a CVE lookup, a patch checker, and a report writer.A compromised worker can return poisoned responses that redirect the orchestrator's decisions.
Multi-agent peer networkMultiple agents running in parallel, collaborating on a shared task.Software development: planner agent, coder agent, and reviewer agent working together.Trust between peers is often implicit. A hijacked peer can inject malicious instructions.
Agentic pipelineA linear chain of agents, each processing the output of the previous one.Document workflow: reader, extractor, formatter, and publisher agents in sequence.Malicious content entering an early stage can propagate through the entire chain.

What Security Questions Should You Ask Before Deploying an AI Agent?

Before you put an AI agent into production, these are the questions you should be able to answer clearly.

AI Agent Pre-Deployment Security Checklist
  1. 1What tools does this agent have access to? Is every tool genuinely necessary for the task?
  2. 2What is the minimum permission level each tool requires to do its job?
  3. 3What external content can the agent read? Could that content contain injection payloads?
  4. 4Which actions are irreversible? Is there a human approval step before executing them?
  5. 5What sensitive data flows through the agent's context window? Can it appear in outputs?
  6. 6Are all tool calls logged with their full inputs and outputs for audit purposes?
  7. 7Is there a rate limit or action budget to prevent runaway autonomous loops?
  8. 8How does the agent behave when it receives contradictory or unexpected instructions?

References

  1. 1OWASP Top 10 for LLM Applications
  2. 2CISA: Guidelines for Secure AI System Development
  3. 3NIST AI Risk Management Framework
  4. 4Anthropic: Building Effective Agents

Q&A Section

Standard ChatGPT is a chatbot. ChatGPT with tool use enabled, or with browsing and code interpreter active, behaves as an AI agent because it can take actions beyond generating text. Claude with computer use enabled is also an agent. The distinction is whether the system can take actions in the real world, not just produce text.
An AI assistant answers questions and generates content. An AI agent acts: it can search the web, write files, send emails, and run code, then chain these actions together to complete a goal. An assistant is a tool you use. An agent is a system that does things on your behalf. The security implications are very different because agents can cause real-world effects without step-by-step human approval.
Yes. The most common attack is prompt injection, where a malicious instruction is embedded in content the agent reads, such as a webpage, an email, or a retrieved document. The agent interprets the instruction as legitimate and follows it. Other risks include memory poisoning, tool hijacking through compromised APIs, and supply chain attacks on the model itself.
Excessive agency combined with prompt injection. If an agent has broad tool access and can be manipulated through its inputs to follow attacker instructions, the attacker effectively gains access to every tool the agent can use. The solution is to limit tool access to the strict minimum required and to require human approval before any high-risk or irreversible action.
Traditional application security controls still apply: authentication, authorisation, input validation, and logging. But you also need LLM-specific controls: prompt injection detection, tool scope enforcement, output validation against a known schema, observability over the agent's reasoning trace, and human-in-the-loop checkpoints for sensitive actions. Guardrail frameworks and LLM firewall products are available to help with this.
Copied!