The pitch for AI agents is intoxicating: instead of asking a chatbot for an answer and copying it somewhere, you hand a goal to an agent and it does the work — reads your inbox, queries the database, files the ticket, sends the email. This week the whole industry leaned into it. Google shipped Managed Agents in its API, and Microsoft and Nvidia announced new Windows "containment" primitives specifically so agents can act on your PC without running wild (Nvidia).
That last phrase is the whole story. The moment an AI stops talking and starts acting, it stops being a clever chatbot and becomes a new kind of user on your network — one that can be tricked, over-permissioned, or simply wrong. If you're new to the concept, our explainer on what AI agents actually are is a good starting point. This piece is about the part the demos skip: the security conversation to have before you let an agent touch anything that matters.
What actually changed: from "answer" to "action"
A traditional chatbot has a small blast radius. The worst a bad answer does is mislead you, and you're still the one who clicks the button. An agent is different by design: it has tools (it can call APIs, run code, browse the web), memory (it remembers across steps), and autonomy (it decides what to do next without asking every time).
Those three properties are exactly what make agents useful — and exactly what make them risky. The OWASP GenAI Security Project, which publishes the closest thing the industry has to a consensus risk list, released a 2026 Top 10 for agentic applications precisely because these systems introduce failure modes that go beyond simple "the model said something wrong" — things like tool misuse, excessive agency, memory poisoning, and prompt injection (OWASP).
Let's take the two that matter most in practice.
The headline risk: prompt injection
Prompt injection is the security problem that won't go away in 2026, and it's worth understanding because it breaks a comfortable assumption. With a normal app, you trust the code and distrust the user input. With an agent, the "instructions" and the "data" arrive in the same text stream — so any content the agent reads (a web page, an email, a code comment, a PDF) can contain instructions, and the model may follow them.
This isn't theoretical. A real, patched example: security researcher Johann Rehberger demonstrated CVE-2025-53773, where hidden instructions placed in files a developer opened could make GitHub Copilot in Visual Studio Code silently edit the project's settings.json to switch on "YOLO mode" (chat.tools.autoApprove: true) — disabling the human-approval prompts and letting the agent run shell commands. Microsoft rated it 7.8 (High) and shipped a fix in its August update (Embrace The Red, NVD).
The uncomfortable part: there is no known complete fix for prompt injection — only defense-in-depth. So the right mental model isn't "can I block the attack?" It's "when the agent gets bad instructions, what's the worst it can do?" — which leads straight to the second risk.
The quiet risk: excessive agency
"Excessive agency" is OWASP's term for the most common own-goal: giving an agent more power than its job requires. An agent with read/write access to your production database, permission to send email as you, and a corporate credit card on file is a problem whether it's hijacked by an attacker or just makes a bad call on its own.
The fix is the oldest principle in security, applied to a new kind of actor: least privilege. OWASP frames the agentic version as least agency — grant the minimum autonomy and access needed for the defined task, nothing more. An agent that summarizes support tickets does not need write access to billing. An agent that drafts replies does not need permission to send them.
Where agent data actually leaks
If you strip away the hype, most real agent data-exposure incidents trace back to a short list of patterns. Here's how they map to defenses:
| What goes wrong | Why it happens | The practical guardrail |
|---|---|---|
| Agent reads data outside its job | Access is scoped at the app, not the data layer | Enforce permissions on the data itself; give the agent its own scoped identity, not a human's broad one |
| Stolen agent credentials | Long-lived API keys/tokens sit in configs | Short-lived, scoped credentials; rotate; never hard-code keys |
| Injection → exfiltration | Agent is tricked into leaking data it can read | Limit what it can read; restrict outbound calls; human approval for sensitive actions |
| "It just did something" | Too much autonomy, no audit trail | Human-in-the-loop for high-impact steps; log every tool call |
None of this is exotic. It's the same identity, access, and logging discipline that protects human staff — applied to a non-human worker that acts faster and never gets tired. If your organization hasn't nailed the basics yet, start with our cybersecurity basics for small businesses guide first; agents amplify whatever security posture you already have.
A pre-deployment checklist
Before an agent goes live with access to real systems, you should be able to answer "yes" to these:
- Scoped identity? The agent has its own credentials with the narrowest permissions for its task — not a borrowed admin login.
- Least agency? It can only call the specific tools it needs; everything else is denied by default.
- Human-in-the-loop for the dangerous stuff? Sending money, deleting data, emailing customers, changing config → require a human approval.
- Untrusted input treated as untrusted? Anything the agent reads from the web, email, or user uploads is assumed to possibly contain injected instructions.
- Sandboxed? Code execution and file/system access run in a contained environment (the exact thing Microsoft and Nvidia are now building into Windows).
- Logged and reversible? Every tool call is recorded, and high-impact actions can be reviewed or undone.
If you can't answer yes, the agent isn't ready for that level of access — give it a smaller job.
The honest business case
Security caution isn't anti-AI; it's what keeps these projects alive. Gartner has predicted that over 40% of agentic AI projects will be canceled by the end of 2027, citing escalating costs, unclear business value, and inadequate risk controls — and it warned of "agent washing," estimating that only a small fraction of self-described agentic vendors are the real thing (Gartner).
The flip side of Gartner's own forecast is that the surviving projects are real: the firm still expects a meaningful share of enterprise software to embed agentic AI by 2028. The winners won't be whoever deploys agents fastest — it'll be whoever deploys them with guardrails that let them keep running after the first incident.
Bottom line
AI agents are genuinely useful, and 2026 is the year they move from demo to deployment — see today's Tech Pulse for how fast the platforms are racing here. But "it can take actions for you" and "it can take actions for an attacker" are the same sentence. Treat an agent like a new employee you've never met: give it the smallest possible access, watch what it does, and make sure a human signs off on anything you'd regret. Do that, and agents become a powerful teammate instead of your most over-permissioned liability.
We link primary sources (OWASP, NVD, Gartner) so you can verify, and we flag company claims and forecasts as exactly that. Security guidance here is general — adapt it to your own systems and risk tolerance.



