Skip to main content
Not all agents carry the same risk. An internal research assistant with read-only access is fundamentally different from a customer-facing agent that can send emails and modify databases. Understanding this spectrum—and where your agent falls on it—is the first step to building something that’s both useful and safe.
Risk vs Utility Tradeoff showing that broader scope and more tool access increase both utility and risk

Security starts at design time

During the agent design process, think through:
  • What capabilities does this agent need?
  • What data does it need access to?
  • Who will it communicate with?
  • What could go wrong, and how bad would it be?
These questions shape everything: the agent’s instructions, its capabilities, the guardrails you configure, and how much human oversight is needed.

The intern metaphor

Agent design can be compared to hiring a new intern. Does the intern need access to the company bank account? Probably not. Most jobs don’t require it, and removing that access entirely eliminates a whole category of risk. What if they need access to customer data AND email? Now you have a potential leak vector. The intern could accidentally (or be manipulated into) sharing sensitive information externally. You need to think about:
  1. Do they really need both? Can you separate the tasks so one role has data access and another handles email?
  2. If yes, what safeguards? Maybe require approval for external emails. Maybe restrict email to specific domains. Maybe monitor closely at first.
The same logic applies to agents. The platform gives you tools to manage these risks—guardrails, approval requirements, monitoring—but you need to decide which ones are appropriate for each agent.

Risk vs Reward

FactorLower RiskHigher Risk
ScopeSpecific, narrow taskBroad responsibilities
ToolsRead-only accessWrite, send, call, delete
CommunicationInternal team onlyCustomer-facing, public
Data accessPublic informationConfidential, PII, financial
ReversibilityEasy to undoPermanent or hard to reverse
An agent that scores “lower risk” across all factors needs minimal guardrails. An agent that scores “higher risk” on several factors may be more capable and valuable, but needs serious attention to security configuration.

Examples across the spectrum

Low-risk: Ticket priority agent

Imagine an agent whose only job is to set the priority field on incoming support tickets. That’s it—no email, no customer data, no external communication. Just read the ticket and set a priority. What’s the worst that can happen? A ticket gets the wrong priority. Not the end of the world. This agent is safe essentially out of the box, just by nature of its limited scope and tools. Minimal guardrails needed.

High-risk: Customer inquiry handler

Now imagine an agent that handles incoming customer inquiries. It has access to public-facing email, a database with customer information, and can reach out to people and make decisions with real impact. This could be a VERY valuable agent—but it takes more work to make it safe:
  • Better models — Use the best available (usually also the most expensive)
  • Better instructions — Spend more time crafting and iterating on clear guidelines
  • More testing — Test edge cases and adversarial scenarios thoroughly
  • Guardrails — Code-enforced constraints like whitelists that can’t be bypassed
  • Human-in-the-loop — Require approval for important decisions
  • More monitoring — Watch closely, especially early on
  • Gradual autonomy — Start with limited scope (like a trainee), expand as it proves itself
Higher reward, but more work to manage the risk.

The reality: Most agents are in between

Most agents fall somewhere between these extremes. The most crucial decisions are usually best left to humans, while the agent handles the grunt work—the boring, repetitive tasks—and provides data and insights the human can use.

Who decides?

A crucial part of agent design is deciding who makes decisions—the agent or the human. This is a spectrum:
Decision spectrum from Agent in control to Human in control, showing four modes: Agent decides, Agent decides and informs human, Agent suggests and waits for approval, Agent asks human to decide
ModeWhen to useExample
Agent decidesLow-stakes, reversible actions where speed mattersSetting ticket priority
Agent decides, informs humanMedium-stakes actions where visibility is importantSending internal status updates
Agent suggests, waits for approvalHigher-stakes decisions where human judgment adds valueIssuing a customer refund
Agent asks human to decideCritical decisions, edge cases, ambiguous situationsEscalating a complaint to leadership
You can adjust this as you gain confidence in the agent. Start with more human oversight, then gradually give the agent more autonomy as it proves itself.

Guiding principles

  • Principle of Least Privilege — Give agents only the capabilities and data they need for their job—no more. An agent that doesn’t have email access can’t accidentally send a bad email.
  • Principle of Earned Trust — Start narrow and expand as the agent proves itself. Begin with approval requirements, then relax them once you’re confident in the agent’s behavior.
Advanced agents with broad scope CAN be safe—it just requires more careful design and ongoing attention. Don’t be afraid of powerful agents; be thoughtful about how you build them.

Learn more