Risk vs. Utility Tradeoff

Risk management is a tradeoff between utility and safety
- Job scope — How broad is the agent’s responsibility?
- Tool access — What capabilities does the agent have?
- Guardrails — What constraints are enforced?
- Oversight — How much monitoring and approval is required?
Guiding Principles
Principle of Least Privilege
Give agents only the tools and data they need for their job.
Principle of Earned Trust
Start with narrow scope and expand as the agent proves itself.
Risk Factors
| Factor | Lower Risk | Higher Risk |
|---|---|---|
| Scope | Specific task | Broad responsibilities |
| Tools | Read-only access | Write/send/call capabilities |
| External contact | Internal only | Customer-facing |
| Data sensitivity | Public data | Confidential information |
| Reversibility | Easy to undo | Permanent actions |
Mitigation Strategies
Better models
Better models
Use the most capable models for high-stakes tasks. Default selection is optimized for agentic behavior.
Better instructions
Better instructions
Spend more time on clear, detailed instructions with explicit boundaries.
More testing
More testing
Extensive testing before deployment, especially for edge cases.
Guardrails
Guardrails
Configure technical constraints (whitelists, limits) that are enforced by code.
Human approval
Human approval
Require manual approval for sensitive actions.
Monitoring
Monitoring
Watch what agents do, especially early on.
The Security Agent
The platform includes a security agent that works in the background:- Reviews incoming events and agent plans
- Operates independently of the agent’s context
- Can veto actions that seem unsafe
- Prevents prompt injection attacks

The security agent's assessment in the activity log
Practical Recommendations
1
Start conservative
Narrow scope, limited tools, close monitoring.
2
Test thoroughly
Verify behavior before expanding capabilities.
3
Expand gradually
Add tools and scope incrementally.
4
Monitor continuously
Watch for unexpected patterns even after deployment.
Our experience shows that advanced agents with broad scope can be safe — it just requires more careful setup and ongoing attention.

