The Security Agent
The platform includes a security agent that works in the background:
The security agent reviews plans before execution
- Agent receives an event (email, webhook, etc.)
- Agent creates a plan for how to respond
- Security agent reviews the plan independently
- Security agent can approve, modify, or veto the plan
- Only approved plans are executed
The security agent operates independently of the main agent’s context, preventing prompt injection attacks from influencing security decisions.
Capability Guardrails
Sensitive capabilities can be configured with constraints:
Configure email whitelists to restrict external communication
Email Whitelist
Restrict which domains the agent can email:| Setting | Behavior |
|---|---|
| Whitelist only | Can only email approved domains |
| Whitelist + approval | Other domains require human approval |
| No restriction | Can email any domain |
SMS Limits
- Maximum messages per day
- Approved recipient lists
- Require approval for new numbers
Call Approval
- Require human approval before making calls
- Restrict to specific numbers
- Set calling hours
How Guardrails Work
Guardrails are enforced by code in the platform:Team-Level Guardrails
(Planned feature) Team admins can set guardrails that apply across all agents:- Whitelist of allowed agent capabilities
- Team-level email domain restrictions
- Global approval requirements
Configuring Guardrails
1
Go to agent settings
Navigate to the capabilities section.
2
Select capability
Choose the capability to configure (e.g., Email).
3
Set constraints
Configure whitelists, limits, or approval requirements.
4
Test behavior
Verify the guardrails work as expected.
Guardrails are now active. Test by attempting a blocked action to confirm.
Best Practices
Start restrictive
Start restrictive
Begin with tight guardrails and loosen as needed.
Require approval for external communication
Require approval for external communication
Any message to external parties should have oversight.
Set sensible limits
Set sensible limits
Prevent runaway behavior with rate limits.
Review blocked actions
Review blocked actions
Check what’s being blocked to identify false positives.

