
How it works
The security agent uses LLM-based reasoning (not fixed rules) to evaluate whether a planned action is safe. For security-sensitive triggers, here’s the flow:- Trigger arrives — The agent receives an event that requires security review (e.g., an incoming email)
- Planning phase — The agent creates a plan for how to respond.
- Security review — The security agent independently reviews the plan against the agent’s instructions.
- Decision — The security agent returns a verdict: safe or unsafe, with reasoning.
- Execution or block — Safe plans execute. Unsafe plans are blocked and logged to the diary and activity log.
The security agent operates in a separate context from the main agent. This is critical: if someone tries to manipulate the main agent through a malicious email, the security agent reviews the resulting plan from a clean context and can block suspicious actions.
What it checks for
The security agent evaluates plans for:| Check | What it looks for |
|---|---|
| Data exposure | Actions that might expose or leak data the agent has access to |
| Instruction tampering | Attempts to modify the agent’s instructions or configuration |
| Unauthorized communication | External messages that seem outside the agent’s normal scope |
| Infinite loops | Patterns like two agents stuck emailing each other back and forth |
When a plan is blocked
When the security agent blocks a plan:- The action does not execute
- The block is logged to the activity log with the security agent’s reasoning
- The agent is told the action was blocked for security reasons
- You can review the diary entry to understand what happened and why
Example in action
The Activity Monitoring page shows a complete example of the security agent in action, including:- The incoming trigger (an email)
- The agent’s interpretation and plan
- The security agent’s assessment
- The execution (or blocking) of the plan
FAQ
Does the security agent check every action?
Does the security agent check every action?
The security agent reviews plans for security-sensitive triggers, particularly incoming emails. Chat conversations with authenticated team members rely on other safeguards like guardrails and approval requirements.
Can the security agent be bypassed by prompt injection?
Can the security agent be bypassed by prompt injection?
This is exactly what it’s designed to prevent. The security agent operates in a separate context from the main agent—it doesn’t see the malicious content that might have been in the triggering email. It only sees the plan that resulted and evaluates whether that plan is consistent with the agent’s instructions.
What if the security agent blocks a legitimate action?
What if the security agent blocks a legitimate action?
Check the diary entry to understand why it was blocked. If the action is legitimate, you may need to update the agent’s instructions to make the intended behavior clearer, or adjust the scope of what the agent should do.

