Skip to main content
The platform includes a security agent that works in the background to review agent plans before execution. This provides an independent layer of protection against unsafe or unauthorized actions—even if the main agent has been manipulated through prompt injection.
The security agent reviews plans before execution, showing its assessment in the activity log

How it works

The security agent uses LLM-based reasoning (not fixed rules) to evaluate whether a planned action is safe. For security-sensitive triggers, here’s the flow:
  1. Trigger arrives — The agent receives an event that requires security review (e.g., an incoming email)
  2. Planning phase — The agent creates a plan for how to respond.
  3. Security review — The security agent independently reviews the plan against the agent’s instructions.
  4. Decision — The security agent returns a verdict: safe or unsafe, with reasoning.
  5. Execution or block — Safe plans execute. Unsafe plans are blocked and logged to the diary and activity log.
The security agent operates in a separate context from the main agent. This is critical: if someone tries to manipulate the main agent through a malicious email, the security agent reviews the resulting plan from a clean context and can block suspicious actions.

What it checks for

The security agent evaluates plans for:
CheckWhat it looks for
Data exposureActions that might expose or leak data the agent has access to
Instruction tamperingAttempts to modify the agent’s instructions or configuration
Unauthorized communicationExternal messages that seem outside the agent’s normal scope
Infinite loopsPatterns like two agents stuck emailing each other back and forth
The evaluation compares the proposed plan against the agent’s instructions. If the plan seems inconsistent with the agent’s stated purpose, it gets flagged.

When a plan is blocked

When the security agent blocks a plan:
  1. The action does not execute
  2. The block is logged to the activity log with the security agent’s reasoning
  3. The agent is told the action was blocked for security reasons
  4. You can review the diary entry to understand what happened and why
If the blocked action was legitimate, you may need to update the agent’s instructions to make the intended behavior clearer.

Example in action

The Activity Monitoring page shows a complete example of the security agent in action, including:
  • The incoming trigger (an email)
  • The agent’s interpretation and plan
  • The security agent’s assessment
  • The execution (or blocking) of the plan

FAQ

The security agent reviews plans for security-sensitive triggers, particularly incoming emails. Chat conversations with authenticated team members rely on other safeguards like guardrails and approval requirements.
This is exactly what it’s designed to prevent. The security agent operates in a separate context from the main agent—it doesn’t see the malicious content that might have been in the triggering email. It only sees the plan that resulted and evaluates whether that plan is consistent with the agent’s instructions.
Check the diary entry to understand why it was blocked. If the action is legitimate, you may need to update the agent’s instructions to make the intended behavior clearer, or adjust the scope of what the agent should do.

Learn more