The platform includes automated attack detection that screens incoming trigger content before the agent acts on it. This provides protection against manipulation attempts—especially prompt injection through channels like email and SMS where anyone can send content to the agent.Documentation Index
Fetch the complete documentation index at: https://docs.abundly.ai/llms.txt
Use this file to discover all available pages before exploring further.
How it works
For security-sensitive triggers, the platform adds an attack detection step before execution:- Trigger arrives — The agent receives an event from an untrusted channel (e.g., an incoming email or SMS)
- Attack detection — The platform analyses the trigger content for manipulation attempts
- Decision — If an attack is detected, the trigger is blocked and admins are alerted. Otherwise, the agent proceeds to act on the event.
Attack detection runs in a separate context from the main agent. This is critical: the detection evaluates the raw trigger content before the agent has a chance to process it, catching manipulation attempts before they can influence the agent’s behavior.
What it checks for
The attack detection evaluates trigger content for:| Check | What it looks for |
|---|---|
| Prompt injection | Attempts to override the agent’s instructions through crafted input |
| Data exfiltration | Instructions designed to trick the agent into leaking data |
| Jailbreak | Attempts to bypass the agent’s safety boundaries |
| Reconnaissance | Probing to discover what the agent has access to |
When an attack is blocked
When the platform detects an attack:- The agent does not act on the trigger
- The block is logged to the activity log with the detection reasoning
- Alert emails are sent to configured recipients
- A diary entry is written documenting the blocked attack
When does it run?
Attack detection runs automatically for:- Incoming emails — Anyone can send an email to your agent
- Incoming SMS — Anyone can text your agent’s number
- Script escalations — When a script escalates to the full agent pipeline (unless the script explicitly opts out)
Configuring alerts
Workspace administrators can configure attack detection alert behavior in Workspace → Settings.| Setting | What it does |
|---|---|
| Alert recipients | Choose whether alerts go to all workspace admins, admins plus additional emails, or a custom email list |
| Include blocked content | Optionally include the blocked trigger payload in alert emails for debugging |
FAQ
Does attack detection check every trigger?
Does attack detection check every trigger?
No. It only screens untrusted trigger content—primarily incoming emails and SMS. Triggers from authenticated or controlled sources (like scheduled tasks or authenticated webhooks) skip this step.
Can attack detection be bypassed by clever prompting?
Can attack detection be bypassed by clever prompting?
Attack detection runs before the agent processes the content, and in a separate context. The detection model evaluates the raw input independently, making it resistant to the manipulation techniques it’s designed to catch.
What if attack detection blocks a legitimate message?
What if attack detection blocks a legitimate message?
Check the activity log to understand why it was blocked. If the message was legitimate, review the content for patterns that might look suspicious (e.g., instructions that resemble prompt injection). You can also adjust the agent’s instructions to make its expected interactions clearer.
Learn more
Guardrails
Code-enforced constraints like whitelists and approval requirements
User Approval
Human-in-the-loop for sensitive actions
Activity Monitoring
See attack detection results in the activity log

