KI-Agenten Sicherheit: Vollständiger Leitfaden 2026
LLM-basierte KI-Agenten sind die am schnellsten wachsende Angriffsfläche in der modernen Infrastruktur. Dieser Leitfaden gibt dir das vollständige Abwehr-Stack — von Prompt Injection bis Container-Isolation — mit direkten Links zu jedem Themen-Runbook.
OWASP LLM Top 10 — Threat Coverage Map
Each risk maps to a dedicated ClawGuru defense guide. Click the guide link to jump straight to the runbook.
| ID | Risk | Severity | Defense Guide |
|---|---|---|---|
| LLM01 | Prompt Injection | CRITICAL | prompt injection defense → |
| LLM02 | Insecure Output Handling | HIGH | ai agent sandboxing → |
| LLM03 | Training Data Poisoning | CRITICAL | model poisoning protection → |
| LLM04 | Model Denial of Service | HIGH | llm gateway hardening → |
| LLM05 | Supply Chain Vulnerabilities | HIGH | model poisoning protection → |
| LLM06 | Sensitive Info Disclosure | HIGH | ai agent sandboxing → |
| LLM07 | Insecure Plugin Design | MEDIUM | secure agent communication → |
| LLM08 | Excessive Agency | HIGH | ai agent sandboxing → |
| LLM09 | Overreliance | MEDIUM | ai agent hardening guide → |
| LLM10 | Model Theft | HIGH | llm gateway hardening → |
Defense Deep-Dives
Five dedicated guides — each a complete playbook with code examples, checklists, and JSON-LD schemas.
5-Layer Defense Architecture
30-Minute Quick-Start Checklist
System prompt in separate, immutable channel (not interpolated with user input)
Injection pattern scanner active on all LLM inputs
Agent container runs as UID 65534 (nobody), read-only rootfs
LLM gateway bound to 127.0.0.1 — zero public exposure
Rate limiting: max 10 LLM calls/min per API key
All agent inputs and outputs logged with correlation ID
Model SHA-256 checksum verified before each deployment
Behavioral test suite runs in CI — deployment blocked on failure
Capability tokens used for agent-to-agent auth (not raw API keys)
Agent execution timeout: 30 seconds hard limit
Compliance: EU AI Act + GDPR
EU AI Act (High-Risk)
High-risk AI systems (healthcare, infrastructure, HR) require: human oversight mechanisms, risk management system, technical documentation, conformity assessment, and post-market monitoring.
GDPR / DSGVO
AI processing personal data: data minimisation (agents only receive what they need), logging with PII masking, purpose limitation, retention limits, and right-to-erasure support in agent memory.
SOC 2 Type II
Audit logging of all agent actions (1-year retention), access controls with least privilege, incident response procedures, and regular security testing of agent systems.
NIS2 (EU)
AI systems in critical infrastructure: risk management obligations, incident reporting within 24h, supply chain security including AI model provenance, and business continuity measures.
Frequently Asked Questions
What is the #1 security risk for AI agents in 2026?
Prompt injection (OWASP LLM01) is the top risk. Attackers embed malicious instructions in user input or external data to hijack agent behavior. Defense requires input validation, structural prompt separation, output parsing, and sandbox isolation.
How do I secure a self-hosted LLM gateway?
Bind Ollama/LocalAI to 127.0.0.1 only, place a reverse proxy (nginx/Caddy) in front with API key auth or mTLS, add rate limiting (max 10 req/min per key), enable audit logging of all prompts, and restrict network access with iptables.
What Docker flags are required for a secure AI agent container?
Use: --read-only, --network=none, --cap-drop=ALL, --no-new-privileges, --user=65534, --memory=512m, --pids-limit=100, and wrap execution in timeout 30. This provides 6 isolation layers with minimal blast radius.
How can I tell if my AI model has been poisoned?
Run a behavioral test suite on every model version: test known refusal scenarios, check for anomalous outputs on synthetic inputs (including known trigger phrases), compare output distributions between model versions, and use SHA-256 checksums of model weights to detect unauthorized modifications.
What is the principle of least privilege for AI agents?
Each agent receives only the minimum permissions for its specific task. A summarization agent needs no filesystem or network access. A code agent reads repos but writes only to feature branches. Use scoped, time-limited capability tokens — never raw API keys or broad database credentials.