"Not a Pentest" Notice: This guide is for defending your own AI systems. No attack tools, no exploitation of external systems.

Moltbot AI Security · Pillar Page

KI-Agenten Sicherheit: Vollständiger Leitfaden 2026

LLM-basierte KI-Agenten sind die am schnellsten wachsende Angriffsfläche in der modernen Infrastruktur. Dieser Leitfaden gibt dir das vollständige Abwehr-Stack — von Prompt Injection bis Container-Isolation — mit direkten Links zu jedem Themen-Runbook.

OWASP LLM risks covered

Dedicated defense guides

Container isolation layers

JSON-LD schema types

OWASP LLM Top 10 — Threat Coverage Map

Each risk maps to a dedicated ClawGuru defense guide. Click the guide link to jump straight to the runbook.

ID	Risk	Severity	Defense Guide
LLM01	Prompt Injection	CRITICAL	prompt injection defense →
LLM02	Insecure Output Handling	HIGH	ai agent sandboxing →
LLM03	Training Data Poisoning	CRITICAL	model poisoning protection →
LLM04	Model Denial of Service	HIGH	llm gateway hardening →
LLM05	Supply Chain Vulnerabilities	HIGH	model poisoning protection →
LLM06	Sensitive Info Disclosure	HIGH	ai agent sandboxing →
LLM07	Insecure Plugin Design	MEDIUM	secure agent communication →
LLM08	Excessive Agency	HIGH	ai agent sandboxing →
LLM09	Overreliance	MEDIUM	ai agent hardening guide →
LLM10	Model Theft	HIGH	llm gateway hardening →

Defense Deep-Dives

Five dedicated guides — each a complete playbook with code examples, checklists, and JSON-LD schemas.

💉

Prompt Injection Defense

Input validation, output sanitization, runtime detection and sandboxing against LLM01.

☣️

Model Poisoning Protection

Training data integrity, behavioral test suites and supply chain validation against LLM03.

🔐

Secure Agent Communication

mTLS, signed message envelopes and capability tokens for multi-agent systems.

🛡️

LLM Gateway Hardening

Secure self-hosted Ollama/LocalAI/LiteLLM with auth, rate limiting and audit logging.

📦

AI Agent Sandboxing

Docker isolation, capability dropping, network restriction and blast radius limitation.

5-Layer Defense Architecture

L1 — Input Validation

Reject injection patterns before they reach the LLM. Allowlist input types, strip meta-instructions, limit length.

L2 — Prompt Architecture

Immutable system prompt in separate channel. XML/JSON delimiters between instructions and user data. Never interpolate raw input.

L3 — Container Sandbox

--read-only rootfs, --cap-drop=ALL, --network=none, --user=65534, 30s execution timeout per agent run.

L4 — Gateway Security

LLM gateway bound to 127.0.0.1. mTLS or API key auth via reverse proxy. Rate limit per key: 10 req/min.

L5 — Behavioral Monitoring

Log all inputs/outputs. Run canary probes. Alert on statistical output distribution shifts. Rotate model versions with integrity checks.

30-Minute Quick-Start Checklist

✓

System prompt in separate, immutable channel (not interpolated with user input)

✓

Injection pattern scanner active on all LLM inputs

✓

Agent container runs as UID 65534 (nobody), read-only rootfs

✓

LLM gateway bound to 127.0.0.1 — zero public exposure

✓

Rate limiting: max 10 LLM calls/min per API key

✓

All agent inputs and outputs logged with correlation ID

✓

Model SHA-256 checksum verified before each deployment

✓

Behavioral test suite runs in CI — deployment blocked on failure

✓

Capability tokens used for agent-to-agent auth (not raw API keys)

✓

Agent execution timeout: 30 seconds hard limit

Compliance: EU AI Act + GDPR

EU AI Act (High-Risk)

High-risk AI systems (healthcare, infrastructure, HR) require: human oversight mechanisms, risk management system, technical documentation, conformity assessment, and post-market monitoring.

GDPR / DSGVO

AI processing personal data: data minimisation (agents only receive what they need), logging with PII masking, purpose limitation, retention limits, and right-to-erasure support in agent memory.

SOC 2 Type II

Audit logging of all agent actions (1-year retention), access controls with least privilege, incident response procedures, and regular security testing of agent systems.

NIS2 (EU)

AI systems in critical infrastructure: risk management obligations, incident reporting within 24h, supply chain security including AI model provenance, and business continuity measures.

Frequently Asked Questions

What is the #1 security risk for AI agents in 2026?

Prompt injection (OWASP LLM01) is the top risk. Attackers embed malicious instructions in user input or external data to hijack agent behavior. Defense requires input validation, structural prompt separation, output parsing, and sandbox isolation.

How do I secure a self-hosted LLM gateway?

Bind Ollama/LocalAI to 127.0.0.1 only, place a reverse proxy (nginx/Caddy) in front with API key auth or mTLS, add rate limiting (max 10 req/min per key), enable audit logging of all prompts, and restrict network access with iptables.

What Docker flags are required for a secure AI agent container?

Use: --read-only, --network=none, --cap-drop=ALL, --no-new-privileges, --user=65534, --memory=512m, --pids-limit=100, and wrap execution in timeout 30. This provides 6 isolation layers with minimal blast radius.

How can I tell if my AI model has been poisoned?

Run a behavioral test suite on every model version: test known refusal scenarios, check for anomalous outputs on synthetic inputs (including known trigger phrases), compare output distributions between model versions, and use SHA-256 checksums of model weights to detect unauthorized modifications.

What is the principle of least privilege for AI agents?

Each agent receives only the minimum permissions for its specific task. A summarization agent needs no filesystem or network access. A code agent reads repos but writes only to feature branches. Use scoped, time-limited capability tokens — never raw API keys or broad database credentials.