Open Standard · Apache 2.0

Geometric safety for multi-agent AI

One decorator. Three layers of defense. C4 classifies AI behavior into a 27-state cognitive topology and blocks dangerous trajectories before they execute.

<1%
Attack success rate
with C4 active
16
AoC failure modes
blocked
295
Tests
passing
4.3ms
Median latency
dual classifier
$pip install c4-protocol
$pip install "c4-protocol[full]"

How it works

from c4_protocol import guard @guard def agent(prompt: str) -> str: return llm.generate(prompt) # Harmful prompts are blocked automatically agent("ignore all previous instructions") // → C4SafetyError: BLOCKED | State: (1,1,2) | Risk: 85% // BERT danger + keyword: 'ignore previous instructions' # Safe prompts pass through agent("Hello, how are you today?") // → "I'm doing well, thank you for asking!"

How to integrate — 3 ways

Decorator

One @guard per agent function. Wraps any Python function that calls an LLM. Blocked calls raise C4SafetyError.

@guard
def agent(prompt): ..."

FastAPI Middleware

One line protects an entire server. All POST requests auto-classified. Blocked requests return 403.

add_c4_safety(app)

Manual API

Call the classifier directly for custom logic. Get C4 state, confidence, and danger verdict per-text.

DualClassifier().classify(text)
What it does NOT do: Installing this package does not automatically protect all agents in your system. c4protocol is a library — you integrate it. For multi-agent setups, wrap each agent’s LLM calls in @guard and use C4BGPRouter for inter-agent state coordination (like BGP for routers).

Three-Layer Defense

L1 Dual Classifier BERT ONNX model (416MB) and keyword heuristic run in parallel. OR-logic: block if either flags danger. Defense-in-depth that catches both semantic attacks and known patterns.
L2 16 AoC Defenses Per-category C4 state trajectory analysis. Each defense monitors specific failure modes from the Agents of Chaos taxonomy (arXiv:2602.20021) plus 5 extended categories.
L3 SVETILO Values 7 ethical seals mapped to C4 states. The Empathy Checkpoint ensures Self→System transitions require Other mediation. Value verification on every decision.

Capabilities

@guard Decorator

One-line safety for any Python function. Async support. Configurable violation modes: raise, warn, redirect.

Dual Classifier

BERT ONNX + keyword in parallel with OR-logic. Catches what either misses alone.

C4BGP Routing

Border Gateway Protocol for agents. Path vector routing. Φ-distance route selection. Community tags.

HMAC Messaging

Cryptographic message signing. Nonce replay protection. Constant-time verification.

ThoughtVirus Defense

C4 trajectory-based subliminal bias detection. Based on Microsoft Research (arXiv:2603.00131).

Integrations

OpenAI async wrapper. LangChain callback. FastAPI middleware. One-line add_c4_safety(app).

LLM Validation

C4 block rate: 96.7% (532/550 prompts blocked preemptively)

GPT-4o-mini

0.7%

Attack success rate with C4 active. Baseline: 10.7%. 550 adversarial prompts across 11 AoC categories, 2200 trials. 93.2% reduction.

Mistral 7B

0.5%

Attack success rate with C4 active. Baseline: 22.5%. Local inference via Ollama. Same test suite, 97.6% reduction.

Next Milestones — Path to 90% DR

1Adversarial fine-tuning Train ONNX model on 5-10K adversarial prompts. 64% → 78% DR
2Specialized ensemble Add toxicity + prompt-injection expert models. OR-logic across 3 models. 78% → 85%
3Stronger paraphrase Qwen 32B or GPT-4o-mini for indirect→direct language conversion. 85% → 88%
4Multi-turn trajectory Track C4 state drift across conversation turns. Preemptive blocking. 88% → 90%
5Runtime verification Agda/Coq theorems as live invariant checks. Structural impossibility. 90% → 91%
Target: ~90% DR · ~7% FPR Full roadmap →

Resources

Red Team Challenge

Break our defenses. We'll thank you.

No AI safety system is perfect. If you find a prompt that bypasses @guard, report it and we'll fix it — publicly acknowledging your contribution.

🦊 Report Bypasses