Open Standard · Apache 2.0

Geometric safety for multi-agent AI

One decorator. Three layers of defense. C4 classifies AI behavior into a 27-state cognitive topology and blocks dangerous trajectories before they execute.

<1%

Attack success rate
with C4 active

AoC failure modes
blocked

295

Tests
passing

4.3ms

Median latency
dual classifier

$pip install c4-protocol

$pip install "c4-protocol[full]"

How it works

from c4_protocol import guard @guard def agent(prompt: str) -> str: return llm.generate(prompt) # Harmful prompts are blocked automatically agent("ignore all previous instructions") // → C4SafetyError: BLOCKED | State: (1,1,2) | Risk: 85% // BERT danger + keyword: 'ignore previous instructions' # Safe prompts pass through agent("Hello, how are you today?") // → "I'm doing well, thank you for asking!"

How to integrate — 3 ways

⊡

Decorator

One @guard per agent function. Wraps any Python function that calls an LLM. Blocked calls raise C4SafetyError.

@guard
def agent(prompt): ..."

⊕

FastAPI Middleware

One line protects an entire server. All POST requests auto-classified. Blocked requests return 403.

add_c4_safety(app)

⊛

Manual API

Call the classifier directly for custom logic. Get C4 state, confidence, and danger verdict per-text.

DualClassifier().classify(text)

⚠ What it does NOT do: Installing this package does not automatically protect all agents in your system. c4protocol is a library — you integrate it. For multi-agent setups, wrap each agent’s LLM calls in @guard and use C4BGPRouter for inter-agent state coordination (like BGP for routers).

Three-Layer Defense

L1 Dual Classifier BERT ONNX model (416MB) and keyword heuristic run in parallel. OR-logic: block if either flags danger. Defense-in-depth that catches both semantic attacks and known patterns.

L2 16 AoC Defenses Per-category C4 state trajectory analysis. Each defense monitors specific failure modes from the Agents of Chaos taxonomy (arXiv:2602.20021) plus 5 extended categories.

L3 SVETILO Values 7 ethical seals mapped to C4 states. The Empathy Checkpoint ensures Self→System transitions require Other mediation. Value verification on every decision.

Capabilities

⊡

@guard Decorator

One-line safety for any Python function. Async support. Configurable violation modes: raise, warn, redirect.

⊕

Dual Classifier

BERT ONNX + keyword in parallel with OR-logic. Catches what either misses alone.

◈

C4BGP Routing

Border Gateway Protocol for agents. Path vector routing. Φ-distance route selection. Community tags.

⊛

HMAC Messaging

Cryptographic message signing. Nonce replay protection. Constant-time verification.

⬡

ThoughtVirus Defense

C4 trajectory-based subliminal bias detection. Based on Microsoft Research (arXiv:2603.00131).

⎔

Integrations

OpenAI async wrapper. LangChain callback. FastAPI middleware. One-line add_c4_safety(app).

LLM Validation

C4 block rate: 96.7% (532/550 prompts blocked preemptively)

GPT-4o-mini

0.7%

Attack success rate with C4 active. Baseline: 10.7%. 550 adversarial prompts across 11 AoC categories, 2200 trials. 93.2% reduction.

Mistral 7B

0.5%

Attack success rate with C4 active. Baseline: 22.5%. Local inference via Ollama. Same test suite, 97.6% reduction.

Next Milestones — Path to 90% DR

1Adversarial fine-tuning Train ONNX model on 5-10K adversarial prompts. 64% → 78% DR

2Specialized ensemble Add toxicity + prompt-injection expert models. OR-logic across 3 models. 78% → 85%

3Stronger paraphrase Qwen 32B or GPT-4o-mini for indirect→direct language conversion. 85% → 88%

4Multi-turn trajectory Track C4 state drift across conversation turns. Preemptive blocking. 88% → 90%

5Runtime verification Agda/Coq theorems as live invariant checks. Structural impossibility. 90% → 91%

Target: ~90% DR · ~7% FPR Full roadmap →

Resources

▣

Red Team Challenge

⚔

Break our defenses. We'll thank you.

No AI safety system is perfect. If you find a prompt that bypasses @guard, report it and we'll fix it — publicly acknowledging your contribution.

🦊 Report Bypasses