Caudal
Caudal is an attention engine for AI agents. It tells an agent what matters right now.
Modern agents have tools, vector search, and long-term storage — but they still behave strangely:
- They bring up old topics
- They lose track of current user intent
- They repeat irrelevant actions
- They need custom heuristics for recency and prioritization
The missing piece is not more knowledge. It is attention.
Caudal provides a continuously evolving “attention signal” derived from real interactions. Agents send events. Caudal learns which entities and relationships are currently important, and naturally forgets what is no longer relevant.
No prompts. No fine-tuning. No heuristics. Just behavior → relevance.
What Caudal is
Caudal is not a database, a vector store, or a conversation history. It is a new layer:
| System | Answers |
|---|---|
| SQL / Graph DB | What is true? |
| Vector DB | What is similar? |
| LLM | What can I reason about? |
| Caudal | What should I focus on now? |
Caudal stores interaction traces and continuously computes a ranked relevance field with built-in forgetting.
Quickstart (Docker)
# Start dependencies (Postgres) + Caudal
cd docker
docker compose up -d
# Check health
curl http://localhost:8080/actuator/health
Default dev mode allows requests without auth when AUTH_DISABLED=true.
How an agent uses Caudal
1. Write: emit events while working
Whenever the agent observes meaningful activity, it sends events:
- User mentions a topic
- A tool is used
- A document is retrieved
- A task succeeds or fails
curl -X POST http://localhost:8080/api/v1/events \
-H "Content-Type: application/json" \
-d '{
"space": "user:123",
"events": [
{
"src": "user:123",
"dst": "topic:car-buying",
"type": "chat",
"intensity": 2.0
},
{
"src": "agent:planner",
"dst": "tool:car-comparison",
"type": "tool_use",
"intensity": 1.0
}
]
}'
Caudal interprets these as signals of attention, not facts.
2. Read: ask what matters now
Before deciding what to do next, the agent queries Caudal:
curl "http://localhost:8080/api/v1/focus?space=user:123&k=5"
{
"asOf": "2026-02-28T10:06:00Z",
"items": [
{ "id": "topic:car-buying", "score": 0.83 },
{ "id": "topic:stroller", "score": 0.61 },
{ "id": "topic:cycling", "score": 0.22 }
]
}
The agent now knows what to prioritize in conversation, planning, tool selection, and retrieval.
3. Follow associations
Agents can explore what is likely to come next:
curl "http://localhost:8080/api/v1/next?space=user:123&src=topic:car-buying&k=5"
{
"asOf": "2026-02-28T10:06:00Z",
"items": [
{ "id": "user:123", "score": 0.83 },
{ "id": "tool:car-comparison", "score": 0.45 },
{ "id": "topic:stroller", "score": 0.12 }
]
}
This enables recommendations, coherent planning, and contextual tool use.
4. Explore multi-hop pathways
For deeper associations across several steps, agents can sample pathways starting from an entity. This helps with planning, recommendations, and explaining why something is relevant.
curl -X POST http://localhost:8080/api/v1/pathways \
-H "Content-Type: application/json" \
-d '{
"space": "user:123",
"start": "user:123",
"k": 5,
"mode": "deep"
}'
| Mode | Description |
|---|---|
"fast" | Fewer samples, shorter walks — low latency |
"balanced" | Good default (used when mode is omitted) |
"deep" | More samples, longer walks — thorough |
{
"asOf": "2026-02-28T10:06:00Z",
"topEntities": [
{ "id": "topic:car-buying", "score": 0.72 },
{ "id": "brand:toyota", "score": 0.51 },
{ "id": "model:yaris_cross", "score": 0.44 }
],
"paths": [
{ "nodes": ["user:123","topic:car-buying","brand:toyota","model:yaris_cross"], "score": 0.34 },
{ "nodes": ["user:123","topic:car-buying","doc:comparison_guide"], "score": 0.28 }
]
}
5. Modulate attention
Sometimes you need to suppress or amplify attention without erasing memory.
curl -X POST http://localhost:8080/api/v1/modulate \
-H "Content-Type: application/json" \
-d '{
"space": "user:123",
"modulations": [
{ "entity": "topic:bikes", "attention": 0.1, "decay": 50 }
]
}'
| Attention value | Effect |
|---|---|
0.0 | Fully suppress — entity disappears from results |
0.1 | Strongly suppress — 10% of normal |
1.0 | Normal (resets any modulation) |
3.0 | Triple the score |
The decay field controls how many events until the modulation fades to half strength. Omit it for a persistent modulation.
Why this improves agents
Without Caudal, agent developers implement recency windows, TTL memory, custom ranking formulas, and manual heuristics.
With Caudal:
- Recent behavior reinforces relevance
- Old context fades naturally
- Focus emerges automatically
Caudal gives agents working memory. Vector databases give agents recall. LLMs give agents reasoning. All three together produce far more stable behavior.
Architecture
- Core engine (pure Java): decay, reinforcement, ranking, pathways
- Server (Spring Boot): REST, auth, persistence, metrics
- PostgreSQL: event log (WAL) + periodic snapshots
Time is handled internally via discrete buckets for deterministic recovery and testing.
Docker image
Caudal uses Spring Boot Buildpacks to produce an OCI image — no Dockerfile needed.
# Build the image locally
mvn spring-boot:build-image -pl server -DskipTests
# Run the full stack
cd docker && docker compose --profile full up -d
The image is published to GHCR as ghcr.io/caudal-labs/caudal-server.
Claude Code Skill
Caudal ships a Claude Code skill (caudal-attention) that integrates temporal attention directly into Claude’s workflow.
Install
/plugin install caudal-attention@caudal-skills
Configure
export CAUDAL_URL=http://localhost:8080
export CAUDAL_API_KEY=your-api-key
Add to your project’s CLAUDE.md:
At the start of every conversation, invoke the `caudal-attention` skill before doing anything else.