Compaction

When the conversation grows too long for the model's context window, OpenVesper summarizes older messages into a single system note. Saves tokens, keeps the conversation alive.

Triggers

  • Manual โ€” user types /compact or hits POST /sessions/:key/compact
  • Auto โ€” shouldAutoCompact() exposes a flag when estimated tokens cross 80% of the budget; clients can call /compact when they see it set

How it works

  1. Keep the most recent N messages verbatim (default: 10)
  2. Concatenate older messages into a transcript
  3. (If LLM provided) ask the model to summarize the transcript
  4. (Fallback) structural summary: "[Compacted N earlier messages, first user message: '...']"
  5. Replace old messages with a single system-role summary entry

Manual compaction

# Inside a chat
/compact

# Or with a hint
/compact Focus on the trading decisions we discussed

# Or via API
curl -X POST http://127.0.0.1:18789/sessions/user-123/compact \
  -d '{"keepRecent": 15, "instructions": "Preserve API key locations"}'

Checking before you compact

curl http://127.0.0.1:18789/sessions/user-123/tokens
# {
#   "sessionKey": "user-123",
#   "messageCount": 87,
#   "estimatedTokens": 24530,
#   "shouldAutoCompact": false
# }

Token estimate is approximate: 1 token โ‰ˆ 4 characters. Real provider counts vary.

Memory flush (advanced)

Before compaction destroys detail, you can run a "memory flush" โ€” a silent LLM turn that asks the agent to write durable notes to MEMORY.md. That way, key facts persist even after the original messages are summarized.

See Memory Engine for the active memory system.

Source

Implementation: apps/gateway/src/compaction.ts, apps/gateway/src/memory-flush.ts.

What's next?