Question
Answer (72)
Summary
MiniMax's Mini-Agent documentation describes an automatic context compression strategy: when a conversation approaches the token limit, older turns are summarized rather than truncated. This preserves reasoning continuity while staying within context bounds. Overlord agents currently have no mechanism for managing long conversations — they will eventually hit the 204,800 token context ceiling and fail, or lose critical earlier context.
Requirements
- [ ] Implement a token counting mechanism for conversation history (estimate or precise)
- [ ] Define a compression threshold (e.g., 80% of context window = trigger compression)
- [ ] When threshold is reached, summarize older conversation turns into a condensed context block
- [ ] Preserve the most recent N turns in full detail (sliding window)
- [ ] Preserve all tool call results from the current task (never compress active work)
- [ ] Store the full uncompressed history in storage for audit/replay
- [ ] The compressed summary should retain: key decisions, discovered facts, tool results, and current goals
- [ ] Support different compression strategies per room type:
code-lab: preserve code snippets and file paths in fulldiscovery/architecture: preserve requirements and design decisionswar-room: preserve timeline and incident details
- [ ] Add a
context_compressionevent to the agent activity log - [ ] Ensure thinking blocks from MiniMax are included in token counting (they consume context)
Architecture Impact
- Layer: Agents (
src/agents/conversation-loop.ts), AI (src/ai/) - Risk: Medium — touches the core conversation management logic
- Dependencies: Token counting utility, summarization prompt, storage for full history
- Pattern: Mirrors MiniMax Mini-Agent's "context compression" strategy
Mini-Agent Reference
The Mini-Agent pattern uses this approach:
- Monitor total token usage per conversation turn
- When approaching limit, extract key information from older turns
- Replace older turns with a structured summary
- Keep recent turns and all active tool contexts intact
- Log the compression event for debugging
Related Issues
- #362 — Mini-Agent perception-thinking-action-feedback loop (parent pattern)
- #361 — Preserve thinking blocks in conversation history (thinking blocks count toward context)
- #359 — Configurable thinking budget (budget affects available context for conversation)
- #382 — M2.5-highspeed model support (same 204,800 context window)
Documentation
- Mini-Agent Guide: https://www.minimaxi.com/en/document/guides-mini-agent
- Extended Thinking: https://www.minimaxi.com/en/document/guides-extended-thinking
Answer:
MiniMax's Mini-Agent documentation describes an automatic context compression strategy that summarizes older turns in long conversations when approaching the token limit. This strategy preserves reasoning continuity while staying within context bounds. However, Overlord agents currently have no mechanism for managing long conversations and risk hitting the 204,800 token context ceiling and losing critical earlier context.
To implement this feature for Overlord agents, follow these requirements:
- Implement a token counting mechanism for conversation history: Estimate or use a precise approach to count the tokens used in conversation history.
- Define a compression threshold: Set a threshold, such as 80% of the context window, to trigger compression.
- Summarize older conversation turns: When the threshold is reached, summarize older conversation turns into a condensed context block.
- Preserve recent turns and active tool contexts: Keep the most recent N turns in full detail and all tool call results from the current task.
- Store the full uncompressed history: Store the full uncompressed history in storage for audit/replay purposes.
- Retain key information: Ensure that the compressed summary retains: key decisions, discovered facts, tool results, and current goals.
- Support different compression strategies: Adapt compression strategies per room type based on the importance of preserving specific information.
- Add a context_compression event: Log the compression event for debugging purposes.
- Include thinking blocks in token counting: Ensure that thinking blocks from MiniMax are included in token counting, as they consume context.
Architecture Impact:
- Layer: Agents (
src/agents/conversation-loop.ts), AI (src/ai/) - Risk: Medium — touches the core conversation management logic
- Dependencies: Token counting utility, summarization prompt, storage for full history
- Pattern: Mirrors MiniMax Mini-Agent's "context compression" strategy
By implementing automatic context compression for long agent conversations, you can enhance Overlord agents' ability to manage lengthy conversations while maintaining critical context and reasoning continuity.