From 5bfaee8dd55cb535e07e3756b8be75c48bbed6dd Mon Sep 17 00:00:00 2001 From: "Dhanji R. Prasanna" Date: Thu, 8 Jan 2026 12:54:03 +1100 Subject: [PATCH] use consistent naming for compaction --- DESIGN.md | 6 +- README.md | 6 +- agents/carmack.md | 25 ++++--- crates/g3-cli/src/lib.rs | 18 +++--- crates/g3-cli/src/ui_writer_impl.rs | 2 +- crates/g3-core/src/context_window.rs | 14 ++-- crates/g3-core/src/error_handling.rs | 4 +- crates/g3-core/src/lib.rs | 72 ++++++++++----------- crates/g3-core/tests/test_token_counting.rs | 10 +-- docs/CONTROL_COMMANDS.md | 12 ++-- docs/architecture.md | 2 +- docs/configuration.md | 2 +- docs/providers.md | 2 +- 13 files changed, 86 insertions(+), 89 deletions(-) diff --git a/DESIGN.md b/DESIGN.md index 568ca0e..3c276dd 100644 --- a/DESIGN.md +++ b/DESIGN.md @@ -14,7 +14,7 @@ The agent follows a **tool-first philosophy**: instead of just providing advice, 4. **Modularity**: Clear separation of concerns 5. **Composability**: Components can be combined in different ways 6. **Performance**: Built in Rust for speed and reliability -7. **Context Intelligence**: Smart context window management with auto-summarization +7. **Context Intelligence**: Smart context window management with auto-compaction 8. **Error Resilience**: Robust error handling with automatic retry logic ## Project Structure @@ -87,7 +87,7 @@ g3/ - Error handling with automatic retry logic **Key Features:** -- **Context Window Intelligence**: Automatic monitoring with percentage-based tracking (80% capacity triggers auto-summarization) +- **Context Window Intelligence**: Automatic monitoring with percentage-based tracking (80% capacity triggers auto-compaction) - **Tool System**: Built-in tools for file operations (read, write, edit), shell commands, and structured output - **Streaming Parser**: Real-time parsing of LLM responses with tool call detection and execution - **Session Management**: Automatic session logging with detailed conversation history and token usage @@ -402,7 +402,7 @@ This design document reflects the current state of G3 as a mature, production-re - ✅ **Configuration**: TOML-based config with environment overrides - ✅ **Error Handling**: Comprehensive retry logic and error classification - ✅ **Session Logging**: Automatic session tracking and JSON logs -- ✅ **Context Management**: Context thinning (50-80%) and auto-summarization at 80% capacity +- ✅ **Context Management**: Context thinning (50-80%) and auto-compaction at 80% capacity - ✅ **Computer Control**: Cross-platform automation with OCR support - ✅ **TODO Management**: In-memory TODO list with read/write tools diff --git a/README.md b/README.md index 0cad9b5..3b8e359 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,7 @@ G3 follows a modular architecture organized as a Rust workspace with multiple cr #### **g3-core** The heart of the agent system, containing: - **Agent Engine**: Main orchestration logic for handling conversations, tool execution, and task management -- **Context Window Management**: Intelligent tracking of token usage with context thinning (50-80%) and auto-summarization at 80% capacity +- **Context Window Management**: Intelligent tracking of token usage with context thinning (50-80%) and auto-compaction at 80% capacity - **Tool System**: Built-in tools for file operations, shell commands, computer control, TODO management, and structured output - **Streaming Response Parser**: Real-time parsing of LLM responses with tool call detection and execution - **Task Execution**: Support for single and iterative task execution with automatic retry logic @@ -80,14 +80,14 @@ After each response, G3 displays a timing footer showing elapsed time, time to f ### Intelligent Context Management - Automatic context window monitoring with percentage-based tracking -- Smart auto-summarization when approaching token limits +- Smart auto-compaction when approaching token limits - **Context thinning** at 50%, 60%, 70%, 80% thresholds - automatically replaces large tool results with file references - Conversation history preservation through summaries - Dynamic token allocation for different providers (4k to 200k+ tokens) ### Interactive Control Commands G3's interactive CLI includes control commands for manual context management: -- **`/compact`**: Manually trigger summarization to compact conversation history +- **`/compact`**: Manually trigger compaction to compact conversation history - **`/thinnify`**: Manually trigger context thinning to replace large tool results with file references - **`/skinnify`**: Manually trigger full context thinning (like `/thinnify` but processes the entire context window, not just the first third) - **`/readme`**: Reload README.md and AGENTS.md from disk without restarting diff --git a/agents/carmack.md b/agents/carmack.md index 8102c1f..76a3b71 100644 --- a/agents/carmack.md +++ b/agents/carmack.md @@ -3,7 +3,7 @@ SYSTEM PROMPT — “Carmack” (In-Code Readability & Craft Agent) You are Carmack: a code-aware readability agent, inspired by John Carmack. You work **inside source code files only — ever.** -Your job is to make complex logic understandable to humans and code a joy to read. +Your job is to simplify, make code easy to understand, and a joy to read. ------------------------------------------------------------ PRIME DIRECTIVE @@ -18,7 +18,7 @@ PRIME DIRECTIVE - Non-negotiable nudge: **Readable code > commented code.** -You remain disciplined inside the source. Do NOT touch docs, READMEs, etc. +Stay inside the source. Do NOT touch docs, READMEs, etc. ------------------------------------------------------------ ALLOWED ACTIVITIES @@ -26,16 +26,14 @@ ALLOWED ACTIVITIES LOCAL REFACTORS (behavior-preserving, BUT aggressively readability improving): - Rename private functions/variables for legibility -- Extract overly long functions into smaller helpers -- Simplify nested conditionals -- Clarify data shapes and invariants -- Replace clever tricks with plain constructs -- Improve existing explanations - Pull out constants, interfaces, structs for readability -- If files are larger than 1000 lines, refactor them into smaller pieces -- If functions are longer than 250 lines refactor them +- Simplify nested control flow and conditionals +- Return well-defined structs over tuples/vectors +- Extract overly long functions and files into smaller helpers/components + - If files are larger than 1000 lines, refactor them into smaller pieces + - If functions are longer than 250 lines refactor them -EXPLANATION (only when needed): +ADD EXPLANATIONS (when needed): - Describe non-obvious algorithms in a short header comment sketch - Explain macros, protocols, serializers, hotspot systems, briefly @@ -48,9 +46,9 @@ EXPLICIT BANS You MUST NOT: -- Modify system architecture or layering +- Modify system architecture - Change public APIs, CLI flags, or file formats -- Add per-line explanatory comments to **obvious** code +- Add explanatory comments to **obvious** code - Introduce mocks or new libraries ------------------------------------------------------------ @@ -61,9 +59,8 @@ Your output is successful if: - the code is pure joy to read for a skilled programmer - Humans can understand complex regions faster - A correct file becomes more pleasant to modify -- Control flow straightens +- Files get smaller, more modular, composable, easy to trace - Behavior is unchanged -- No architecture or external docs were touched ------------------------------------------------------------ CARMACK PREFLIGHT CHECKLIST diff --git a/crates/g3-cli/src/lib.rs b/crates/g3-cli/src/lib.rs index d68083a..c800745 100644 --- a/crates/g3-cli/src/lib.rs +++ b/crates/g3-cli/src/lib.rs @@ -1666,7 +1666,7 @@ async fn run_interactive( "/help" => { output.print(""); output.print("📖 Control Commands:"); - output.print(" /compact - Trigger auto-summarization (compacts conversation history)"); + output.print(" /compact - Trigger compaction (compacts conversation history)"); output.print(" /thinnify - Trigger context thinning (replaces large tool results with file references)"); output.print(" /skinnify - Trigger full context thinning (like /thinnify but for entire context, not just first third)"); output.print(" /clear - Clear session and start fresh (discards continuation artifacts)"); @@ -1680,17 +1680,17 @@ async fn run_interactive( continue; } "/compact" => { - output.print("🗜️ Triggering manual summarization..."); - match agent.force_summarize().await { + output.print("🗜️ Triggering manual compaction..."); + match agent.force_compact().await { Ok(true) => { - output.print("✅ Summarization completed successfully"); + output.print("✅ Compaction completed successfully"); } Ok(false) => { - output.print("⚠️ Summarization failed"); + output.print("⚠️ Compaction failed"); } Err(e) => { output.print(&format!( - "❌ Error during summarization: {}", + "❌ Error during compaction: {}", e )); } @@ -1909,9 +1909,9 @@ async fn run_interactive_machine( match input.as_str() { "/compact" => { println!("COMMAND: compact"); - match agent.force_summarize().await { - Ok(true) => println!("RESULT: Summarization completed"), - Ok(false) => println!("RESULT: Summarization failed"), + match agent.force_compact().await { + Ok(true) => println!("RESULT: Compaction completed"), + Ok(false) => println!("RESULT: Compaction failed"), Err(e) => println!("ERROR: {}", e), } continue; diff --git a/crates/g3-cli/src/ui_writer_impl.rs b/crates/g3-cli/src/ui_writer_impl.rs index 1f7f97f..d99bcc1 100644 --- a/crates/g3-cli/src/ui_writer_impl.rs +++ b/crates/g3-cli/src/ui_writer_impl.rs @@ -321,7 +321,7 @@ impl UiWriter for ConsoleUiWriter { fn print_final_output(&self, summary: &str) { // Show spinner while "formatting" let spinner_frames = ['⠋', '⠙', '⠹', '⠸', '⠼', '⠴', '⠦', '⠧', '⠇', '⠏']; - let message = "summarizing work done..."; + let message = "compacting work done..."; // Brief spinner animation (about 0.5 seconds) for i in 0..5 { diff --git a/crates/g3-core/src/context_window.rs b/crates/g3-core/src/context_window.rs index 6db485b..518dd72 100644 --- a/crates/g3-core/src/context_window.rs +++ b/crates/g3-core/src/context_window.rs @@ -183,8 +183,8 @@ impl ContextWindow { self.total_tokens.saturating_sub(self.used_tokens) } - /// Check if we should trigger summarization (at 80% capacity) - pub fn should_summarize(&self) -> bool { + /// Check if we should trigger compaction (at 80% capacity) + pub fn should_compact(&self) -> bool { // Trigger at 80% OR if we're getting close to absolute limits // This prevents issues with models that have large contexts but still hit limits let percentage_trigger = self.percentage_used() >= 80.0; @@ -744,19 +744,19 @@ mod tests { } #[test] - fn test_should_summarize_at_80_percent() { + fn test_should_compact_at_80_percent() { let mut cw = ContextWindow::new(100); cw.used_tokens = 79; - assert!(!cw.should_summarize()); + assert!(!cw.should_compact()); cw.used_tokens = 80; - assert!(cw.should_summarize()); + assert!(cw.should_compact()); } #[test] - fn test_should_summarize_at_absolute_limit() { + fn test_should_compact_at_absolute_limit() { let mut cw = ContextWindow::new(1_000_000); cw.used_tokens = 150_001; - assert!(cw.should_summarize()); + assert!(cw.should_compact()); } #[test] diff --git a/crates/g3-core/src/error_handling.rs b/crates/g3-core/src/error_handling.rs index 9e582aa..8513f87 100644 --- a/crates/g3-core/src/error_handling.rs +++ b/crates/g3-core/src/error_handling.rs @@ -181,7 +181,7 @@ pub enum RecoverableError { ModelBusy, /// Timeout Timeout, - /// Token limit exceeded (might be recoverable with summarization) + /// Token limit exceeded (might be recoverable with compaction) TokenLimit, /// Context length exceeded (prompt too long) - should end current turn in autonomous mode ContextLengthExceeded, @@ -357,7 +357,7 @@ where // Special handling for token limit errors if matches!(recoverable_type, RecoverableError::TokenLimit) { - debug!("Token limit error detected. Consider triggering summarization."); + debug!("Token limit error detected. Consider triggering compaction."); } tokio::time::sleep(delay).await; diff --git a/crates/g3-core/src/lib.rs b/crates/g3-core/src/lib.rs index 7ca10e7..f8448dd 100644 --- a/crates/g3-core/src/lib.rs +++ b/crates/g3-core/src/lib.rs @@ -92,9 +92,9 @@ pub struct Agent { providers: ProviderRegistry, context_window: ContextWindow, thinning_events: Vec, // chars saved per thinning event - pending_90_summarization: bool, // flag to trigger summarization at 90% + pending_90_compaction: bool, // flag to trigger compaction at 90% auto_compact: bool, // whether to auto-compact at 90% before tool calls - summarization_events: Vec, // chars saved per summarization event + compaction_events: Vec, // chars saved per compaction event first_token_times: Vec, // time to first token for each completion config: Config, session_id: Option, @@ -267,9 +267,9 @@ impl Agent { providers, context_window, auto_compact: config.agent.auto_compact, - pending_90_summarization: false, + pending_90_compaction: false, thinning_events: Vec::new(), - summarization_events: Vec::new(), + compaction_events: Vec::new(), first_token_times: Vec::new(), config, session_id: None, @@ -856,15 +856,15 @@ impl Agent { self.save_context_window("completed"); // Check if we need to do 90% auto-compaction - if self.pending_90_summarization { + if self.pending_90_compaction { self.ui_writer .print_context_status("\n⚡ Context window reached 90% - auto-compacting...\n"); - if let Err(e) = self.force_summarize().await { + if let Err(e) = self.force_compact().await { warn!("Failed to auto-compact at 90%: {}", e); } else { self.ui_writer.println(""); } - self.pending_90_summarization = false; + self.pending_90_compaction = false; } // Return the task result which already includes timing if needed @@ -940,13 +940,13 @@ impl Agent { } } - /// Manually trigger context summarization regardless of context window size - /// Returns Ok(true) if summarization was successful, Ok(false) if it failed - pub async fn force_summarize(&mut self) -> Result { - debug!("Manual summarization triggered"); + /// Manually trigger context compaction regardless of context window size + /// Returns Ok(true) if compaction was successful, Ok(false) if it failed + pub async fn force_compact(&mut self) -> Result { + debug!("Manual compaction triggered"); self.ui_writer.print_context_status(&format!( - "\n🗜️ Manual summarization requested (current usage: {}%)...", + "\n🗜️ Manual compaction requested (current usage: {}%)...", self.context_window.percentage_used() as u32 )); @@ -1048,7 +1048,7 @@ impl Agent { let chars_saved = self .context_window .reset_with_summary(summary_response.content, latest_user_msg); - self.summarization_events.push(chars_saved); + self.compaction_events.push(chars_saved); Ok(true) } @@ -1238,17 +1238,17 @@ impl Agent { } stats.push_str(&format!( - " • Summarizations: {:>10}\n", - self.summarization_events.len() + " • Compactions: {:>10}\n", + self.compaction_events.len() )); - if !self.summarization_events.is_empty() { - let total_summarized: usize = self.summarization_events.iter().sum(); - let avg_summarized = total_summarized / self.summarization_events.len(); + if !self.compaction_events.is_empty() { + let total_compacted: usize = self.compaction_events.iter().sum(); + let avg_compacted = total_compacted / self.compaction_events.len(); stats.push_str(&format!( " • Total Chars Saved: {:>10}\n", - total_summarized + total_compacted )); - stats.push_str(&format!(" • Avg Chars/Event: {:>10}\n", avg_summarized)); + stats.push_str(&format!(" • Avg Chars/Event: {:>10}\n", avg_compacted)); } stats.push('\n'); @@ -1604,9 +1604,9 @@ impl Agent { // Note: Session-level duplicate tracking was removed - we only prevent sequential duplicates (DUP IN CHUNK, DUP IN MSG) let mut turn_accumulated_usage: Option = None; // Track token usage for timing footer - // Check if we need to summarize before starting - if self.context_window.should_summarize() { - // First try thinning if we are at capacity, don't call the LLM for a summary (might fail) + // Check if we need to compact before starting + if self.context_window.should_compact() { + // First try thinning if we are at capacity, don't call the LLM for compaction (might fail) if self.context_window.percentage_used() > 90.0 && self.context_window.should_thin() { self.ui_writer.print_context_status(&format!( "\n🥒 Context window at {}%. Trying thinning first...", @@ -1617,23 +1617,23 @@ impl Agent { self.ui_writer.print_context_thinning(&thin_summary); // Check if thinning was sufficient - if !self.context_window.should_summarize() { + if !self.context_window.should_compact() { self.ui_writer.print_context_status( "✅ Thinning resolved capacity issue. Continuing...\n", ); - // Continue with the original request without summarization + // Continue with the original request without compaction } else { self.ui_writer.print_context_status( - "⚠️ Thinning insufficient. Proceeding with summarization...\n", + "⚠️ Thinning insufficient. Proceeding with compaction...\n", ); } } - // Only proceed with summarization if still needed after thinning - if self.context_window.should_summarize() { - // Notify user about summarization + // Only proceed with compaction if still needed after thinning + if self.context_window.should_compact() { + // Notify user about compaction self.ui_writer.print_context_status(&format!( - "\n🗜️ Context window reaching capacity ({}%). Creating summary...", + "\n🗜️ Context window reaching capacity ({}%). Compacting...", self.context_window.percentage_used() as u32 )); @@ -1735,17 +1735,17 @@ impl Agent { let chars_saved = self .context_window .reset_with_summary(summary_response.content, latest_user_msg); - self.summarization_events.push(chars_saved); + self.compaction_events.push(chars_saved); // Update the request with new context request.messages = self.context_window.conversation_history.clone(); } Err(e) => { error!("Failed to create summary: {}", e); - self.ui_writer.print_context_status("⚠️ Unable to create summary. Consider starting a new session if you continue to see errors.\n"); - // Don't continue with the original request if summarization failed + self.ui_writer.print_context_status("⚠️ Unable to compact context. Consider starting a new session if you continue to see errors.\n"); + // Don't continue with the original request if compaction failed // as we're likely at token limit - return Err(anyhow::anyhow!("Context window at capacity and summarization failed. Please start a new session.")); + return Err(anyhow::anyhow!("Context window at capacity and compaction failed. Please start a new session.")); } } } @@ -1963,9 +1963,9 @@ impl Agent { // Check if we should auto-compact at 90% BEFORE executing the tool // We need to do this before any borrows of self if self.auto_compact && self.context_window.percentage_used() >= 90.0 { - // Set flag to trigger summarization after this turn completes + // Set flag to trigger compaction after this turn completes // We can't do it now due to borrow checker constraints - self.pending_90_summarization = true; + self.pending_90_compaction = true; } // Check if we should thin the context BEFORE executing the tool diff --git a/crates/g3-core/tests/test_token_counting.rs b/crates/g3-core/tests/test_token_counting.rs index 21f62dd..654d41e 100644 --- a/crates/g3-core/tests/test_token_counting.rs +++ b/crates/g3-core/tests/test_token_counting.rs @@ -2,7 +2,7 @@ use g3_core::ContextWindow; use g3_providers::{Message, MessageRole, Usage}; /// Test that used_tokens is tracked via add_message, not update_usage_from_response. -/// This is critical for the 80% summarization threshold to work correctly. +/// This is critical for the 80% compaction threshold to work correctly. #[test] fn test_used_tokens_tracked_via_messages() { let mut window = ContextWindow::new(10000); @@ -106,10 +106,10 @@ fn test_percentage_based_on_used_tokens() { assert!(window.remaining_tokens() < 1000, "remaining tokens should decrease"); } -/// Test that the 80% summarization threshold works correctly. +/// Test that the 80% compaction threshold works correctly. /// This was the original bug - used_tokens was being double/triple counted. #[test] -fn test_should_summarize_threshold() { +fn test_should_compact_threshold() { let mut window = ContextWindow::new(1000); // Add messages until we approach 80% @@ -131,9 +131,9 @@ fn test_should_summarize_threshold() { let percentage_after = window.percentage_used(); println!("After 10 messages: {}% used ({} tokens)", percentage_after, window.used_tokens); - // Now should_summarize should return true if we're at 80%+ + // Now should_compact should return true if we're at 80%+ if percentage_after >= 80.0 { - assert!(window.should_summarize(), "should_summarize should be true at 80%+"); + assert!(window.should_compact(), "should_compact should be true at 80%+"); } } diff --git a/docs/CONTROL_COMMANDS.md b/docs/CONTROL_COMMANDS.md index 2a92a97..5ec4f33 100644 --- a/docs/CONTROL_COMMANDS.md +++ b/docs/CONTROL_COMMANDS.md @@ -11,7 +11,7 @@ Control commands are special commands you can use during an interactive G3 sessi | Command | Description | |---------|-------------| -| `/compact` | Manually trigger conversation summarization | +| `/compact` | Manually trigger conversation compaction | | `/thinnify` | Replace large tool results with file references (first third) | | `/skinnify` | Full context thinning (entire context window) | | `/readme` | Reload README.md and AGENTS.md from disk | @@ -22,7 +22,7 @@ Control commands are special commands you can use during an interactive G3 sessi ## /compact -Manually trigger conversation summarization to reduce context size. +Manually trigger conversation compaction to reduce context size. **When to use**: - Context usage is getting high (70%+) @@ -30,7 +30,7 @@ Manually trigger conversation summarization to reduce context size. - Conversation has accumulated irrelevant history **What it does**: -1. Sends conversation history to LLM for summarization +1. Sends conversation history to LLM for compaction 2. Replaces detailed history with concise summary 3. Preserves key decisions and context 4. Significantly reduces token usage @@ -144,7 +144,7 @@ Show detailed context and performance statistics. - Session duration - Token usage breakdown - Tool call metrics -- Thinning and summarization events +- Thinning and compaction events - First-token latency statistics **Example**: @@ -198,7 +198,7 @@ When context gets high: 1. **50-70%**: Consider `/thinnify` 2. **70-80%**: Use `/compact` 3. **80-90%**: Use `/skinnify` then `/compact` -4. **90%+**: Auto-summarization triggers +4. **90%+**: Auto-compaction triggers ### Best Practices @@ -218,7 +218,7 @@ G3 performs automatic context management: | 50% | Thin oldest third of context | | 60% | Thin oldest third of context | | 70% | Thin oldest third of context | -| 80% | Auto-summarization (if `auto_compact = true`) | +| 80% | Auto-compaction (if `auto_compact = true`) | | 90% | Aggressive thinning before tool calls | Manual commands give you finer control over when and how this happens. diff --git a/docs/architecture.md b/docs/architecture.md index 58fe54a..2c7faa8 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -289,7 +289,7 @@ The `ContextWindow` struct manages conversation history with intelligent token t 1. **Token Tracking**: Monitors usage as percentage of provider's context limit 2. **Context Thinning**: At 50%, 60%, 70%, 80% thresholds, replaces large tool results with file references -3. **Auto-Summarization**: At 80% capacity, triggers conversation summarization +3. **Auto-Compaction**: At 80% capacity, triggers conversation compaction 4. **Provider Adaptation**: Adjusts to different model context windows (4k to 200k+ tokens) ## Error Handling diff --git a/docs/configuration.md b/docs/configuration.md index 5e85e40..215576f 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -376,5 +376,5 @@ For Databricks OAuth: If you see context overflow errors: 1. Check `max_context_length` in `[agent]` -2. Use `/compact` command to manually summarize +2. Use `/compact` command to manually compact 3. Use `/thinnify` to replace large tool results with file references diff --git a/docs/providers.md b/docs/providers.md index 5e0886e..98e2ad1 100644 --- a/docs/providers.md +++ b/docs/providers.md @@ -386,7 +386,7 @@ To reduce rate limit issues: ### Context Window Errors If you see "context too long" errors: -1. Use `/compact` to summarize conversation +1. Use `/compact` to compact conversation 2. Use `/thinnify` to replace large tool results 3. Increase `max_context_length` in config 4. Switch to a provider with larger context