use consistent naming for compaction
This commit is contained in:
@@ -14,7 +14,7 @@ The agent follows a **tool-first philosophy**: instead of just providing advice,
|
|||||||
4. **Modularity**: Clear separation of concerns
|
4. **Modularity**: Clear separation of concerns
|
||||||
5. **Composability**: Components can be combined in different ways
|
5. **Composability**: Components can be combined in different ways
|
||||||
6. **Performance**: Built in Rust for speed and reliability
|
6. **Performance**: Built in Rust for speed and reliability
|
||||||
7. **Context Intelligence**: Smart context window management with auto-summarization
|
7. **Context Intelligence**: Smart context window management with auto-compaction
|
||||||
8. **Error Resilience**: Robust error handling with automatic retry logic
|
8. **Error Resilience**: Robust error handling with automatic retry logic
|
||||||
|
|
||||||
## Project Structure
|
## Project Structure
|
||||||
@@ -87,7 +87,7 @@ g3/
|
|||||||
- Error handling with automatic retry logic
|
- Error handling with automatic retry logic
|
||||||
|
|
||||||
**Key Features:**
|
**Key Features:**
|
||||||
- **Context Window Intelligence**: Automatic monitoring with percentage-based tracking (80% capacity triggers auto-summarization)
|
- **Context Window Intelligence**: Automatic monitoring with percentage-based tracking (80% capacity triggers auto-compaction)
|
||||||
- **Tool System**: Built-in tools for file operations (read, write, edit), shell commands, and structured output
|
- **Tool System**: Built-in tools for file operations (read, write, edit), shell commands, and structured output
|
||||||
- **Streaming Parser**: Real-time parsing of LLM responses with tool call detection and execution
|
- **Streaming Parser**: Real-time parsing of LLM responses with tool call detection and execution
|
||||||
- **Session Management**: Automatic session logging with detailed conversation history and token usage
|
- **Session Management**: Automatic session logging with detailed conversation history and token usage
|
||||||
@@ -402,7 +402,7 @@ This design document reflects the current state of G3 as a mature, production-re
|
|||||||
- ✅ **Configuration**: TOML-based config with environment overrides
|
- ✅ **Configuration**: TOML-based config with environment overrides
|
||||||
- ✅ **Error Handling**: Comprehensive retry logic and error classification
|
- ✅ **Error Handling**: Comprehensive retry logic and error classification
|
||||||
- ✅ **Session Logging**: Automatic session tracking and JSON logs
|
- ✅ **Session Logging**: Automatic session tracking and JSON logs
|
||||||
- ✅ **Context Management**: Context thinning (50-80%) and auto-summarization at 80% capacity
|
- ✅ **Context Management**: Context thinning (50-80%) and auto-compaction at 80% capacity
|
||||||
- ✅ **Computer Control**: Cross-platform automation with OCR support
|
- ✅ **Computer Control**: Cross-platform automation with OCR support
|
||||||
- ✅ **TODO Management**: In-memory TODO list with read/write tools
|
- ✅ **TODO Management**: In-memory TODO list with read/write tools
|
||||||
|
|
||||||
|
|||||||
@@ -11,7 +11,7 @@ G3 follows a modular architecture organized as a Rust workspace with multiple cr
|
|||||||
#### **g3-core**
|
#### **g3-core**
|
||||||
The heart of the agent system, containing:
|
The heart of the agent system, containing:
|
||||||
- **Agent Engine**: Main orchestration logic for handling conversations, tool execution, and task management
|
- **Agent Engine**: Main orchestration logic for handling conversations, tool execution, and task management
|
||||||
- **Context Window Management**: Intelligent tracking of token usage with context thinning (50-80%) and auto-summarization at 80% capacity
|
- **Context Window Management**: Intelligent tracking of token usage with context thinning (50-80%) and auto-compaction at 80% capacity
|
||||||
- **Tool System**: Built-in tools for file operations, shell commands, computer control, TODO management, and structured output
|
- **Tool System**: Built-in tools for file operations, shell commands, computer control, TODO management, and structured output
|
||||||
- **Streaming Response Parser**: Real-time parsing of LLM responses with tool call detection and execution
|
- **Streaming Response Parser**: Real-time parsing of LLM responses with tool call detection and execution
|
||||||
- **Task Execution**: Support for single and iterative task execution with automatic retry logic
|
- **Task Execution**: Support for single and iterative task execution with automatic retry logic
|
||||||
@@ -80,14 +80,14 @@ After each response, G3 displays a timing footer showing elapsed time, time to f
|
|||||||
|
|
||||||
### Intelligent Context Management
|
### Intelligent Context Management
|
||||||
- Automatic context window monitoring with percentage-based tracking
|
- Automatic context window monitoring with percentage-based tracking
|
||||||
- Smart auto-summarization when approaching token limits
|
- Smart auto-compaction when approaching token limits
|
||||||
- **Context thinning** at 50%, 60%, 70%, 80% thresholds - automatically replaces large tool results with file references
|
- **Context thinning** at 50%, 60%, 70%, 80% thresholds - automatically replaces large tool results with file references
|
||||||
- Conversation history preservation through summaries
|
- Conversation history preservation through summaries
|
||||||
- Dynamic token allocation for different providers (4k to 200k+ tokens)
|
- Dynamic token allocation for different providers (4k to 200k+ tokens)
|
||||||
|
|
||||||
### Interactive Control Commands
|
### Interactive Control Commands
|
||||||
G3's interactive CLI includes control commands for manual context management:
|
G3's interactive CLI includes control commands for manual context management:
|
||||||
- **`/compact`**: Manually trigger summarization to compact conversation history
|
- **`/compact`**: Manually trigger compaction to compact conversation history
|
||||||
- **`/thinnify`**: Manually trigger context thinning to replace large tool results with file references
|
- **`/thinnify`**: Manually trigger context thinning to replace large tool results with file references
|
||||||
- **`/skinnify`**: Manually trigger full context thinning (like `/thinnify` but processes the entire context window, not just the first third)
|
- **`/skinnify`**: Manually trigger full context thinning (like `/thinnify` but processes the entire context window, not just the first third)
|
||||||
- **`/readme`**: Reload README.md and AGENTS.md from disk without restarting
|
- **`/readme`**: Reload README.md and AGENTS.md from disk without restarting
|
||||||
|
|||||||
@@ -3,7 +3,7 @@ SYSTEM PROMPT — “Carmack” (In-Code Readability & Craft Agent)
|
|||||||
You are Carmack: a code-aware readability agent, inspired by John Carmack.
|
You are Carmack: a code-aware readability agent, inspired by John Carmack.
|
||||||
You work **inside source code files only — ever.**
|
You work **inside source code files only — ever.**
|
||||||
|
|
||||||
Your job is to make complex logic understandable to humans and code a joy to read.
|
Your job is to simplify, make code easy to understand, and a joy to read.
|
||||||
|
|
||||||
------------------------------------------------------------
|
------------------------------------------------------------
|
||||||
PRIME DIRECTIVE
|
PRIME DIRECTIVE
|
||||||
@@ -18,7 +18,7 @@ PRIME DIRECTIVE
|
|||||||
- Non-negotiable nudge:
|
- Non-negotiable nudge:
|
||||||
**Readable code > commented code.**
|
**Readable code > commented code.**
|
||||||
|
|
||||||
You remain disciplined inside the source. Do NOT touch docs, READMEs, etc.
|
Stay inside the source. Do NOT touch docs, READMEs, etc.
|
||||||
|
|
||||||
------------------------------------------------------------
|
------------------------------------------------------------
|
||||||
ALLOWED ACTIVITIES
|
ALLOWED ACTIVITIES
|
||||||
@@ -26,16 +26,14 @@ ALLOWED ACTIVITIES
|
|||||||
LOCAL REFACTORS (behavior-preserving, BUT aggressively readability improving):
|
LOCAL REFACTORS (behavior-preserving, BUT aggressively readability improving):
|
||||||
|
|
||||||
- Rename private functions/variables for legibility
|
- Rename private functions/variables for legibility
|
||||||
- Extract overly long functions into smaller helpers
|
|
||||||
- Simplify nested conditionals
|
|
||||||
- Clarify data shapes and invariants
|
|
||||||
- Replace clever tricks with plain constructs
|
|
||||||
- Improve existing explanations
|
|
||||||
- Pull out constants, interfaces, structs for readability
|
- Pull out constants, interfaces, structs for readability
|
||||||
|
- Simplify nested control flow and conditionals
|
||||||
|
- Return well-defined structs over tuples/vectors
|
||||||
|
- Extract overly long functions and files into smaller helpers/components
|
||||||
- If files are larger than 1000 lines, refactor them into smaller pieces
|
- If files are larger than 1000 lines, refactor them into smaller pieces
|
||||||
- If functions are longer than 250 lines refactor them
|
- If functions are longer than 250 lines refactor them
|
||||||
|
|
||||||
EXPLANATION (only when needed):
|
ADD EXPLANATIONS (when needed):
|
||||||
|
|
||||||
- Describe non-obvious algorithms in a short header comment sketch
|
- Describe non-obvious algorithms in a short header comment sketch
|
||||||
- Explain macros, protocols, serializers, hotspot systems, briefly
|
- Explain macros, protocols, serializers, hotspot systems, briefly
|
||||||
@@ -48,9 +46,9 @@ EXPLICIT BANS
|
|||||||
|
|
||||||
You MUST NOT:
|
You MUST NOT:
|
||||||
|
|
||||||
- Modify system architecture or layering
|
- Modify system architecture
|
||||||
- Change public APIs, CLI flags, or file formats
|
- Change public APIs, CLI flags, or file formats
|
||||||
- Add per-line explanatory comments to **obvious** code
|
- Add explanatory comments to **obvious** code
|
||||||
- Introduce mocks or new libraries
|
- Introduce mocks or new libraries
|
||||||
|
|
||||||
------------------------------------------------------------
|
------------------------------------------------------------
|
||||||
@@ -61,9 +59,8 @@ Your output is successful if:
|
|||||||
- the code is pure joy to read for a skilled programmer
|
- the code is pure joy to read for a skilled programmer
|
||||||
- Humans can understand complex regions faster
|
- Humans can understand complex regions faster
|
||||||
- A correct file becomes more pleasant to modify
|
- A correct file becomes more pleasant to modify
|
||||||
- Control flow straightens
|
- Files get smaller, more modular, composable, easy to trace
|
||||||
- Behavior is unchanged
|
- Behavior is unchanged
|
||||||
- No architecture or external docs were touched
|
|
||||||
|
|
||||||
------------------------------------------------------------
|
------------------------------------------------------------
|
||||||
CARMACK PREFLIGHT CHECKLIST
|
CARMACK PREFLIGHT CHECKLIST
|
||||||
|
|||||||
@@ -1666,7 +1666,7 @@ async fn run_interactive<W: UiWriter>(
|
|||||||
"/help" => {
|
"/help" => {
|
||||||
output.print("");
|
output.print("");
|
||||||
output.print("📖 Control Commands:");
|
output.print("📖 Control Commands:");
|
||||||
output.print(" /compact - Trigger auto-summarization (compacts conversation history)");
|
output.print(" /compact - Trigger compaction (compacts conversation history)");
|
||||||
output.print(" /thinnify - Trigger context thinning (replaces large tool results with file references)");
|
output.print(" /thinnify - Trigger context thinning (replaces large tool results with file references)");
|
||||||
output.print(" /skinnify - Trigger full context thinning (like /thinnify but for entire context, not just first third)");
|
output.print(" /skinnify - Trigger full context thinning (like /thinnify but for entire context, not just first third)");
|
||||||
output.print(" /clear - Clear session and start fresh (discards continuation artifacts)");
|
output.print(" /clear - Clear session and start fresh (discards continuation artifacts)");
|
||||||
@@ -1680,17 +1680,17 @@ async fn run_interactive<W: UiWriter>(
|
|||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
"/compact" => {
|
"/compact" => {
|
||||||
output.print("🗜️ Triggering manual summarization...");
|
output.print("🗜️ Triggering manual compaction...");
|
||||||
match agent.force_summarize().await {
|
match agent.force_compact().await {
|
||||||
Ok(true) => {
|
Ok(true) => {
|
||||||
output.print("✅ Summarization completed successfully");
|
output.print("✅ Compaction completed successfully");
|
||||||
}
|
}
|
||||||
Ok(false) => {
|
Ok(false) => {
|
||||||
output.print("⚠️ Summarization failed");
|
output.print("⚠️ Compaction failed");
|
||||||
}
|
}
|
||||||
Err(e) => {
|
Err(e) => {
|
||||||
output.print(&format!(
|
output.print(&format!(
|
||||||
"❌ Error during summarization: {}",
|
"❌ Error during compaction: {}",
|
||||||
e
|
e
|
||||||
));
|
));
|
||||||
}
|
}
|
||||||
@@ -1909,9 +1909,9 @@ async fn run_interactive_machine(
|
|||||||
match input.as_str() {
|
match input.as_str() {
|
||||||
"/compact" => {
|
"/compact" => {
|
||||||
println!("COMMAND: compact");
|
println!("COMMAND: compact");
|
||||||
match agent.force_summarize().await {
|
match agent.force_compact().await {
|
||||||
Ok(true) => println!("RESULT: Summarization completed"),
|
Ok(true) => println!("RESULT: Compaction completed"),
|
||||||
Ok(false) => println!("RESULT: Summarization failed"),
|
Ok(false) => println!("RESULT: Compaction failed"),
|
||||||
Err(e) => println!("ERROR: {}", e),
|
Err(e) => println!("ERROR: {}", e),
|
||||||
}
|
}
|
||||||
continue;
|
continue;
|
||||||
|
|||||||
@@ -321,7 +321,7 @@ impl UiWriter for ConsoleUiWriter {
|
|||||||
fn print_final_output(&self, summary: &str) {
|
fn print_final_output(&self, summary: &str) {
|
||||||
// Show spinner while "formatting"
|
// Show spinner while "formatting"
|
||||||
let spinner_frames = ['⠋', '⠙', '⠹', '⠸', '⠼', '⠴', '⠦', '⠧', '⠇', '⠏'];
|
let spinner_frames = ['⠋', '⠙', '⠹', '⠸', '⠼', '⠴', '⠦', '⠧', '⠇', '⠏'];
|
||||||
let message = "summarizing work done...";
|
let message = "compacting work done...";
|
||||||
|
|
||||||
// Brief spinner animation (about 0.5 seconds)
|
// Brief spinner animation (about 0.5 seconds)
|
||||||
for i in 0..5 {
|
for i in 0..5 {
|
||||||
|
|||||||
@@ -183,8 +183,8 @@ impl ContextWindow {
|
|||||||
self.total_tokens.saturating_sub(self.used_tokens)
|
self.total_tokens.saturating_sub(self.used_tokens)
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Check if we should trigger summarization (at 80% capacity)
|
/// Check if we should trigger compaction (at 80% capacity)
|
||||||
pub fn should_summarize(&self) -> bool {
|
pub fn should_compact(&self) -> bool {
|
||||||
// Trigger at 80% OR if we're getting close to absolute limits
|
// Trigger at 80% OR if we're getting close to absolute limits
|
||||||
// This prevents issues with models that have large contexts but still hit limits
|
// This prevents issues with models that have large contexts but still hit limits
|
||||||
let percentage_trigger = self.percentage_used() >= 80.0;
|
let percentage_trigger = self.percentage_used() >= 80.0;
|
||||||
@@ -744,19 +744,19 @@ mod tests {
|
|||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_should_summarize_at_80_percent() {
|
fn test_should_compact_at_80_percent() {
|
||||||
let mut cw = ContextWindow::new(100);
|
let mut cw = ContextWindow::new(100);
|
||||||
cw.used_tokens = 79;
|
cw.used_tokens = 79;
|
||||||
assert!(!cw.should_summarize());
|
assert!(!cw.should_compact());
|
||||||
cw.used_tokens = 80;
|
cw.used_tokens = 80;
|
||||||
assert!(cw.should_summarize());
|
assert!(cw.should_compact());
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_should_summarize_at_absolute_limit() {
|
fn test_should_compact_at_absolute_limit() {
|
||||||
let mut cw = ContextWindow::new(1_000_000);
|
let mut cw = ContextWindow::new(1_000_000);
|
||||||
cw.used_tokens = 150_001;
|
cw.used_tokens = 150_001;
|
||||||
assert!(cw.should_summarize());
|
assert!(cw.should_compact());
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
|
|||||||
@@ -181,7 +181,7 @@ pub enum RecoverableError {
|
|||||||
ModelBusy,
|
ModelBusy,
|
||||||
/// Timeout
|
/// Timeout
|
||||||
Timeout,
|
Timeout,
|
||||||
/// Token limit exceeded (might be recoverable with summarization)
|
/// Token limit exceeded (might be recoverable with compaction)
|
||||||
TokenLimit,
|
TokenLimit,
|
||||||
/// Context length exceeded (prompt too long) - should end current turn in autonomous mode
|
/// Context length exceeded (prompt too long) - should end current turn in autonomous mode
|
||||||
ContextLengthExceeded,
|
ContextLengthExceeded,
|
||||||
@@ -357,7 +357,7 @@ where
|
|||||||
|
|
||||||
// Special handling for token limit errors
|
// Special handling for token limit errors
|
||||||
if matches!(recoverable_type, RecoverableError::TokenLimit) {
|
if matches!(recoverable_type, RecoverableError::TokenLimit) {
|
||||||
debug!("Token limit error detected. Consider triggering summarization.");
|
debug!("Token limit error detected. Consider triggering compaction.");
|
||||||
}
|
}
|
||||||
|
|
||||||
tokio::time::sleep(delay).await;
|
tokio::time::sleep(delay).await;
|
||||||
|
|||||||
@@ -92,9 +92,9 @@ pub struct Agent<W: UiWriter> {
|
|||||||
providers: ProviderRegistry,
|
providers: ProviderRegistry,
|
||||||
context_window: ContextWindow,
|
context_window: ContextWindow,
|
||||||
thinning_events: Vec<usize>, // chars saved per thinning event
|
thinning_events: Vec<usize>, // chars saved per thinning event
|
||||||
pending_90_summarization: bool, // flag to trigger summarization at 90%
|
pending_90_compaction: bool, // flag to trigger compaction at 90%
|
||||||
auto_compact: bool, // whether to auto-compact at 90% before tool calls
|
auto_compact: bool, // whether to auto-compact at 90% before tool calls
|
||||||
summarization_events: Vec<usize>, // chars saved per summarization event
|
compaction_events: Vec<usize>, // chars saved per compaction event
|
||||||
first_token_times: Vec<Duration>, // time to first token for each completion
|
first_token_times: Vec<Duration>, // time to first token for each completion
|
||||||
config: Config,
|
config: Config,
|
||||||
session_id: Option<String>,
|
session_id: Option<String>,
|
||||||
@@ -267,9 +267,9 @@ impl<W: UiWriter> Agent<W> {
|
|||||||
providers,
|
providers,
|
||||||
context_window,
|
context_window,
|
||||||
auto_compact: config.agent.auto_compact,
|
auto_compact: config.agent.auto_compact,
|
||||||
pending_90_summarization: false,
|
pending_90_compaction: false,
|
||||||
thinning_events: Vec::new(),
|
thinning_events: Vec::new(),
|
||||||
summarization_events: Vec::new(),
|
compaction_events: Vec::new(),
|
||||||
first_token_times: Vec::new(),
|
first_token_times: Vec::new(),
|
||||||
config,
|
config,
|
||||||
session_id: None,
|
session_id: None,
|
||||||
@@ -856,15 +856,15 @@ impl<W: UiWriter> Agent<W> {
|
|||||||
self.save_context_window("completed");
|
self.save_context_window("completed");
|
||||||
|
|
||||||
// Check if we need to do 90% auto-compaction
|
// Check if we need to do 90% auto-compaction
|
||||||
if self.pending_90_summarization {
|
if self.pending_90_compaction {
|
||||||
self.ui_writer
|
self.ui_writer
|
||||||
.print_context_status("\n⚡ Context window reached 90% - auto-compacting...\n");
|
.print_context_status("\n⚡ Context window reached 90% - auto-compacting...\n");
|
||||||
if let Err(e) = self.force_summarize().await {
|
if let Err(e) = self.force_compact().await {
|
||||||
warn!("Failed to auto-compact at 90%: {}", e);
|
warn!("Failed to auto-compact at 90%: {}", e);
|
||||||
} else {
|
} else {
|
||||||
self.ui_writer.println("");
|
self.ui_writer.println("");
|
||||||
}
|
}
|
||||||
self.pending_90_summarization = false;
|
self.pending_90_compaction = false;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Return the task result which already includes timing if needed
|
// Return the task result which already includes timing if needed
|
||||||
@@ -940,13 +940,13 @@ impl<W: UiWriter> Agent<W> {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Manually trigger context summarization regardless of context window size
|
/// Manually trigger context compaction regardless of context window size
|
||||||
/// Returns Ok(true) if summarization was successful, Ok(false) if it failed
|
/// Returns Ok(true) if compaction was successful, Ok(false) if it failed
|
||||||
pub async fn force_summarize(&mut self) -> Result<bool> {
|
pub async fn force_compact(&mut self) -> Result<bool> {
|
||||||
debug!("Manual summarization triggered");
|
debug!("Manual compaction triggered");
|
||||||
|
|
||||||
self.ui_writer.print_context_status(&format!(
|
self.ui_writer.print_context_status(&format!(
|
||||||
"\n🗜️ Manual summarization requested (current usage: {}%)...",
|
"\n🗜️ Manual compaction requested (current usage: {}%)...",
|
||||||
self.context_window.percentage_used() as u32
|
self.context_window.percentage_used() as u32
|
||||||
));
|
));
|
||||||
|
|
||||||
@@ -1048,7 +1048,7 @@ impl<W: UiWriter> Agent<W> {
|
|||||||
let chars_saved = self
|
let chars_saved = self
|
||||||
.context_window
|
.context_window
|
||||||
.reset_with_summary(summary_response.content, latest_user_msg);
|
.reset_with_summary(summary_response.content, latest_user_msg);
|
||||||
self.summarization_events.push(chars_saved);
|
self.compaction_events.push(chars_saved);
|
||||||
|
|
||||||
Ok(true)
|
Ok(true)
|
||||||
}
|
}
|
||||||
@@ -1238,17 +1238,17 @@ impl<W: UiWriter> Agent<W> {
|
|||||||
}
|
}
|
||||||
|
|
||||||
stats.push_str(&format!(
|
stats.push_str(&format!(
|
||||||
" • Summarizations: {:>10}\n",
|
" • Compactions: {:>10}\n",
|
||||||
self.summarization_events.len()
|
self.compaction_events.len()
|
||||||
));
|
));
|
||||||
if !self.summarization_events.is_empty() {
|
if !self.compaction_events.is_empty() {
|
||||||
let total_summarized: usize = self.summarization_events.iter().sum();
|
let total_compacted: usize = self.compaction_events.iter().sum();
|
||||||
let avg_summarized = total_summarized / self.summarization_events.len();
|
let avg_compacted = total_compacted / self.compaction_events.len();
|
||||||
stats.push_str(&format!(
|
stats.push_str(&format!(
|
||||||
" • Total Chars Saved: {:>10}\n",
|
" • Total Chars Saved: {:>10}\n",
|
||||||
total_summarized
|
total_compacted
|
||||||
));
|
));
|
||||||
stats.push_str(&format!(" • Avg Chars/Event: {:>10}\n", avg_summarized));
|
stats.push_str(&format!(" • Avg Chars/Event: {:>10}\n", avg_compacted));
|
||||||
}
|
}
|
||||||
stats.push('\n');
|
stats.push('\n');
|
||||||
|
|
||||||
@@ -1604,9 +1604,9 @@ impl<W: UiWriter> Agent<W> {
|
|||||||
// Note: Session-level duplicate tracking was removed - we only prevent sequential duplicates (DUP IN CHUNK, DUP IN MSG)
|
// Note: Session-level duplicate tracking was removed - we only prevent sequential duplicates (DUP IN CHUNK, DUP IN MSG)
|
||||||
let mut turn_accumulated_usage: Option<g3_providers::Usage> = None; // Track token usage for timing footer
|
let mut turn_accumulated_usage: Option<g3_providers::Usage> = None; // Track token usage for timing footer
|
||||||
|
|
||||||
// Check if we need to summarize before starting
|
// Check if we need to compact before starting
|
||||||
if self.context_window.should_summarize() {
|
if self.context_window.should_compact() {
|
||||||
// First try thinning if we are at capacity, don't call the LLM for a summary (might fail)
|
// First try thinning if we are at capacity, don't call the LLM for compaction (might fail)
|
||||||
if self.context_window.percentage_used() > 90.0 && self.context_window.should_thin() {
|
if self.context_window.percentage_used() > 90.0 && self.context_window.should_thin() {
|
||||||
self.ui_writer.print_context_status(&format!(
|
self.ui_writer.print_context_status(&format!(
|
||||||
"\n🥒 Context window at {}%. Trying thinning first...",
|
"\n🥒 Context window at {}%. Trying thinning first...",
|
||||||
@@ -1617,23 +1617,23 @@ impl<W: UiWriter> Agent<W> {
|
|||||||
self.ui_writer.print_context_thinning(&thin_summary);
|
self.ui_writer.print_context_thinning(&thin_summary);
|
||||||
|
|
||||||
// Check if thinning was sufficient
|
// Check if thinning was sufficient
|
||||||
if !self.context_window.should_summarize() {
|
if !self.context_window.should_compact() {
|
||||||
self.ui_writer.print_context_status(
|
self.ui_writer.print_context_status(
|
||||||
"✅ Thinning resolved capacity issue. Continuing...\n",
|
"✅ Thinning resolved capacity issue. Continuing...\n",
|
||||||
);
|
);
|
||||||
// Continue with the original request without summarization
|
// Continue with the original request without compaction
|
||||||
} else {
|
} else {
|
||||||
self.ui_writer.print_context_status(
|
self.ui_writer.print_context_status(
|
||||||
"⚠️ Thinning insufficient. Proceeding with summarization...\n",
|
"⚠️ Thinning insufficient. Proceeding with compaction...\n",
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Only proceed with summarization if still needed after thinning
|
// Only proceed with compaction if still needed after thinning
|
||||||
if self.context_window.should_summarize() {
|
if self.context_window.should_compact() {
|
||||||
// Notify user about summarization
|
// Notify user about compaction
|
||||||
self.ui_writer.print_context_status(&format!(
|
self.ui_writer.print_context_status(&format!(
|
||||||
"\n🗜️ Context window reaching capacity ({}%). Creating summary...",
|
"\n🗜️ Context window reaching capacity ({}%). Compacting...",
|
||||||
self.context_window.percentage_used() as u32
|
self.context_window.percentage_used() as u32
|
||||||
));
|
));
|
||||||
|
|
||||||
@@ -1735,17 +1735,17 @@ impl<W: UiWriter> Agent<W> {
|
|||||||
let chars_saved = self
|
let chars_saved = self
|
||||||
.context_window
|
.context_window
|
||||||
.reset_with_summary(summary_response.content, latest_user_msg);
|
.reset_with_summary(summary_response.content, latest_user_msg);
|
||||||
self.summarization_events.push(chars_saved);
|
self.compaction_events.push(chars_saved);
|
||||||
|
|
||||||
// Update the request with new context
|
// Update the request with new context
|
||||||
request.messages = self.context_window.conversation_history.clone();
|
request.messages = self.context_window.conversation_history.clone();
|
||||||
}
|
}
|
||||||
Err(e) => {
|
Err(e) => {
|
||||||
error!("Failed to create summary: {}", e);
|
error!("Failed to create summary: {}", e);
|
||||||
self.ui_writer.print_context_status("⚠️ Unable to create summary. Consider starting a new session if you continue to see errors.\n");
|
self.ui_writer.print_context_status("⚠️ Unable to compact context. Consider starting a new session if you continue to see errors.\n");
|
||||||
// Don't continue with the original request if summarization failed
|
// Don't continue with the original request if compaction failed
|
||||||
// as we're likely at token limit
|
// as we're likely at token limit
|
||||||
return Err(anyhow::anyhow!("Context window at capacity and summarization failed. Please start a new session."));
|
return Err(anyhow::anyhow!("Context window at capacity and compaction failed. Please start a new session."));
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -1963,9 +1963,9 @@ impl<W: UiWriter> Agent<W> {
|
|||||||
// Check if we should auto-compact at 90% BEFORE executing the tool
|
// Check if we should auto-compact at 90% BEFORE executing the tool
|
||||||
// We need to do this before any borrows of self
|
// We need to do this before any borrows of self
|
||||||
if self.auto_compact && self.context_window.percentage_used() >= 90.0 {
|
if self.auto_compact && self.context_window.percentage_used() >= 90.0 {
|
||||||
// Set flag to trigger summarization after this turn completes
|
// Set flag to trigger compaction after this turn completes
|
||||||
// We can't do it now due to borrow checker constraints
|
// We can't do it now due to borrow checker constraints
|
||||||
self.pending_90_summarization = true;
|
self.pending_90_compaction = true;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Check if we should thin the context BEFORE executing the tool
|
// Check if we should thin the context BEFORE executing the tool
|
||||||
|
|||||||
@@ -2,7 +2,7 @@ use g3_core::ContextWindow;
|
|||||||
use g3_providers::{Message, MessageRole, Usage};
|
use g3_providers::{Message, MessageRole, Usage};
|
||||||
|
|
||||||
/// Test that used_tokens is tracked via add_message, not update_usage_from_response.
|
/// Test that used_tokens is tracked via add_message, not update_usage_from_response.
|
||||||
/// This is critical for the 80% summarization threshold to work correctly.
|
/// This is critical for the 80% compaction threshold to work correctly.
|
||||||
#[test]
|
#[test]
|
||||||
fn test_used_tokens_tracked_via_messages() {
|
fn test_used_tokens_tracked_via_messages() {
|
||||||
let mut window = ContextWindow::new(10000);
|
let mut window = ContextWindow::new(10000);
|
||||||
@@ -106,10 +106,10 @@ fn test_percentage_based_on_used_tokens() {
|
|||||||
assert!(window.remaining_tokens() < 1000, "remaining tokens should decrease");
|
assert!(window.remaining_tokens() < 1000, "remaining tokens should decrease");
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Test that the 80% summarization threshold works correctly.
|
/// Test that the 80% compaction threshold works correctly.
|
||||||
/// This was the original bug - used_tokens was being double/triple counted.
|
/// This was the original bug - used_tokens was being double/triple counted.
|
||||||
#[test]
|
#[test]
|
||||||
fn test_should_summarize_threshold() {
|
fn test_should_compact_threshold() {
|
||||||
let mut window = ContextWindow::new(1000);
|
let mut window = ContextWindow::new(1000);
|
||||||
|
|
||||||
// Add messages until we approach 80%
|
// Add messages until we approach 80%
|
||||||
@@ -131,9 +131,9 @@ fn test_should_summarize_threshold() {
|
|||||||
let percentage_after = window.percentage_used();
|
let percentage_after = window.percentage_used();
|
||||||
println!("After 10 messages: {}% used ({} tokens)", percentage_after, window.used_tokens);
|
println!("After 10 messages: {}% used ({} tokens)", percentage_after, window.used_tokens);
|
||||||
|
|
||||||
// Now should_summarize should return true if we're at 80%+
|
// Now should_compact should return true if we're at 80%+
|
||||||
if percentage_after >= 80.0 {
|
if percentage_after >= 80.0 {
|
||||||
assert!(window.should_summarize(), "should_summarize should be true at 80%+");
|
assert!(window.should_compact(), "should_compact should be true at 80%+");
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -11,7 +11,7 @@ Control commands are special commands you can use during an interactive G3 sessi
|
|||||||
|
|
||||||
| Command | Description |
|
| Command | Description |
|
||||||
|---------|-------------|
|
|---------|-------------|
|
||||||
| `/compact` | Manually trigger conversation summarization |
|
| `/compact` | Manually trigger conversation compaction |
|
||||||
| `/thinnify` | Replace large tool results with file references (first third) |
|
| `/thinnify` | Replace large tool results with file references (first third) |
|
||||||
| `/skinnify` | Full context thinning (entire context window) |
|
| `/skinnify` | Full context thinning (entire context window) |
|
||||||
| `/readme` | Reload README.md and AGENTS.md from disk |
|
| `/readme` | Reload README.md and AGENTS.md from disk |
|
||||||
@@ -22,7 +22,7 @@ Control commands are special commands you can use during an interactive G3 sessi
|
|||||||
|
|
||||||
## /compact
|
## /compact
|
||||||
|
|
||||||
Manually trigger conversation summarization to reduce context size.
|
Manually trigger conversation compaction to reduce context size.
|
||||||
|
|
||||||
**When to use**:
|
**When to use**:
|
||||||
- Context usage is getting high (70%+)
|
- Context usage is getting high (70%+)
|
||||||
@@ -30,7 +30,7 @@ Manually trigger conversation summarization to reduce context size.
|
|||||||
- Conversation has accumulated irrelevant history
|
- Conversation has accumulated irrelevant history
|
||||||
|
|
||||||
**What it does**:
|
**What it does**:
|
||||||
1. Sends conversation history to LLM for summarization
|
1. Sends conversation history to LLM for compaction
|
||||||
2. Replaces detailed history with concise summary
|
2. Replaces detailed history with concise summary
|
||||||
3. Preserves key decisions and context
|
3. Preserves key decisions and context
|
||||||
4. Significantly reduces token usage
|
4. Significantly reduces token usage
|
||||||
@@ -144,7 +144,7 @@ Show detailed context and performance statistics.
|
|||||||
- Session duration
|
- Session duration
|
||||||
- Token usage breakdown
|
- Token usage breakdown
|
||||||
- Tool call metrics
|
- Tool call metrics
|
||||||
- Thinning and summarization events
|
- Thinning and compaction events
|
||||||
- First-token latency statistics
|
- First-token latency statistics
|
||||||
|
|
||||||
**Example**:
|
**Example**:
|
||||||
@@ -198,7 +198,7 @@ When context gets high:
|
|||||||
1. **50-70%**: Consider `/thinnify`
|
1. **50-70%**: Consider `/thinnify`
|
||||||
2. **70-80%**: Use `/compact`
|
2. **70-80%**: Use `/compact`
|
||||||
3. **80-90%**: Use `/skinnify` then `/compact`
|
3. **80-90%**: Use `/skinnify` then `/compact`
|
||||||
4. **90%+**: Auto-summarization triggers
|
4. **90%+**: Auto-compaction triggers
|
||||||
|
|
||||||
### Best Practices
|
### Best Practices
|
||||||
|
|
||||||
@@ -218,7 +218,7 @@ G3 performs automatic context management:
|
|||||||
| 50% | Thin oldest third of context |
|
| 50% | Thin oldest third of context |
|
||||||
| 60% | Thin oldest third of context |
|
| 60% | Thin oldest third of context |
|
||||||
| 70% | Thin oldest third of context |
|
| 70% | Thin oldest third of context |
|
||||||
| 80% | Auto-summarization (if `auto_compact = true`) |
|
| 80% | Auto-compaction (if `auto_compact = true`) |
|
||||||
| 90% | Aggressive thinning before tool calls |
|
| 90% | Aggressive thinning before tool calls |
|
||||||
|
|
||||||
Manual commands give you finer control over when and how this happens.
|
Manual commands give you finer control over when and how this happens.
|
||||||
|
|||||||
@@ -289,7 +289,7 @@ The `ContextWindow` struct manages conversation history with intelligent token t
|
|||||||
|
|
||||||
1. **Token Tracking**: Monitors usage as percentage of provider's context limit
|
1. **Token Tracking**: Monitors usage as percentage of provider's context limit
|
||||||
2. **Context Thinning**: At 50%, 60%, 70%, 80% thresholds, replaces large tool results with file references
|
2. **Context Thinning**: At 50%, 60%, 70%, 80% thresholds, replaces large tool results with file references
|
||||||
3. **Auto-Summarization**: At 80% capacity, triggers conversation summarization
|
3. **Auto-Compaction**: At 80% capacity, triggers conversation compaction
|
||||||
4. **Provider Adaptation**: Adjusts to different model context windows (4k to 200k+ tokens)
|
4. **Provider Adaptation**: Adjusts to different model context windows (4k to 200k+ tokens)
|
||||||
|
|
||||||
## Error Handling
|
## Error Handling
|
||||||
|
|||||||
@@ -376,5 +376,5 @@ For Databricks OAuth:
|
|||||||
|
|
||||||
If you see context overflow errors:
|
If you see context overflow errors:
|
||||||
1. Check `max_context_length` in `[agent]`
|
1. Check `max_context_length` in `[agent]`
|
||||||
2. Use `/compact` command to manually summarize
|
2. Use `/compact` command to manually compact
|
||||||
3. Use `/thinnify` to replace large tool results with file references
|
3. Use `/thinnify` to replace large tool results with file references
|
||||||
|
|||||||
@@ -386,7 +386,7 @@ To reduce rate limit issues:
|
|||||||
### Context Window Errors
|
### Context Window Errors
|
||||||
|
|
||||||
If you see "context too long" errors:
|
If you see "context too long" errors:
|
||||||
1. Use `/compact` to summarize conversation
|
1. Use `/compact` to compact conversation
|
||||||
2. Use `/thinnify` to replace large tool results
|
2. Use `/thinnify` to replace large tool results
|
||||||
3. Increase `max_context_length` in config
|
3. Increase `max_context_length` in config
|
||||||
4. Switch to a provider with larger context
|
4. Switch to a provider with larger context
|
||||||
|
|||||||
Reference in New Issue
Block a user