Commit Graph

400 Commits

Author SHA1 Message Date
Dhanji R. Prasanna
c347a73cbd Add plan approval gate to block file changes without approved plan
- Add check_plan_approval_gate() in tools/plan.rs that runs after each tool call
- Detects file changes via git status --porcelain when plan exists but not approved
- Reverts changes: git checkout for modified files, rm for new untracked files
- Returns blocking message instructing LLM to create/approve plan first
- Add ApprovalGateResult enum with Allowed/Blocked/NotGitRepo variants
- Add set_session_id() and set_working_dir() methods on Agent for testing
- Add integration test using MockProvider to simulate blocked write_file
2026-02-05 11:34:10 +11:00
Dhanji R. Prasanna
fdb1255f02 Add --resume <session-id> flag for explicit session resumption
- Add --resume CLI flag that conflicts with --new-session
- Add load_continuation_by_id() to load sessions by full or partial ID
- Support loading from latest.json or falling back to session.json
- Handle --resume in both normal and agent modes
- Agent mode validates session belongs to correct agent
2026-02-05 10:23:39 +11:00
Dhanji R. Prasanna
3046f0dd6e feat: Add invariants system for Plan Mode verification
Adds rulespec.yaml and envelope.yaml support for machine-readable
invariant checking during plan completion.

- Add invariants module with Rulespec, ActionEnvelope, and evaluation logic
- Add Invariants section to system prompt with workflow instructions
- Show rulespec/envelope file status in plan verification output
- Rulespec written during planning (captures constraints from task)
- Envelope written after implementation (documents what was built)
2026-02-04 20:49:58 +11:00
Dhanji R. Prasanna
263a838d31 Remove redundant 'No plan exists' message from plan_read output
The UI already shows 'empty' via print_plan_compact, so returning an
empty string avoids duplicate output.
2026-02-02 17:19:01 +11:00
Dhanji R. Prasanna
e332109273 Auto-approve plans in non-interactive (autonomous/one-shot) mode
- Add auto-approval logic in execute_plan_write() when ctx.is_autonomous is true
- Update system prompt to document auto-approval behavior
- Plans still require explicit approval in interactive mode
2026-02-02 17:16:21 +11:00
Dhanji R. Prasanna
0aead8d86d fix: Enable compact UI output for plan_approve tool
Added plan_approve to the compact tool list in format_tool_result_summary()
so it displays in the same format as other tools like read_file and write_file.

The format_plan_approve_summary() function already existed but was never
called because plan_approve was missing from the matches! block.
2026-02-02 17:06:10 +11:00
Dhanji R. Prasanna
571188305a feat: add compact UI output for Plan Mode tools
Plan tools (plan_read, plan_write) now display with elegant tree-style
formatting similar to the old todo_write UI:

- State indicators: □ (todo), ◐ (doing), ■ (done), ⊘ (blocked)
- Tree prefixes (├/└) for items with child details
- Strikethrough for completed items
- Shows touches and all three checks (happy/negative/boundary)
- Displays plan file path link at the end

plan_approve uses compact single-line format like read_file:
- Shows approval status and revision number
- Handles already-approved and error cases

Changes:
- Add print_plan_compact() to UiWriter trait with default impl
- Implement print_plan_compact() in ConsoleUiWriter
- Call print_plan_compact() from execute_plan_read/write
- Add plan_read/plan_write to is_self_handled_tool()
- Add plan_approve to is_compact_tool() with format_plan_approve_summary()
- Add serde_yaml dependency to g3-cli
2026-02-02 15:30:05 +11:00
Dhanji R. Prasanna
d6b7177107 Implement plan_verify() for deterministic evidence validation
Adds a verification system that checks evidence in completed plan items:

- Evidence parsing: supports code locations (file:line, file:line-line, file only)
  and test references (file::test_name)
- Code location verification: checks file exists, validates line numbers in range
- Test reference verification: checks test file exists, searches for fn pattern
- Verification results: Verified, Warning, Error, Skipped statuses
- Loud output formatting with emoji indicators for warnings/errors
- Integration with execute_plan_write(): runs when plan is complete and approved
- 12 new unit tests covering parsing and verification

Warnings are advisory (don't block), errors are loud but also don't block.
Blocked items are skipped during verification.
2026-02-02 15:15:03 +11:00
Dhanji R. Prasanna
a63950d8f5 Add Plan Mode to replace TODO system
Plan Mode is a cognitive forcing system that requires reasoning about:
- Happy path
- Negative case
- Boundary condition

New tools:
- plan_read: Read current plan for session
- plan_write: Create/update plan with YAML content (validates structure)
- plan_approve: Mark current revision as approved

New command:
- /feature <description>: Start Plan Mode for a new feature

Plan schema requires:
- plan_id, revision, approved_revision
- items with id, description, state, touches, checks (happy/negative/boundary)
- evidence and notes required when marking items done

Verification:
- plan_verify() called automatically when all items are done/blocked

Removed:
- todo_read, todo_write tools
- todo.rs module and related tests
2026-02-02 14:38:25 +11:00
Dhanji R. Prasanna
afc5bc8574 Readability improvements across streaming_parser, input_formatter, commands
- streaming_parser.rs: Reduced ~70 lines by removing redundant comments,
  consolidating doc comments, using slice syntax for TOOL_CALL_PATTERNS
- input_formatter.rs: Lazy regex compilation via once_cell (performance),
  cleaner function structure, reduced comment noise
- commands.rs: Extracted format_research_task_summary() and
  format_research_report_header() helpers, reduced ~40 lines of duplication
- pending_research.rs: Fixed 2 unused variable warnings in tests

All changes are behavior-preserving. 446 tests pass.

Agent: carmack
2026-01-30 14:48:08 +11:00
Dhanji R. Prasanna
51f12769d5 Merge sessions/hopper/297c7be9 2026-01-30 14:30:53 +11:00
Dhanji R. Prasanna
58bbfde6f4 test: add integration tests for streaming parser stuttering bug fix
Add characterization tests for the streaming parser stuttering bug fix (fa3c920).
These tests verify that when an LLM "stutters" and emits incomplete tool call
fragments followed by complete tool calls, the parser:

1. Does not get stuck waiting for the incomplete fragment to complete
2. Successfully parses complete tool calls that appear after the fragment

Tests cover:
- The exact pattern from butler session butler_c6ab59af2e4f991c
- Edge cases that should NOT trigger invalidation (nested JSON, patterns in strings)
- Recovery behavior after reset
- Multiple complete tool calls
- Boundary conditions (chunk boundaries, minimal patterns)

Agent: hopper
2026-01-30 14:30:27 +11:00
Dhanji R. Prasanna
3003bdebaa refactor: fix flaky test and remove dead code in recent commits
Fixes issues in the last 11 commits:

1. pending_research.rs: Fix flaky test_generate_id_uniqueness
   - Replaced random u16 suffix with atomic counter for guaranteed uniqueness
   - The timestamp+random approach could collide when generating IDs rapidly
   - Now uses static AtomicU32 counter that increments monotonically

2. embedded/adapters/glm.rs: Remove unused in_code_fence field
   - Field was written but never read (dead code)
   - Removed from struct definition, constructor, and reset()

3. embedded/adapters/glm.rs: Fix orphaned tests
   - Two tests (test_strip_code_fences, test_code_fenced_tool_call) were
     outside the #[cfg(test)] mod tests block
   - Moved closing brace to include them in the test module

All 446 library tests pass.

Agent: fowler
2026-01-30 14:28:43 +11:00
Dhanji R. Prasanna
6bb07ce4f5 Merge sessions/interactive/3c2a09df 2026-01-30 14:20:12 +11:00
Dhanji R. Prasanna
fa3c9203e0 Fix streaming parser bug: detect abandoned tool call fragments
When the LLM 'stutters' and emits incomplete tool call fragments like:
  {"tool": "shell", "args": {...}}
  {"tool":
  {"tool": "shell", "args": {...}}

The parser would get stuck waiting for the incomplete fragment to complete,
causing the entire response to be lost (no tool executed, no text displayed).

This was observed in butler session butler_c6ab59af2e4f991c where the user's
'send!' command produced no response.

Fix: Enhanced is_json_invalidated() to detect when a new tool call pattern
({"tool"}) appears after a newline while parsing an incomplete JSON fragment.
This indicates the previous fragment was abandoned and should be invalidated.

Safety:
- Tool patterns inside JSON strings (e.g., writing example code) are not
  affected because the check only runs outside strings
- Added tests for the stuttering pattern and the file-writing edge case
2026-01-30 14:00:18 +11:00
Dhanji R. Prasanna
f93d05f444 Add real-time research completion notifications
When background research completes, g3 now immediately prints a status
message instead of waiting for the next user interaction:

- Added ResearchCompletionNotification and broadcast channel to
  PendingResearchManager for push-based notifications
- Added spawn_research_notification_handler() in interactive mode that
  listens for completions in a background task
- When idle (at prompt): clears line, prints status, reprints prompt
- When busy (processing): prints status inline (interleaving is fine)
- Added G3Status::research_complete() for consistent formatting
- Added enable_research_notifications() method to Agent

Output format: "g3: 1 research report ... [done]"
2026-01-30 13:35:35 +11:00
Dhanji R. Prasanna
5ab1598e03 feat: async research tool - runs in background, returns immediately
The research tool now spawns the scout agent in a background tokio task
and returns immediately with a research_id placeholder. This allows the
agent to continue working while research runs (30-120 seconds).

Key changes:
- New PendingResearchManager for tracking async research tasks
- research tool returns immediately with placeholder containing research_id
- research_status tool to check progress of pending research
- Auto-injection of completed research at natural break points:
  - Start of each tool iteration (before LLM call)
  - Before prompting user in interactive mode
- /research CLI command to list all research tasks
- Updated system prompt to explain async behavior

The agent can:
- Continue with other work while research runs
- Check status with research_status tool
- Yield turn to user if results are critical before continuing
2026-01-30 13:00:02 +11:00
Dhanji R. Prasanna
570a824780 Rename archivist agent to huffman
Named after David Huffman, inventor of Huffman coding -
compression that preserves information with fewer bits.

Fits the agent's purpose: compact memory, preserve semantics.
2026-01-29 11:22:59 +11:00
Dhanji R. Prasanna
56f558dc1b Fix compiler warnings in test files
Eliminate unused variable and import warnings across test files:
- streaming_parser_test.rs: prefix unused `tools` with underscore
- webdriver_session.rs: remove unused `use super::*` import
- mock_provider_integration_test.rs: prefix unused `result` and `task_result`
- test_preflight_max_tokens.rs: prefix unused `proposed_max`
- todo_staleness_test.rs: add #[allow(dead_code)] for test helper methods
- json_parsing_stress_test.rs: prefix unused `tools`
- read_file_token_limit_test.rs: add #[allow(dead_code)] for unused helper
- background_process_demo_test.rs: remove unused PathBuf import
- test_session_continuation.rs: prefix unused `temp_dir` in 7 tests

All tests pass. No behavior changes.

Agent: fowler
2026-01-29 11:15:10 +11:00
Dhanji R. Prasanna
7bfb9efa19 Remove automatic README loading from context window
README.md is no longer auto-loaded into the LLM context at startup.
This saves ~4,600 tokens per session while AGENTS.md and memory.md
still provide all critical information for code tasks.

Changes:
- Delete read_project_readme() function
- Remove readme_content parameter from combine_project_content()
- Rename extract_readme_heading() -> extract_project_heading()
- Rename Agent constructors: *_with_readme_* -> *_with_project_context_*
- Update context preservation to only check for Agent Configuration
- Remove has_readme field from LoadedContent
- Update all tests to use new markers and function names

The LLM can still read README.md on-demand via read_file when needed.
2026-01-29 11:07:41 +11:00
Dhanji R. Prasanna
735e9c9312 Add Google Gemini provider support
- Add GeminiProvider with streaming and native tool calling
- Support gemini-2.5-pro, gemini-2.0-flash, gemini-1.5-pro/flash models
- Model-specific context window detection (1M-2M tokens)
- Message conversion: assistant -> model role mapping
- System messages extracted to system_instruction field
- Tool schema conversion with functionCall/functionResponse parts
- SSE streaming with JSON array buffer parsing
- 8 unit tests for conversion and parsing logic
- Register provider in g3-core and validate in g3-cli
2026-01-29 10:11:42 +11:00
Dhanji R. Prasanna
fe33568ee0 Fix embedded provider max_tokens default (2048 -> 8192)
The resolve_max_tokens() function was returning 2048 for embedded providers,
which caused responses to be truncated prematurely. Increased to 8192 to
allow the provider's own effective_max_tokens() calculation to work properly.
2026-01-28 13:58:14 +11:00
Dhanji R. Prasanna
58fe74334d Auto-detect context window size from GGUF for embedded providers
- Add context_window_size() method to LLMProvider trait
- Implement for EmbeddedProvider to return the auto-detected context length
- Update Agent to query provider directly instead of using hardcoded defaults
- Removes need for model-specific context length mappings
2026-01-28 11:16:14 +11:00
Dhanji R. Prasanna
55dba121b7 Add GLM-4 to context length defaults (32k)
GLM-4 models support 32k context but were falling back to the
conservative 4096 default, causing context overflow on startup.
2026-01-28 10:46:36 +11:00
Dhanji R. Prasanna
ba6e1f9896 Remove unused code to eliminate build warnings
- Remove unused SYSTEM_PROMPT_FOR_NATIVE_TOOL_USE and SYSTEM_PROMPT_FOR_NON_NATIVE_TOOL_USE constants
- Remove unused gpu_layers field from EmbeddedProvider struct
- Remove unused clean_stop_sequences method from EmbeddedProvider
2026-01-28 10:01:44 +11:00
Dhanji R. Prasanna
a902be1562 Refactor system prompts to eliminate duplication; upgrade embedded provider
- Refactor prompts.rs: extract shared sections (intro, TODO, workspace memory,
  web research, response guidelines) used by both native and non-native prompts
- Fix typo in native prompt: "save them.." -> "save them."
- Fix non-native prompt: add missing closing braces in JSON examples,
  add IMPORTANT steps section, align with native prompt quality
- Add 9 unit tests to verify both prompts contain required sections
- Upgrade llama-cpp-2 dependency and refactor embedded provider
- Update config.example.toml with embedded model examples
- Update workspace memory
2026-01-28 09:56:39 +11:00
Dhanji R. Prasanna
cdb8b0f5eb refactor(g3-core): consolidate Agent construction into single canonical path
Eliminate code-path aliasing in Agent construction methods by introducing
a single `build_agent()` helper that all constructors delegate to.

Before: 3 nearly-identical `Ok(Self { ... })` blocks (~30 lines each)
with subtle differences in auto_compact, is_autonomous, quiet, and
computer_controller fields - prone to drift over time.

After: Single canonical `build_agent()` method that constructs Agent
with all fields. All public constructors delegate to this single path:
- new_for_test() -> new_for_test_with_readme() -> build_agent()
- new_with_mode_and_readme() -> build_agent()

Changes:
- Add `build_agent()` private helper method (single source of truth)
- Simplify `new_for_test()` to delegate to `new_for_test_with_readme()`
- Update `new_for_test_with_readme()` to use `build_agent()`
- Update `new_with_mode_and_readme()` to use `build_agent()`

Net reduction: ~43 lines (-109/+66)
All 190 tests pass.

Agent: fowler
2026-01-27 12:01:12 +11:00
Dhanji R. Prasanna
dfa0e4bfa2 refactor(g3-core): add section markers to lib.rs for better organization
Added clear section comments to organize the 3000-line lib.rs into
logical groupings:

- CONSTRUCTION METHODS (~line 159)
- CONFIGURATION & PROVIDER RESOLUTION (~line 444)
- TASK EXECUTION (~line 782)
- SESSION MANAGEMENT (~line 1069)
- CONTEXT WINDOW OPERATIONS (~line 1148)
- STREAMING & LLM INTERACTION (~line 1563)
- TOOL EXECUTION (~line 2825)

This improves code navigation and provides clear boundaries for
future extraction into separate modules.

No behavioral changes - all 191 tests pass.

Agent: fowler
2026-01-27 11:46:17 +11:00
Dhanji R. Prasanna
5b4079e861 Add prompt cache statistics tracking to /stats command
- Extend Usage struct with cache_creation_tokens and cache_read_tokens fields
- Parse Anthropic cache_creation_input_tokens and cache_read_input_tokens
- Parse OpenAI prompt_tokens_details.cached_tokens for automatic prefix caching
- Add CacheStats struct to Agent for cumulative tracking across API calls
- Add "Prompt Cache Statistics" section to /stats output showing:
  - API call count and cache hit count
  - Hit rate percentage
  - Total input tokens and cache read/creation tokens
  - Cache efficiency (% of input served from cache)
- Update all provider implementations and test files
2026-01-27 11:32:45 +11:00
Dhanji R. Prasanna
2e84f1ece0 test: fix ACD test race condition and add read_image characterization test
- Fix test_rehydrate_success race condition by using UUID for unique session IDs
- Add #[serial] attribute to prevent parallel execution conflicts
- Improve cleanup to remove entire session directory tree
- Add characterization test for resize_image_to_dimensions fallback behavior
  (documents fix from commit af8b849 for media type preservation)

Agent: hopper
2026-01-26 16:19:53 +11:00
Dhanji R. Prasanna
726e2d71f5 test: add integration test for project content surviving compaction
Add test_project_content_survives_compaction() to verify that project
content loaded via /project command persists through context compaction.

This is a CHARACTERIZATION test that validates:
- Project content appended to README message survives compaction
- The README message (containing project content) is preserved as message[1]
- PROJECT INSTRUCTIONS, ACTIVE PROJECT markers, Brief and Status sections
  all survive the compaction process

Agent: hopper
2026-01-26 16:09:17 +11:00
Dhanji R. Prasanna
9de8e8cc76 Fix compaction bug: use User role for summary to maintain alternation
The previous implementation added the summary as a System message, which
caused "Conversation must start with a user message" errors because the
first non-system message after compaction was Assistant (the preserved
last assistant message).

Fix: Change summary from System to User message, creating valid alternation:
[System Prompt] -> [Summary as USER] -> [Last Assistant] -> [Latest User]

This also prevents system message bloat across multiple compactions since
the summary is now part of the conversation flow and gets replaced on
each compaction.

Added test_second_compaction_no_bloat to verify no accumulation.
2026-01-26 15:24:04 +11:00
Dhanji R. Prasanna
5d0d532b47 feat: preserve last assistant message during compaction
When context window compaction occurs, the last assistant message is now
preserved in addition to the system prompt, README, and summary. This
improves continuity after compaction by keeping the LLM's most recent
response, which often contains important context about what was just
done or what comes next.

New message order after compaction:
[System Prompt] -> [README/AGENTS.md] -> [ACD Stub?] -> [Summary] -> [Last Assistant] -> [Latest User?]

Changes:
- Add last_assistant_message field to PreservedMessages struct
- Modify extract_preserved_messages() to find last assistant message
- Modify reset_with_summary_and_stub() to include last assistant message
- Add comprehensive integration tests using MockProvider

Tests cover edge cases:
- No assistant message exists
- Tool-call-only assistant messages (still preserved)
- Multiple assistant messages (only last one preserved)
- No trailing user message
2026-01-23 09:54:03 +05:30
Dhanji R. Prasanna
af8b849311 fix(read_image): use correct media type when resize fails to reduce size
When resize_image_to_dimensions() returns a larger file than the original,
we fall back to using the original bytes. Previously, was_resized was set
to true if the original dimensions exceeded MAX_IMAGE_DIMENSION, which
caused final_media_type to be set to 'image/jpeg' even though we were
using the original PNG bytes.

This caused Anthropic API errors like:
  'Image does not match the provided media type image/jpeg'

Fix: Set was_resized=false when falling back to original bytes, so the
original media type (detected from magic bytes) is preserved.
2026-01-22 07:58:05 +05:30
Dhanji R. Prasanna
9325a43ff3 feat(cli): shorten file paths in tool output display
Add three-level path shortening hierarchy for cleaner CLI output:
1. Project path -> <project_name>/... (when project loaded via /project)
2. Workspace path -> ./... (relative to current working directory)
3. Home path -> ~/... (fallback for paths under home directory)

Changes:
- Add shorten_path() and shorten_paths_in_command() functions in display.rs
- Add project_path/project_name fields to ConsoleUiWriter
- Add set_workspace_path(), set_project_path(), clear_project() to UiWriter trait
- Add ui_writer() getter to Agent struct
- Wire up project path setting in /project and /unproject commands
- Set workspace path when creating agents in all CLI modes

Before: ● read_file | /Users/dhanji/icloud/butler/projects/appa_estate/status.md
After:  ● read_file | appa_estate/status.md (with project loaded)
        ● read_file | ./src/main.rs (workspace-relative)
        ● read_file | ~/Documents/file.txt (home-relative)
2026-01-21 21:27:16 +05:30
Dhanji R. Prasanna
feb7c3e40d Add /project and /unproject commands for project-specific context
- Add Project struct in crates/g3-cli/src/project.rs with file loading logic
- Load brief.md, contacts.yaml, status.md from project path
- Load projects.md from workspace root for cross-project context
- Project content appended to system message (survives compaction/dehydration)
- /project <path> loads project and auto-submits prompt asking about state
- /unproject clears project content and resets context
- Add set_project_content(), clear_project_content(), has_project_content() to Agent
- Add new_for_test_with_readme() for testing with custom README content
- Add 6 unit tests for Project struct
- Add 9 integration tests for project context behavior
2026-01-21 14:53:30 +05:30
Dhanji R. Prasanna
a34a3b08e9 Rename Project Memory to Workspace Memory
Rename all references from "Project Memory" to "Workspace Memory" to avoid
future conflation if a "project" concept is introduced later.

Changes:
- Rename read_project_memory() -> read_workspace_memory()
- Update all prompts, tool descriptions, and comments
- Update header parsing in memory.rs to use "# Workspace Memory"
- Update display detection for "=== Workspace Memory ==="
- Update documentation and analysis/memory.md

11 files changed, ~36 occurrences updated.
2026-01-21 14:08:42 +05:30
Dhanji R. Prasanna
6a5ce11e7b Consolidate redundant assistant message test files
Deleted 4 redundant test files (~956 lines):
- assistant_message_dedup_test.rs (416 lines, 12 tests)
- consecutive_assistant_message_test.rs (248 lines, 6 tests)
- missing_assistant_message_test.rs (100 lines, 4 tests)
- early_return_path_test.rs (192 lines, 5 tests) - whitebox test

Created consolidated assistant_message_test.rs (369 lines, 14 tests):
- Helper function tests for consecutive message detection
- ContextWindow unit tests for normal and tool execution flows
- Bug demonstration tests documenting what bugs looked like
- Invariant tests for user/assistant alternation
- Missing assistant message fallback logic tests

The early_return_path_test was removed because it:
- Referenced specific line numbers in production code (brittle)
- Reimplemented internal logic (whitebox anti-pattern)
- Duplicated coverage from mock_provider_integration_test.rs

All 729 g3-core tests pass.
2026-01-21 10:27:07 +05:30
Dhanji R. Prasanna
c5d549c211 Readability pass: remove verbose comments and clean up tests
- completion.rs: Remove redundant comments, clean up test output (println! -> let _)
- g3_status.rs: Condense doc comments, rename from_str() to parse()
- streaming.rs: Remove obvious doc comments that duplicate function names
- simple_output.rs, ui_writer_impl.rs: Update Status::parse() calls

All changes are behavior-preserving. 132 lines removed, code is more scannable.

Agent: carmack
2026-01-21 07:13:20 +05:30
Dhanji R. Prasanna
38b0019ad4 Fix compile warnings and tweak error message format
Warnings fixed:
- Remove unused 'warn' import from retry.rs
- Prefix unused 'output' param with underscore
- Prefix unused 'rel_start' with underscore
- Add #[allow(dead_code)] to G3Status::info()

Message format tweaked per feedback:
- 'g3: model overloaded [error]' (no attempt info)
- 'g3: retrying in 2.2s (1/3) ... [done]' (attempt info moved here)
- Handle empty error message in Status::Error to show just '[error]'
2026-01-20 22:49:55 +05:30
Dhanji R. Prasanna
60578e310c Clean up error and retry messages for recoverable errors
Before:
   Error: Anthropic API error: AnthropicError { error_type: "overloaded_error", ... }
  ⚠️  Model busy detected (attempt 2/3). Retrying in 2.2s...
  [ERROR logs dumped to terminal]

After:
  g3: model overloaded [error: attempt 1/3]
  g3: retrying in 2.2s ... [done]

Changes:
- Use G3Status formatting for clean, consistent output
- Downgrade ERROR logs to debug for recoverable errors
- Apply same treatment to all recoverable error types:
  rate limited, server error, network error, timeout,
  model overloaded, token limit, context length exceeded
- Update both g3-cli (task_execution.rs) and g3-core (retry.rs)
2026-01-20 22:40:09 +05:30
Dhanji R. Prasanna
d7f22679a9 Remove '📋 Task: ' prefix from ACD stub
The first user message in dehydrated context stubs is now shown
without any prefix, consistent with the removal of 'Task: ' prefix
from user messages.
2026-01-20 21:57:12 +05:30
Dhanji R. Prasanna
07c0bf1e39 Remove 'Task: ' prefix from user messages
The prefix was causing duplication when users typed 'Task: ...' themselves,
resulting in '📋 Task: Task: ...' in context dumps.

User messages are now stored as-is without any prefix.
2026-01-20 21:53:28 +05:30
Dhanji R. Prasanna
9a0a2a2726 Make dehydration stub more compact
Change from multi-line verbose format to single-line compact format:

Before:
   DEHYDRATED CONTEXT (fragment_id: 188c7ac71613)
     • 8 messages (4 user, 4 assistant)
     • 3 tool calls (shell ×3)
     • ~299 tokens saved

     To restore this history, call: rehydrate(fragment_id: "188c7ac71613")

After:
   DEHYDRATED CONTEXT: 3 tool calls (shell x3), 8 total msgs. To restore, call: rehydrate(fragment_id: "188c7ac71613")

- Combine all info into single line
- Remove tokens saved (not essential for rehydration decision)
- Use ASCII 'x' instead of '×' for simplicity
- Add 'no tool calls' case for fragments without tools
- Update related tests
2026-01-20 21:26:42 +05:30
Dhanji R. Prasanna
4321503e89 Refactor streaming_parser.rs and context_window.rs for readability
streaming_parser.rs (879 → 806 lines, -8%):
- Extract CodeFenceTracker struct for cleaner fence state management
- Consolidate pattern matching into module-level functions
- Rename functions for clarity (find_json_object_end, parse_all_json_tool_calls)
- Add clear section headers with // === separators
- Simplify try_parse_json_tool_call state machine

context_window.rs (889 → 843 lines, -5%):
- Eliminate duplication: reset_with_summary now delegates to reset_with_summary_and_stub
- Extract PreservedMessages struct for cleaner message preservation
- Add ThinResult::no_changes() helper to reduce boilerplate
- Simplify should_compact() and should_thin() with early returns
- Add clear section headers for navigation

All 44 tests pass. Behavior unchanged.

Agent: carmack
2026-01-20 16:17:38 +05:30
Dhanji R. Prasanna
168cfff2ed refactor(g3-core): extract tool output formatting to streaming.rs
Centralize tool output formatting logic that was duplicated/scattered in
stream_completion_with_tools(). This eliminates code-path aliasing where
tool type checks were done in multiple places.

Changes:
- Add ToolOutputFormat enum (SelfHandled, Compact, Regular)
- Add format_tool_result_summary() for centralized formatting decisions
- Add is_compact_tool() and is_self_handled_tool() helper functions
- Move parse_diff_stats() from lib.rs to streaming.rs
- Simplify tool execution display logic in lib.rs using new helpers

Net effect: -86 lines in lib.rs, +112 lines in streaming.rs
The streaming.rs additions are reusable, well-named functions.

All 585+ workspace tests pass.

Agent: fowler
2026-01-20 15:45:35 +05:30
Dhanji R. Prasanna
9abb3735d2 refactor(g3-core): use StreamingState and IterationState structs in stream_completion_with_tools
Consolidate scattered state variables in the 834-line stream_completion_with_tools()
function to use the existing StreamingState and IterationState structs from
streaming.rs. This eliminates code-path aliasing where state was tracked in
multiple places and makes the streaming loop easier to reason about.

Changes:
- Add assistant_message_added field to StreamingState
- Add stream_stop_reason field to IterationState
- Replace 8 inline state variables with StreamingState::new()
- Replace 7 iteration-local variables with IterationState::new()
- All 585 workspace tests pass

This is a pure refactor with no behavior changes. The state structs were already
defined in streaming.rs but not used in the main streaming loop.

Agent: fowler
2026-01-20 15:05:23 +05:30
Dhanji R. Prasanna
10bce7f66f Remove ANSI formatting codes from g3-core
Move terminal formatting responsibility to g3-cli layer:

- format_str_replace_summary(): Remove ANSI codes, add colorize_str_replace_summary()
  helper in CLI to apply green/red colors for insertions/deletions
- format_timing_footer(): Remove dimming ANSI codes (now plain text)
- str_replace tool result: Remove ANSI codes from success message

Remaining acceptable ANSI usage in g3-core:
- iTerm2 inline image protocol (terminal-specific escape sequence)
- Image metadata dimming (direct print, would need larger refactor)
- Terminal beep for stale TODO warning (audio, not visual)
- ANSI stripping utility in research.rs (not output)

This continues the separation of concerns: g3-core handles logic,
g3-cli handles all terminal formatting.
2026-01-20 10:00:37 +05:30
Dhanji R. Prasanna
182f5f98fe Centralize g3 status message formatting
Extract a new g3_status module in g3-cli that provides consistent formatting
for all 'g3:' prefixed system status messages.

Key changes:
- Add G3Status struct with methods for progress, done, failed, error, etc.
- Add Status enum with Done, Failed, Error, Resolved, Insufficient, NoChanges
- Add ThinResult struct in g3-core for semantic thinning data
- Update UiWriter trait with print_thin_result() method
- Refactor context thinning to return ThinResult instead of formatted strings
- Update all callers to use the new centralized formatting
- Session resume/decline messages now use G3Status
- Compaction status messages now use G3Status

This maintains clean separation of concerns: g3-core emits semantic data,
g3-cli handles all terminal formatting and colors.
2026-01-20 09:50:55 +05:30
Dhanji R. Prasanna
7bd72a4a51 Add tests for tool-specific timeout durations
Adds 8 unit tests verifying:
- Research tool has 20-minute timeout
- All other tools (shell, read_file, write_file, str_replace, code_search,
  webdriver_*, etc.) have standard 8-minute timeout
- Comprehensive test_only_research_has_extended_timeout covers 19 tools

This ensures future changes don't accidentally affect other tool timeouts.
2026-01-19 21:58:16 +05:30