alex/g3 - g3 - Millerson GIT hosting

alex/g3

Author	SHA1	Message	Date
Dhanji R. Prasanna	af8b849311	fix(read_image): use correct media type when resize fails to reduce size When resize_image_to_dimensions() returns a larger file than the original, we fall back to using the original bytes. Previously, was_resized was set to true if the original dimensions exceeded MAX_IMAGE_DIMENSION, which caused final_media_type to be set to 'image/jpeg' even though we were using the original PNG bytes. This caused Anthropic API errors like: 'Image does not match the provided media type image/jpeg' Fix: Set was_resized=false when falling back to original bytes, so the original media type (detected from magic bytes) is preserved.	2026-01-22 07:58:05 +05:30
Dhanji R. Prasanna	9325a43ff3	feat(cli): shorten file paths in tool output display Add three-level path shortening hierarchy for cleaner CLI output: 1. Project path -> <project_name>/... (when project loaded via /project) 2. Workspace path -> ./... (relative to current working directory) 3. Home path -> ~/... (fallback for paths under home directory) Changes: - Add shorten_path() and shorten_paths_in_command() functions in display.rs - Add project_path/project_name fields to ConsoleUiWriter - Add set_workspace_path(), set_project_path(), clear_project() to UiWriter trait - Add ui_writer() getter to Agent struct - Wire up project path setting in /project and /unproject commands - Set workspace path when creating agents in all CLI modes Before: ● read_file \| /Users/dhanji/icloud/butler/projects/appa_estate/status.md After: ● read_file \| appa_estate/status.md (with project loaded) ● read_file \| ./src/main.rs (workspace-relative) ● read_file \| ~/Documents/file.txt (home-relative)	2026-01-21 21:27:16 +05:30
Dhanji R. Prasanna	feb7c3e40d	Add /project and /unproject commands for project-specific context - Add Project struct in crates/g3-cli/src/project.rs with file loading logic - Load brief.md, contacts.yaml, status.md from project path - Load projects.md from workspace root for cross-project context - Project content appended to system message (survives compaction/dehydration) - /project <path> loads project and auto-submits prompt asking about state - /unproject clears project content and resets context - Add set_project_content(), clear_project_content(), has_project_content() to Agent - Add new_for_test_with_readme() for testing with custom README content - Add 6 unit tests for Project struct - Add 9 integration tests for project context behavior	2026-01-21 14:53:30 +05:30
Dhanji R. Prasanna	a34a3b08e9	Rename Project Memory to Workspace Memory Rename all references from "Project Memory" to "Workspace Memory" to avoid future conflation if a "project" concept is introduced later. Changes: - Rename read_project_memory() -> read_workspace_memory() - Update all prompts, tool descriptions, and comments - Update header parsing in memory.rs to use "# Workspace Memory" - Update display detection for "=== Workspace Memory ===" - Update documentation and analysis/memory.md 11 files changed, ~36 occurrences updated.	2026-01-21 14:08:42 +05:30
Dhanji R. Prasanna	6a5ce11e7b	Consolidate redundant assistant message test files Deleted 4 redundant test files (~956 lines): - assistant_message_dedup_test.rs (416 lines, 12 tests) - consecutive_assistant_message_test.rs (248 lines, 6 tests) - missing_assistant_message_test.rs (100 lines, 4 tests) - early_return_path_test.rs (192 lines, 5 tests) - whitebox test Created consolidated assistant_message_test.rs (369 lines, 14 tests): - Helper function tests for consecutive message detection - ContextWindow unit tests for normal and tool execution flows - Bug demonstration tests documenting what bugs looked like - Invariant tests for user/assistant alternation - Missing assistant message fallback logic tests The early_return_path_test was removed because it: - Referenced specific line numbers in production code (brittle) - Reimplemented internal logic (whitebox anti-pattern) - Duplicated coverage from mock_provider_integration_test.rs All 729 g3-core tests pass.	2026-01-21 10:27:07 +05:30
Dhanji R. Prasanna	c5d549c211	Readability pass: remove verbose comments and clean up tests - completion.rs: Remove redundant comments, clean up test output (println! -> let _) - g3_status.rs: Condense doc comments, rename from_str() to parse() - streaming.rs: Remove obvious doc comments that duplicate function names - simple_output.rs, ui_writer_impl.rs: Update Status::parse() calls All changes are behavior-preserving. 132 lines removed, code is more scannable. Agent: carmack	2026-01-21 07:13:20 +05:30
Dhanji R. Prasanna	38b0019ad4	Fix compile warnings and tweak error message format Warnings fixed: - Remove unused 'warn' import from retry.rs - Prefix unused 'output' param with underscore - Prefix unused 'rel_start' with underscore - Add #[allow(dead_code)] to G3Status::info() Message format tweaked per feedback: - 'g3: model overloaded [error]' (no attempt info) - 'g3: retrying in 2.2s (1/3) ... [done]' (attempt info moved here) - Handle empty error message in Status::Error to show just '[error]'	2026-01-20 22:49:55 +05:30
Dhanji R. Prasanna	60578e310c	Clean up error and retry messages for recoverable errors Before: ❌ Error: Anthropic API error: AnthropicError { error_type: "overloaded_error", ... } ⚠️ Model busy detected (attempt 2/3). Retrying in 2.2s... [ERROR logs dumped to terminal] After: g3: model overloaded [error: attempt 1/3] g3: retrying in 2.2s ... [done] Changes: - Use G3Status formatting for clean, consistent output - Downgrade ERROR logs to debug for recoverable errors - Apply same treatment to all recoverable error types: rate limited, server error, network error, timeout, model overloaded, token limit, context length exceeded - Update both g3-cli (task_execution.rs) and g3-core (retry.rs)	2026-01-20 22:40:09 +05:30
Dhanji R. Prasanna	d7f22679a9	Remove '📋 Task: ' prefix from ACD stub The first user message in dehydrated context stubs is now shown without any prefix, consistent with the removal of 'Task: ' prefix from user messages.	2026-01-20 21:57:12 +05:30
Dhanji R. Prasanna	07c0bf1e39	Remove 'Task: ' prefix from user messages The prefix was causing duplication when users typed 'Task: ...' themselves, resulting in '📋 Task: Task: ...' in context dumps. User messages are now stored as-is without any prefix.	2026-01-20 21:53:28 +05:30
Dhanji R. Prasanna	9a0a2a2726	Make dehydration stub more compact Change from multi-line verbose format to single-line compact format: Before: ⚡ DEHYDRATED CONTEXT (fragment_id: 188c7ac71613) • 8 messages (4 user, 4 assistant) • 3 tool calls (shell ×3) • ~299 tokens saved To restore this history, call: rehydrate(fragment_id: "188c7ac71613") After: ⚡ DEHYDRATED CONTEXT: 3 tool calls (shell x3), 8 total msgs. To restore, call: rehydrate(fragment_id: "188c7ac71613") - Combine all info into single line - Remove tokens saved (not essential for rehydration decision) - Use ASCII 'x' instead of '×' for simplicity - Add 'no tool calls' case for fragments without tools - Update related tests	2026-01-20 21:26:42 +05:30
Dhanji R. Prasanna	4321503e89	Refactor streaming_parser.rs and context_window.rs for readability streaming_parser.rs (879 → 806 lines, -8%): - Extract CodeFenceTracker struct for cleaner fence state management - Consolidate pattern matching into module-level functions - Rename functions for clarity (find_json_object_end, parse_all_json_tool_calls) - Add clear section headers with // === separators - Simplify try_parse_json_tool_call state machine context_window.rs (889 → 843 lines, -5%): - Eliminate duplication: reset_with_summary now delegates to reset_with_summary_and_stub - Extract PreservedMessages struct for cleaner message preservation - Add ThinResult::no_changes() helper to reduce boilerplate - Simplify should_compact() and should_thin() with early returns - Add clear section headers for navigation All 44 tests pass. Behavior unchanged. Agent: carmack	2026-01-20 16:17:38 +05:30
Dhanji R. Prasanna	168cfff2ed	refactor(g3-core): extract tool output formatting to streaming.rs Centralize tool output formatting logic that was duplicated/scattered in stream_completion_with_tools(). This eliminates code-path aliasing where tool type checks were done in multiple places. Changes: - Add ToolOutputFormat enum (SelfHandled, Compact, Regular) - Add format_tool_result_summary() for centralized formatting decisions - Add is_compact_tool() and is_self_handled_tool() helper functions - Move parse_diff_stats() from lib.rs to streaming.rs - Simplify tool execution display logic in lib.rs using new helpers Net effect: -86 lines in lib.rs, +112 lines in streaming.rs The streaming.rs additions are reusable, well-named functions. All 585+ workspace tests pass. Agent: fowler	2026-01-20 15:45:35 +05:30
Dhanji R. Prasanna	9abb3735d2	refactor(g3-core): use StreamingState and IterationState structs in stream_completion_with_tools Consolidate scattered state variables in the 834-line stream_completion_with_tools() function to use the existing StreamingState and IterationState structs from streaming.rs. This eliminates code-path aliasing where state was tracked in multiple places and makes the streaming loop easier to reason about. Changes: - Add assistant_message_added field to StreamingState - Add stream_stop_reason field to IterationState - Replace 8 inline state variables with StreamingState::new() - Replace 7 iteration-local variables with IterationState::new() - All 585 workspace tests pass This is a pure refactor with no behavior changes. The state structs were already defined in streaming.rs but not used in the main streaming loop. Agent: fowler	2026-01-20 15:05:23 +05:30
Dhanji R. Prasanna	10bce7f66f	Remove ANSI formatting codes from g3-core Move terminal formatting responsibility to g3-cli layer: - format_str_replace_summary(): Remove ANSI codes, add colorize_str_replace_summary() helper in CLI to apply green/red colors for insertions/deletions - format_timing_footer(): Remove dimming ANSI codes (now plain text) - str_replace tool result: Remove ANSI codes from success message Remaining acceptable ANSI usage in g3-core: - iTerm2 inline image protocol (terminal-specific escape sequence) - Image metadata dimming (direct print, would need larger refactor) - Terminal beep for stale TODO warning (audio, not visual) - ANSI stripping utility in research.rs (not output) This continues the separation of concerns: g3-core handles logic, g3-cli handles all terminal formatting.	2026-01-20 10:00:37 +05:30
Dhanji R. Prasanna	182f5f98fe	Centralize g3 status message formatting Extract a new g3_status module in g3-cli that provides consistent formatting for all 'g3:' prefixed system status messages. Key changes: - Add G3Status struct with methods for progress, done, failed, error, etc. - Add Status enum with Done, Failed, Error, Resolved, Insufficient, NoChanges - Add ThinResult struct in g3-core for semantic thinning data - Update UiWriter trait with print_thin_result() method - Refactor context thinning to return ThinResult instead of formatted strings - Update all callers to use the new centralized formatting - Session resume/decline messages now use G3Status - Compaction status messages now use G3Status This maintains clean separation of concerns: g3-core emits semantic data, g3-cli handles all terminal formatting and colors.	2026-01-20 09:50:55 +05:30
Dhanji R. Prasanna	7bd72a4a51	Add tests for tool-specific timeout durations Adds 8 unit tests verifying: - Research tool has 20-minute timeout - All other tools (shell, read_file, write_file, str_replace, code_search, webdriver_*, etc.) have standard 8-minute timeout - Comprehensive test_only_research_has_extended_timeout covers 19 tools This ensures future changes don't accidentally affect other tool timeouts.	2026-01-19 21:58:16 +05:30
Dhanji R. Prasanna	4b7be3f9ee	Increase research tool timeout to 20 minutes The research tool often runs past 8 minutes due to web browsing and analysis. Increased its timeout to 20 minutes while keeping other tools at 8 minutes. Changes: - Tool timeout is now tool-specific (20 min for research, 8 min for others) - Timeout error message now shows the correct duration for each tool	2026-01-19 21:51:08 +05:30
Dhanji R. Prasanna	f4cce22db3	Add test documenting LLM duplicate text behavior Adds test_llm_repeats_text_before_each_tool_call() which documents the scenario where the LLM re-outputs the same preamble text before each tool call in a multi-tool response. Analysis showed this is LLM behavior, not a g3 bug: - Each assistant message is correctly stored with different tool calls - The duplicate display is the LLM choosing to repeat context - Storage is correct, display accurately reflects LLM output Decision: Accept as LLM behavior (Option B). Future LLM improvements may resolve this naturally without g3 code changes.	2026-01-19 18:44:01 +05:30
Dhanji R. Prasanna	1604ed613a	Add integration tests proving tool results are never parsed as tool calls Adds 3 new tests to json_parsing_stress_test.rs: - test_tool_result_with_json_not_parsed: Full agent integration test proving that JSON in tool results (sent TO the LLM) is never parsed by the streaming parser (which only sees LLM output) - test_parser_only_processes_completion_chunks: Documents that StreamingToolParser only accepts CompletionChunk, not Message objects - test_architectural_separation_documented: Documents the data flow showing tool results flow TO the LLM while the parser only sees FROM the LLM This proves the architectural guarantee: there is no code path where tool result content could be parsed as a tool call, because: 1. Tool results are Message objects added to context_window 2. The streaming parser only processes CompletionChunk from provider.stream_completion() 3. These are completely separate data types flowing in opposite directions Total: 41 JSON parsing stress tests now pass.	2026-01-19 16:21:36 +05:30
Dhanji R. Prasanna	2043a83e7d	Add comprehensive MockProvider integration tests Added 6 new integration tests for stream_completion_with_tools: - test_text_before_tool_call_preserved: text before native tool call is saved - test_native_tool_call_execution: native tool calls execute correctly - test_duplicate_tool_calls_skipped: sequential duplicates are detected - test_json_fallback_tool_calling: JSON tool calls work without native support - test_text_after_tool_execution_preserved: follow-up text is saved - test_multiple_tool_calls_executed: multiple tool calls in sequence work Also added MockResponse helper methods: - text_then_native_tool(): text followed by native tool call - duplicate_native_tool_calls(): same tool call twice (for dedup testing) Fixed text_with_json_tool() to ensure "tool" key comes before "args" (serde_json alphabetizes keys, breaking pattern detection). Total: 18 integration tests covering historical bugs and core behaviors.	2026-01-19 14:44:30 +05:30
Dhanji R. Prasanna	5caa101b84	Fix inline JSON being incorrectly detected as tool call The bug was caused by mark_tool_calls_consumed() being called after displaying each chunk, which advanced last_consumed_position to the end of the current buffer. When the next chunk arrived with JSON, the unchecked_buffer started at position 0 of the slice, causing is_on_own_line() to return true (position 0 is always "on its own line"). Removed the problematic mark_tool_calls_consumed() call from the "no tool executed" branch. The remaining call after actual tool execution is correct and necessary. Added integration test that verifies inline JSON in prose is not detected as a tool call.	2026-01-19 14:35:01 +05:30
Dhanji R. Prasanna	292a3aa48d	Add MockProvider for integration testing Adds a configurable mock LLM provider that can simulate various behaviors: - Text-only responses (single or multi-chunk streaming) - Native tool calls - JSON tool calls in text - Truncated responses (max_tokens) - Multi-turn conversations Features: - Builder pattern for easy test setup - Request tracking for verification - Preset scenarios for common patterns - Full LLMProvider trait implementation Also adds integration tests that use MockProvider to test the stream_completion_with_tools code path, including: - test_butler_bug_scenario: reproduces the exact bug where text-only responses were not saved to context, causing consecutive user messages This enables testing complex streaming behaviors without real API calls.	2026-01-19 13:59:31 +05:30
Dhanji R. Prasanna	349230d0b7	Fix missing assistant messages in context window Bug: When the LLM responded with text-only (no tool calls), the assistant message was sometimes not saved to the context window. This caused consecutive user messages where the LLM would lose track of previous responses. Root causes found and fixed: 1. Early return path (line ~2535): When stream finishes with no tools executed in previous iterations (any_tool_executed=false), the code returned early without saving the assistant message. Fixed by adding save before return. 2. Post-loop path (line ~2657): When raw_clean was empty but current_response had content, no message was saved. Fixed by falling back to current_response. Both paths now properly save the assistant message before returning. The assistant_message_added flag prevents any duplication. Added tests: - missing_assistant_message_test.rs: verifies the fallback logic - assistant_message_dedup_test.rs: verifies no duplicate messages - consecutive_assistant_message_test.rs: verifies alternation invariant	2026-01-19 13:50:28 +05:30
Dhanji R. Prasanna	02655110d6	fix: auto-resize images exceeding 1568px dimension to prevent 413 Payload Too Large The Anthropic API was rejecting requests with multiple high-resolution images (~2000x3000 pixels each) even though individual file sizes were under limits. Root cause: Code only checked per-image file size (3.75MB), not dimensions. Claude recommends images ≤1568px on longest edge and has 32MB total request limit. Changes: - Add MAX_IMAGE_DIMENSION (1568px) and MAX_TOTAL_IMAGE_PAYLOAD (20MB) constants - Trigger resize when dimensions > 1568px (not just file size > 3.75MB) - Add new resize_image_to_dimensions() for dimension-constrained resizing - Track cumulative payload size across multiple images - Warn if total payload exceeds recommended limit Test results with Walking Dead comic images: - WD_0001_0001.jpg: 800KB 1987x3057 → 321KB 1019x1568 - WD_0001_1064.png: 150KB 1988x3057 → 143KB 1020x1568 - WD_0002_0001.jpg: 1023KB 1988x3056 → 292KB 1020x1568 - Total payload: ~2.5MB → ~1MB base64	2026-01-18 10:05:45 +05:30
Dhanji R. Prasanna	3a03ed0585	Fix imgcat aspect ratio by adding preserveAspectRatio=1 Images were being displayed as narrow vertical strips because iTerm2 wasn't preserving aspect ratio when only height was specified.	2026-01-17 18:50:00 +05:30
Dhanji R. Prasanna	d600b600b8	Always keep chromedriver running for faster subsequent startups Removed the persistent_chrome config flag - chromedriver is now always kept running after webdriver_quit. This eliminates startup latency for subsequent WebDriver sessions. Safaridriver is still killed on quit since it doesn't benefit from persistence in the same way. Updated quit message to correctly indicate chromedriver remains running.	2026-01-17 09:48:10 +05:30
Dhanji R. Prasanna	8ed360024f	Add persistent ChromeDriver support for faster WebDriver startup When webdriver_start is called, now checks if chromedriver is already running on the configured port and reuses it instead of spawning a new process. This significantly reduces startup time for subsequent sessions. New config option: [webdriver] persistent_chrome = true # Keep chromedriver running between sessions When enabled, webdriver_quit closes the browser session but leaves chromedriver running for reuse by the next session.	2026-01-17 09:26:25 +05:30
Dhanji R. Prasanna	b8193bf9f9	style: use orange color for [no changes] status in thinning message	2026-01-17 04:53:42 +05:30
Dhanji R. Prasanna	74b1b9bea3	refactor: simplify context thinning status message Change format from verbose emoji-based message to cleaner status line: Before: ✨ 🥒 Context thinned at 70%: 7 tool results, ~33839 chars saved ✨ After: g3: thinning context ... 70% -> 40% ... [done] The new format shows before/after percentages and uses bold green for 'g3:' and '[done]' to match other status messages. Also removes unused emoji() and label() methods from ThinScope.	2026-01-17 04:47:16 +05:30
Dhanji R. Prasanna	c7984fd4c2	fix: account for base64 encoding overhead in image size limit The Anthropic API has a 5MB limit on base64-encoded images, not raw file size. Base64 encoding increases size by ~33% (4/3 ratio), so a 4MB raw image becomes ~5.3MB encoded, exceeding the limit. Changed MAX_IMAGE_SIZE from 5MB to ~3.75MB (5MB * 3/4) to trigger resizing before the base64-encoded result exceeds the API limit. Also updated target resize size to 3.6MB to leave margin.	2026-01-16 21:29:05 +05:30
Dhanji R. Prasanna	1003386f7f	Auto-resize large images (>=5MB) in read_image tool Images >= 5MB are now automatically resized to < 4.9MB using ImageMagick before being sent to the LLM. This prevents API errors from oversized images. - Uses iterative quality/scale reduction to find optimal size - Converts to JPEG for better compression - Shows original and resized size in terminal output (e.g., '6.2 MB → 4.1 MB (resized)') - Falls back to original if ImageMagick fails or isn't available	2026-01-16 21:09:38 +05:30
Dhanji R. Prasanna	fc702168ab	Add streaming completion integration test with mock LLM provider Adds tests to verify that: - All streaming chunks are processed before control returns to caller - Both tool calls in a multi-tool-call stream are executed - The finished signal properly terminates stream processing Also adds Agent::new_for_test() to allow injecting mock providers.	2026-01-16 20:52:32 +05:30
Dhanji R. Prasanna	0e33465342	Add print_g3_progress/print_g3_status methods for consistent status messages	2026-01-16 20:28:24 +05:30
Dhanji R. Prasanna	95f89d3f8e	Simplify compaction status messages	2026-01-16 20:26:35 +05:30
Dhanji R. Prasanna	7c59d1993c	Fix auto-memory JSON leak: tool call printed raw to UI The JSON filter only suppresses tool calls at line boundaries. When "Memory checkpoint: " was printed without a trailing newline, the LLM response `{"tool": "remember", ...}` appeared on the same line and leaked through to the UI. Fix: - Add trailing newline to "Memory checkpoint:" message - Reset JSON filter state before streaming the response Added test: test_tool_call_not_at_line_start_passes_through Documents the filter behavior and references the fix location.	2026-01-16 13:10:18 +05:30
Dhanji R. Prasanna	6bd9c51e8e	feat: shell output pagination and optimized read_file with seek - Shell outputs > 8KB are truncated to first 500 chars - Full output saved to .g3/sessions/<session_id>/tools/shell_stdout_<id>.txt - LLM can use read_file with start/end to paginate through large outputs - read_file now uses seek() for O(1) random access instead of reading entire file - UTF-8 safe: reads extra bytes at boundaries to find valid char positions - Falls back to lossy conversion for binary files (no panics) Files changed: - paths.rs: get_tools_output_dir(), generate_short_id() - shell.rs: truncate_large_output() integration - file_ops.rs: seek-based read_file_range() helper - New test: read_file_utf8_test.rs	2026-01-16 09:16:16 +05:30
Dhanji R. Prasanna	01cb4f6691	fix: use consistent max_tokens defaults across providers - Fix aliasing issue where resolve_max_tokens() used fallback_default_max_tokens (8192) instead of provider-specific defaults - Update fallback_default_max_tokens from 8192 to 32000 - Set provider-specific max_tokens defaults: - Anthropic: 32000 - OpenAI: 32000 (was 16000) - Databricks: 32000 (was 50000, now matches Anthropic as passthru) - Embedded: 2048 - Context window lengths unchanged: - OpenAI: 400,000 - Anthropic: 200,000 - Databricks (Claude): 200,000 This fixes the 'LLM response was cut off due to max_tokens limit' error in agent mode that occurred because 8192 was being used instead of 32000.	2026-01-16 07:05:57 +05:30
Dhanji R. Prasanna	a84fead03b	refactor: improve readability of streaming parser and JSON filter Agent: carmack Changes: - streaming_parser.rs: Unified find_first/last_tool_call_start into single find_tool_call_start with SearchDirection enum, reducing duplication. Simplified is_json_invalidated from 45 to 20 lines with clearer logic. Fixed redundant !escape_next check in find_complete_json_object_end. - filter_json.rs: Simplified check_tool_pattern from 40 to 24 lines. Replaced repetitive prefix checks with loop over ["t", "to", "too", "tool"]. Reduced trailing return statements with direct expression returns. - ui_writer_impl.rs: Added ansi module for duration color constants. Simplified duration_color function by removing redundant comments. - language_prompts.rs: Fixed test assertions to match actual prompt content ("obvious, readable Racket" instead of "RACKET-SPECIFIC GUIDANCE"). All 174+ tests pass. No behavior changes.	2026-01-15 13:49:29 +05:30
Dhanji R. Prasanna	0ae1a13cdb	feat: real-time tool call streaming indicator with blinking UI - Add ToolParsingHint enum (Detected/Active/Complete) for UI feedback - New UiWriter methods: print_tool_streaming_hint(), print_tool_streaming_active() - Refactor ConsoleUiWriter state to use atomics in ParsingHintState - Add tool_call_streaming field to CompletionChunk for provider hints - Anthropic provider sends streaming hints when tool name detected - New streaming helpers: make_tool_streaming_hint(), make_tool_streaming_active() Parser improvements: - Add is_json_invalidated() to detect false positive tool patterns - Fix tool result poisoning when file contents contain partial JSON - Unescaped newlines in strings or prose after JSON invalidates detection User sees ' ● tool_name \|' immediately when tool call starts streaming, with blinking indicator while args are received.	2026-01-15 13:49:29 +05:30
Dhanji R. Prasanna	d68f059acf	fix: detect invalidated JSON tool calls to prevent parser poisoning When partial JSON tool call patterns appear in LLM output (e.g., from quoting file content), the parser would incorrectly report them as "incomplete tool calls", triggering auto-continue loops. Fix: Added is_json_invalidated() to detect when partial JSON has been invalidated by subsequent content that cannot be valid JSON: - Unescaped newline inside a string (invalid JSON) - Newline followed by prose text outside a string The check is only applied to incomplete JSON - complete tool calls with trailing text are still correctly detected. Added 6 new tests covering: - Tool results with partial JSON patterns - LLM quoting file content inline vs on own line - Comment prefixes (// # -- etc) with partial patterns - Real incomplete tool calls (should still be detected)	2026-01-15 13:49:29 +05:30
Dhanji R. Prasanna	999ac6fe66	fix: prevent parser poisoning from inline tool-call JSON patterns The streaming parser was incorrectly detecting tool call patterns that appeared inline in prose (e.g., when explaining the format), causing g3 to return control mid-task. Fix: Modified find_first_tool_call_start() and find_last_tool_call_start() to only recognize patterns that appear on their own line (at start of buffer or after newline with only whitespace before the pattern). Changes: - Added is_on_own_line() helper to check line-boundary conditions - Updated detection methods to skip inline patterns - Removed sanitize_inline_tool_patterns() and LBRACE_HOMOGLYPH (no longer needed) - Rewrote tests for new behavior - Added streaming_repro tests that use process_chunk() to verify the exact bug scenario 28 tests covering: streaming repro, line boundaries, Unicode, code contexts, edge cases	2026-01-15 13:49:29 +05:30
Dhanji R. Prasanna	f4562cd4c9	config: default agent settings and provider override	2026-01-14 20:14:33 +05:30
Dhanji R. Prasanna	38828c7757	Clean up tool output formatting - Shell: "✅ Command executed successfully" → "⚡️ ran successfully" - Write file: Remove ✏️ emoji, use plain "wrote N lines \| M chars"	2026-01-14 19:42:54 +05:30
Dhanji R. Prasanna	9ef064a041	Add guidance to shell tool description to avoid unnecessary cd prefixes LLMs were prefixing shell commands with `cd <workspace> &&` unnecessarily, wasting tokens and cluttering CLI display. Added clear guidance in the shell tool description that commands already execute in the working directory.	2026-01-14 19:00:53 +05:30
Dhanji R. Prasanna	5104bd53b6	refactor(g3-core): improve stream_completion_with_tools readability Extract and simplify the streaming completion function: - Extract ensure_context_capacity() helper for pre-loop context management (thinning + compaction logic now in dedicated async method) - Simplify compact_summary generation block: flatten nested if/match, remove redundant comments, reorder branches for clarity - Remove dead code: unused _last_error variable and modified_tool_call - Streamline duplicate detection block: reduce verbose logging - Clean up text content display block: remove redundant comments, tighten variable declarations - Remove redundant is_todo_tool redefinition inside block expression Net reduction: 79 lines (-187/+108) Behavior unchanged, all unit tests pass. Agent: carmack	2026-01-14 15:11:53 +05:30
Dhanji R. Prasanna	dea0e6b1ca	Compact tool output improvements - Rename take_screenshot -> screenshot, code_coverage -> coverage (shorter names) - Align \| character across all compact tools (pad to 11 chars for str_replace) - Make code_search a compact tool with summary display - Show language and search name in code_search output (e.g., rust:"find structs") - Add format_code_search_summary() to extract match/file counts from JSON response	2026-01-14 08:12:50 +05:30
Dhanji R. Prasanna	7d17b436f9	refactor(g3-core): remove 3 unused Agent constructor variants Remove dead code - constructor variants that had no callers: - new_with_readme() - new_autonomous_with_readme() - new_with_quiet() These were thin wrappers around new_with_mode_and_readme() that were never used externally. All 5 remaining constructors have verified callers. Results: - lib.rs reduced from 2817 to 2797 lines (-20 lines) - Eliminated code-path aliasing: 8 constructors → 5 constructors - All g3-core tests pass - Full workspace compiles cleanly Agent: fowler	2026-01-14 04:26:42 +05:30
Dhanji R. Prasanna	a1dfd9c0b6	Enhanced auto-memory with rich few-shot format - Updated memory reminder prompt with per-symbol char ranges - Added two few-shot examples: Session Continuation (feature) + UTF-8 Safe Slicing (pattern) - Updated system prompt Memory Format section to match - Format: file -> nested symbols with [start..end] ranges and descriptions - Enables direct read_file navigation to specific functions	2026-01-13 21:49:48 +05:30
Dhanji R. Prasanna	3a47ebe668	better racket example support	2026-01-13 21:16:14 +05:30

1 2 3 4 5 ...

417 Commits