alex/g3 - g3 - Millerson GIT hosting

alex/g3

Author	SHA1	Message	Date
Dhanji R. Prasanna	0aa1287ca6	Remove final_output tool and improve scout report handback final_output removal: - Remove final_output from tool definitions and dispatch - Update system prompts to request summaries as regular text - Remove final_output_called field from StreamingState - Update auto_continue tests to remove final_output_called parameter - Remove final_output test from tool_execution_test.rs - Update planner and flock prompts to not reference final_output - Keep backwards-compat code in feedback_extraction.rs and task_result.rs Scout report handback: - Change from file-based to delimiter-based report extraction - Scout outputs report between ---SCOUT_REPORT_START/END--- markers - Research tool extracts content between markers, strips ANSI codes - Add comprehensive tests for extraction and ANSI stripping 657 tests pass.	2026-01-10 13:43:04 +11:00
Dhanji R. Prasanna	cab2fb187a	Stream scout agent output to CLI during research The research tool now streams the underlying scout agent's output to the CLI in real-time for visual indication of progress. This output is displayed but not added to the conversation context.	2026-01-09 20:39:53 +11:00
Dhanji R. Prasanna	91239ae2ca	modified scout to be more HTML aggressive for content	2026-01-09 20:37:21 +11:00
Dhanji R. Prasanna	c88ffa2431	Remove final_output tool, improve scout agent - Remove final_output tool to allow LLM responses to stream naturally - Update system prompts to request summaries instead of tool calls - Rename final_output_summary to summary in session continuation - Update tool count tests (12→11 core tools, 27→26 total) - Delete obsolete final_output tests Scout agent improvements: - Simplify WebDriver usage instructions - Prefer DuckDuckGo/Brave/Bing over Google - Support passing task directly to agent mode - Suppress completion message for scout (needs clean output for research tool)	2026-01-09 20:30:00 +11:00
Dhanji R. Prasanna	22d1ac8096	Move WebDriver instructions from main prompt to scout agent Simplified the main system prompt's web research section to just direct users to the research tool. Moved the detailed WebDriver usage instructions to scout.md where they belong, since the scout agent is the one that actually uses WebDriver for research. Main prompt now simply says: use the research tool for web research. Scout agent now has the full WebDriver best practices documentation.	2026-01-09 16:01:47 +11:00
Dhanji R. Prasanna	33e5705fc3	Add research tool for web-based research via scout agent New tool that spawns a scout agent to perform web research and return a structured research brief. The scout agent uses webdriver to browse the web and returns a decision-ready report. Changes: - Added 'research' tool definition (12 core tools total) - Added research tool dispatch in tool_dispatch.rs - Created tools/research.rs implementation: - Spawns 'g3 --agent scout <query>' as subprocess - Captures stdout and extracts last line (report file path) - Reads and returns the report file contents - Added exclude_research flag to ToolConfig - Scout agent (agent_name == 'scout') does NOT have access to research tool to prevent infinite recursion - Updated system prompts to describe when to use research tool - Added scout.md agent prompt with research brief output contract The research tool is preferred for complex research tasks (APIs, SDKs, libraries, approaches, bugs). WebDriver can still be used directly for simple lookups or fine-grained control.	2026-01-09 15:59:19 +11:00
Dhanji R. Prasanna	de50726eeb	Prefer ripgrep over grep in system prompts Added guidance to use rg (ripgrep) instead of grep in shell commands. Ripgrep is faster, has better defaults, and respects .gitignore.	2026-01-09 15:28:04 +11:00
Dhanji R. Prasanna	e301075666	Fix panic on multi-byte chars in filter_json buffer truncation The buffer truncation code was slicing at a raw byte offset which could land in the middle of a multi-byte character (like emojis), causing a panic. Fixed by using char_indices() to find valid character boundaries. Also added stop_reason field to CompletionChunk initializers in tests to complete the stop_reason feature addition. - Fix byte boundary panic in filter_json.rs line 327 - Add test for multi-byte character handling - Update test files with missing stop_reason field	2026-01-09 15:20:57 +11:00
Dhanji R. Prasanna	c470964628	Fix: Save LLM text response to context after tool execution When the LLM executes a tool and then outputs text (e.g., analysis after reading images), the text was being displayed during streaming but never saved to the context window. This caused: 1. The response to appear truncated in the session log 2. Loss of context for subsequent turns 3. The LLM losing track of what it had already said The fix saves current_response to the context window before breaking out of the streaming loop for auto-continue after tool execution. Reproduction scenario: - User asks LLM to read images and analyze them - LLM calls read_image tool - Tool executes successfully - LLM outputs analysis text ("Now I can see the results...") - Text was displayed but lost from session log Now the text is properly persisted to the context window.	2026-01-09 15:04:43 +11:00
Dhanji R. Prasanna	777191b3cb	Remove final_output tool - let summaries stream naturally - Remove final_output from tool definitions, dispatch, and misc tools - Update system prompts to request summaries as regular markdown text - Remove print_final_output from UiWriter trait and all implementations - Remove final_output handling from agent core logic - Rename final_output_summary → summary in session continuation - Delete final_output test files - Update tool count tests (12→11, 27→26) This allows LLM summaries to stream through the markdown formatter for a more natural, responsive user experience instead of buffering everything into a tool call.	2026-01-09 14:57:24 +11:00
Dhanji R. Prasanna	bebf04c7bd	Tighten system prompt	2026-01-09 14:11:19 +11:00
Dhanji R. Prasanna	d96d8c1d90	Rewrite JSON tool call filter with clean state machine Fixes bug where JSON tool calls were printed as text due to chunking issues. Changes: - Complete rewrite of filter_json.rs with 3-state machine: - Streaming: normal pass-through, watches for newline + whitespace + { - Buffering: confirms/denies tool pattern with ~20 char buffer - Suppressing: string-aware brace counting until balanced - Character-by-character processing eliminates chunk boundary issues - Proper handling of } inside JSON strings (was causing premature exit) - Detects truncated JSON followed by complete JSON (LLM retry case) - Removed regex dependency, simpler pattern matching - Added 59 stress tests covering malformed JSON, partial patterns, streaming edge cases, adversarial inputs, and real-world patterns All 86 filter_json tests pass.	2026-01-09 14:05:11 +11:00
Dhanji R. Prasanna	49b27b0cbc	fix: truncate long lines in streaming tool output to prevent terminal wrapping When shell commands output very long lines (e.g., JSON content from tail -c 10000), the lines would wrap in the terminal. The cursor-up escape code (\x1b[1A) only moves up one visual line, not the entire wrapped content, causing the display to fill with uncleared text. This fix truncates lines to 120 characters in update_tool_output_line() before displaying them, preventing the wrapping issue.	2026-01-09 13:35:58 +11:00
Dhanji R. Prasanna	67be0f20c7	fix: remove allow_multiple_tool_calls config and simplify tool execution flow This fixes a bug where the agent would stop responding abruptly without calling final_output. The root cause was the allow_multiple_tool_calls config option (default: false) which caused the agent to break out of the streaming loop mid-stream after executing the first tool, losing any subsequent content. Changes: - Remove allow_multiple_tool_calls config option entirely - Always process all tool calls without breaking mid-stream - Simplify system prompt generation (no longer needs boolean param) - Let the stream complete fully before continuing to next iteration - Change find_last_tool_call_start to find_first_tool_call_start - Remove parser.reset() call on duplicate detection Benefits: - Simpler logic with less conditional branching - No lost content after tool calls - Consistent behavior for all users - Reduced config complexity	2026-01-09 13:28:07 +11:00
Dhanji R. Prasanna	a72d5a650a	Fix two markdown formatting bugs Bug 1: Inline code after list bullets not detected - After emitting a list bullet, at_line_start was not set to false - This caused the next backtick to be treated as a potential code fence - Fixed by setting at_line_start = false after emitting bullet Bug 2: Code block closing on indented backticks - Code blocks containing indented ``` (4+ spaces) were closing prematurely - The .trim() check was too permissive - Fixed by only allowing closing fence with <= 3 spaces indent (CommonMark spec) Added tests for both edge cases.	2026-01-08 20:50:26 +11:00
Dhanji R. Prasanna	19a804e0be	Add syntax highlighting for Racket, Elisp, and Scheme Add language alias mapping in highlight_code() to map: - racket, rkt -> lisp - elisp, emacs-lisp -> lisp - scheme -> lisp - common-lisp, cl -> lisp - shell, sh, zsh, dockerfile -> bash Syntect's built-in Lisp syntax handles all Lisp-family languages well. Added test to verify the aliases work correctly.	2026-01-08 20:35:34 +11:00
Dhanji R. Prasanna	df706308ca	Unify final_output rendering with streaming markdown formatter Replace the separate syntax_highlight module with the streaming markdown formatter for final_output rendering. This: - Removes special buffered rendering logic for final_output - Uses the same StreamingMarkdownFormatter used for agent responses - Removes the spinner animation (content renders immediately) - Deletes the now-unused syntax_highlight.rs module - Updates test to use the streaming formatter Benefits: - Consistent rendering across all markdown output - Less code to maintain (removed ~250 lines) - Same syntax highlighting via syntect (already in streaming formatter)	2026-01-08 20:30:44 +11:00
Dhanji R. Prasanna	347513b04c	Add comprehensive stress tests for streaming markdown formatter Add 10 stress tests covering: - Nested formatting (bold in italic, italic in bold) - Empty/minimal content edge cases - Escape sequences and special characters - Lists with complex inline formatting - Links with various content types - Tables with formatting in cells - Code blocks (should not format contents) - Mixed block elements (headers, quotes, rules) - Nested lists (3+ levels, mixed types) - Pathological/adversarial inputs (unbalanced delimiters, unicode, long lines) All 45 tests pass.	2026-01-08 20:27:28 +11:00
Dhanji R. Prasanna	fadfaee040	update gitingore	2026-01-08 13:50:03 +11:00
Dhanji R. Prasanna	381b852869	refactor(g3-core): Extract streaming utilities into dedicated module Extract reusable utilities from the massive stream_completion_with_tools function into a new streaming.rs module for improved readability: - format_duration, format_timing_footer: timing display helpers - clean_llm_tokens: consolidates 4 duplicate token-cleaning call sites - log_stream_error: extracts 70+ lines of error logging - is_empty_response, is_connection_error: predicate helpers - truncate_for_display, truncate_line: string truncation utilities - StreamingState, IterationState: state structs for future refactoring Results: - lib.rs reduced from 2978 to 2840 lines (138 lines, ~5%) - New streaming.rs: 309 lines with 5 unit tests - All 98+ tests pass Agent: carmack	2026-01-08 13:20:11 +11:00
Dhanji R. Prasanna	267ef00848	refactor: extract session helper in webdriver.rs to reduce boilerplate Agent: carmack Add get_session() helper function that: - Checks if webdriver is enabled - Acquires the session read lock - Returns the cloned session or an error message Refactored 12 webdriver tool functions to use this helper: - execute_webdriver_navigate - execute_webdriver_get_url - execute_webdriver_get_title - execute_webdriver_find_element - execute_webdriver_find_elements - execute_webdriver_click - execute_webdriver_send_keys - execute_webdriver_execute_script - execute_webdriver_get_page_source - execute_webdriver_screenshot - execute_webdriver_back - execute_webdriver_forward - execute_webdriver_refresh Each function previously had ~10 lines of identical boilerplate. Now reduced to 4 lines using the helper. Net reduction: 68 lines (678 -> 610) All tests pass. Behavior unchanged.	2026-01-08 13:05:44 +11:00
Dhanji R. Prasanna	5bfaee8dd5	use consistent naming for compaction	2026-01-08 12:54:03 +11:00
Dhanji R. Prasanna	3776ed847e	refactor: use shared streaming helpers in openai and embedded providers Agent: carmack openai.rs: - Use make_text_chunk() for streaming text content - Use make_final_chunk() for final completion chunk - Simplify tool_calls conversion logic embedded.rs: - Use make_text_chunk() for all 4 streaming text chunks - Use make_final_chunk() for final completion chunk - Remove unused CompletionChunk import Net reduction: 35 lines removed All tests pass. Behavior unchanged.	2026-01-07 13:01:03 +11:00
Dhanji R. Prasanna	2bf475960c	refactor: extract shared streaming utilities module Agent: carmack Create crates/g3-providers/src/streaming.rs with shared helpers: - decode_utf8_streaming(): Handle incomplete UTF-8 sequences in SSE streams - is_incomplete_json_error(): Detect incomplete vs malformed JSON - make_final_chunk(): Create finished completion chunks - make_text_chunk(): Create text content chunks - make_tool_chunk(): Create tool call chunks Refactor anthropic.rs: - Use shared decode_utf8_streaming (removes 15 lines of inline UTF-8 handling) - Use make_final_chunk, make_text_chunk, make_tool_chunk helpers - Reduces verbose CompletionChunk constructions throughout Refactor databricks.rs: - Remove local copies of streaming helpers (now uses shared module) - Reduces duplication between providers Net reduction: 118 lines removed, 16 lines added (including new module) All tests pass. Behavior unchanged.	2026-01-07 12:48:07 +11:00
Dhanji R. Prasanna	bb63050779	refactor: improve readability of streaming and file ops code Agent: carmack databricks.rs: - Extract ToolCallAccumulator struct to replace opaque (String, String, String) tuple - Add decode_utf8_streaming() helper for cleaner UTF-8 handling - Add is_incomplete_json_error() helper for JSON parse error detection - Add make_final_chunk() helper to reduce duplication - Add finalize_tool_calls() to convert accumulators to final format - Refactor parse_streaming_response from ~270 lines to ~100 lines - Reduce nesting depth from 8+ levels to 4 levels - Use early returns and let-else for cleaner control flow file_ops.rs: - Replace repetitive if-let chains with declarative PATH_CONTENT_KEYS table - Use match expression instead of nested if-else - Reduce extract_path_and_content from 44 lines to 20 lines All tests pass. Behavior unchanged.	2026-01-07 12:39:05 +11:00
Dhanji R. Prasanna	532ed132f7	Few shot prompts for carmack	2026-01-07 12:33:11 +11:00
Dhanji R. Prasanna	4e7aca50fa	feat: royal blue tool names in agent mode + fix README heading display - Add set_agent_mode() to UiWriter trait for visual mode differentiation - ConsoleUiWriter uses royal blue (ANSI 256 color 69) for tool names in agent mode - Fix extract_readme_heading() to search only README section of combined content (was incorrectly showing AGENTS.md heading instead of README heading)	2026-01-07 11:37:51 +11:00
Dhanji R. Prasanna	189fdec006	Carmack agent	2026-01-07 11:18:27 +11:00
Dhanji R. Prasanna	1980e62511	Improve code readability in g3-core - streaming_parser.rs: Rename has_message_like_keys to args_contain_prose_fragments with improved documentation explaining the heuristic for detecting malformed tool calls where LLM prose leaked into JSON keys - context_window.rs: Simplify build_thin_result_message using early return pattern and match expression for cleaner control flow Agent: carmack	2026-01-07 11:16:42 +11:00
Dhanji R. Prasanna	2e9535974d	removed testing craft	2026-01-07 10:46:37 +11:00
Dhanji R. Prasanna	775bcd10a5	chore: remove g3-console crate entirely The g3-console crate was not referenced by any other crate in the workspace and appears to be an abandoned web console implementation. Removed: - crates/g3-console/ (entire directory) - Workspace member entry in Cargo.toml Agent: fowler	2026-01-07 10:41:46 +11:00
Dhanji R. Prasanna	1056b4193b	chore(g3-cli): remove orphaned retro_tui and tui modules These files were not referenced anywhere in the codebase and appear to be leftover from a previous TUI implementation that was abandoned. Removed: - crates/g3-cli/src/retro_tui.rs (62KB) - crates/g3-cli/src/tui.rs (6KB) Agent: fowler	2026-01-07 10:39:42 +11:00
Dhanji R. Prasanna	48036d01e3	fix(g3-core): disable auto-continue in interactive mode Auto-continue was incorrectly triggering when the LLM asked questions in interactive/chat mode. Now auto-continue only activates when is_autonomous is true, allowing proper back-and-forth conversation in interactive mode. Agent: fowler	2026-01-07 10:37:30 +11:00
Dhanji R. Prasanna	a553764e93	docs(agents): add git authorship rule to all agent prompts Ensure agents never override git author/email and instead put their identity in the commit message body. Agent: fowler	2026-01-07 10:27:44 +11:00
Dhanji R. Prasanna	b73dfacb7a	refactor(g3-core): extract provider_registration and session modules Extract two focused modules from the monolithic lib.rs (3372 lines): 1. provider_registration.rs (233 lines) - Consolidates duplicated provider registration patterns - Single determine_providers_to_register() function for mode-based selection - Unified register_providers() async function for all provider types - Includes unit tests for registration logic 2. session.rs (394 lines) - Session ID generation (generate_session_id) - Context window persistence (save_context_window, write_context_window_summary) - Error logging (log_error_to_session) - Utility functions (format_token_count, token_indicator) - Session restoration helper (restore_from_session_log) - Includes comprehensive unit tests Also fixes: - Removed redundant tool_executed assignment that triggered unused warning - Removed unused Message import in session.rs Results: - lib.rs reduced from 3372 to 2976 lines (-396 lines, -11.7%) - All tests pass, no warnings - Behavior preserved (pure mechanical extraction) Agent: fowler	2026-01-07 10:20:28 +11:00
Dhanji R. Prasanna	c4ae85de72	Add --new-session flag to skip session resumption in agent mode Adds a new CLI flag that allows users to force a new session when running in agent mode, bypassing the automatic detection and resumption of incomplete sessions. Usage: g3 --agent my-agent --new-session	2026-01-07 09:59:15 +11:00
Dhanji R. Prasanna	f0bd7959b1	chore(analysis): update dependency analysis artifacts Authored by: Structural Analysis Agent (Euler) Updated all dependency analysis artifacts with fresh extraction: - graph.json: Canonical dependency graph with 10 crates, 139 files, 16 crate edges, 72 file edges - graph.summary.md: Overview with fan-in/fan-out rankings and crate inventory - sccs.md: SCC analysis confirming no cycles at crate or file level (clean DAG) - layers.observed.md: 5-layer architecture diagram derived from dependencies - hotspots.md: Coupling hotspots (g3-config highest fan-in, g3-cli highest fan-out) - limitations.md: Documented extraction limitations (conditional compilation, macros, etc.) Key findings: - All 10 workspace crates form a directed acyclic graph - g3-core/src/ui_writer.rs has highest file-level fan-in (10 dependents) - g3-console is standalone with no workspace dependencies - Clean layered architecture with no violations detected	2026-01-07 09:36:52 +11:00
Dhanji R. Prasanna	ff08a622eb	ask all agents to commit their work	2026-01-07 09:31:02 +11:00
Dhanji R. Prasanna	5d20da2609	Add 54 integration tests for CLI, tools, and message serialization New test files: - crates/g3-cli/tests/cli_integration_test.rs (14 tests) Blackbox CLI tests: help/version flags, argument validation, conflicting modes, flock mode requirements - crates/g3-core/tests/tool_execution_test.rs (20 tests) Tool call structure tests and unified diff application: read_file, write_file, str_replace, shell, background_process, todo, final_output, code_search, take_screenshot - crates/g3-providers/tests/message_serialization_test.rs (20 tests) Round-trip serialization tests for Message, MessageRole, CacheControl, and Tool types. Covers Unicode, special chars, and edge cases. All tests follow blackbox/integration-first principles with documentation of what they protect and intentionally do not assert.	2026-01-07 09:23:34 +11:00
Dhanji R. Prasanna	9cb6282719	update lamport	2026-01-07 09:07:29 +11:00
Dhanji R. Prasanna	311b3bd75a	added hopper testing agent and updated fowler to use euler	2026-01-07 09:06:46 +11:00
Dhanji R. Prasanna	e2445a5d22	refactor(g3-core): extract duplicate detection helper and consolidate thinning - Extract check_duplicate_in_previous_message() helper to reduce nesting from 6+ levels to 2 levels in stream_completion_with_tools - Create do_thin_context() and do_thin_context_all() helpers to centralize context thinning with event tracking - Use provider_config::parse_provider_ref() in additional call sites - All 295 tests pass This continues the refactoring to eliminate code-path aliasing and reduce cyclomatic complexity in the Agent implementation.	2026-01-07 08:45:51 +11:00
Dhanji R. Prasanna	a87928661d	Remove overly broad .json from .gitignore The blanket .json ignore is not canonical for Rust projects. JSON files that need ignoring are already covered by: - .g3/ for session logs - logs/ for error logs - .build for Swift build artifacts	2026-01-06 13:54:27 +11:00
Dhanji R. Prasanna	2d8e733820	Add dependency graph JSON data Add exception to .gitignore for analysis/deps/graph.json	2026-01-06 13:24:01 +11:00
Dhanji R. Prasanna	6d6aed563d	Add structural dependency analysis artifacts - graph.json: Canonical dependency graph (10 crates, 16 edges, 76 files) - graph.summary.md: One-page overview with fan-in/fan-out rankings - sccs.md: Strongly Connected Components analysis (no cycles) - layers.observed.md: 5-layer architecture diagram - hotspots.md: Coupling hotspots (g3-config, g3-cli) - limitations.md: Extraction limitations and validity conditions	2026-01-06 13:23:24 +11:00
Dhanji R. Prasanna	764d1bf67e	Add ./tmp/ to .gitignore	2026-01-06 12:50:14 +11:00
Dhanji R. Prasanna	2592fee5d5	Generalize lamport.md examples to be language-agnostic - Changed Rust-specific examples to generic ones: - 'Tool calls must be valid JSON' → 'API responses must be valid JSON' - 'Never block the async runtime' → 'Never block the event loop' - 'Crate/module' → 'Module/package' - 'run cargo test' → 'basic commands'	2026-01-06 12:49:00 +11:00
Dhanji R. Prasanna	e2fffaab94	Slim down AGENTS.md and update lamport.md for machine-specific output AGENTS.md changes: - Removed redundant sections that duplicated README.md: - System Overview (crate table) - File Structure Quick Reference - Testing Strategy - Pointers to Documentation - Architecture Decisions - Kept unique machine-specific sections: - Critical Invariants (merged Performance Constraints) - Recommended Entry Points - Dangerous/Subtle Code Paths - Do's and Don'ts for Automated Changes - Common Incorrect Assumptions - Dependency Analysis Artifacts - Reduced from ~220 lines to ~116 lines lamport.md changes: - Rewrote AGENTS.md section with explicit instructions - Added REQUIRED sections list (5 sections only) - Added DO NOT include list to prevent README duplication - AGENTS.md now points to README for architecture/usage	2026-01-06 12:46:40 +11:00
Dhanji R. Prasanna	6d2cab93f5	Extend euler.md to require AGENTS.md updates The Euler agent must now update AGENTS.md after generating artifacts: - Add/update 'Dependency Analysis Artifacts' section - Table listing each file in analysis/deps/ with one-line descriptions - No findings, metrics, or recommendations in AGENTS.md	2026-01-06 12:35:12 +11:00
Dhanji R. Prasanna	9132c441f1	Remove Key findings section from dependency analysis docs	2026-01-06 12:33:48 +11:00

... 6 7 8 9 10 ...

787 Commits