alex/g3 - g3 - Millerson GIT hosting

alex/g3

Author	SHA1	Message	Date
Dhanji R. Prasanna	25d35529e7	Fix --accept flag being passed through to g3 in studio run When --accept was passed after positional args (e.g., 'studio run --agent carmack task --accept'), clap's trailing_var_arg captured it as part of g3_args instead of parsing it as the studio flag. This caused g3 to error with 'unexpected argument --accept'. - Extract filter_accept_flag() helper to detect and remove --accept from trailing args - Set auto_accept=true if --accept found in either position - Add 5 unit tests for the filtering logic	2026-01-15 21:05:13 +05:30
Dhanji R. Prasanna	a84fead03b	refactor: improve readability of streaming parser and JSON filter Agent: carmack Changes: - streaming_parser.rs: Unified find_first/last_tool_call_start into single find_tool_call_start with SearchDirection enum, reducing duplication. Simplified is_json_invalidated from 45 to 20 lines with clearer logic. Fixed redundant !escape_next check in find_complete_json_object_end. - filter_json.rs: Simplified check_tool_pattern from 40 to 24 lines. Replaced repetitive prefix checks with loop over ["t", "to", "too", "tool"]. Reduced trailing return statements with direct expression returns. - ui_writer_impl.rs: Added ansi module for duration color constants. Simplified duration_color function by removing redundant comments. - language_prompts.rs: Fixed test assertions to match actual prompt content ("obvious, readable Racket" instead of "RACKET-SPECIFIC GUIDANCE"). All 174+ tests pass. No behavior changes.	2026-01-15 13:49:29 +05:30
Dhanji R. Prasanna	0ae1a13cdb	feat: real-time tool call streaming indicator with blinking UI - Add ToolParsingHint enum (Detected/Active/Complete) for UI feedback - New UiWriter methods: print_tool_streaming_hint(), print_tool_streaming_active() - Refactor ConsoleUiWriter state to use atomics in ParsingHintState - Add tool_call_streaming field to CompletionChunk for provider hints - Anthropic provider sends streaming hints when tool name detected - New streaming helpers: make_tool_streaming_hint(), make_tool_streaming_active() Parser improvements: - Add is_json_invalidated() to detect false positive tool patterns - Fix tool result poisoning when file contents contain partial JSON - Unescaped newlines in strings or prose after JSON invalidates detection User sees ' ● tool_name \|' immediately when tool call starts streaming, with blinking indicator while args are received.	2026-01-15 13:49:29 +05:30
Dhanji R. Prasanna	d68f059acf	fix: detect invalidated JSON tool calls to prevent parser poisoning When partial JSON tool call patterns appear in LLM output (e.g., from quoting file content), the parser would incorrectly report them as "incomplete tool calls", triggering auto-continue loops. Fix: Added is_json_invalidated() to detect when partial JSON has been invalidated by subsequent content that cannot be valid JSON: - Unescaped newline inside a string (invalid JSON) - Newline followed by prose text outside a string The check is only applied to incomplete JSON - complete tool calls with trailing text are still correctly detected. Added 6 new tests covering: - Tool results with partial JSON patterns - LLM quoting file content inline vs on own line - Comment prefixes (// # -- etc) with partial patterns - Real incomplete tool calls (should still be detected)	2026-01-15 13:49:29 +05:30
Dhanji R. Prasanna	999ac6fe66	fix: prevent parser poisoning from inline tool-call JSON patterns The streaming parser was incorrectly detecting tool call patterns that appeared inline in prose (e.g., when explaining the format), causing g3 to return control mid-task. Fix: Modified find_first_tool_call_start() and find_last_tool_call_start() to only recognize patterns that appear on their own line (at start of buffer or after newline with only whitespace before the pattern). Changes: - Added is_on_own_line() helper to check line-boundary conditions - Updated detection methods to skip inline patterns - Removed sanitize_inline_tool_patterns() and LBRACE_HOMOGLYPH (no longer needed) - Rewrote tests for new behavior - Added streaming_repro tests that use process_chunk() to verify the exact bug scenario 28 tests covering: streaming repro, line boundaries, Unicode, code contexts, edge cases	2026-01-15 13:49:29 +05:30
Dhanji R. Prasanna	65807eea99	Add carmack.rust.md agent-specific language prompt Rust-specific readability guidance for the carmack agent including: - let...else example for shallow control flow - Async: don't block the runtime (tokio::fs, spawn_blocking, Send) - Visibility: prefer pub(crate), private fields with accessors - Generics: impl Trait over explicit params, avoid complex where clauses - Improved iterator guidance: if you need a comment, use a loop - UTF-8 string slicing warnings - Ownership/lifetime pragmatism - Anti-patterns: no macros/typestate/proc-macros unless already in repo Also adds Rust detection to LANGUAGE_PROMPTS (empty base prompt, agent-specific prompts handle the guidance).	2026-01-15 13:49:29 +05:30
Jochen	6d1aa62ba7	Merge pull request #63 from cjustice/fix/tracing-subscriber-panic Fix tracing subscriber panic in scout agent	2026-01-15 12:54:31 +11:00
Jochen	0bca05a1ba	Merge pull request #62 from cjustice/fix/planning-verbose-flag Fix: Initialize logging before planning mode check	2026-01-15 12:51:11 +11:00
Dhanji R. Prasanna	04e3c69b0a	Add --accept flag to studio run command Automatically accept the session after g3 completes successfully, but only if there are commits on the branch. Changes: - Add --accept flag to Run command (stripped, not passed to g3) - Add has_commits_on_branch() helper using git rev-list --count - Auto-accept triggers merge to main and cleanup when: 1. g3 exits successfully (exit code 0) 2. Branch has commits ahead of main - Show warning if --accept set but no commits exist Usage: studio run --agent carmack --accept	2026-01-15 06:43:35 +05:30
Dhanji R. Prasanna	5d8dbc43f8	Add agent-specific language prompt injection When running in agent mode (e.g., --agent carmack) in a workspace with detected languages, inject agent+language-specific prompts from prompts/langs/<agent>.<lang>.md at the end of the system prompt. Changes: - Add AGENT_LANGUAGE_PROMPTS static array for compile-time embedding - Add get_agent_language_prompt() to look up specific agent+lang combos - Add get_agent_language_prompts_for_workspace_with_langs() that returns both content and matched languages for display - Update agent_mode.rs to inject prompts and show which languages loaded - Display format: '✓ carmack: racket language guidance' - Add tests for new functionality Uses the same detect_languages() mechanism as regular language prompts to avoid code-path aliasing.	2026-01-15 06:43:29 +05:30
Connor Justice	fa29a64e51	Simplify logging initialization comment Removed unnecessary comment about logging initialization.	2026-01-14 17:53:04 -05:00
Connor Justice	505225c0bd	fix: prevent panic when tracing subscriber already initialized Use try_init() instead of init() for tracing subscriber setup to gracefully handle cases where a global subscriber is already set. This fixes a panic in the scout agent subprocess when spawned by the research tool, where a dependency may have already initialized tracing. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-14 15:33:22 -05:00
Connor Justice	6532442d32	fix: initialize logging before planning mode check Move initialize_logging() call to run immediately after CLI parsing, before any mode checks. This ensures the --verbose flag works correctly in planning mode, which previously bypassed logging initialization. Previously, planning mode would return early before initialize_logging() was called, causing verbose output to be silently ignored.	2026-01-14 14:33:44 -05:00
Dhanji R. Prasanna	afec65fd50	Add language-specific prompt injection for toolchain guidance - Add language_prompts module that auto-detects programming languages in workspace - Scan for language files with depth limit (2) to inject relevant toolchain prompts - Add prompts/langs/ directory for language-specific markdown files - Include Racket/raco toolchain guidance as first language prompt - Update combine_project_content() to accept language_content parameter - Integrate language detection into main CLI flow and agent mode - Update project memory with new feature documentation	2026-01-14 21:00:52 +05:30
Dhanji R. Prasanna	f4562cd4c9	config: default agent settings and provider override	2026-01-14 20:14:33 +05:30
Dhanji R. Prasanna	38828c7757	Clean up tool output formatting - Shell: "✅ Command executed successfully" → "⚡️ ran successfully" - Write file: Remove ✏️ emoji, use plain "wrote N lines \| M chars"	2026-01-14 19:42:54 +05:30
Dhanji R. Prasanna	9ef064a041	Add guidance to shell tool description to avoid unnecessary cd prefixes LLMs were prefixing shell commands with `cd <workspace> &&` unnecessarily, wasting tokens and cluttering CLI display. Added clear guidance in the shell tool description that commands already execute in the working directory.	2026-01-14 19:00:53 +05:30
Dhanji R. Prasanna	03143ec7f8	Agent Mode Enhancements • Agent prompts are now embedded within the g3 binary • README.md - Added new "Agent Mode" section documenting: • All 7 built-in agents with their focus areas • Usage examples (--list-agents, --agent <name>) • How to create custom workspace agents Behavior 1. Workspace agents take priority - If agents/<name>.md exists in the workspace, it's used 2. Embedded fallback - If no workspace agent exists, the embedded version is used 3. Portability - g3 binary now works on any repo without needing the agents/ directory 4. Discoverability - g3 --list-agents shows all available agents and their source	2026-01-14 16:27:03 +05:30
Dhanji R. Prasanna	5104bd53b6	refactor(g3-core): improve stream_completion_with_tools readability Extract and simplify the streaming completion function: - Extract ensure_context_capacity() helper for pre-loop context management (thinning + compaction logic now in dedicated async method) - Simplify compact_summary generation block: flatten nested if/match, remove redundant comments, reorder branches for clarity - Remove dead code: unused _last_error variable and modified_tool_call - Streamline duplicate detection block: reduce verbose logging - Clean up text content display block: remove redundant comments, tighten variable declarations - Remove redundant is_todo_tool redefinition inside block expression Net reduction: 79 lines (-187/+108) Behavior unchanged, all unit tests pass. Agent: carmack	2026-01-14 15:11:53 +05:30
Dhanji R. Prasanna	996dc357b4	Skip session resume prompt when --new-session flag is passed When users explicitly pass --new-session, they want a fresh session. Previously g3 would still prompt to resume an existing session. Now the resume check is skipped entirely when the flag is set.	2026-01-14 08:54:35 +05:30
Dhanji R. Prasanna	dea0e6b1ca	Compact tool output improvements - Rename take_screenshot -> screenshot, code_coverage -> coverage (shorter names) - Align \| character across all compact tools (pad to 11 chars for str_replace) - Make code_search a compact tool with summary display - Show language and search name in code_search output (e.g., rust:"find structs") - Add format_code_search_summary() to extract match/file counts from JSON response	2026-01-14 08:12:50 +05:30
Dhanji R. Prasanna	bd25d7dace	Merge sessions/fowler/786b20b5	2026-01-14 04:28:06 +05:30
Dhanji R. Prasanna	7d17b436f9	refactor(g3-core): remove 3 unused Agent constructor variants Remove dead code - constructor variants that had no callers: - new_with_readme() - new_autonomous_with_readme() - new_with_quiet() These were thin wrappers around new_with_mode_and_readme() that were never used externally. All 5 remaining constructors have verified callers. Results: - lib.rs reduced from 2817 to 2797 lines (-20 lines) - Eliminated code-path aliasing: 8 constructors → 5 constructors - All g3-core tests pass - Full workspace compiles cleanly Agent: fowler	2026-01-14 04:26:42 +05:30
Dhanji R. Prasanna	21eb4f2d30	Only show Chrome diagnostics when there are issues Silence the diagnostic report when all checks pass to reduce noise.	2026-01-14 04:25:13 +05:30
Dhanji R. Prasanna	a1dfd9c0b6	Enhanced auto-memory with rich few-shot format - Updated memory reminder prompt with per-symbol char ranges - Added two few-shot examples: Session Continuation (feature) + UTF-8 Safe Slicing (pattern) - Updated system prompt Memory Format section to match - Format: file -> nested symbols with [start..end] ranges and descriptions - Enables direct read_file navigation to specific functions	2026-01-13 21:49:48 +05:30
Dhanji R. Prasanna	3a47ebe668	better racket example support	2026-01-13 21:16:14 +05:30
Dhanji R. Prasanna	c2f96d7048	Make WebDriver and Chrome headless enabled by default - webdriver flag now defaults to true (tools always available) - chrome_headless flag now defaults to true (Chrome is default browser) - Use --safari flag to override and use Safari instead - Updated README documentation to reflect new defaults	2026-01-13 21:14:52 +05:30
Dhanji R. Prasanna	151b8c4658	Add Racket tree-sitter support, remove Kotlin - Add tree-sitter-racket dependency (v0.24) - Initialize Racket parser in code search - Add .rkt, .rktl, .rktd file extensions - Add test_racket_search test - Remove Kotlin from supported languages (was disabled) - Clean up duplicate test files Supported languages: Rust, Python, JavaScript, TypeScript, Go, Java, C, C++, Racket	2026-01-13 18:44:59 +05:30
Dhanji R. Prasanna	5e45e110e2	refactor(g3-core): extract finalize_streaming_turn() to unify return paths Extract a single canonical helper function for completing streaming turns, eliminating 3 nearly-identical return paths in stream_completion_with_tools(). Changes: - Add finalize_streaming_turn() helper that handles: - Finishing streaming markdown - Saving context window - Adding timing footer (when requested) - Dehydrating context (when ACD enabled) - Building TaskResult - Replace 3 duplicated return blocks with calls to the helper - Remove unused mut on full_response variable Results: - Function reduced from 1067 to 999 lines (-68 lines) - Eliminated code-path aliasing: 3 paths → 1 canonical path - All 32 characterization tests pass - Full g3-core test suite passes Agent: fowler	2026-01-13 16:52:48 +05:30
Dhanji R. Prasanna	b89d55a9ff	Add characterization tests for stream_completion_with_tools Add 32 blackbox characterization tests to lock down the behavior of the stream_completion_with_tools function (1067 lines) before refactoring. Tests cover key behaviors through stable boundaries: - StreamingToolParser: tool call detection, incomplete detection, text accumulation - Auto-continue logic: autonomous mode decisions, priority ordering - Duplicate detection: sequential duplicates, cross-message duplicates - Context window: token tracking, compaction threshold, history preservation - Tool execution: read_file, shell, write_file, todo tools through Agent - Streaming utilities: LLM token cleaning, duration formatting, truncation - Parser sanitization: inline tool pattern handling, homoglyph replacement These tests intentionally do NOT assert: - Internal parser state or implementation details - Specific timing values - UI output formatting - Provider-specific behavior Agent: hopper	2026-01-13 16:25:33 +05:30
Dhanji R. Prasanna	47e3a88cf6	refactor(g3-core): extract stats formatting to dedicated module Extract the get_stats() function (158 lines) from lib.rs to a new stats.rs module. Changes: - Create stats.rs with AgentStatsSnapshot struct for capturing agent state - Replace inline formatting logic with delegation to snapshot.format() - Add unit tests for stats formatting (empty and populated states) - Reduce lib.rs from 2961 to 2818 lines (-143 lines) The new module improves: - Testability: Stats formatting can now be unit tested in isolation - Separation of concerns: Formatting logic is decoupled from Agent struct - Readability: lib.rs is more focused on core agent behavior All 271 workspace tests pass. Agent: fowler	2026-01-13 16:11:53 +05:30
Dhanji R. Prasanna	9a3b03a41f	Remove flock mode (superseded by studio) Flock mode has been superseded by the studio multi-agent workspace manager. Changes: - Remove g3-ensembles crate entirely - Remove --project, --flock-workspace, --segments, --flock-max-turns CLI flags - Remove run_flock_mode() from autonomous.rs - Remove flock-related tests from cli_integration_test.rs - Update README.md, docs/architecture.md, analysis/memory.md - Delete docs/FLOCK_MODE.md	2026-01-13 15:01:12 +05:30
Dhanji R. Prasanna	82c0165765	Fix unused variable warning and UTF-8 panic in string slicing - Remove unused total_lines variable in file_ops.rs - Fix UTF-8 boundary panic in utils.rs when generating diff error preview The code was slicing at byte index 200 which could land inside a multi-byte character (e.g., box-drawing chars like ─). Now uses character-based slicing with chars().take() instead.	2026-01-13 14:52:52 +05:30
Dhanji R. Prasanna	c65d082c5d	Make --agent optional in Studio for one-shot mode Studio can now run g3 without specifying an agent: # Agent mode (existing) studio run --agent carmack "fix the bug" # One-shot mode (new) studio run "fix the bug" When no agent is specified, sessions are created under the 'single' directory in .worktrees/sessions/single/<session-id>/ This makes Studio a complete replacement for Flock mode.	2026-01-13 14:42:20 +05:30
Dhanji R. Prasanna	f6b84d864a	Rename G3 -> g3 in docs and comments Standardize project name to lowercase 'g3' throughout documentation, comments, and configuration files. Environment variables (G3_*) are unchanged as they follow the uppercase convention.	2026-01-13 14:36:33 +05:30
Dhanji R. Prasanna	389ed6a554	Compact project info display in interactive mode Before: 🤖 AGENTS.md configuration loaded 📚 detected: G3 - AI Coding Agent 🧠 Project memory loaded workspace: /Users/dhanji/src/g3 After: >> G3 - AI Coding Agent ✓ README \| ✓ AGENTS.md \| ✓ Memory -> ~/src/g3	2026-01-13 14:32:24 +05:30
Dhanji R. Prasanna	af3aa840db	Compress session continuation UI prompt	2026-01-13 14:29:54 +05:30
Dhanji R. Prasanna	118935d2da	Remove unused variable total_lines in file_ops.rs	2026-01-13 14:25:17 +05:30
Dhanji R. Prasanna	a09967eb27	refactor(streaming): Extract deduplication and auto-continue logic into helpers Improve readability of stream_completion_with_tools (~1000 line function): - Add deduplicate_tool_calls() helper with closure for previous-message check - Add should_auto_continue() with AutoContinueReason enum for clearer control flow - Replace inline deduplication loop with helper call (-19 lines) - Replace complex auto-continue conditional with match on reason enum (-13 lines) - Add section comments for major phases (State Init, Pre-loop, Main Loop, Auto-Continue, Post-Loop) - Add comprehensive tests for new helpers Net reduction: 82 deletions, behavior unchanged (172+ tests pass) Agent: carmack	2026-01-13 11:44:06 +05:30
Dhanji R. Prasanna	dc45987e8d	Add characterization tests for UTF-8 truncation and parser sanitization Agent: hopper Adds 32 new integration tests covering recent commits: ## UTF-8 Safe Truncation Tests (14 tests) Covers commit `f30f145` (Fix UTF-8 panics): - Topic extraction with emoji, CJK, and multi-byte characters - Truncation at character boundaries (not byte boundaries) - Edge cases: exactly 50 chars, 51 chars, 2-byte/3-byte/4-byte UTF-8 - Stub generation with multi-byte topics - Combining characters and diacritics ## Parser Sanitization Tests (18 tests) Covers commit `4c36cc0` (Prevent parser poisoning): - Code block contexts (inline code, after fences, prose) - Line boundary edge cases (empty lines, whitespace, indentation) - Unicode handling (emoji, bullets, CJK before patterns) - Multiple patterns on same line - Negative cases (similar but different patterns, partial patterns) - Real-world scenarios from the original bug report All tests are blackbox/characterization style - they test observable outputs through stable public interfaces without encoding internal implementation details.	2026-01-13 11:22:46 +05:30
Dhanji R. Prasanna	8dcb7a3dba	feat: add compact styled output for TODO tools TODO tools (todo_read, todo_write) now display with a cleaner, more compact format: - Styled header: " ● todo_read" or " ● todo_write" - Tree-style prefixes for content lines (│ and └) - Checkbox conversion: "- [ ]" → □, "- [x]" → ■ - Dimmed content for visual distinction - No timing footer (cleaner output) Changes: - Add print_todo_compact() method to UiWriter trait - Implement print_todo_compact() in ConsoleUiWriter - Update todo.rs to call print_todo_compact() instead of line-by-line output - Skip tool header, output header, and timing for TODO tools in agent streaming	2026-01-13 10:58:55 +05:30
Dhanji R. Prasanna	4c36cc058c	fix: prevent parser poisoning from inline tool-call JSON patterns When the streaming parser encountered fragments of JSON that looked like partial tool calls (e.g., {"tool":) embedded in inline text (like code examples or prose), it would incorrectly enter JSON parsing mode and poison the parser state, causing control to be returned to the user mid-task. This fix: - Adds sanitize_inline_tool_patterns() to detect tool-call patterns that are NOT on their own line and replace the opening brace with a Unicode homoglyph (fullwidth left curly bracket U+FF5B) - Integrates sanitization into process_chunk() before text is buffered - Updates system prompts to instruct LLMs to use homoglyphs when showing example tool call JSON in prose - Adds comprehensive tests for the sanitization logic Real tool calls from LLMs always appear on their own line, so those are left untouched. Only inline patterns (with non-whitespace before them) are sanitized.	2026-01-13 10:58:41 +05:30
Dhanji R. Prasanna	a0b9126555	Revert "refactor(g3-core): extract streaming logic to agent_streaming.rs" This reverts commit `a2e51cf075`.	2026-01-13 07:59:18 +05:30
Dhanji R. Prasanna	6907fa36c0	UI: Add newline before auto-memory skip message	2026-01-13 07:03:42 +05:30
Dhanji R. Prasanna	08e6a1dca0	Merge sessions/fowler/f8c3f2e5	2026-01-13 06:58:01 +05:30
Dhanji R. Prasanna	98eea09dc8	UI: Show consecutive read_file calls as continuation lines When the LLM reads the same file multiple times in sequence (scrolling through a large file), instead of showing each as a separate line: ● read_file \| path [0..2000] \| 50 lines \| 100 ◉ 5ms ● read_file \| path [2000..4000] \| 50 lines \| 100 ◉ 5ms ● read_file \| path [4000..6000] \| 50 lines \| 100 ◉ 5ms Now shows a cleaner continuation format: ● read_file \| path [0..2000] \| 50 lines \| 100 ◉ 5ms └─ reading further [2000..4000] \| 50 lines \| 100 ◉ 5ms └─ reading further [4000..6000] \| 50 lines \| 100 ◉ 5ms This makes it visually clear that the agent is scrolling through a single file rather than reading multiple different files. Implementation: - Added last_read_file_path field to ConsoleUiWriter - Detect when consecutive read_file calls target the same file - Print continuation format for subsequent reads - Reset tracking when: - A different tool is executed (shell, write_file, etc.) - A different file is read - Text is output between tool calls	2026-01-13 06:25:28 +05:30
Dhanji R. Prasanna	a2e51cf075	refactor(g3-core): extract streaming logic to agent_streaming.rs Reduce lib.rs complexity by extracting the streaming completion logic: - Extract stream_completion_with_tools (~1080 lines) to agent_streaming.rs - Extract stream_with_retry helper method - Extract parse_diff_stats helper function - Add handle_pre_stream_compaction helper for cleaner pre-stream logic - Add format_tool_output helper for tool output formatting - Remove 3 unused constructor variants: - new_with_readme - new_autonomous_with_readme - new_with_quiet Results: - lib.rs reduced from 2974 to 1791 lines (40% reduction) - Streaming logic cleanly separated into dedicated module - All tests pass, no behavior changes Agent: fowler	2026-01-13 06:14:56 +05:30
Dhanji R. Prasanna	5c9404e292	Refactor: improve readability in CLI modules - project_files.rs: Fix UTF-8 safety in truncate_for_display (use char boundaries instead of byte slicing), add test for multi-byte chars - task_execution.rs: Extract recoverable_error_name() helper, use shared calculate_retry_delay() from error_handling.rs to eliminate duplication - ui_writer_impl.rs: Extract duration_color() helper for timing display, add clear_tool_state() to consolidate repeated mutex clearing patterns Agent: carmack	2026-01-13 05:58:54 +05:30
Dhanji R. Prasanna	f30f145c85	Fix UTF-8 panics and inconsistent retry logic - Fix 7 UTF-8 byte slicing panics that crash on multi-byte characters: - acd.rs: extract_topic_from_text() [..50] slice - streaming.rs: log_stream_error() [..500] slice - tools/acd.rs: rehydrate message truncation [..2000] slice - history.rs: git commit message truncation [..69] slice - planner.rs: commit summary/description truncation [..69] slices - llm.rs: requirements summary line truncation [..117] slice - All now use chars().count() and chars().take(N).collect() for UTF-8 safe truncation - Fix inconsistent retry logic in task_execution.rs: - Previously only retried on Timeout errors - Now retries on ALL recoverable errors (rate limits, network, server errors, model busy, token limits, context length) - Added error-specific base delays (rate limit: 5s, server: 2s, etc.) - Added exponential backoff with ±20% jitter - Consistent with autonomous mode retry behavior	2026-01-13 05:49:45 +05:30
Dhanji R. Prasanna	6f50d01ab6	Add comprehensive end-of-turn behavior tests for g3-core Agent: hopper Adds 56 new integration tests covering the observable end-of-turn behaviors in the streaming module: - Timing footer formatting (5 tests): verifies user-facing timing display with various durations, token counts, and context percentages - Tool call duplicate detection (6 tests): ensures identical sequential tool calls are detected while different tools/args are not - Empty response detection (9 tests): validates detection of empty, whitespace-only, and timing-only responses that trigger auto-continue - Connection error classification (5 tests): verifies EOF, connection, chunk, and body errors are correctly identified for graceful recovery - Tool output summary formatting (17 tests): covers read_file, write_file, str_replace, remember, screenshot, coverage, and rehydrate summaries - Duration formatting (4 tests): milliseconds, seconds, minutes, zero - Text truncation (4 tests): short/long strings, multiline, flag behavior - LLM token cleaning (3 tests): removal of stop tokens like <\|im_end\|> - Edge cases (4 tests): empty inputs, unicode handling, large numbers All tests are blackbox/characterization style - they test observable outputs through stable public interfaces without encoding internal implementation details. Tests remain stable under refactoring that preserves behavior.	2026-01-12 21:17:32 +05:30

1 2 3 4 5 ...

537 Commits