Commit Graph

53 Commits

Author SHA1 Message Date
Dhanji R. Prasanna
a902be1562 Refactor system prompts to eliminate duplication; upgrade embedded provider
- Refactor prompts.rs: extract shared sections (intro, TODO, workspace memory,
  web research, response guidelines) used by both native and non-native prompts
- Fix typo in native prompt: "save them.." -> "save them."
- Fix non-native prompt: add missing closing braces in JSON examples,
  add IMPORTANT steps section, align with native prompt quality
- Add 9 unit tests to verify both prompts contain required sections
- Upgrade llama-cpp-2 dependency and refactor embedded provider
- Update config.example.toml with embedded model examples
- Update workspace memory
2026-01-28 09:56:39 +11:00
Dhanji R. Prasanna
5b4079e861 Add prompt cache statistics tracking to /stats command
- Extend Usage struct with cache_creation_tokens and cache_read_tokens fields
- Parse Anthropic cache_creation_input_tokens and cache_read_input_tokens
- Parse OpenAI prompt_tokens_details.cached_tokens for automatic prefix caching
- Add CacheStats struct to Agent for cumulative tracking across API calls
- Add "Prompt Cache Statistics" section to /stats output showing:
  - API call count and cache hit count
  - Hit rate percentage
  - Total input tokens and cache read/creation tokens
  - Cache efficiency (% of input served from cache)
- Update all provider implementations and test files
2026-01-27 11:32:45 +11:00
Dhanji R. Prasanna
2043a83e7d Add comprehensive MockProvider integration tests
Added 6 new integration tests for stream_completion_with_tools:
- test_text_before_tool_call_preserved: text before native tool call is saved
- test_native_tool_call_execution: native tool calls execute correctly
- test_duplicate_tool_calls_skipped: sequential duplicates are detected
- test_json_fallback_tool_calling: JSON tool calls work without native support
- test_text_after_tool_execution_preserved: follow-up text is saved
- test_multiple_tool_calls_executed: multiple tool calls in sequence work

Also added MockResponse helper methods:
- text_then_native_tool(): text followed by native tool call
- duplicate_native_tool_calls(): same tool call twice (for dedup testing)

Fixed text_with_json_tool() to ensure "tool" key comes before "args"
(serde_json alphabetizes keys, breaking pattern detection).

Total: 18 integration tests covering historical bugs and core behaviors.
2026-01-19 14:44:30 +05:30
Dhanji R. Prasanna
292a3aa48d Add MockProvider for integration testing
Adds a configurable mock LLM provider that can simulate various behaviors:
- Text-only responses (single or multi-chunk streaming)
- Native tool calls
- JSON tool calls in text
- Truncated responses (max_tokens)
- Multi-turn conversations

Features:
- Builder pattern for easy test setup
- Request tracking for verification
- Preset scenarios for common patterns
- Full LLMProvider trait implementation

Also adds integration tests that use MockProvider to test the
stream_completion_with_tools code path, including:
- test_butler_bug_scenario: reproduces the exact bug where text-only
  responses were not saved to context, causing consecutive user messages

This enables testing complex streaming behaviors without real API calls.
2026-01-19 13:59:31 +05:30
Dhanji R. Prasanna
01cb4f6691 fix: use consistent max_tokens defaults across providers
- Fix aliasing issue where resolve_max_tokens() used fallback_default_max_tokens
  (8192) instead of provider-specific defaults
- Update fallback_default_max_tokens from 8192 to 32000
- Set provider-specific max_tokens defaults:
  - Anthropic: 32000
  - OpenAI: 32000 (was 16000)
  - Databricks: 32000 (was 50000, now matches Anthropic as passthru)
  - Embedded: 2048
- Context window lengths unchanged:
  - OpenAI: 400,000
  - Anthropic: 200,000
  - Databricks (Claude): 200,000

This fixes the 'LLM response was cut off due to max_tokens limit' error
in agent mode that occurred because 8192 was being used instead of 32000.
2026-01-16 07:05:57 +05:30
Dhanji R. Prasanna
0ae1a13cdb feat: real-time tool call streaming indicator with blinking UI
- Add ToolParsingHint enum (Detected/Active/Complete) for UI feedback
- New UiWriter methods: print_tool_streaming_hint(), print_tool_streaming_active()
- Refactor ConsoleUiWriter state to use atomics in ParsingHintState
- Add tool_call_streaming field to CompletionChunk for provider hints
- Anthropic provider sends streaming hints when tool name detected
- New streaming helpers: make_tool_streaming_hint(), make_tool_streaming_active()

Parser improvements:
- Add is_json_invalidated() to detect false positive tool patterns
- Fix tool result poisoning when file contents contain partial JSON
- Unescaped newlines in strings or prose after JSON invalidates detection

User sees ' ● tool_name |' immediately when tool call starts streaming,
with blinking indicator while args are received.
2026-01-15 13:49:29 +05:30
Dhanji R. Prasanna
f4562cd4c9 config: default agent settings and provider override 2026-01-14 20:14:33 +05:30
Dhanji R. Prasanna
f415dbb84b Fix ACD turn summary loss and add /dump command
ACD (Aggressive Context Dehydration) fixes:
- Fixed dehydrate_context() to extract turn summary from context window
  instead of using the passed-in final_response (which contained only
  the timing footer, not the actual LLM response)
- Removed final_response parameter from dehydrate_context() since it
  now self-extracts the last assistant message as the summary
- This ensures the actual turn summary is preserved after dehydration,
  not just the timing footer

New /dump command:
- Added /dump command to dump entire context window to tmp/ for debugging
- Shows message index, role, kind, content length, and full content
- Available in both console and machine modes

UTF-8 safety:
- Fixed truncate_to_word_boundary() to use character indices instead of
  byte indices, preventing panics on multi-byte UTF-8 characters
- Added UTF-8 string slicing guidance to AGENTS.md

Agent: g3
2026-01-12 05:13:02 +05:30
Dhanji R. Prasanna
e301075666 Fix panic on multi-byte chars in filter_json buffer truncation
The buffer truncation code was slicing at a raw byte offset which could
land in the middle of a multi-byte character (like emojis), causing a
panic. Fixed by using char_indices() to find valid character boundaries.

Also added stop_reason field to CompletionChunk initializers in tests
to complete the stop_reason feature addition.

- Fix byte boundary panic in filter_json.rs line 327
- Add test for multi-byte character handling
- Update test files with missing stop_reason field
2026-01-09 15:20:57 +11:00
Dhanji R. Prasanna
3776ed847e refactor: use shared streaming helpers in openai and embedded providers
Agent: carmack

openai.rs:
- Use make_text_chunk() for streaming text content
- Use make_final_chunk() for final completion chunk
- Simplify tool_calls conversion logic

embedded.rs:
- Use make_text_chunk() for all 4 streaming text chunks
- Use make_final_chunk() for final completion chunk
- Remove unused CompletionChunk import

Net reduction: 35 lines removed
All tests pass. Behavior unchanged.
2026-01-07 13:01:03 +11:00
Dhanji R. Prasanna
2bf475960c refactor: extract shared streaming utilities module
Agent: carmack

Create crates/g3-providers/src/streaming.rs with shared helpers:
- decode_utf8_streaming(): Handle incomplete UTF-8 sequences in SSE streams
- is_incomplete_json_error(): Detect incomplete vs malformed JSON
- make_final_chunk(): Create finished completion chunks
- make_text_chunk(): Create text content chunks
- make_tool_chunk(): Create tool call chunks

Refactor anthropic.rs:
- Use shared decode_utf8_streaming (removes 15 lines of inline UTF-8 handling)
- Use make_final_chunk, make_text_chunk, make_tool_chunk helpers
- Reduces verbose CompletionChunk constructions throughout

Refactor databricks.rs:
- Remove local copies of streaming helpers (now uses shared module)
- Reduces duplication between providers

Net reduction: 118 lines removed, 16 lines added (including new module)
All tests pass. Behavior unchanged.
2026-01-07 12:48:07 +11:00
Dhanji R. Prasanna
bb63050779 refactor: improve readability of streaming and file ops code
Agent: carmack

databricks.rs:
- Extract ToolCallAccumulator struct to replace opaque (String, String, String) tuple
- Add decode_utf8_streaming() helper for cleaner UTF-8 handling
- Add is_incomplete_json_error() helper for JSON parse error detection
- Add make_final_chunk() helper to reduce duplication
- Add finalize_tool_calls() to convert accumulators to final format
- Refactor parse_streaming_response from ~270 lines to ~100 lines
- Reduce nesting depth from 8+ levels to 4 levels
- Use early returns and let-else for cleaner control flow

file_ops.rs:
- Replace repetitive if-let chains with declarative PATH_CONTENT_KEYS table
- Use match expression instead of nested if-else
- Reduce extract_path_and_content from 44 lines to 20 lines

All tests pass. Behavior unchanged.
2026-01-07 12:39:05 +11:00
Dhanji R. Prasanna
5d20da2609 Add 54 integration tests for CLI, tools, and message serialization
New test files:
- crates/g3-cli/tests/cli_integration_test.rs (14 tests)
  Blackbox CLI tests: help/version flags, argument validation,
  conflicting modes, flock mode requirements

- crates/g3-core/tests/tool_execution_test.rs (20 tests)
  Tool call structure tests and unified diff application:
  read_file, write_file, str_replace, shell, background_process,
  todo, final_output, code_search, take_screenshot

- crates/g3-providers/tests/message_serialization_test.rs (20 tests)
  Round-trip serialization tests for Message, MessageRole,
  CacheControl, and Tool types. Covers Unicode, special chars,
  and edge cases.

All tests follow blackbox/integration-first principles with
documentation of what they protect and intentionally do not assert.
2026-01-07 09:23:34 +11:00
Dhanji R. Prasanna
3601cc0547 Enhance read_image tool with magic byte detection and multi-image support
- Fix media type detection using magic bytes instead of file extension
  - Correctly identifies JPEG files with .png extension (and vice versa)
  - Supports PNG, JPEG, GIF, and WebP formats

- Add multi-image support with file_paths array parameter
  - Load multiple images in a single tool call
  - All images queued for LLM analysis

- Enhanced CLI output:
  - Inline image preview via iTerm2 imgcat protocol (height=5)
  - Dimmed info line showing: path | dimensions | media type | file size
  - Proper │ prefix alignment with tool output boxing
  - Human-readable file sizes (bytes, KB, MB)

- Add image dimension extraction from file headers
  - PNG, JPEG, GIF, WebP dimension parsing

- Add comprehensive tests for magic byte detection and dimensions
2025-12-26 11:19:37 +11:00
Dhanji R. Prasanna
3ece02ff31 fix: resolve compiler warnings across crates
- Remove unused assignment to final_output_called (returns immediately after)
- Mark cache_config field as #[allow(dead_code)] (reserved for future use)
- Mark print_status_line method as #[allow(dead_code)] (reserved for future use)
2025-12-25 18:47:22 +11:00
Dhanji R. Prasanna
923def0ab2 Convert all INFO logs to DEBUG to reduce CLI noise
Converted ~77 info! macro calls to debug! across the codebase to prevent
log messages from interrupting the CLI experience during normal operation.
Users can still see these logs by setting RUST_LOG=debug if needed.

Affected crates:
- g3-cli
- g3-computer-control
- g3-console
- g3-core
- g3-ensembles
- g3-execution
- g3-providers
2025-12-22 16:27:35 +11:00
Dhanji R. Prasanna
b4f6da6bf2 duplicate tool call bugfix 2025-12-19 15:24:03 +11:00
Jochen
ff8b3e7c7b Implement planning mode 2025-12-09 17:03:53 +11:00
Jochen
4aa84e2144 disable thinking if there is no token budget 2025-12-09 16:45:28 +11:00
Jochen
fb2cf6f898 fix for thinking budget and hardcoded max token on summary 2025-12-09 12:41:52 +11:00
Jochen
ae16243f49 Fix temperature param + add thinking for anthropic
The temperature param was not passed to the llm.
Now support anthropic models in 'thinking' mode.
2025-12-02 17:24:55 +11:00
Dhanji R. Prasanna
8928fb92be append instead of replace system msg 2025-11-29 16:13:00 +11:00
Jochen
52f78653b4 add context window monitor
Writes the current context window to logs/current_context_window (uses a symlink to a session ID).

This PR was unfortunately generated by a different LLM and did a ton of superficial reformating, it's actually a fairly small and benign change, but I don't want to roll back everything. Hope that's ok.
2025-11-27 21:00:02 +11:00
Jochen
93dc4acf86 generate internal id (debugging only)
NOT set to provider... Anthropic will reject a message with id
2025-11-27 18:30:42 +11:00
Jochen
ad198a8501 add code exploration fast start
This tries to short-circuit multiple round-trips to llm for reading code.
It's a precursor to trying to context engineer tailored to specific tasks.
In initial experiments, it's only marginally faster than regular mode, and burns more tokens.
2025-11-25 22:51:32 +11:00
Jochen
9bffd8b1bf cache_control removed from databricks 2025-11-19 12:15:49 +11:00
Jochen
bfee8040e9 regression tests added 2025-11-19 11:32:14 +11:00
Jochen
a150ba6a55 adds ttl to cache control 2025-11-18 23:23:49 +11:00
Jochen
296bf5a449 adds cache_control 2025-11-18 22:38:52 +11:00
Michael Neale
81cd956c20 allow openai to be used to name named compatible providers 2025-11-10 16:12:33 +11:00
Dhanji R. Prasanna
cef234d91a more color 2025-11-06 13:51:58 +11:00
Dhanji R. Prasanna
d78732df14 colors 2025-11-06 13:41:06 +11:00
Dhanji R. Prasanna
d007e8f471 improve code_search nudge and increase anthropic tmieout 2025-11-05 15:05:29 +11:00
Dhanji Prasanna
22a0090cdc fix unexpected EOF on streams 2025-11-04 16:28:41 +11:00
Dhanji Prasanna
f89bbfc89a fix final_output bug 2025-10-31 14:48:36 +11:00
Dhanji Prasanna
d0ac222e2e more macax tooling 2025-10-24 10:45:24 +11:00
Dhanji Prasanna
c5d6fbef08 control commands 2025-10-22 22:14:12 +11:00
Jochen
010a43d203 coach/player provider split + add OpenAI
Allows coach and player LLM providers to be separately specified.
Also adds OpenAI provider
2025-10-21 16:59:13 +11:00
Dhanji Prasanna
bb90cc7826 some fixes 2025-10-14 12:44:02 +11:00
Dhanji Prasanna
062e6de63f fix for buffered messages at end, colorized context bars 2025-10-13 13:36:37 +11:00
Dhanji Prasanna
260c949576 token counting fixes 2025-10-09 12:11:21 +11:00
Dhanji Prasanna
bcba99ec6c auto refresh token 2025-10-04 17:32:48 +10:00
Dhanji Prasanna
57b1b51e65 retro mode ui! 2025-10-02 14:47:19 +10:00
Dhanji Prasanna
046b54c49b move embedded provider to a better crate 2025-10-01 15:19:37 +10:00
Dhanji Prasanna
4e64555008 max tokens fix for databricks 2025-09-29 06:45:53 +10:00
Dhanji Prasanna
c490228824 databricks support 2025-09-27 17:28:02 +10:00
Dhanji Prasanna
5ef4a74468 minor 2025-09-26 21:38:01 +10:00
Dhanji Prasanna
dd20e0bb01 some cleanup of converstation mgmt 2025-09-22 20:38:44 +10:00
Dhanji Prasanna
64d2ac53a8 tweaks for streaming toolcalls 2025-09-22 14:25:04 +10:00
Dhanji Prasanna
9a5486f2a8 Fix for tool use 2025-09-20 20:17:50 +10:00