Commit Graph

17 Commits

Author SHA1 Message Date
Dhanji R. Prasanna
af8b849311 fix(read_image): use correct media type when resize fails to reduce size
When resize_image_to_dimensions() returns a larger file than the original,
we fall back to using the original bytes. Previously, was_resized was set
to true if the original dimensions exceeded MAX_IMAGE_DIMENSION, which
caused final_media_type to be set to 'image/jpeg' even though we were
using the original PNG bytes.

This caused Anthropic API errors like:
  'Image does not match the provided media type image/jpeg'

Fix: Set was_resized=false when falling back to original bytes, so the
original media type (detected from magic bytes) is preserved.
2026-01-22 07:58:05 +05:30
Dhanji R. Prasanna
10bce7f66f Remove ANSI formatting codes from g3-core
Move terminal formatting responsibility to g3-cli layer:

- format_str_replace_summary(): Remove ANSI codes, add colorize_str_replace_summary()
  helper in CLI to apply green/red colors for insertions/deletions
- format_timing_footer(): Remove dimming ANSI codes (now plain text)
- str_replace tool result: Remove ANSI codes from success message

Remaining acceptable ANSI usage in g3-core:
- iTerm2 inline image protocol (terminal-specific escape sequence)
- Image metadata dimming (direct print, would need larger refactor)
- Terminal beep for stale TODO warning (audio, not visual)
- ANSI stripping utility in research.rs (not output)

This continues the separation of concerns: g3-core handles logic,
g3-cli handles all terminal formatting.
2026-01-20 10:00:37 +05:30
Dhanji R. Prasanna
02655110d6 fix: auto-resize images exceeding 1568px dimension to prevent 413 Payload Too Large
The Anthropic API was rejecting requests with multiple high-resolution images
(~2000x3000 pixels each) even though individual file sizes were under limits.

Root cause: Code only checked per-image file size (3.75MB), not dimensions.
Claude recommends images ≤1568px on longest edge and has 32MB total request limit.

Changes:
- Add MAX_IMAGE_DIMENSION (1568px) and MAX_TOTAL_IMAGE_PAYLOAD (20MB) constants
- Trigger resize when dimensions > 1568px (not just file size > 3.75MB)
- Add new resize_image_to_dimensions() for dimension-constrained resizing
- Track cumulative payload size across multiple images
- Warn if total payload exceeds recommended limit

Test results with Walking Dead comic images:
- WD_0001_0001.jpg: 800KB 1987x3057 → 321KB 1019x1568
- WD_0001_1064.png: 150KB 1988x3057 → 143KB 1020x1568
- WD_0002_0001.jpg: 1023KB 1988x3056 → 292KB 1020x1568
- Total payload: ~2.5MB → ~1MB base64
2026-01-18 10:05:45 +05:30
Dhanji R. Prasanna
3a03ed0585 Fix imgcat aspect ratio by adding preserveAspectRatio=1
Images were being displayed as narrow vertical strips because
iTerm2 wasn't preserving aspect ratio when only height was specified.
2026-01-17 18:50:00 +05:30
Dhanji R. Prasanna
c7984fd4c2 fix: account for base64 encoding overhead in image size limit
The Anthropic API has a 5MB limit on base64-encoded images, not raw file
size. Base64 encoding increases size by ~33% (4/3 ratio), so a 4MB raw
image becomes ~5.3MB encoded, exceeding the limit.

Changed MAX_IMAGE_SIZE from 5MB to ~3.75MB (5MB * 3/4) to trigger
resizing before the base64-encoded result exceeds the API limit.

Also updated target resize size to 3.6MB to leave margin.
2026-01-16 21:29:05 +05:30
Dhanji R. Prasanna
1003386f7f Auto-resize large images (>=5MB) in read_image tool
Images >= 5MB are now automatically resized to < 4.9MB using ImageMagick
before being sent to the LLM. This prevents API errors from oversized images.

- Uses iterative quality/scale reduction to find optimal size
- Converts to JPEG for better compression
- Shows original and resized size in terminal output (e.g., '6.2 MB → 4.1 MB (resized)')
- Falls back to original if ImageMagick fails or isn't available
2026-01-16 21:09:38 +05:30
Dhanji R. Prasanna
6bd9c51e8e feat: shell output pagination and optimized read_file with seek
- Shell outputs > 8KB are truncated to first 500 chars
- Full output saved to .g3/sessions/<session_id>/tools/shell_stdout_<id>.txt
- LLM can use read_file with start/end to paginate through large outputs
- read_file now uses seek() for O(1) random access instead of reading entire file
- UTF-8 safe: reads extra bytes at boundaries to find valid char positions
- Falls back to lossy conversion for binary files (no panics)

Files changed:
- paths.rs: get_tools_output_dir(), generate_short_id()
- shell.rs: truncate_large_output() integration
- file_ops.rs: seek-based read_file_range() helper
- New test: read_file_utf8_test.rs
2026-01-16 09:16:16 +05:30
Dhanji R. Prasanna
38828c7757 Clean up tool output formatting
- Shell: " Command executed successfully" → "️ ran successfully"
- Write file: Remove ✏️ emoji, use plain "wrote N lines | M chars"
2026-01-14 19:42:54 +05:30
Dhanji R. Prasanna
118935d2da Remove unused variable total_lines in file_ops.rs 2026-01-13 14:25:17 +05:30
Dhanji R. Prasanna
ac17b95b24 fix(read_file): clamp end position instead of erroring when it exceeds file length
When read_file is called with an end position beyond the file length,
instead of returning an error that forces a retry, now clamps to the
actual file length and returns the content with an informative message.

This eliminates wasteful retry cycles where the LLM had to make a
second request with the corrected end position.
2026-01-12 05:11:09 +05:30
Dhanji R. Prasanna
da63e79a13 Move read_file metadata to end of output
Change read_file output format so the "🔍 N lines read" appears as
the last line after the file content, not before it. This keeps the
output cleaner with just one metadata line at the end.
2026-01-11 19:56:23 +05:30
Dhanji R. Prasanna
ed1c31dd70 Improve tool output formatting
1. str_replace: Show insertion/deletion counts with colors
   " +N insertions | -M deletions" (green/red)

2. write_file: Compact format with human-readable sizes
   " wrote N lines | Xk chars"

3. read_file: Cleaner format
   "🔍 N lines read" instead of "📄 File content (N lines)"

4. webdriver_quit: Show correct driver name (safaridriver vs chromedriver)

5. read_file: When start position exceeds file length, read last 100 chars
   with explanation instead of failing

6. shell: Remove redundant "Command failed:" prefix from error messages
2026-01-11 19:52:00 +05:30
Dhanji R. Prasanna
67be0f20c7 fix: remove allow_multiple_tool_calls config and simplify tool execution flow
This fixes a bug where the agent would stop responding abruptly without
calling final_output. The root cause was the allow_multiple_tool_calls
config option (default: false) which caused the agent to break out of
the streaming loop mid-stream after executing the first tool, losing
any subsequent content.

Changes:
- Remove allow_multiple_tool_calls config option entirely
- Always process all tool calls without breaking mid-stream
- Simplify system prompt generation (no longer needs boolean param)
- Let the stream complete fully before continuing to next iteration
- Change find_last_tool_call_start to find_first_tool_call_start
- Remove parser.reset() call on duplicate detection

Benefits:
- Simpler logic with less conditional branching
- No lost content after tool calls
- Consistent behavior for all users
- Reduced config complexity
2026-01-09 13:28:07 +11:00
Dhanji R. Prasanna
bb63050779 refactor: improve readability of streaming and file ops code
Agent: carmack

databricks.rs:
- Extract ToolCallAccumulator struct to replace opaque (String, String, String) tuple
- Add decode_utf8_streaming() helper for cleaner UTF-8 handling
- Add is_incomplete_json_error() helper for JSON parse error detection
- Add make_final_chunk() helper to reduce duplication
- Add finalize_tool_calls() to convert accumulators to final format
- Refactor parse_streaming_response from ~270 lines to ~100 lines
- Reduce nesting depth from 8+ levels to 4 levels
- Use early returns and let-else for cleaner control flow

file_ops.rs:
- Replace repetitive if-let chains with declarative PATH_CONTENT_KEYS table
- Use match expression instead of nested if-else
- Reduce extract_path_and_content from 44 lines to 20 lines

All tests pass. Behavior unchanged.
2026-01-07 12:39:05 +11:00
Dhanji R. Prasanna
386176899e Remove vision tools (except take_screenshot) and macax tools
Vision tools removed:
- extract_text (OCR from image files)
- extract_text_with_boxes (OCR with bounding boxes)
- vision_find_text (find text in app windows)
- vision_click_text (find and click on text)
- vision_click_near_text (click near text labels)

macax tools removed:
- macax_list_apps
- macax_get_frontmost_app
- macax_activate_app
- macax_press_key
- macax_type_text

The LLM can now read images directly via read_image tool.
take_screenshot is retained for capturing application windows.

Files deleted:
- crates/g3-core/src/tools/vision.rs
- crates/g3-core/src/tools/macax.rs
- docs/macax-tools.md

Updated tool counts: 12 core + 15 webdriver = 27 total
2026-01-03 17:38:25 +11:00
Dhanji R. Prasanna
29e263ac49 Fix Unicode space handling in macOS screenshot filenames
macOS uses U+202F (Narrow No-Break Space) in screenshot filenames
between the time and am/pm. When users type or paste these paths,
they use regular spaces, causing file-not-found errors.

Changes:
- Add resolve_path_with_unicode_fallback() to try U+202F variants
- Add resolve_paths_in_shell_command() for shell command paths
- Apply fix to read_file, read_image, and shell tools
- Fix read_image prompt docs: file_path -> file_paths (array)
- Add 6 unit tests for Unicode space normalization
2026-01-03 17:17:08 +11:00
Dhanji R. Prasanna
595ad6ad21 agent mode resumption 2026-01-03 14:50:08 +11:00