When resize_image_to_dimensions() returns a larger file than the original,
we fall back to using the original bytes. Previously, was_resized was set
to true if the original dimensions exceeded MAX_IMAGE_DIMENSION, which
caused final_media_type to be set to 'image/jpeg' even though we were
using the original PNG bytes.
This caused Anthropic API errors like:
'Image does not match the provided media type image/jpeg'
Fix: Set was_resized=false when falling back to original bytes, so the
original media type (detected from magic bytes) is preserved.
The Anthropic API was rejecting requests with multiple high-resolution images
(~2000x3000 pixels each) even though individual file sizes were under limits.
Root cause: Code only checked per-image file size (3.75MB), not dimensions.
Claude recommends images ≤1568px on longest edge and has 32MB total request limit.
Changes:
- Add MAX_IMAGE_DIMENSION (1568px) and MAX_TOTAL_IMAGE_PAYLOAD (20MB) constants
- Trigger resize when dimensions > 1568px (not just file size > 3.75MB)
- Add new resize_image_to_dimensions() for dimension-constrained resizing
- Track cumulative payload size across multiple images
- Warn if total payload exceeds recommended limit
Test results with Walking Dead comic images:
- WD_0001_0001.jpg: 800KB 1987x3057 → 321KB 1019x1568
- WD_0001_1064.png: 150KB 1988x3057 → 143KB 1020x1568
- WD_0002_0001.jpg: 1023KB 1988x3056 → 292KB 1020x1568
- Total payload: ~2.5MB → ~1MB base64
The Anthropic API has a 5MB limit on base64-encoded images, not raw file
size. Base64 encoding increases size by ~33% (4/3 ratio), so a 4MB raw
image becomes ~5.3MB encoded, exceeding the limit.
Changed MAX_IMAGE_SIZE from 5MB to ~3.75MB (5MB * 3/4) to trigger
resizing before the base64-encoded result exceeds the API limit.
Also updated target resize size to 3.6MB to leave margin.
Images >= 5MB are now automatically resized to < 4.9MB using ImageMagick
before being sent to the LLM. This prevents API errors from oversized images.
- Uses iterative quality/scale reduction to find optimal size
- Converts to JPEG for better compression
- Shows original and resized size in terminal output (e.g., '6.2 MB → 4.1 MB (resized)')
- Falls back to original if ImageMagick fails or isn't available
- Shell outputs > 8KB are truncated to first 500 chars
- Full output saved to .g3/sessions/<session_id>/tools/shell_stdout_<id>.txt
- LLM can use read_file with start/end to paginate through large outputs
- read_file now uses seek() for O(1) random access instead of reading entire file
- UTF-8 safe: reads extra bytes at boundaries to find valid char positions
- Falls back to lossy conversion for binary files (no panics)
Files changed:
- paths.rs: get_tools_output_dir(), generate_short_id()
- shell.rs: truncate_large_output() integration
- file_ops.rs: seek-based read_file_range() helper
- New test: read_file_utf8_test.rs
When read_file is called with an end position beyond the file length,
instead of returning an error that forces a retry, now clamps to the
actual file length and returns the content with an informative message.
This eliminates wasteful retry cycles where the LLM had to make a
second request with the corrected end position.
Change read_file output format so the "🔍 N lines read" appears as
the last line after the file content, not before it. This keeps the
output cleaner with just one metadata line at the end.
1. str_replace: Show insertion/deletion counts with colors
"✅ +N insertions | -M deletions" (green/red)
2. write_file: Compact format with human-readable sizes
"✅ wrote N lines | Xk chars"
3. read_file: Cleaner format
"🔍 N lines read" instead of "📄 File content (N lines)"
4. webdriver_quit: Show correct driver name (safaridriver vs chromedriver)
5. read_file: When start position exceeds file length, read last 100 chars
with explanation instead of failing
6. shell: Remove redundant "Command failed:" prefix from error messages
This fixes a bug where the agent would stop responding abruptly without
calling final_output. The root cause was the allow_multiple_tool_calls
config option (default: false) which caused the agent to break out of
the streaming loop mid-stream after executing the first tool, losing
any subsequent content.
Changes:
- Remove allow_multiple_tool_calls config option entirely
- Always process all tool calls without breaking mid-stream
- Simplify system prompt generation (no longer needs boolean param)
- Let the stream complete fully before continuing to next iteration
- Change find_last_tool_call_start to find_first_tool_call_start
- Remove parser.reset() call on duplicate detection
Benefits:
- Simpler logic with less conditional branching
- No lost content after tool calls
- Consistent behavior for all users
- Reduced config complexity
Agent: carmack
databricks.rs:
- Extract ToolCallAccumulator struct to replace opaque (String, String, String) tuple
- Add decode_utf8_streaming() helper for cleaner UTF-8 handling
- Add is_incomplete_json_error() helper for JSON parse error detection
- Add make_final_chunk() helper to reduce duplication
- Add finalize_tool_calls() to convert accumulators to final format
- Refactor parse_streaming_response from ~270 lines to ~100 lines
- Reduce nesting depth from 8+ levels to 4 levels
- Use early returns and let-else for cleaner control flow
file_ops.rs:
- Replace repetitive if-let chains with declarative PATH_CONTENT_KEYS table
- Use match expression instead of nested if-else
- Reduce extract_path_and_content from 44 lines to 20 lines
All tests pass. Behavior unchanged.
macOS uses U+202F (Narrow No-Break Space) in screenshot filenames
between the time and am/pm. When users type or paste these paths,
they use regular spaces, causing file-not-found errors.
Changes:
- Add resolve_path_with_unicode_fallback() to try U+202F variants
- Add resolve_paths_in_shell_command() for shell command paths
- Apply fix to read_file, read_image, and shell tools
- Fix read_image prompt docs: file_path -> file_paths (array)
- Add 6 unit tests for Unicode space normalization