Commit Graph

801 Commits

Author SHA1 Message Date
Dhanji R. Prasanna
5085f10717 Merge sessions/interactive/07eabd99 2026-02-07 12:29:56 +11:00
Dhanji R. Prasanna
afaee8816c tweak to system prompt 2026-02-06 20:32:19 +11:00
Dhanji R. Prasanna
14112ff92e Remove client-side plan approval interception
Let approval input flow through the LLM instead of being
short-circuited in the REPL. The LLM calls plan_approve
itself, which is cleaner (single input path) and more
flexible (no hardcoded misspelling list).
2026-02-06 20:16:11 +11:00
Dhanji R. Prasanna
799b4ced8e Remove auto-submit status prompt from /project command
The /project command was auto-invoking a status report ("what is the
current state of the project?") as the first user message after loading
project files. This was inconsistent with the --project flag behavior,
which only loads files and displays status without auto-prompting.

Removed the auto-submit lines so /project now behaves identically to
the --project CLI flag: load files, set context, display status, done.
2026-02-06 16:12:33 +11:00
Dhanji R. Prasanna
7032e75fc6 Add write_envelope tool with verify_envelope for explicit envelope creation
- New crates/g3-core/src/tools/envelope.rs with execute_write_envelope()
  and verify_envelope() (moved from shadow_datalog_verify in plan.rs)
- write_envelope accepts YAML facts, writes envelope.yaml to session dir,
  then runs datalog verification against analysis/rulespec.yaml in shadow mode
- plan_verify() now only checks envelope existence (no longer runs datalog)
- Tool count: 13 -> 14
- Updated system prompt to instruct agents to call write_envelope before
  marking last plan item done
- Updated integration tests to use write_envelope tool directly

Workflow: write_envelope -> verify_envelope -> datalog shadow artifacts
          plan_write(done) -> plan_verify -> checks envelope exists
2026-02-06 16:09:07 +11:00
Dhanji R. Prasanna
f7a240a99b refactor: decouple rulespec from plan_write, read from analysis/rulespec.yaml
- Remove rulespec parameter from plan_write tool definition and execution
- Remove rulespec compilation from plan_approve (no longer pre-compiles)
- Remove write_rulespec, get_rulespec_path, format_rulespec_yaml/markdown
  from invariants.rs; read_rulespec() now takes &Path working dir
- Remove save/load_compiled_rulespec, get_compiled_rulespec_path from datalog.rs
- Update shadow_datalog_verify() to compile on-the-fly from
  analysis/rulespec.yaml, writing rulespec.compiled.dl and
  datalog_evaluation.txt to session dir
- Remove rulespec display from plan_read output
- Remove Invariants/Rulespec section from native.md system prompt
- Remove rulespec from prompts.rs plan_write format and examples
- Update existing tests to remove rulespec from plan_write calls
- Add 3 integration tests for on-the-fly rulespec verification
2026-02-06 15:31:23 +11:00
Dhanji R. Prasanna
a93ce932a3 refactor: Clean up Cargo dependencies - remove unused, update outdated
- Remove unused const_format from g3-planner (never imported)
- Remove unused thiserror from workspace and 5 crates (declared but never used)
- Update termimad 0.31 -> 0.34 in studio (consistency with g3-cli)
- Update indicatif 0.17 -> 0.18 in g3-cli
- Update ratatui 0.29 -> 0.30 in g3-cli
- Update walkdir 2.4 -> 2.5 in g3-core
- Update image 0.24 -> 0.25 in g3-computer-control (macOS + Linux)
- Update config 0.14 -> 0.15 in workspace

Blocked: reqwest 0.11 -> 0.12/0.13 requires breaking API changes to
bytes_stream() used in 4 providers - needs separate migration effort.

All tests pass. No behavior changes.

Agent: fowler
2026-02-06 14:22:59 +11:00
Dhanji R. Prasanna
31bdcb651b feat(cli): add multiline input support with Alt+Enter
- Enable custom-bindings feature in rustyline
- Bind Alt+Enter to insert newlines in interactive and accumulative modes
- Update calculate_visual_lines() to handle embedded newlines correctly
- Add tests for multiline visual line calculation

Note: Shift+Enter is not distinguishable in standard terminals, so Alt+Enter
is used as the multiline input trigger.
2026-02-06 14:09:12 +11:00
Dhanji R. Prasanna
abfac197ab Add datalog-based invariant verification system
Implement a new datalog verification layer using datafrog that:

- Compiles rulespec to datalog on plan_approve
- Extracts facts from action envelope using selectors
- Executes datalog rules on plan_verify
- Writes evaluation results to datalog_evaluation.txt (shadow mode)

Key components:
- crates/g3-core/src/tools/datalog.rs: Full datalog module with:
  - compile_rulespec(): Validates and compiles rulespec
  - extract_facts(): Extracts facts from envelope YAML
  - execute_rules(): Runs datafrog iteration
  - 23 comprehensive tests

- crates/g3-core/src/tools/plan.rs:
  - execute_plan_approve(): Now compiles rulespec on approval
  - shadow_datalog_verify(): Runs datalog and writes to eval file

Results are written to .g3/sessions/<id>/datalog_evaluation.txt
for inspection, NOT injected into context window (shadow mode).
2026-02-06 13:50:54 +11:00
Dhanji R. Prasanna
bcd50190c6 Add explicit [plan mode] indicator to interactive prompt
- Change plan mode prompt from ' >> ' to ' [plan mode] >> ' for clarity
- Add magenta syntax highlighting for [plan mode] text in prompt
- Add tests for prompt highlighting behavior
2026-02-06 11:31:07 +11:00
Dhanji R. Prasanna
f35807b728 refactor: move research tools to loadable toolset
Migrate research and research_status tools from core tools to a
dynamically loadable toolset, following the same pattern as webdriver.

Changes:
- Add 'research' toolset to TOOLSET_REGISTRY in toolsets.rs
- Add create_research_tools() function with research and research_status
- Remove research tools from create_core_tools() in tool_definitions.rs
- Remove exclude_research field and with_research_excluded() from ToolConfig
- Update tests: core tools now 13 (was 15), added 3 research toolset tests

The agent must now call load_toolset('research') to use research tools.
This simplifies the default tool set and removes special-case logic for
the scout agent (which simply won't load the research toolset).
2026-02-06 11:17:32 +11:00
Dhanji R. Prasanna
cbced3390c feat: JIT-injectable toolsets with load_toolset tool
Implement dynamic tool loading system that allows tools to be loaded
on-demand rather than included in the default set.

Key changes:
- Add toolsets module with registry of loadable toolsets
- Add load_toolset tool that returns tool definitions for a named toolset
- Add <available_toolsets> section to system prompt
- Track loaded toolsets in Agent, extend tool definitions dynamically
- Move webdriver (15 tools) to JIT-only loading

Benefits:
- Leaner default context (fewer tokens consumed)
- On-demand loading when agent needs specialized tools
- Extensible registry for future toolsets
- Idempotent loading with helpful error messages

Files:
- crates/g3-core/src/toolsets.rs (new)
- crates/g3-core/src/tools/toolsets.rs (new)
- crates/g3-core/src/tool_definitions.rs
- crates/g3-core/src/tool_dispatch.rs
- crates/g3-core/src/prompts.rs
- crates/g3-core/src/lib.rs
- crates/g3-core/src/tools/executor.rs
2026-02-06 09:35:11 +11:00
Dhanji R. Prasanna
ff15db44c0 Restore research as first-class tool, remove research skill
Restores the research tool that was previously externalized as a skill:

- Add pending_research.rs: PendingResearchManager with thread-safe task tracking
- Add tools/research.rs: execute_research (async), execute_research_status
- Add research/research_status tool definitions with exclude_research config
- Integrate PendingResearchManager into Agent and ToolContext
- Inject completed research results in streaming loop

Remove research skill:
- Clear EMBEDDED_SKILLS array in embedded.rs
- Delete skills/research/ directory
- Update all tests expecting embedded research skill
- Update docs and memory to reflect the change

The research tool now:
- Spawns scout agent in background tokio task
- Returns immediately with research_id
- Automatically injects results into conversation when ready
- Supports status checks via research_status tool
2026-02-06 07:38:06 +11:00
Dhanji R. Prasanna
b673827076 Fix embedded skill loading: stop XML-escaping location paths
The <location> field in the skills XML prompt was being XML-escaped,
converting <embedded:research>/SKILL.md to &lt;embedded:research&gt;/SKILL.md.
When the LLM tried to use read_file with this escaped path, it would fail.

Changes:
- Remove escape_xml() call from location field in prompt.rs
- Add fallback handling for escaped paths in try_read_embedded_skill()
- Add tests for both prompt generation and read_file handling

Fixes embedded skill loading for agents like butler running outside the g3 repo.
2026-02-05 23:16:40 +11:00
Dhanji R. Prasanna
65b2ec368f Add Action Envelope section back to native prompt
Restored the Action Envelope instructions with a clear, complete example
showing how to write envelope.yaml for rulespec verification.
2026-02-05 22:27:29 +11:00
Dhanji R. Prasanna
3823f8b5f3 Optimize native system prompt - 48% size reduction
Removed redundant and vague content from prompts/system/native.md:
- Simplified intro from 17 lines to 3 lines
- Reduced Code Search section to one line
- Removed duplicate Plan Mode example (kept one)
- Removed Action Envelope section (rarely used correctly)
- Removed verbose Memory Format details (tool description covers it)
- Removed Response Guidelines (obvious to modern LLMs)

Size: 8,620 chars -> 4,498 chars

Also updated:
- G3_IDENTITY_LINE constant for agent mode compatibility
- Test assertions to check for new prompt markers
- System prompt validation to use new marker string
2026-02-05 22:16:34 +11:00
Dhanji R. Prasanna
d978032044 Remove redundant AGENTS.md heading from startup output
The loaded status line (✓ AGENTS.md ✓ Memory) already indicates that
AGENTS.md was loaded, so the separate '>> AGENTS.md - Machine Instructions'
heading line was redundant.

- Remove print_project_heading() function from display.rs
- Remove extract_project_heading call from interactive.rs
- Clean up unused imports
2026-02-05 21:38:47 +11:00
Dhanji R. Prasanna
c6df75d886 Fix shell tool output line clipping to account for suffix
The shell tool output line was wrapping because update_tool_output_line
clipped the content without reserving space for the suffix that gets
appended later (line count + timing info).

Added suffix_overhead of 30 chars for shell tools to reserve space for:
- " (9999 lines)" = ~13 chars
- " | 99999 ◉ 999ms" = ~17 chars

This ensures the complete line fits within terminal width without wrapping.
2026-02-05 21:23:00 +11:00
Dhanji R. Prasanna
7e2d9bc22c Enforce rulespec creation with plan_write for new plans
Solves the tautology problem where the LLM would write invariants after
implementation, making them match what was done rather than constrain it.

Changes:
- plan_write now accepts 'rulespec' parameter
- New plans REQUIRE rulespec (fails with helpful error if missing)
- Plan updates don't require rulespec (backward compatible)
- Rulespec is parsed, validated, and written atomically with plan
- Updated system prompt with clear examples for new vs update
- Updated tool definition schema
- Updated all affected tests

New flow: task → plan+rulespec → user reviews BOTH → approve → implement
2026-02-05 21:12:02 +11:00
Dhanji R. Prasanna
085688479b Improve terminal width responsiveness for tool output
Clip summary text and other long fields to fit terminal width:

- Clip display_summary in print_tool_compact (e.g., "47 lines (2.0k chars)")
- Account for header_suffix length when compressing paths in print_tool_output_header
- Clip TODO item lines in print_todo_compact
- Clip plan item descriptions, evidence, touches, checks, and paths in print_plan_compact
- Replace hardcoded 70/40 char limits with dynamic terminal-width-based clipping

All clipping uses clip_line() which handles UTF-8 safely and adds ellipsis.
2026-02-05 20:44:12 +11:00
Dhanji R. Prasanna
19162b1fe6 Exit plan mode when plan is completed or blocked
When a plan reaches a terminal state (all items done or blocked) in
interactive mode, automatically exit plan mode and return to normal
prompt.

Changes:
- Add Agent::is_plan_terminal() method to check if plan is complete
- Add check_and_exit_plan_mode_if_terminal() helper in interactive.rs
- Call the helper after each execute_user_input() to detect completion

Fixes issue where plan mode prompt ' >> ' persisted after plan completion.
2026-02-05 20:31:24 +11:00
Dhanji R. Prasanna
30627bce97 feat(cli): make tool output responsive to terminal width
- Add terminal_width module with get_terminal_width(), clip_line(),
  compress_path(), and compress_command() utilities
- Update ConsoleUiWriter to use dynamic terminal width for all tool output
- Tool output lines are clipped to fit without wrapping
- Tool headers use semantic compression (paths preserve filename,
  commands clip from right)
- 4-character right margin for visual clarity
- Minimum 40 columns, default 80 when terminal size unavailable
- All truncation is UTF-8 safe (char counting, not byte slicing)
- Add 13 unit tests for terminal width utilities
2026-02-05 20:18:30 +11:00
Dhanji R. Prasanna
b2fbcf33d0 Fix plan approval gate and add "Create a plan:" prefix for first message
- Fix build warnings: add #[allow(dead_code)] to unused deserialization fields
- Fix plan approval gate bug: block file changes when no plan exists (not just
  when plan exists but is unapproved)
- Add "Create a plan: " prefix to first user message in plan mode
- Add prepare_plan_mode_input() helper function for testability
- Reset is_first_plan_message flag when entering plan mode via /plan command
- Add tests for approval gate (no plan + no changes, no plan + changes)
- Add tests for prepare_plan_mode_input (happy, negative, boundary cases)
2026-02-05 19:43:38 +11:00
Dhanji R. Prasanna
06d75f613c feat(plan): display rulespec.yaml and envelope.yaml in plan_read/plan_write output
- Add format_envelope_markdown() function in invariants.rs for rich markdown
  formatting of ActionEnvelope facts
- Add format_yaml_value_markdown() helper for recursive YAML value display
- Update execute_plan_read() to append rulespec and envelope sections
- Update execute_plan_write() to append envelope section alongside rulespec
- Add 3 tests for format_envelope_markdown (empty, with facts, null values)

When plan_read or plan_write is called, the output now includes:
- Plan YAML (as before)
- Rulespec section (if rulespec.yaml exists) with invariants grouped by source
- Envelope section (if envelope.yaml exists) with facts in readable format

Missing files show placeholder text rather than errors.
2026-02-05 19:08:55 +11:00
Dhanji R. Prasanna
bc5c1bdf61 Fix plan UI formatting to handle Vec<Check> and display elegantly
- Update ChecksCompact to use Vec<CheckCompact> for negative/boundary fields
- Add progress bar visualization showing done/doing/blocked/todo counts
- Show evidence for done items, checks for active items
- Display all negative and boundary checks (not just first)
- Add proper tree structure with └/├ prefixes
- Truncate long descriptions and evidence paths
- Add file path display with 📄 icon
2026-02-05 14:38:18 +11:00
Dhanji R. Prasanna
e34f37fd47 Merge sessions/sdlc/3b6c6c3e into main
Resolved conflicts:
- analysis/memory.md: kept condensed documentation from incoming branch
- crates/g3-core/src/skills/embedded.rs: removed unused HashMap import, kept better doc comment

Additional fix:
- crates/g3-core/src/prompts.rs: updated test to match current prompt file content
2026-02-05 14:38:08 +11:00
Dhanji R. Prasanna
307f04fa25 chore: Compress workspace memory after research externalization
- Remove deleted code: pending_research.rs, tools/research.rs (externalized to skill)
- Merge duplicate Agent Skills entries into unified section
- Update SDLC state path: analysis/sdlc/ → .g3/sdlc/
- Remove G3Status.resuming() (deleted in 6228001)
- Tighten verbose descriptions throughout

Metrics: 444 → 325 lines (-27%), 23.6k → 17.0k chars (-28%)
Concepts preserved: all semantic information retained

Agent: huffman
2026-02-05 14:29:48 +11:00
Dhanji R. Prasanna
74c2671e1b docs: Update documentation for Agent Skills system
Document the new Skills system introduced in recent commits:

- docs/architecture.md: Add Skills System section with discovery
  priority, embedded skills, script extraction, and key types
- docs/skills.md: New comprehensive guide covering SKILL.md format,
  discovery priority, embedded skills, research skill usage, and
  troubleshooting
- README.md: Update Agent Skills section with correct priority order,
  add embedded skills info, research skill usage, and link to Skills
  Guide in Documentation Map
- AGENTS.md: Add skill creation to Adding Features, skill extraction
  to Dangerous Code Paths, and new Skills System Entry Points section

All documentation links validated - no broken links or orphan files.

Agent: lamport
2026-02-05 14:26:26 +11:00
Dhanji R. Prasanna
cff32bf0ba Make research skill self-contained without external scripts
- Rewrite SKILL.md with inline instructions to spawn g3 --agent scout directly
- Extend read_file to handle embedded skill paths (<embedded:name>/SKILL.md)
- Remove scripts field from EmbeddedSkill struct (no longer needed)
- Delete extraction.rs module (was only for script extraction)
- Delete g3-research bash script
- Remove obsolete Async Research Tool section from workspace memory

Skills are now fully portable - they work when g3 is installed as a
binary without access to source files. Agents can read embedded skill
content via read_file with the special <embedded:...> path syntax.
2026-02-05 14:22:17 +11:00
Dhanji R. Prasanna
c3549ce043 refactor: Remove unused functions from skills module
- Remove is_embedded_skill() from discovery.rs (unused)
- Remove get_embedded_skills_map() from embedded.rs (unused)
- Remove associated tests for deleted functions
- Inline path check in test_repo_overrides_embedded test

This eliminates dead code warnings and reduces module surface area
without changing any behavior.

Agent: fowler
2026-02-05 14:17:56 +11:00
Dhanji R. Prasanna
38da6a56ef analysis: Update dependency graph for commits b6d2582..9443f933
Focused analysis on past 10 commits covering:
- New skills module in g3-core (parser, discovery, prompt, embedded, extraction)
- Research tool externalized to skills/research/ skill
- SkillsConfig added to g3-config
- SDLC pipeline state moved to .g3/sdlc/

Key findings:
- 4 crates changed, 29 files affected (8 added, 2 deleted, 19 modified)
- No dependency cycles detected
- Clean DAG structure in new skills module
- Cross-crate coupling via g3-core::skills and g3-config::SkillsConfig
- Compile-time coupling to skills/research/ via include_str!

Agent: euler
2026-02-05 14:02:44 +11:00
Dhanji R. Prasanna
788debb93a remove cruft from system prompt 2026-02-05 14:01:26 +11:00
Dhanji R. Prasanna
68fd7b96c1 Remove accidental Emacs lock file 2026-02-05 14:01:03 +11:00
Dhanji R. Prasanna
6cb70f26fa Fix empty Language-Specific Guidance header in system prompt
When a Rust-only workspace was detected, the Language-Specific Guidance
header was appearing with no content because Rust has an empty prompt
string (agent-specific prompts handle Rust instead).

The fix filters out empty prompt strings in get_language_prompts_for_workspace()
so the header only appears when there's actual guidance content.

Added test to verify Rust-only workspaces return None.
2026-02-05 14:00:52 +11:00
Dhanji R. Prasanna
9443f9333b refactor: Remove hardcoded Web Research section from system prompt
- Web Research instructions now come from skills/research/SKILL.md
- Skills are dynamically loaded and injected via generate_skills_prompt()
- Remove test_both_prompts_have_web_research test (no longer applicable)
- Remove unused G3Status::research_complete() function

This completes the externalization of research as a skill.
2026-02-05 13:41:53 +11:00
Dhanji R. Prasanna
0b308853a0 fix: Improve research skill with ANSI stripping and fallback extraction
- Add strip_ansi() function using perl for comprehensive escape sequence removal
- Add fallback extraction when scout doesn't output markers
- Strip g3 UI elements (session banner, tool output chrome, auto-memory messages)
- Reports are now clean plaintext without terminal formatting
2026-02-05 13:35:32 +11:00
Dhanji R. Prasanna
39e586982c feat: Externalize research tool as embedded skill
Replaces the built-in research/research_status tools with a portable
skill-based approach:

- Add embedded skills infrastructure (skills compiled into binary)
- Add repo-local skills/ directory support (highest priority)
- Create research skill with SKILL.md and g3-research shell script
- Script extraction to .g3/bin/ with version tracking
- Filesystem-based handoff via .g3/research/<id>/status.json
- Remove PendingResearchManager and all research tool code
- Update system prompt to reference skill instead of tool

Benefits:
- No special tool infrastructure needed (just shell + read_file)
- Context-efficient (reports stay on disk until needed)
- Crash-resilient (state persisted to filesystem)
- Portable (skill can be overridden per-workspace)

Breaking change: research tool calls now return a deprecation message
pointing to the research skill.
2026-02-05 13:23:26 +11:00
Dhanji R. Prasanna
bf9e3dc878 Merge sessions/interactive/213d9910 2026-02-05 13:05:57 +11:00
Dhanji R. Prasanna
89c071baf6 fix: honor --resume flag when used with --agent --chat
The --resume flag was being ignored when --agent and --chat flags were
used together. The if-else chain checked for chat mode first and
immediately returned None, skipping the --resume check entirely.

Reordered the logic to check flags.resume first, ensuring explicit
--resume is always honored regardless of other flags.

Fixes: --resume not working with --agent --chat
2026-02-05 13:05:48 +11:00
Dhanji R. Prasanna
bc2860dd3a studio sdlc: merge worktree on completion, move state to .g3/
- Add merge step before worktree cleanup when pipeline completes
- On success with commits: merge to main, then cleanup
- On failure: preserve worktree for debugging, print path
- On merge conflict: preserve worktree, print resolution instructions
- Move pipeline.json from analysis/sdlc/ to .g3/sdlc/ (gitignored)
2026-02-05 13:03:54 +11:00
Dhanji R. Prasanna
0e64f13a8a Merge feature/agent-skills-support: Agent Skills specification support 2026-02-05 12:46:53 +11:00
Dhanji R. Prasanna
6228001bfc Remove automatic session resume suggestion on startup
- Remove the interactive prompt that asked users to resume in-progress sessions
- Remove unused new_session parameter from run_interactive()
- Remove unused info_inline() function from G3Status
- Explicit --resume <session_id> flag still works
2026-02-05 12:40:27 +11:00
Dhanji R. Prasanna
8bbaf6f02e Tighten system prompt and tool definitions
Prompt changes (native.md):
- Remove duplicate 'Temporary files' section
- Consolidate 'remember' instructions into single authoritative location
- Remove motivational 'Benefits' list from Plan Mode
- Add 'Code Search Tool Selection' guidance (code_search vs rg)

Tool changes (tool_definitions.rs, tool_dispatch.rs):
- Remove screenshot tool (webdriver_screenshot remains)
- Remove coverage tool
- Reduce plan_write description from 22 lines to 1 line
- Update tool count tests (16 -> 14 core tools)

Net result: ~6 lines removed from prompt, ~56 lines removed from
tool definitions, clearer tool selection guidance added.
2026-02-05 12:36:49 +11:00
Dhanji R. Prasanna
b6d25824f3 Tighten system prompt 2026-02-05 12:01:01 +11:00
Dhanji R. Prasanna
25ad198b83 Sync agent plan mode state on CLI startup
CLI starts in plan mode by default (when not in agent mode), but was not
calling agent.set_plan_mode(true) at initialization. This meant the gate
check would not run until the user explicitly entered plan mode via /plan.
2026-02-05 11:47:38 +11:00
Dhanji R. Prasanna
b86901a86b Merge sessions/interactive/47299e3b 2026-02-05 11:47:24 +11:00
Dhanji R. Prasanna
3d3f68e6da Externalize native system prompt to markdown file
- Move system prompt for native tool calling models to prompts/system/native.md
- Use include_str! to embed at compile time
- Remove concatenated SHARED_* string constants
- Prompt is now readable/editable as a complete markdown document
- Non-native prompt still uses Rust constants (acceptable for now)
2026-02-05 11:46:49 +11:00
Dhanji R. Prasanna
0f919237ea Make plan approval gate only active in plan mode
- Add in_plan_mode flag to Agent struct
- Add set_plan_mode() and is_plan_mode() methods
- Gate check now only runs when in_plan_mode is true
- CLI calls set_plan_mode(true) on /plan command and EnterPlanMode
- CLI calls set_plan_mode(false) on approval and CTRL-D exit
- Update integration test to enable plan mode
- Fix test YAML to use Vec<Check> for negative/boundary checks
2026-02-05 11:41:52 +11:00
Dhanji R. Prasanna
3d284b8b60 Merge sessions/interactive/179ac8a6 2026-02-05 11:37:07 +11:00
Dhanji R. Prasanna
1f1a517620 feat(plan): support multiple negative and boundary checks
Change Plan Mode to allow multiple negative and boundary checks per item,
while keeping happy path as a single check.

Schema change:
- checks.negative: Check -> Vec<Check> (>=1 required)
- checks.boundary: Check -> Vec<Check> (>=1 required)
- checks.happy: Check (unchanged, single)

This better reflects real-world tasks where there are often multiple
error conditions and edge cases worth tracking.

Changes:
- Update Checks struct to use Vec<Check> for negative/boundary
- Update validation to require at least 1 of each
- Update prompts and tool definitions with new array syntax
- Add 4 new tests for multi-check scenarios
2026-02-05 11:36:45 +11:00