alex/g3 - g3 - Millerson GIT hosting

alex/g3

Author	SHA1	Message	Date
Dhanji R. Prasanna	d61be719c2	fix: strip orphaned tool_calls from preserved assistant message during compaction After context compaction, the preserved last assistant message retained its structured tool_calls field, but the corresponding tool_result was summarized away. This created orphaned tool_use blocks that violated the Anthropic API constraint: 'Each tool_use block must have a corresponding tool_result block in the next message', causing 400 errors. Primary fix: clear tool_calls from the preserved assistant message in extract_preserved_messages(). The tool call was already executed and its result is captured in the summary. Defense-in-depth: added strip_orphaned_tool_use() post-processing in Anthropic convert_messages() to detect and strip any orphaned tool_use blocks before they reach the API. Added 7 tests: 3 unit tests for compaction stripping, 3 unit tests for Anthropic orphan detection, 1 integration test reproducing the exact bug scenario from the h3 session.	2026-02-11 15:22:03 +11:00
Dhanji R. Prasanna	d3f0112f46	fix: store tool calls structurally for proper API roundtripping The agent would stop mid-task because native tool calls were stored as inline JSON text in Message.content. When sent back to the Anthropic API via convert_messages(), they went as plain text instead of structured tool_use/tool_result blocks. The model would occasionally get confused and emit text describing what it wanted to do instead of invoking the tool mechanism. Changes: - Add MessageToolCall struct and tool_calls/tool_result_id fields to Message - Add id field to core ToolCall struct to preserve provider tool call IDs - Update Anthropic convert_messages() to emit tool_use and tool_result blocks - Add ToolResult variant to AnthropicContent enum - Store tool calls structurally in tool message construction (not inline JSON) - Fix add_message() to preserve empty-content messages with tool_calls - Fix check_duplicate_in_previous_message() to check structured tool_calls - Generate valid IDs for JSON fallback tool calls (Anthropic pattern requirement) - Update planner create_tool_message() to use structured tool calls	2026-02-11 08:48:07 +11:00
Dhanji R. Prasanna	2a4cd1f4d6	fix: strip duplicate tool call JSON from assistant messages when LLM stutters When the LLM emits identical JSON tool calls as text content (JSON fallback mode), the raw duplicate JSON was being stored in the assistant message in conversation history. This confused the model on subsequent turns, causing it to stall or repeat itself. Root cause: raw_content_for_log used get_text_content() which returns the full parser buffer including all duplicate tool call JSONs. Fix: Added get_text_before_tool_calls() to StreamingToolParser that returns only the text before the first JSON tool call. Changed raw_content_for_log to use this method so the assistant message only contains the preamble text + the single executed tool call. Added 5 integration tests covering stuttered duplicates, triple stutter, cross-turn dedup, and different-args boundary case. Added MockResponse helpers for simulating LLM stutter patterns.	2026-02-10 19:53:11 +11:00
Dhanji R. Prasanna	f9625f1a2d	Add envelope verification token: keyed SipHash-2-4 MAC stamps envelope.yaml - Key management: 32-byte random key at ~/.g3/verification.key (chmod 600) - Token format: g3v1:<base64(SipHash-2-4 of canonical_facts + NUL + canonical_rulespec)> - stamp_envelope() called only when all rulespec predicates pass - verify_token() for cross-process validation - ActionEnvelope.verified field (Option<String>, skip_serializing_if none) - Token never shown to LLM, only written to envelope.yaml - Zero new dependencies (uses std SipHasher, existing rand/base64) - 12 unit tests covering determinism, tamper detection, backward compat	2026-02-07 17:09:37 +11:00
Dhanji R. Prasanna	edbae60ff3	Add rulespec extensions: new predicate rules, when conditions, null handling, solon agent Features: - New predicate rules: NotContains, AnyOf, NoneOf - Conditional predicates via when clauses (WhenCondition/CompiledWhenCondition) - Null handling: YAML null treated as absent for exists/not_exists - Solon agent for rulespec authoring (agents/solon.md) - Rulespec schema documentation (prompts/schemas/rulespec.schema.md) Bugfix: - Fixed when condition evaluation in datalog path: catch-all branch did naive string contains instead of delegating to evaluate_predicate_datalog(). Rules like matches (regex) were silently ignored, causing vacuous pass and letting violations through. Now delegates to evaluate_predicate_datalog() which handles all 12 rule types correctly. Tests: 34 new tests covering all new rules, null handling, when conditions, and the when+matches bugfix (butler rulespec pattern).	2026-02-07 16:38:27 +11:00
Dhanji R. Prasanna	328eecfcad	fix: extract_facts fallback for facts-prefixed selectors in datalog verification Root cause: ActionEnvelope.to_yaml_value() creates a Mapping from the facts HashMap without a 'facts:' wrapper key, but rulespec selectors may include a 'facts.' prefix (e.g. 'facts.feature.done' instead of 'feature.done'). This caused zero facts to be extracted, making all predicate evaluations fail. Fix: extract_facts() now tries the selector against the unwrapped envelope value first, and if empty, retries against a facts-wrapped version as fallback. Also: - Strengthened write_envelope tool description to require top-level facts: key, file paths for evidence, and allow free-form notes - Updated system prompt with matching rules - Added 6 new tests (4 unit, 2 integration) - Strengthened existing integration test to verify fact count > 0	2026-02-07 14:42:39 +11:00
Dhanji R. Prasanna	b045d0c5e9	fix: reject write_envelope with empty facts The write_envelope tool was silently accepting YAML without a 'facts:' top-level key. serde would ignore unknown fields and default the facts HashMap to empty, causing the predicate pipeline to always see no facts. Now validates that envelope.facts is non-empty after deserialization, returning a clear error with an example of the correct format. Adds 6 tests covering valid/invalid/boundary deserialization cases.	2026-02-07 13:24:41 +11:00
Dhanji R. Prasanna	51dfe71a2b	fix: generate actual Soufflé datalog in .dl files instead of YAML The rulespec compiler was writing serde_yaml::to_string(&compiled) into rulespec.compiled.dl files — just YAML, not datalog at all. Added format_datalog_program() that produces proper Soufflé-style datalog: - .decl relation declarations (claim_value, claim_length, predicate_pass, predicate_fail) - Fact assertions from the envelope - Rules for all 9 predicate types (exists, not_exists, equals, contains, greater_than, less_than, min_length, max_length, matches) - .output directives for query results Updated verify_envelope() to call the new function instead of serde_yaml::to_string(). Added 8 unit tests covering all rule types, edge cases, and the butler rulespec example.	2026-02-07 12:33:50 +11:00
Dhanji R. Prasanna	5085f10717	Merge sessions/interactive/07eabd99	2026-02-07 12:29:56 +11:00
Dhanji R. Prasanna	14112ff92e	Remove client-side plan approval interception Let approval input flow through the LLM instead of being short-circuited in the REPL. The LLM calls plan_approve itself, which is cleaner (single input path) and more flexible (no hardcoded misspelling list).	2026-02-06 20:16:11 +11:00
Dhanji R. Prasanna	799b4ced8e	Remove auto-submit status prompt from /project command The /project command was auto-invoking a status report ("what is the current state of the project?") as the first user message after loading project files. This was inconsistent with the --project flag behavior, which only loads files and displays status without auto-prompting. Removed the auto-submit lines so /project now behaves identically to the --project CLI flag: load files, set context, display status, done.	2026-02-06 16:12:33 +11:00
Dhanji R. Prasanna	7032e75fc6	Add write_envelope tool with verify_envelope for explicit envelope creation - New crates/g3-core/src/tools/envelope.rs with execute_write_envelope() and verify_envelope() (moved from shadow_datalog_verify in plan.rs) - write_envelope accepts YAML facts, writes envelope.yaml to session dir, then runs datalog verification against analysis/rulespec.yaml in shadow mode - plan_verify() now only checks envelope existence (no longer runs datalog) - Tool count: 13 -> 14 - Updated system prompt to instruct agents to call write_envelope before marking last plan item done - Updated integration tests to use write_envelope tool directly Workflow: write_envelope -> verify_envelope -> datalog shadow artifacts plan_write(done) -> plan_verify -> checks envelope exists	2026-02-06 16:09:07 +11:00
Dhanji R. Prasanna	f7a240a99b	refactor: decouple rulespec from plan_write, read from analysis/rulespec.yaml - Remove rulespec parameter from plan_write tool definition and execution - Remove rulespec compilation from plan_approve (no longer pre-compiles) - Remove write_rulespec, get_rulespec_path, format_rulespec_yaml/markdown from invariants.rs; read_rulespec() now takes &Path working dir - Remove save/load_compiled_rulespec, get_compiled_rulespec_path from datalog.rs - Update shadow_datalog_verify() to compile on-the-fly from analysis/rulespec.yaml, writing rulespec.compiled.dl and datalog_evaluation.txt to session dir - Remove rulespec display from plan_read output - Remove Invariants/Rulespec section from native.md system prompt - Remove rulespec from prompts.rs plan_write format and examples - Update existing tests to remove rulespec from plan_write calls - Add 3 integration tests for on-the-fly rulespec verification	2026-02-06 15:31:23 +11:00
Dhanji R. Prasanna	a93ce932a3	refactor: Clean up Cargo dependencies - remove unused, update outdated - Remove unused const_format from g3-planner (never imported) - Remove unused thiserror from workspace and 5 crates (declared but never used) - Update termimad 0.31 -> 0.34 in studio (consistency with g3-cli) - Update indicatif 0.17 -> 0.18 in g3-cli - Update ratatui 0.29 -> 0.30 in g3-cli - Update walkdir 2.4 -> 2.5 in g3-core - Update image 0.24 -> 0.25 in g3-computer-control (macOS + Linux) - Update config 0.14 -> 0.15 in workspace Blocked: reqwest 0.11 -> 0.12/0.13 requires breaking API changes to bytes_stream() used in 4 providers - needs separate migration effort. All tests pass. No behavior changes. Agent: fowler	2026-02-06 14:22:59 +11:00
Dhanji R. Prasanna	31bdcb651b	feat(cli): add multiline input support with Alt+Enter - Enable custom-bindings feature in rustyline - Bind Alt+Enter to insert newlines in interactive and accumulative modes - Update calculate_visual_lines() to handle embedded newlines correctly - Add tests for multiline visual line calculation Note: Shift+Enter is not distinguishable in standard terminals, so Alt+Enter is used as the multiline input trigger.	2026-02-06 14:09:12 +11:00
Dhanji R. Prasanna	abfac197ab	Add datalog-based invariant verification system Implement a new datalog verification layer using datafrog that: - Compiles rulespec to datalog on plan_approve - Extracts facts from action envelope using selectors - Executes datalog rules on plan_verify - Writes evaluation results to datalog_evaluation.txt (shadow mode) Key components: - crates/g3-core/src/tools/datalog.rs: Full datalog module with: - compile_rulespec(): Validates and compiles rulespec - extract_facts(): Extracts facts from envelope YAML - execute_rules(): Runs datafrog iteration - 23 comprehensive tests - crates/g3-core/src/tools/plan.rs: - execute_plan_approve(): Now compiles rulespec on approval - shadow_datalog_verify(): Runs datalog and writes to eval file Results are written to .g3/sessions/<id>/datalog_evaluation.txt for inspection, NOT injected into context window (shadow mode).	2026-02-06 13:50:54 +11:00
Dhanji R. Prasanna	bcd50190c6	Add explicit [plan mode] indicator to interactive prompt - Change plan mode prompt from ' >> ' to ' [plan mode] >> ' for clarity - Add magenta syntax highlighting for [plan mode] text in prompt - Add tests for prompt highlighting behavior	2026-02-06 11:31:07 +11:00
Dhanji R. Prasanna	f35807b728	refactor: move research tools to loadable toolset Migrate research and research_status tools from core tools to a dynamically loadable toolset, following the same pattern as webdriver. Changes: - Add 'research' toolset to TOOLSET_REGISTRY in toolsets.rs - Add create_research_tools() function with research and research_status - Remove research tools from create_core_tools() in tool_definitions.rs - Remove exclude_research field and with_research_excluded() from ToolConfig - Update tests: core tools now 13 (was 15), added 3 research toolset tests The agent must now call load_toolset('research') to use research tools. This simplifies the default tool set and removes special-case logic for the scout agent (which simply won't load the research toolset).	2026-02-06 11:17:32 +11:00
Dhanji R. Prasanna	cbced3390c	feat: JIT-injectable toolsets with load_toolset tool Implement dynamic tool loading system that allows tools to be loaded on-demand rather than included in the default set. Key changes: - Add toolsets module with registry of loadable toolsets - Add load_toolset tool that returns tool definitions for a named toolset - Add <available_toolsets> section to system prompt - Track loaded toolsets in Agent, extend tool definitions dynamically - Move webdriver (15 tools) to JIT-only loading Benefits: - Leaner default context (fewer tokens consumed) - On-demand loading when agent needs specialized tools - Extensible registry for future toolsets - Idempotent loading with helpful error messages Files: - crates/g3-core/src/toolsets.rs (new) - crates/g3-core/src/tools/toolsets.rs (new) - crates/g3-core/src/tool_definitions.rs - crates/g3-core/src/tool_dispatch.rs - crates/g3-core/src/prompts.rs - crates/g3-core/src/lib.rs - crates/g3-core/src/tools/executor.rs	2026-02-06 09:35:11 +11:00
Dhanji R. Prasanna	ff15db44c0	Restore research as first-class tool, remove research skill Restores the research tool that was previously externalized as a skill: - Add pending_research.rs: PendingResearchManager with thread-safe task tracking - Add tools/research.rs: execute_research (async), execute_research_status - Add research/research_status tool definitions with exclude_research config - Integrate PendingResearchManager into Agent and ToolContext - Inject completed research results in streaming loop Remove research skill: - Clear EMBEDDED_SKILLS array in embedded.rs - Delete skills/research/ directory - Update all tests expecting embedded research skill - Update docs and memory to reflect the change The research tool now: - Spawns scout agent in background tokio task - Returns immediately with research_id - Automatically injects results into conversation when ready - Supports status checks via research_status tool	2026-02-06 07:38:06 +11:00
Dhanji R. Prasanna	b673827076	Fix embedded skill loading: stop XML-escaping location paths The <location> field in the skills XML prompt was being XML-escaped, converting <embedded:research>/SKILL.md to <embedded:research>/SKILL.md. When the LLM tried to use read_file with this escaped path, it would fail. Changes: - Remove escape_xml() call from location field in prompt.rs - Add fallback handling for escaped paths in try_read_embedded_skill() - Add tests for both prompt generation and read_file handling Fixes embedded skill loading for agents like butler running outside the g3 repo.	2026-02-05 23:16:40 +11:00
Dhanji R. Prasanna	3823f8b5f3	Optimize native system prompt - 48% size reduction Removed redundant and vague content from prompts/system/native.md: - Simplified intro from 17 lines to 3 lines - Reduced Code Search section to one line - Removed duplicate Plan Mode example (kept one) - Removed Action Envelope section (rarely used correctly) - Removed verbose Memory Format details (tool description covers it) - Removed Response Guidelines (obvious to modern LLMs) Size: 8,620 chars -> 4,498 chars Also updated: - G3_IDENTITY_LINE constant for agent mode compatibility - Test assertions to check for new prompt markers - System prompt validation to use new marker string	2026-02-05 22:16:34 +11:00
Dhanji R. Prasanna	d978032044	Remove redundant AGENTS.md heading from startup output The loaded status line (✓ AGENTS.md ✓ Memory) already indicates that AGENTS.md was loaded, so the separate '>> AGENTS.md - Machine Instructions' heading line was redundant. - Remove print_project_heading() function from display.rs - Remove extract_project_heading call from interactive.rs - Clean up unused imports	2026-02-05 21:38:47 +11:00
Dhanji R. Prasanna	c6df75d886	Fix shell tool output line clipping to account for suffix The shell tool output line was wrapping because update_tool_output_line clipped the content without reserving space for the suffix that gets appended later (line count + timing info). Added suffix_overhead of 30 chars for shell tools to reserve space for: - " (9999 lines)" = ~13 chars - " \| 99999 ◉ 999ms" = ~17 chars This ensures the complete line fits within terminal width without wrapping.	2026-02-05 21:23:00 +11:00
Dhanji R. Prasanna	7e2d9bc22c	Enforce rulespec creation with plan_write for new plans Solves the tautology problem where the LLM would write invariants after implementation, making them match what was done rather than constrain it. Changes: - plan_write now accepts 'rulespec' parameter - New plans REQUIRE rulespec (fails with helpful error if missing) - Plan updates don't require rulespec (backward compatible) - Rulespec is parsed, validated, and written atomically with plan - Updated system prompt with clear examples for new vs update - Updated tool definition schema - Updated all affected tests New flow: task → plan+rulespec → user reviews BOTH → approve → implement	2026-02-05 21:12:02 +11:00
Dhanji R. Prasanna	085688479b	Improve terminal width responsiveness for tool output Clip summary text and other long fields to fit terminal width: - Clip display_summary in print_tool_compact (e.g., "47 lines (2.0k chars)") - Account for header_suffix length when compressing paths in print_tool_output_header - Clip TODO item lines in print_todo_compact - Clip plan item descriptions, evidence, touches, checks, and paths in print_plan_compact - Replace hardcoded 70/40 char limits with dynamic terminal-width-based clipping All clipping uses clip_line() which handles UTF-8 safely and adds ellipsis.	2026-02-05 20:44:12 +11:00
Dhanji R. Prasanna	19162b1fe6	Exit plan mode when plan is completed or blocked When a plan reaches a terminal state (all items done or blocked) in interactive mode, automatically exit plan mode and return to normal prompt. Changes: - Add Agent::is_plan_terminal() method to check if plan is complete - Add check_and_exit_plan_mode_if_terminal() helper in interactive.rs - Call the helper after each execute_user_input() to detect completion Fixes issue where plan mode prompt ' >> ' persisted after plan completion.	2026-02-05 20:31:24 +11:00
Dhanji R. Prasanna	30627bce97	feat(cli): make tool output responsive to terminal width - Add terminal_width module with get_terminal_width(), clip_line(), compress_path(), and compress_command() utilities - Update ConsoleUiWriter to use dynamic terminal width for all tool output - Tool output lines are clipped to fit without wrapping - Tool headers use semantic compression (paths preserve filename, commands clip from right) - 4-character right margin for visual clarity - Minimum 40 columns, default 80 when terminal size unavailable - All truncation is UTF-8 safe (char counting, not byte slicing) - Add 13 unit tests for terminal width utilities	2026-02-05 20:18:30 +11:00
Dhanji R. Prasanna	b2fbcf33d0	Fix plan approval gate and add "Create a plan:" prefix for first message - Fix build warnings: add #[allow(dead_code)] to unused deserialization fields - Fix plan approval gate bug: block file changes when no plan exists (not just when plan exists but is unapproved) - Add "Create a plan: " prefix to first user message in plan mode - Add prepare_plan_mode_input() helper function for testability - Reset is_first_plan_message flag when entering plan mode via /plan command - Add tests for approval gate (no plan + no changes, no plan + changes) - Add tests for prepare_plan_mode_input (happy, negative, boundary cases)	2026-02-05 19:43:38 +11:00
Dhanji R. Prasanna	06d75f613c	feat(plan): display rulespec.yaml and envelope.yaml in plan_read/plan_write output - Add format_envelope_markdown() function in invariants.rs for rich markdown formatting of ActionEnvelope facts - Add format_yaml_value_markdown() helper for recursive YAML value display - Update execute_plan_read() to append rulespec and envelope sections - Update execute_plan_write() to append envelope section alongside rulespec - Add 3 tests for format_envelope_markdown (empty, with facts, null values) When plan_read or plan_write is called, the output now includes: - Plan YAML (as before) - Rulespec section (if rulespec.yaml exists) with invariants grouped by source - Envelope section (if envelope.yaml exists) with facts in readable format Missing files show placeholder text rather than errors.	2026-02-05 19:08:55 +11:00
Dhanji R. Prasanna	bc5c1bdf61	Fix plan UI formatting to handle Vec<Check> and display elegantly - Update ChecksCompact to use Vec<CheckCompact> for negative/boundary fields - Add progress bar visualization showing done/doing/blocked/todo counts - Show evidence for done items, checks for active items - Display all negative and boundary checks (not just first) - Add proper tree structure with └/├ prefixes - Truncate long descriptions and evidence paths - Add file path display with 📄 icon	2026-02-05 14:38:18 +11:00
Dhanji R. Prasanna	e34f37fd47	Merge sessions/sdlc/3b6c6c3e into main Resolved conflicts: - analysis/memory.md: kept condensed documentation from incoming branch - crates/g3-core/src/skills/embedded.rs: removed unused HashMap import, kept better doc comment Additional fix: - crates/g3-core/src/prompts.rs: updated test to match current prompt file content	2026-02-05 14:38:08 +11:00
Dhanji R. Prasanna	cff32bf0ba	Make research skill self-contained without external scripts - Rewrite SKILL.md with inline instructions to spawn g3 --agent scout directly - Extend read_file to handle embedded skill paths (<embedded:name>/SKILL.md) - Remove scripts field from EmbeddedSkill struct (no longer needed) - Delete extraction.rs module (was only for script extraction) - Delete g3-research bash script - Remove obsolete Async Research Tool section from workspace memory Skills are now fully portable - they work when g3 is installed as a binary without access to source files. Agents can read embedded skill content via read_file with the special <embedded:...> path syntax.	2026-02-05 14:22:17 +11:00
Dhanji R. Prasanna	c3549ce043	refactor: Remove unused functions from skills module - Remove is_embedded_skill() from discovery.rs (unused) - Remove get_embedded_skills_map() from embedded.rs (unused) - Remove associated tests for deleted functions - Inline path check in test_repo_overrides_embedded test This eliminates dead code warnings and reduces module surface area without changing any behavior. Agent: fowler	2026-02-05 14:17:56 +11:00
Dhanji R. Prasanna	6cb70f26fa	Fix empty Language-Specific Guidance header in system prompt When a Rust-only workspace was detected, the Language-Specific Guidance header was appearing with no content because Rust has an empty prompt string (agent-specific prompts handle Rust instead). The fix filters out empty prompt strings in get_language_prompts_for_workspace() so the header only appears when there's actual guidance content. Added test to verify Rust-only workspaces return None.	2026-02-05 14:00:52 +11:00
Dhanji R. Prasanna	9443f9333b	refactor: Remove hardcoded Web Research section from system prompt - Web Research instructions now come from skills/research/SKILL.md - Skills are dynamically loaded and injected via generate_skills_prompt() - Remove test_both_prompts_have_web_research test (no longer applicable) - Remove unused G3Status::research_complete() function This completes the externalization of research as a skill.	2026-02-05 13:41:53 +11:00
Dhanji R. Prasanna	39e586982c	feat: Externalize research tool as embedded skill Replaces the built-in research/research_status tools with a portable skill-based approach: - Add embedded skills infrastructure (skills compiled into binary) - Add repo-local skills/ directory support (highest priority) - Create research skill with SKILL.md and g3-research shell script - Script extraction to .g3/bin/ with version tracking - Filesystem-based handoff via .g3/research/<id>/status.json - Remove PendingResearchManager and all research tool code - Update system prompt to reference skill instead of tool Benefits: - No special tool infrastructure needed (just shell + read_file) - Context-efficient (reports stay on disk until needed) - Crash-resilient (state persisted to filesystem) - Portable (skill can be overridden per-workspace) Breaking change: research tool calls now return a deprecation message pointing to the research skill.	2026-02-05 13:23:26 +11:00
Dhanji R. Prasanna	bf9e3dc878	Merge sessions/interactive/213d9910	2026-02-05 13:05:57 +11:00
Dhanji R. Prasanna	89c071baf6	fix: honor --resume flag when used with --agent --chat The --resume flag was being ignored when --agent and --chat flags were used together. The if-else chain checked for chat mode first and immediately returned None, skipping the --resume check entirely. Reordered the logic to check flags.resume first, ensuring explicit --resume is always honored regardless of other flags. Fixes: --resume not working with --agent --chat	2026-02-05 13:05:48 +11:00
Dhanji R. Prasanna	bc2860dd3a	studio sdlc: merge worktree on completion, move state to .g3/ - Add merge step before worktree cleanup when pipeline completes - On success with commits: merge to main, then cleanup - On failure: preserve worktree for debugging, print path - On merge conflict: preserve worktree, print resolution instructions - Move pipeline.json from analysis/sdlc/ to .g3/sdlc/ (gitignored)	2026-02-05 13:03:54 +11:00
Dhanji R. Prasanna	0e64f13a8a	Merge feature/agent-skills-support: Agent Skills specification support	2026-02-05 12:46:53 +11:00
Dhanji R. Prasanna	6228001bfc	Remove automatic session resume suggestion on startup - Remove the interactive prompt that asked users to resume in-progress sessions - Remove unused new_session parameter from run_interactive() - Remove unused info_inline() function from G3Status - Explicit --resume <session_id> flag still works	2026-02-05 12:40:27 +11:00
Dhanji R. Prasanna	8bbaf6f02e	Tighten system prompt and tool definitions Prompt changes (native.md): - Remove duplicate 'Temporary files' section - Consolidate 'remember' instructions into single authoritative location - Remove motivational 'Benefits' list from Plan Mode - Add 'Code Search Tool Selection' guidance (code_search vs rg) Tool changes (tool_definitions.rs, tool_dispatch.rs): - Remove screenshot tool (webdriver_screenshot remains) - Remove coverage tool - Reduce plan_write description from 22 lines to 1 line - Update tool count tests (16 -> 14 core tools) Net result: ~6 lines removed from prompt, ~56 lines removed from tool definitions, clearer tool selection guidance added.	2026-02-05 12:36:49 +11:00
Dhanji R. Prasanna	25ad198b83	Sync agent plan mode state on CLI startup CLI starts in plan mode by default (when not in agent mode), but was not calling agent.set_plan_mode(true) at initialization. This meant the gate check would not run until the user explicitly entered plan mode via /plan.	2026-02-05 11:47:38 +11:00
Dhanji R. Prasanna	b86901a86b	Merge sessions/interactive/47299e3b	2026-02-05 11:47:24 +11:00
Dhanji R. Prasanna	3d3f68e6da	Externalize native system prompt to markdown file - Move system prompt for native tool calling models to prompts/system/native.md - Use include_str! to embed at compile time - Remove concatenated SHARED_* string constants - Prompt is now readable/editable as a complete markdown document - Non-native prompt still uses Rust constants (acceptable for now)	2026-02-05 11:46:49 +11:00
Dhanji R. Prasanna	0f919237ea	Make plan approval gate only active in plan mode - Add in_plan_mode flag to Agent struct - Add set_plan_mode() and is_plan_mode() methods - Gate check now only runs when in_plan_mode is true - CLI calls set_plan_mode(true) on /plan command and EnterPlanMode - CLI calls set_plan_mode(false) on approval and CTRL-D exit - Update integration test to enable plan mode - Fix test YAML to use Vec<Check> for negative/boundary checks	2026-02-05 11:41:52 +11:00
Dhanji R. Prasanna	3d284b8b60	Merge sessions/interactive/179ac8a6	2026-02-05 11:37:07 +11:00
Dhanji R. Prasanna	1f1a517620	feat(plan): support multiple negative and boundary checks Change Plan Mode to allow multiple negative and boundary checks per item, while keeping happy path as a single check. Schema change: - checks.negative: Check -> Vec<Check> (>=1 required) - checks.boundary: Check -> Vec<Check> (>=1 required) - checks.happy: Check (unchanged, single) This better reflects real-world tasks where there are often multiple error conditions and edge cases worth tracking. Changes: - Update Checks struct to use Vec<Check> for negative/boundary - Update validation to require at least 1 of each - Update prompts and tool definitions with new array syntax - Add 4 new tests for multi-check scenarios	2026-02-05 11:36:45 +11:00
Dhanji R. Prasanna	41839b909e	Remove stray test file	2026-02-05 11:34:15 +11:00

1 2 3 4 5 ...

694 Commits