Commit Graph

823 Commits

Author SHA1 Message Date
Dhanji R. Prasanna
7347d92ae8 Make plan approval gate non-destructive and baseline-aware
- Remove all file revert/delete logic from check_plan_approval_gate:
  no more git checkout or fs::remove_file calls. The gate only warns.
- Remove reverted_files field from ApprovalGateResult::Blocked.
- Add get_dirty_files() helper to snapshot dirty files as a HashSet.
- Capture baseline dirty files when plan mode starts (set_plan_mode).
  Pre-existing dirty files are excluded from gate checks so they
  never trigger blocking.
- Add 5 new unit tests covering non-destructive behavior, baseline
  exclusion, and mixed baseline/new file scenarios.
- Update integration test to match new non-destructive semantics.
2026-02-15 09:53:14 +11:00
Dhanji R. Prasanna
22b1ab93e4 Auto-detect Chrome for Testing to prevent version mismatch
When no chrome_binary is configured, auto-detect Chrome for Testing
at ~/.chrome-for-testing/ and use it instead of letting ChromeDriver
fall back to system Chrome. This prevents the frequent version
mismatch error caused by system Chrome auto-updating independently
of the ChromeDriver installed by setup-chrome-for-testing.sh.

Checks mac-arm64, mac-x64, and linux64 paths. Falls back to system
Chrome (previous behavior) if Chrome for Testing is not installed.
2026-02-14 14:56:33 +11:00
Dhanji R. Prasanna
92352e1897 Make load_toolset a compact tool for CLI display
Add load_toolset to the compact tool lists in streaming.rs and
ui_writer_impl.rs so it renders as a single-line summary instead of
the full multi-line tool definitions output.

Summary format:
  🧩 loaded '<name>' — on success
  ℹ️ already loaded  — when already loaded
   failed          — on error

The toolset name arg is extracted as display_arg in print_tool_compact
so it appears in the compact output line.
2026-02-14 14:49:57 +11:00
Dhanji R. Prasanna
74714806c0 Add evidence format guidance to plan_write tool description
Agents frequently put descriptive prose in the evidence field when
marking plan items done, causing verification errors since
parse_evidence() interprets everything as file paths. The plan_write
tool description now documents the 3 accepted evidence formats
(file path, file:line, file::test_name) and explicitly warns against
putting descriptions in evidence.
2026-02-14 14:19:54 +11:00
Dhanji R. Prasanna
1d77f3f865 fix: allow new plan_write after completed approved plan
When an approved plan was fully complete (all items done/blocked),
plan_write blocked creating a new plan with 'Cannot remove item'
error. Now checks is_complete() first — complete plans allow fresh
plan creation without carrying over approved_revision or enforcing
item ID preservation.

Adds 4 end-to-end integration tests covering happy path, negative
(in-progress still blocks), and boundary cases (all-blocked, mixed).
2026-02-14 12:27:38 +11:00
Dhanji R. Prasanna
1ad74baaa5 Readability refactor: extract mega-functions into focused helpers
Agent: carmack

4 files refactored, net -250 lines, all tests passing (417 + 71).

datalog.rs:
- Extract 7 predicate evaluation helpers from evaluate_predicate_datalog()
  (~200-line match → 12-line dispatch table)
- Extract rule_body_for_predicate() from format_datalog_program()
  (~75-line match → 2-line call)

invariants.rs:
- Extract 7 per-rule helpers from evaluate_predicate()
  (~230-line match → 12-line dispatch table)

envelope.rs:
- Simplify summary construction in verify_envelope()
- Eliminate redundant clone in stamp_envelope()

anthropic.rs:
- Introduce StreamState struct with 6 handler methods
- parse_streaming_response: ~290 lines → ~90 lines
- Max nesting depth reduced from 8 to 4 levels
2026-02-13 16:21:38 +11:00
Dhanji R. Prasanna
0410efd41b Add 1% safety buffer to context window to prevent API token limit errors
Our token estimation heuristic (chars/3 * 1.1 for code, chars/4 * 1.1 for text)
slightly undercounts over long sessions with hundreds of tool calls. This
accumulated drift of ~89 tokens caused Anthropic API 400 errors:
  'prompt is too long: 200089 tokens > 200000 maximum'

Fix: ContextWindow::new() now applies a 1% buffer, setting total_tokens to 99%
of the provider-reported limit. For a 200k window this gives 198k, providing a
2000-token safety margin that absorbs estimation drift.

All percentage calculations, compaction thresholds, and thinning triggers
operate against the buffered limit, so compaction fires earlier and we never
send a request the API will reject.
2026-02-13 15:46:53 +11:00
Dhanji R. Prasanna
a7e0b0ef9e Refactor: deduplicate JSON parsing, provider constructors, and identity function
Agent: fowler

Eliminate code-path aliasing and near-duplicates across recent commits:

1. Deduplicate find_json_object_end: Three near-identical copies in
   streaming_parser.rs, context_window.rs, and acd.rs consolidated into
   a single canonical implementation in utils.rs. All callers now route
   through the canonical version. The utils.rs version uses the most
   defensive variant (with found_start guard). (-84 lines)

2. Deduplicate provider constructors: AnthropicProvider::new() and
   GeminiProvider::new() now delegate to their respective new_with_name()
   methods instead of duplicating the full constructor body.
   (OpenAI already delegated.) (-28 lines)

3. Inline convert_cache_control: Removed identity function that just
   cloned CacheControl. Call sites now use .map(|cc| cc.clone())
   directly. (-4 lines)

Net: -65 lines, 0 behavior changes, all 683 library tests pass.
2026-02-13 12:37:09 +11:00
Dhanji R. Prasanna
bc98c65956 Compact workspace memory: -29% chars, 37 concepts preserved
Agent: huffman

Compaction Summary:
- Lines: 576 → 477 (-17%)
- Chars: 36,372 → 26,173 (-28%)
- Entries: 45 → 37 (merged 8)

Transformations:
- Removed 1 exact duplicate (Datalog Program Generation x2)
- Collapsed 6 log-style bug narratives to current-state declarations
- Merged Plan Verification into Plan Mode entry
- Merged Rulespec Changes into Invariants entry (current state only)
- Updated 12 stale char ranges against actual file positions
- Removed 13 references to deleted code (extraction.rs, shadow_datalog_verify,
  save/load_compiled_rulespec, display_welcome_message, OutputMode, etc.)
- Moved Skills System Entry Points from AGENTS.md to memory (was duplicate)
- AGENTS.md: removed 20-line skills table, kept rules/invariants only
2026-02-13 11:44:55 +11:00
Dhanji R. Prasanna
41584e4479 memory update 2026-02-13 11:35:26 +11:00
Dhanji R. Prasanna
fcb839e5fd fix: nest images inside tool_result content for Anthropic API compliance
read_image tool results placed images as top-level Image content blocks
alongside ToolResult blocks in user messages. The Anthropic API rejects
this combination, reporting orphaned tool_use IDs even though the
tool_result was present — the malformed message structure prevented
the API from recognizing it as a valid tool result.

Added ToolResultContent enum (Text | Blocks) with custom serde so that
when images are attached to a tool result, they are nested inside the
tool_result content array as structured blocks, matching the Anthropic
API's expected format for multi-modal tool results.

Regular tool results (no images) continue to use simple string content.
Regular user messages (not tool results) continue to use top-level
Image blocks.

4 new tests covering image nesting, string fallback, regular user
messages, and orphan detection with structured content.
2026-02-13 10:50:52 +11:00
Dhanji R. Prasanna
68f8f13b38 Fix research_status polling tool falsely deduplicated across auto-continue iterations
The dedup logic compared only tool name+args, ignoring the unique tool call
IDs that native providers (Anthropic) assign to each invocation. When the
model called research_status {} in iteration 1, auto-continued, and called
it again in iteration 2 with identical args but a new ID, the second call
was marked DUP IN MSG and skipped. With no tool executed and no text, the
stream errored with 'No response received from the model.'

Three-part fix:
- ID-aware DUP IN MSG: check_duplicate_in_previous_message() uses tool call
  IDs when both are non-empty (different IDs = different invocations)
- History cutoff: only checks messages from before the current iteration to
  prevent within-iteration false positives
- DUP IN ITER: last_executed_tool on IterationState catches stuttered
  duplicates across chunks within the same response

Regression test reproduces the exact bug (fails without fix, passes with).
2026-02-12 15:54:24 +11:00
Dhanji R. Prasanna
88d2b9592b Fix tool_call input tokens invisible to context window tracker
estimate_tokens() only counted message.content chars, completely
ignoring message.tool_calls[].input JSON. When sent to the API,
tool_use blocks include full input, so the token tracker massively
undercounted — in one session, 303k chars (101k tokens) of tool
input were invisible, showing 39% usage when actual was >100%.
Compaction never triggered, causing an API 400 error.

Added estimate_message_tokens() that accounts for both content and
tool_call input. Updated add_message_with_tokens(), recalculate_tokens(),
and clear_conversation() to use it.

7 unit tests + 1 integration test reproducing the exact session trace.
2026-02-11 16:12:13 +11:00
Dhanji R. Prasanna
d61be719c2 fix: strip orphaned tool_calls from preserved assistant message during compaction
After context compaction, the preserved last assistant message retained
its structured tool_calls field, but the corresponding tool_result was
summarized away. This created orphaned tool_use blocks that violated
the Anthropic API constraint: 'Each tool_use block must have a
corresponding tool_result block in the next message', causing 400 errors.

Primary fix: clear tool_calls from the preserved assistant message in
extract_preserved_messages(). The tool call was already executed and
its result is captured in the summary.

Defense-in-depth: added strip_orphaned_tool_use() post-processing in
Anthropic convert_messages() to detect and strip any orphaned tool_use
blocks before they reach the API.

Added 7 tests: 3 unit tests for compaction stripping, 3 unit tests for
Anthropic orphan detection, 1 integration test reproducing the exact
bug scenario from the h3 session.
2026-02-11 15:22:03 +11:00
Dhanji R. Prasanna
d3f0112f46 fix: store tool calls structurally for proper API roundtripping
The agent would stop mid-task because native tool calls were stored as
inline JSON text in Message.content. When sent back to the Anthropic API
via convert_messages(), they went as plain text instead of structured
tool_use/tool_result blocks. The model would occasionally get confused
and emit text describing what it wanted to do instead of invoking the
tool mechanism.

Changes:
- Add MessageToolCall struct and tool_calls/tool_result_id fields to Message
- Add id field to core ToolCall struct to preserve provider tool call IDs
- Update Anthropic convert_messages() to emit tool_use and tool_result blocks
- Add ToolResult variant to AnthropicContent enum
- Store tool calls structurally in tool message construction (not inline JSON)
- Fix add_message() to preserve empty-content messages with tool_calls
- Fix check_duplicate_in_previous_message() to check structured tool_calls
- Generate valid IDs for JSON fallback tool calls (Anthropic pattern requirement)
- Update planner create_tool_message() to use structured tool calls
2026-02-11 08:48:07 +11:00
Dhanji R. Prasanna
2a4cd1f4d6 fix: strip duplicate tool call JSON from assistant messages when LLM stutters
When the LLM emits identical JSON tool calls as text content (JSON
fallback mode), the raw duplicate JSON was being stored in the assistant
message in conversation history. This confused the model on subsequent
turns, causing it to stall or repeat itself.

Root cause: raw_content_for_log used get_text_content() which returns
the full parser buffer including all duplicate tool call JSONs.

Fix: Added get_text_before_tool_calls() to StreamingToolParser that
returns only the text before the first JSON tool call. Changed
raw_content_for_log to use this method so the assistant message only
contains the preamble text + the single executed tool call.

Added 5 integration tests covering stuttered duplicates, triple
stutter, cross-turn dedup, and different-args boundary case.

Added MockResponse helpers for simulating LLM stutter patterns.
2026-02-10 19:53:11 +11:00
Dhanji R. Prasanna
f9625f1a2d Add envelope verification token: keyed SipHash-2-4 MAC stamps envelope.yaml
- Key management: 32-byte random key at ~/.g3/verification.key (chmod 600)
- Token format: g3v1:<base64(SipHash-2-4 of canonical_facts + NUL + canonical_rulespec)>
- stamp_envelope() called only when all rulespec predicates pass
- verify_token() for cross-process validation
- ActionEnvelope.verified field (Option<String>, skip_serializing_if none)
- Token never shown to LLM, only written to envelope.yaml
- Zero new dependencies (uses std SipHasher, existing rand/base64)
- 12 unit tests covering determinism, tamper detection, backward compat
2026-02-07 17:09:37 +11:00
Dhanji R. Prasanna
edbae60ff3 Add rulespec extensions: new predicate rules, when conditions, null handling, solon agent
Features:
- New predicate rules: NotContains, AnyOf, NoneOf
- Conditional predicates via when clauses (WhenCondition/CompiledWhenCondition)
- Null handling: YAML null treated as absent for exists/not_exists
- Solon agent for rulespec authoring (agents/solon.md)
- Rulespec schema documentation (prompts/schemas/rulespec.schema.md)

Bugfix:
- Fixed when condition evaluation in datalog path: catch-all branch did
  naive string contains instead of delegating to evaluate_predicate_datalog().
  Rules like matches (regex) were silently ignored, causing vacuous pass
  and letting violations through. Now delegates to evaluate_predicate_datalog()
  which handles all 12 rule types correctly.

Tests: 34 new tests covering all new rules, null handling, when conditions,
and the when+matches bugfix (butler rulespec pattern).
2026-02-07 16:38:27 +11:00
Dhanji R. Prasanna
328eecfcad fix: extract_facts fallback for facts-prefixed selectors in datalog verification
Root cause: ActionEnvelope.to_yaml_value() creates a Mapping from the
facts HashMap without a 'facts:' wrapper key, but rulespec selectors
may include a 'facts.' prefix (e.g. 'facts.feature.done' instead of
'feature.done'). This caused zero facts to be extracted, making all
predicate evaluations fail.

Fix: extract_facts() now tries the selector against the unwrapped
envelope value first, and if empty, retries against a facts-wrapped
version as fallback.

Also:
- Strengthened write_envelope tool description to require top-level
  facts: key, file paths for evidence, and allow free-form notes
- Updated system prompt with matching rules
- Added 6 new tests (4 unit, 2 integration)
- Strengthened existing integration test to verify fact count > 0
2026-02-07 14:42:39 +11:00
Dhanji R. Prasanna
b045d0c5e9 fix: reject write_envelope with empty facts
The write_envelope tool was silently accepting YAML without a 'facts:'
top-level key. serde would ignore unknown fields and default the facts
HashMap to empty, causing the predicate pipeline to always see no facts.

Now validates that envelope.facts is non-empty after deserialization,
returning a clear error with an example of the correct format.

Adds 6 tests covering valid/invalid/boundary deserialization cases.
2026-02-07 13:24:41 +11:00
Dhanji R. Prasanna
6c8e334793 chore: update workspace memory with datalog program generation notes 2026-02-07 12:41:37 +11:00
Dhanji R. Prasanna
51dfe71a2b fix: generate actual Soufflé datalog in .dl files instead of YAML
The rulespec compiler was writing serde_yaml::to_string(&compiled) into
rulespec.compiled.dl files — just YAML, not datalog at all.

Added format_datalog_program() that produces proper Soufflé-style datalog:
- .decl relation declarations (claim_value, claim_length, predicate_pass, predicate_fail)
- Fact assertions from the envelope
- Rules for all 9 predicate types (exists, not_exists, equals, contains,
  greater_than, less_than, min_length, max_length, matches)
- .output directives for query results

Updated verify_envelope() to call the new function instead of
serde_yaml::to_string(). Added 8 unit tests covering all rule types,
edge cases, and the butler rulespec example.
2026-02-07 12:33:50 +11:00
Dhanji R. Prasanna
5085f10717 Merge sessions/interactive/07eabd99 2026-02-07 12:29:56 +11:00
Dhanji R. Prasanna
afaee8816c tweak to system prompt 2026-02-06 20:32:19 +11:00
Dhanji R. Prasanna
14112ff92e Remove client-side plan approval interception
Let approval input flow through the LLM instead of being
short-circuited in the REPL. The LLM calls plan_approve
itself, which is cleaner (single input path) and more
flexible (no hardcoded misspelling list).
2026-02-06 20:16:11 +11:00
Dhanji R. Prasanna
799b4ced8e Remove auto-submit status prompt from /project command
The /project command was auto-invoking a status report ("what is the
current state of the project?") as the first user message after loading
project files. This was inconsistent with the --project flag behavior,
which only loads files and displays status without auto-prompting.

Removed the auto-submit lines so /project now behaves identically to
the --project CLI flag: load files, set context, display status, done.
2026-02-06 16:12:33 +11:00
Dhanji R. Prasanna
7032e75fc6 Add write_envelope tool with verify_envelope for explicit envelope creation
- New crates/g3-core/src/tools/envelope.rs with execute_write_envelope()
  and verify_envelope() (moved from shadow_datalog_verify in plan.rs)
- write_envelope accepts YAML facts, writes envelope.yaml to session dir,
  then runs datalog verification against analysis/rulespec.yaml in shadow mode
- plan_verify() now only checks envelope existence (no longer runs datalog)
- Tool count: 13 -> 14
- Updated system prompt to instruct agents to call write_envelope before
  marking last plan item done
- Updated integration tests to use write_envelope tool directly

Workflow: write_envelope -> verify_envelope -> datalog shadow artifacts
          plan_write(done) -> plan_verify -> checks envelope exists
2026-02-06 16:09:07 +11:00
Dhanji R. Prasanna
f7a240a99b refactor: decouple rulespec from plan_write, read from analysis/rulespec.yaml
- Remove rulespec parameter from plan_write tool definition and execution
- Remove rulespec compilation from plan_approve (no longer pre-compiles)
- Remove write_rulespec, get_rulespec_path, format_rulespec_yaml/markdown
  from invariants.rs; read_rulespec() now takes &Path working dir
- Remove save/load_compiled_rulespec, get_compiled_rulespec_path from datalog.rs
- Update shadow_datalog_verify() to compile on-the-fly from
  analysis/rulespec.yaml, writing rulespec.compiled.dl and
  datalog_evaluation.txt to session dir
- Remove rulespec display from plan_read output
- Remove Invariants/Rulespec section from native.md system prompt
- Remove rulespec from prompts.rs plan_write format and examples
- Update existing tests to remove rulespec from plan_write calls
- Add 3 integration tests for on-the-fly rulespec verification
2026-02-06 15:31:23 +11:00
Dhanji R. Prasanna
a93ce932a3 refactor: Clean up Cargo dependencies - remove unused, update outdated
- Remove unused const_format from g3-planner (never imported)
- Remove unused thiserror from workspace and 5 crates (declared but never used)
- Update termimad 0.31 -> 0.34 in studio (consistency with g3-cli)
- Update indicatif 0.17 -> 0.18 in g3-cli
- Update ratatui 0.29 -> 0.30 in g3-cli
- Update walkdir 2.4 -> 2.5 in g3-core
- Update image 0.24 -> 0.25 in g3-computer-control (macOS + Linux)
- Update config 0.14 -> 0.15 in workspace

Blocked: reqwest 0.11 -> 0.12/0.13 requires breaking API changes to
bytes_stream() used in 4 providers - needs separate migration effort.

All tests pass. No behavior changes.

Agent: fowler
2026-02-06 14:22:59 +11:00
Dhanji R. Prasanna
31bdcb651b feat(cli): add multiline input support with Alt+Enter
- Enable custom-bindings feature in rustyline
- Bind Alt+Enter to insert newlines in interactive and accumulative modes
- Update calculate_visual_lines() to handle embedded newlines correctly
- Add tests for multiline visual line calculation

Note: Shift+Enter is not distinguishable in standard terminals, so Alt+Enter
is used as the multiline input trigger.
2026-02-06 14:09:12 +11:00
Dhanji R. Prasanna
abfac197ab Add datalog-based invariant verification system
Implement a new datalog verification layer using datafrog that:

- Compiles rulespec to datalog on plan_approve
- Extracts facts from action envelope using selectors
- Executes datalog rules on plan_verify
- Writes evaluation results to datalog_evaluation.txt (shadow mode)

Key components:
- crates/g3-core/src/tools/datalog.rs: Full datalog module with:
  - compile_rulespec(): Validates and compiles rulespec
  - extract_facts(): Extracts facts from envelope YAML
  - execute_rules(): Runs datafrog iteration
  - 23 comprehensive tests

- crates/g3-core/src/tools/plan.rs:
  - execute_plan_approve(): Now compiles rulespec on approval
  - shadow_datalog_verify(): Runs datalog and writes to eval file

Results are written to .g3/sessions/<id>/datalog_evaluation.txt
for inspection, NOT injected into context window (shadow mode).
2026-02-06 13:50:54 +11:00
Dhanji R. Prasanna
bcd50190c6 Add explicit [plan mode] indicator to interactive prompt
- Change plan mode prompt from ' >> ' to ' [plan mode] >> ' for clarity
- Add magenta syntax highlighting for [plan mode] text in prompt
- Add tests for prompt highlighting behavior
2026-02-06 11:31:07 +11:00
Dhanji R. Prasanna
f35807b728 refactor: move research tools to loadable toolset
Migrate research and research_status tools from core tools to a
dynamically loadable toolset, following the same pattern as webdriver.

Changes:
- Add 'research' toolset to TOOLSET_REGISTRY in toolsets.rs
- Add create_research_tools() function with research and research_status
- Remove research tools from create_core_tools() in tool_definitions.rs
- Remove exclude_research field and with_research_excluded() from ToolConfig
- Update tests: core tools now 13 (was 15), added 3 research toolset tests

The agent must now call load_toolset('research') to use research tools.
This simplifies the default tool set and removes special-case logic for
the scout agent (which simply won't load the research toolset).
2026-02-06 11:17:32 +11:00
Dhanji R. Prasanna
cbced3390c feat: JIT-injectable toolsets with load_toolset tool
Implement dynamic tool loading system that allows tools to be loaded
on-demand rather than included in the default set.

Key changes:
- Add toolsets module with registry of loadable toolsets
- Add load_toolset tool that returns tool definitions for a named toolset
- Add <available_toolsets> section to system prompt
- Track loaded toolsets in Agent, extend tool definitions dynamically
- Move webdriver (15 tools) to JIT-only loading

Benefits:
- Leaner default context (fewer tokens consumed)
- On-demand loading when agent needs specialized tools
- Extensible registry for future toolsets
- Idempotent loading with helpful error messages

Files:
- crates/g3-core/src/toolsets.rs (new)
- crates/g3-core/src/tools/toolsets.rs (new)
- crates/g3-core/src/tool_definitions.rs
- crates/g3-core/src/tool_dispatch.rs
- crates/g3-core/src/prompts.rs
- crates/g3-core/src/lib.rs
- crates/g3-core/src/tools/executor.rs
2026-02-06 09:35:11 +11:00
Dhanji R. Prasanna
ff15db44c0 Restore research as first-class tool, remove research skill
Restores the research tool that was previously externalized as a skill:

- Add pending_research.rs: PendingResearchManager with thread-safe task tracking
- Add tools/research.rs: execute_research (async), execute_research_status
- Add research/research_status tool definitions with exclude_research config
- Integrate PendingResearchManager into Agent and ToolContext
- Inject completed research results in streaming loop

Remove research skill:
- Clear EMBEDDED_SKILLS array in embedded.rs
- Delete skills/research/ directory
- Update all tests expecting embedded research skill
- Update docs and memory to reflect the change

The research tool now:
- Spawns scout agent in background tokio task
- Returns immediately with research_id
- Automatically injects results into conversation when ready
- Supports status checks via research_status tool
2026-02-06 07:38:06 +11:00
Dhanji R. Prasanna
b673827076 Fix embedded skill loading: stop XML-escaping location paths
The <location> field in the skills XML prompt was being XML-escaped,
converting <embedded:research>/SKILL.md to &lt;embedded:research&gt;/SKILL.md.
When the LLM tried to use read_file with this escaped path, it would fail.

Changes:
- Remove escape_xml() call from location field in prompt.rs
- Add fallback handling for escaped paths in try_read_embedded_skill()
- Add tests for both prompt generation and read_file handling

Fixes embedded skill loading for agents like butler running outside the g3 repo.
2026-02-05 23:16:40 +11:00
Dhanji R. Prasanna
65b2ec368f Add Action Envelope section back to native prompt
Restored the Action Envelope instructions with a clear, complete example
showing how to write envelope.yaml for rulespec verification.
2026-02-05 22:27:29 +11:00
Dhanji R. Prasanna
3823f8b5f3 Optimize native system prompt - 48% size reduction
Removed redundant and vague content from prompts/system/native.md:
- Simplified intro from 17 lines to 3 lines
- Reduced Code Search section to one line
- Removed duplicate Plan Mode example (kept one)
- Removed Action Envelope section (rarely used correctly)
- Removed verbose Memory Format details (tool description covers it)
- Removed Response Guidelines (obvious to modern LLMs)

Size: 8,620 chars -> 4,498 chars

Also updated:
- G3_IDENTITY_LINE constant for agent mode compatibility
- Test assertions to check for new prompt markers
- System prompt validation to use new marker string
2026-02-05 22:16:34 +11:00
Dhanji R. Prasanna
d978032044 Remove redundant AGENTS.md heading from startup output
The loaded status line (✓ AGENTS.md ✓ Memory) already indicates that
AGENTS.md was loaded, so the separate '>> AGENTS.md - Machine Instructions'
heading line was redundant.

- Remove print_project_heading() function from display.rs
- Remove extract_project_heading call from interactive.rs
- Clean up unused imports
2026-02-05 21:38:47 +11:00
Dhanji R. Prasanna
c6df75d886 Fix shell tool output line clipping to account for suffix
The shell tool output line was wrapping because update_tool_output_line
clipped the content without reserving space for the suffix that gets
appended later (line count + timing info).

Added suffix_overhead of 30 chars for shell tools to reserve space for:
- " (9999 lines)" = ~13 chars
- " | 99999 ◉ 999ms" = ~17 chars

This ensures the complete line fits within terminal width without wrapping.
2026-02-05 21:23:00 +11:00
Dhanji R. Prasanna
7e2d9bc22c Enforce rulespec creation with plan_write for new plans
Solves the tautology problem where the LLM would write invariants after
implementation, making them match what was done rather than constrain it.

Changes:
- plan_write now accepts 'rulespec' parameter
- New plans REQUIRE rulespec (fails with helpful error if missing)
- Plan updates don't require rulespec (backward compatible)
- Rulespec is parsed, validated, and written atomically with plan
- Updated system prompt with clear examples for new vs update
- Updated tool definition schema
- Updated all affected tests

New flow: task → plan+rulespec → user reviews BOTH → approve → implement
2026-02-05 21:12:02 +11:00
Dhanji R. Prasanna
085688479b Improve terminal width responsiveness for tool output
Clip summary text and other long fields to fit terminal width:

- Clip display_summary in print_tool_compact (e.g., "47 lines (2.0k chars)")
- Account for header_suffix length when compressing paths in print_tool_output_header
- Clip TODO item lines in print_todo_compact
- Clip plan item descriptions, evidence, touches, checks, and paths in print_plan_compact
- Replace hardcoded 70/40 char limits with dynamic terminal-width-based clipping

All clipping uses clip_line() which handles UTF-8 safely and adds ellipsis.
2026-02-05 20:44:12 +11:00
Dhanji R. Prasanna
19162b1fe6 Exit plan mode when plan is completed or blocked
When a plan reaches a terminal state (all items done or blocked) in
interactive mode, automatically exit plan mode and return to normal
prompt.

Changes:
- Add Agent::is_plan_terminal() method to check if plan is complete
- Add check_and_exit_plan_mode_if_terminal() helper in interactive.rs
- Call the helper after each execute_user_input() to detect completion

Fixes issue where plan mode prompt ' >> ' persisted after plan completion.
2026-02-05 20:31:24 +11:00
Dhanji R. Prasanna
30627bce97 feat(cli): make tool output responsive to terminal width
- Add terminal_width module with get_terminal_width(), clip_line(),
  compress_path(), and compress_command() utilities
- Update ConsoleUiWriter to use dynamic terminal width for all tool output
- Tool output lines are clipped to fit without wrapping
- Tool headers use semantic compression (paths preserve filename,
  commands clip from right)
- 4-character right margin for visual clarity
- Minimum 40 columns, default 80 when terminal size unavailable
- All truncation is UTF-8 safe (char counting, not byte slicing)
- Add 13 unit tests for terminal width utilities
2026-02-05 20:18:30 +11:00
Dhanji R. Prasanna
b2fbcf33d0 Fix plan approval gate and add "Create a plan:" prefix for first message
- Fix build warnings: add #[allow(dead_code)] to unused deserialization fields
- Fix plan approval gate bug: block file changes when no plan exists (not just
  when plan exists but is unapproved)
- Add "Create a plan: " prefix to first user message in plan mode
- Add prepare_plan_mode_input() helper function for testability
- Reset is_first_plan_message flag when entering plan mode via /plan command
- Add tests for approval gate (no plan + no changes, no plan + changes)
- Add tests for prepare_plan_mode_input (happy, negative, boundary cases)
2026-02-05 19:43:38 +11:00
Dhanji R. Prasanna
06d75f613c feat(plan): display rulespec.yaml and envelope.yaml in plan_read/plan_write output
- Add format_envelope_markdown() function in invariants.rs for rich markdown
  formatting of ActionEnvelope facts
- Add format_yaml_value_markdown() helper for recursive YAML value display
- Update execute_plan_read() to append rulespec and envelope sections
- Update execute_plan_write() to append envelope section alongside rulespec
- Add 3 tests for format_envelope_markdown (empty, with facts, null values)

When plan_read or plan_write is called, the output now includes:
- Plan YAML (as before)
- Rulespec section (if rulespec.yaml exists) with invariants grouped by source
- Envelope section (if envelope.yaml exists) with facts in readable format

Missing files show placeholder text rather than errors.
2026-02-05 19:08:55 +11:00
Dhanji R. Prasanna
bc5c1bdf61 Fix plan UI formatting to handle Vec<Check> and display elegantly
- Update ChecksCompact to use Vec<CheckCompact> for negative/boundary fields
- Add progress bar visualization showing done/doing/blocked/todo counts
- Show evidence for done items, checks for active items
- Display all negative and boundary checks (not just first)
- Add proper tree structure with └/├ prefixes
- Truncate long descriptions and evidence paths
- Add file path display with 📄 icon
2026-02-05 14:38:18 +11:00
Dhanji R. Prasanna
e34f37fd47 Merge sessions/sdlc/3b6c6c3e into main
Resolved conflicts:
- analysis/memory.md: kept condensed documentation from incoming branch
- crates/g3-core/src/skills/embedded.rs: removed unused HashMap import, kept better doc comment

Additional fix:
- crates/g3-core/src/prompts.rs: updated test to match current prompt file content
2026-02-05 14:38:08 +11:00
Dhanji R. Prasanna
307f04fa25 chore: Compress workspace memory after research externalization
- Remove deleted code: pending_research.rs, tools/research.rs (externalized to skill)
- Merge duplicate Agent Skills entries into unified section
- Update SDLC state path: analysis/sdlc/ → .g3/sdlc/
- Remove G3Status.resuming() (deleted in 6228001)
- Tighten verbose descriptions throughout

Metrics: 444 → 325 lines (-27%), 23.6k → 17.0k chars (-28%)
Concepts preserved: all semantic information retained

Agent: huffman
2026-02-05 14:29:48 +11:00
Dhanji R. Prasanna
74c2671e1b docs: Update documentation for Agent Skills system
Document the new Skills system introduced in recent commits:

- docs/architecture.md: Add Skills System section with discovery
  priority, embedded skills, script extraction, and key types
- docs/skills.md: New comprehensive guide covering SKILL.md format,
  discovery priority, embedded skills, research skill usage, and
  troubleshooting
- README.md: Update Agent Skills section with correct priority order,
  add embedded skills info, research skill usage, and link to Skills
  Guide in Documentation Map
- AGENTS.md: Add skill creation to Adding Features, skill extraction
  to Dangerous Code Paths, and new Skills System Entry Points section

All documentation links validated - no broken links or orphan files.

Agent: lamport
2026-02-05 14:26:26 +11:00