Commit Graph

807 Commits

Author SHA1 Message Date
Dhanji R. Prasanna
25ad198b83 Sync agent plan mode state on CLI startup
CLI starts in plan mode by default (when not in agent mode), but was not
calling agent.set_plan_mode(true) at initialization. This meant the gate
check would not run until the user explicitly entered plan mode via /plan.
2026-02-05 11:47:38 +11:00
Dhanji R. Prasanna
b86901a86b Merge sessions/interactive/47299e3b 2026-02-05 11:47:24 +11:00
Dhanji R. Prasanna
3d3f68e6da Externalize native system prompt to markdown file
- Move system prompt for native tool calling models to prompts/system/native.md
- Use include_str! to embed at compile time
- Remove concatenated SHARED_* string constants
- Prompt is now readable/editable as a complete markdown document
- Non-native prompt still uses Rust constants (acceptable for now)
2026-02-05 11:46:49 +11:00
Dhanji R. Prasanna
0f919237ea Make plan approval gate only active in plan mode
- Add in_plan_mode flag to Agent struct
- Add set_plan_mode() and is_plan_mode() methods
- Gate check now only runs when in_plan_mode is true
- CLI calls set_plan_mode(true) on /plan command and EnterPlanMode
- CLI calls set_plan_mode(false) on approval and CTRL-D exit
- Update integration test to enable plan mode
- Fix test YAML to use Vec<Check> for negative/boundary checks
2026-02-05 11:41:52 +11:00
Dhanji R. Prasanna
3d284b8b60 Merge sessions/interactive/179ac8a6 2026-02-05 11:37:07 +11:00
Dhanji R. Prasanna
1f1a517620 feat(plan): support multiple negative and boundary checks
Change Plan Mode to allow multiple negative and boundary checks per item,
while keeping happy path as a single check.

Schema change:
- checks.negative: Check -> Vec<Check> (>=1 required)
- checks.boundary: Check -> Vec<Check> (>=1 required)
- checks.happy: Check (unchanged, single)

This better reflects real-world tasks where there are often multiple
error conditions and edge cases worth tracking.

Changes:
- Update Checks struct to use Vec<Check> for negative/boundary
- Update validation to require at least 1 of each
- Update prompts and tool definitions with new array syntax
- Add 4 new tests for multi-check scenarios
2026-02-05 11:36:45 +11:00
Dhanji R. Prasanna
41839b909e Remove stray test file 2026-02-05 11:34:15 +11:00
Dhanji R. Prasanna
c347a73cbd Add plan approval gate to block file changes without approved plan
- Add check_plan_approval_gate() in tools/plan.rs that runs after each tool call
- Detects file changes via git status --porcelain when plan exists but not approved
- Reverts changes: git checkout for modified files, rm for new untracked files
- Returns blocking message instructing LLM to create/approve plan first
- Add ApprovalGateResult enum with Allowed/Blocked/NotGitRepo variants
- Add set_session_id() and set_working_dir() methods on Agent for testing
- Add integration test using MockProvider to simulate blocked write_file
2026-02-05 11:34:10 +11:00
Dhanji R. Prasanna
add8060526 Add studio sdlc command for SDLC maintenance pipeline
Implements a pipeline that orchestrates 7 g3 agents in sequence:
1. euler - dependency graph and hotspots analysis
2. breaker - whitebox exploration and edge-case discovery
3. hopper - deep testing and regression integrity
4. fowler - refactoring to deduplicate and reduce complexity
5. carmack - in-place rewriting for readability and concision
6. lamport - human-readable documentation and validation
7. huffman - semantic compression of memory

Features:
- Commit cursor tracking (--from flag to set starting point)
- Crash recovery (resumes from last incomplete stage)
- Git worktree isolation for all pipeline work
- Visual pipeline display with status icons
- Summary generation saved to .g3/sessions/sdlc/
- Pipeline state persisted to analysis/sdlc/pipeline.json

CLI:
- studio sdlc run [-c N] [--from COMMIT]
- studio sdlc status
- studio sdlc reset

Also adds huffman agent to embedded agents list.
2026-02-05 10:46:10 +11:00
Dhanji R. Prasanna
fdb1255f02 Add --resume <session-id> flag for explicit session resumption
- Add --resume CLI flag that conflicts with --new-session
- Add load_continuation_by_id() to load sessions by full or partial ID
- Support loading from latest.json or falling back to session.json
- Handle --resume in both normal and agent modes
- Agent mode validates session belongs to correct agent
2026-02-05 10:23:39 +11:00
Dhanji R. Prasanna
3046f0dd6e feat: Add invariants system for Plan Mode verification
Adds rulespec.yaml and envelope.yaml support for machine-readable
invariant checking during plan completion.

- Add invariants module with Rulespec, ActionEnvelope, and evaluation logic
- Add Invariants section to system prompt with workflow instructions
- Show rulespec/envelope file status in plan verification output
- Rulespec written during planning (captures constraints from task)
- Envelope written after implementation (documents what was built)
2026-02-04 20:49:58 +11:00
Dhanji R. Prasanna
a5f6475603 feat: implement Agent Skills specification support
Implements the Agent Skills specification (https://agentskills.io) for
portable skill packages that give the agent new capabilities.

Changes:
- Add skills module with SKILL.md parser (YAML frontmatter + markdown body)
- Implement skill discovery from ~/.g3/skills/, config extra_paths, and .g3/skills/
- Generate <available_skills> XML for system prompt injection
- Add SkillsConfig to g3-config with enabled flag and extra_paths
- Wire skills discovery into CLI startup
- Add 29 unit tests for parser, discovery, and prompt generation
- Update README with Agent Skills documentation

Skill locations (priority order):
1. ~/.g3/skills/ (global)
2. Config extra_paths
3. .g3/skills/ (workspace, highest priority)

At startup, g3 scans skill directories and injects a summary into the
system prompt. When the agent needs a skill, it reads the full SKILL.md
using the read_file tool.
2026-02-04 12:58:57 +11:00
Dhanji R. Prasanna
95d9847354 Update dependency analysis artifacts with detailed evidence
- hotspots.md: Added specific dependent file lists for each hotspot
- hotspots.md: Added cross-crate coupling points table
- hotspots.md: Added crate-level coupling scores
- limitations.md: Expanded coverage of unobservable patterns
- limitations.md: Added confidence levels for inferences
- limitations.md: Added extraction method details table

Agent: euler
2026-02-02 17:20:15 +11:00
Dhanji R. Prasanna
263a838d31 Remove redundant 'No plan exists' message from plan_read output
The UI already shows 'empty' via print_plan_compact, so returning an
empty string avoids duplicate output.
2026-02-02 17:19:01 +11:00
Dhanji R. Prasanna
e332109273 Auto-approve plans in non-interactive (autonomous/one-shot) mode
- Add auto-approval logic in execute_plan_write() when ctx.is_autonomous is true
- Update system prompt to document auto-approval behavior
- Plans still require explicit approval in interactive mode
2026-02-02 17:16:21 +11:00
Dhanji R. Prasanna
0aead8d86d fix: Enable compact UI output for plan_approve tool
Added plan_approve to the compact tool list in format_tool_result_summary()
so it displays in the same format as other tools like read_file and write_file.

The format_plan_approve_summary() function already existed but was never
called because plan_approve was missing from the matches! block.
2026-02-02 17:06:10 +11:00
Dhanji R. Prasanna
f8448e5622 feat: Plan Mode interactive flow with approval shortcuts
- Start g3 in plan mode with ' >>' prompt and welcome message
- Add is_approval_input() to detect 'approve', 'a', 'yes', etc. and misspellings
- Allow trailing punctuation (!, ., ,) on approval words
- Call plan_approve tool directly without LLM when approval detected
- Add synthetic assistant message after approval for LLM context
- Exit plan mode after successful approval, return to 'g3>' prompt
- CTRL-D in plan mode exits plan mode first, then exits g3
- /plan command enters plan mode and shows welcome message
- Agent mode (--agent) does not start in plan mode
- Add CommandResult enum to signal plan mode entry from commands
2026-02-02 16:59:52 +11:00
Dhanji R. Prasanna
9024f693fa Fix plan tool UI formatting
- Fix vertical bar continuation: │ continues all the way down, only the
  very last sub-line (boundary of last item) gets └
- Add visual gap before plan file path and change 📄 to ->
- Dedent file path to align with tree root
- Fix plan_approve to use proper compact tool format (was missing from
  is_compact_tool matches! in print_tool_compact, causing it to fall
  through to regular output with | prefix)
2026-02-02 16:29:37 +11:00
Dhanji R. Prasanna
e893794029 Rename /feature command to /plan
- Update command matching from /feature to /plan in commands.rs
- Update help text, usage message, and example
- Update workspace memory references
- /feature is no longer recognized (completely removed)
2026-02-02 16:00:09 +11:00
Dhanji R. Prasanna
8705228fda Fix input formatter bugs: apostrophe highlighting and line duplication
Fixes two bugs in the input formatter:

1. Single/double quote regex now requires word boundaries:
   - Contractions like it's, don't, won't no longer trigger highlighting
   - Only properly quoted text like 'special' or "hello" gets cyan
   - Mixed input like "it's a 'test' case" only highlights 'test'

2. Visual line calculation fix for exact terminal width:
   - When text exactly fills terminal width, cursor wraps to next line
   - Added +1 adjustment to account for this edge case
   - Extracted calculate_visual_lines() for testability

Added 9 new tests covering all edge cases.
2026-02-02 15:54:38 +11:00
Dhanji R. Prasanna
571188305a feat: add compact UI output for Plan Mode tools
Plan tools (plan_read, plan_write) now display with elegant tree-style
formatting similar to the old todo_write UI:

- State indicators: □ (todo), ◐ (doing), ■ (done), ⊘ (blocked)
- Tree prefixes (├/└) for items with child details
- Strikethrough for completed items
- Shows touches and all three checks (happy/negative/boundary)
- Displays plan file path link at the end

plan_approve uses compact single-line format like read_file:
- Shows approval status and revision number
- Handles already-approved and error cases

Changes:
- Add print_plan_compact() to UiWriter trait with default impl
- Implement print_plan_compact() in ConsoleUiWriter
- Call print_plan_compact() from execute_plan_read/write
- Add plan_read/plan_write to is_self_handled_tool()
- Add plan_approve to is_compact_tool() with format_plan_approve_summary()
- Add serde_yaml dependency to g3-cli
2026-02-02 15:30:05 +11:00
Dhanji R. Prasanna
d6b7177107 Implement plan_verify() for deterministic evidence validation
Adds a verification system that checks evidence in completed plan items:

- Evidence parsing: supports code locations (file:line, file:line-line, file only)
  and test references (file::test_name)
- Code location verification: checks file exists, validates line numbers in range
- Test reference verification: checks test file exists, searches for fn pattern
- Verification results: Verified, Warning, Error, Skipped statuses
- Loud output formatting with emoji indicators for warnings/errors
- Integration with execute_plan_write(): runs when plan is complete and approved
- 12 new unit tests covering parsing and verification

Warnings are advisory (don't block), errors are loud but also don't block.
Blocked items are skipped during verification.
2026-02-02 15:15:03 +11:00
Dhanji R. Prasanna
a63950d8f5 Add Plan Mode to replace TODO system
Plan Mode is a cognitive forcing system that requires reasoning about:
- Happy path
- Negative case
- Boundary condition

New tools:
- plan_read: Read current plan for session
- plan_write: Create/update plan with YAML content (validates structure)
- plan_approve: Mark current revision as approved

New command:
- /feature <description>: Start Plan Mode for a new feature

Plan schema requires:
- plan_id, revision, approved_revision
- items with id, description, state, touches, checks (happy/negative/boundary)
- evidence and notes required when marking items done

Verification:
- plan_verify() called automatically when all items are done/blocked

Removed:
- todo_read, todo_write tools
- todo.rs module and related tests
2026-02-02 14:38:25 +11:00
Dhanji R. Prasanna
7fc9eb0778 Fix doc-test failure in GLM adapter
Use quadruple backticks for outer code fence to properly escape
the nested code fence example showing JSON format.
2026-01-30 14:53:04 +11:00
Dhanji R. Prasanna
afc5bc8574 Readability improvements across streaming_parser, input_formatter, commands
- streaming_parser.rs: Reduced ~70 lines by removing redundant comments,
  consolidating doc comments, using slice syntax for TOOL_CALL_PATTERNS
- input_formatter.rs: Lazy regex compilation via once_cell (performance),
  cleaner function structure, reduced comment noise
- commands.rs: Extracted format_research_task_summary() and
  format_research_report_header() helpers, reduced ~40 lines of duplication
- pending_research.rs: Fixed 2 unused variable warnings in tests

All changes are behavior-preserving. 446 tests pass.

Agent: carmack
2026-01-30 14:48:08 +11:00
Dhanji R. Prasanna
51f12769d5 Merge sessions/hopper/297c7be9 2026-01-30 14:30:53 +11:00
Dhanji R. Prasanna
58bbfde6f4 test: add integration tests for streaming parser stuttering bug fix
Add characterization tests for the streaming parser stuttering bug fix (fa3c920).
These tests verify that when an LLM "stutters" and emits incomplete tool call
fragments followed by complete tool calls, the parser:

1. Does not get stuck waiting for the incomplete fragment to complete
2. Successfully parses complete tool calls that appear after the fragment

Tests cover:
- The exact pattern from butler session butler_c6ab59af2e4f991c
- Edge cases that should NOT trigger invalidation (nested JSON, patterns in strings)
- Recovery behavior after reset
- Multiple complete tool calls
- Boundary conditions (chunk boundaries, minimal patterns)

Agent: hopper
2026-01-30 14:30:27 +11:00
Dhanji R. Prasanna
3003bdebaa refactor: fix flaky test and remove dead code in recent commits
Fixes issues in the last 11 commits:

1. pending_research.rs: Fix flaky test_generate_id_uniqueness
   - Replaced random u16 suffix with atomic counter for guaranteed uniqueness
   - The timestamp+random approach could collide when generating IDs rapidly
   - Now uses static AtomicU32 counter that increments monotonically

2. embedded/adapters/glm.rs: Remove unused in_code_fence field
   - Field was written but never read (dead code)
   - Removed from struct definition, constructor, and reset()

3. embedded/adapters/glm.rs: Fix orphaned tests
   - Two tests (test_strip_code_fences, test_code_fenced_tool_call) were
     outside the #[cfg(test)] mod tests block
   - Moved closing brace to include them in the test module

All 446 library tests pass.

Agent: fowler
2026-01-30 14:28:43 +11:00
Dhanji R. Prasanna
6bb07ce4f5 Merge sessions/interactive/3c2a09df 2026-01-30 14:20:12 +11:00
Dhanji R. Prasanna
f1a5241777 Add /research <id> and /research latest commands
Allow users to view research reports directly from the CLI:

- /research - List all research tasks (unchanged)
- /research <id> - View the full report for a specific research task
- /research latest - View the most recent completed research report

Report display includes query, status, elapsed time, and full content.
2026-01-30 14:06:28 +11:00
Dhanji R. Prasanna
fa3c9203e0 Fix streaming parser bug: detect abandoned tool call fragments
When the LLM 'stutters' and emits incomplete tool call fragments like:
  {"tool": "shell", "args": {...}}
  {"tool":
  {"tool": "shell", "args": {...}}

The parser would get stuck waiting for the incomplete fragment to complete,
causing the entire response to be lost (no tool executed, no text displayed).

This was observed in butler session butler_c6ab59af2e4f991c where the user's
'send!' command produced no response.

Fix: Enhanced is_json_invalidated() to detect when a new tool call pattern
({"tool"}) appears after a newline while parsing an incomplete JSON fragment.
This indicates the previous fragment was abandoned and should be invalidated.

Safety:
- Tool patterns inside JSON strings (e.g., writing example code) are not
  affected because the check only runs outside strings
- Added tests for the stuttering pattern and the file-writing edge case
2026-01-30 14:00:18 +11:00
Dhanji R. Prasanna
f93d05f444 Add real-time research completion notifications
When background research completes, g3 now immediately prints a status
message instead of waiting for the next user interaction:

- Added ResearchCompletionNotification and broadcast channel to
  PendingResearchManager for push-based notifications
- Added spawn_research_notification_handler() in interactive mode that
  listens for completions in a background task
- When idle (at prompt): clears line, prints status, reprints prompt
- When busy (processing): prints status inline (interleaving is fine)
- Added G3Status::research_complete() for consistent formatting
- Added enable_research_notifications() method to Agent

Output format: "g3: 1 research report ... [done]"
2026-01-30 13:35:35 +11:00
Dhanji R. Prasanna
5428504777 Fix input formatting bugs: newline, line wrapping, and TTY check
Fixes three bugs in the input formatter introduced in 4e16942:

1. Bug 2 & 3 (missing newline, line duplication):
   - Changed print! to println! to add trailing newline
   - Calculate visual lines based on terminal width instead of
     logical line count, fixing duplication for wrapped lines

2. Bug 1 (^M on non-interactive prompts):
   - Added TTY check to skip formatting when stdout is not a terminal
   - Prevents terminal state corruption for stdin prompts
2026-01-30 13:28:31 +11:00
Dhanji R. Prasanna
b252ff443d Merge sessions/interactive/9681cb67 2026-01-30 13:01:00 +11:00
Dhanji R. Prasanna
5ab1598e03 feat: async research tool - runs in background, returns immediately
The research tool now spawns the scout agent in a background tokio task
and returns immediately with a research_id placeholder. This allows the
agent to continue working while research runs (30-120 seconds).

Key changes:
- New PendingResearchManager for tracking async research tasks
- research tool returns immediately with placeholder containing research_id
- research_status tool to check progress of pending research
- Auto-injection of completed research at natural break points:
  - Start of each tool iteration (before LLM call)
  - Before prompting user in interactive mode
- /research CLI command to list all research tasks
- Updated system prompt to explain async behavior

The agent can:
- Continue with other work while research runs
- Check status with research_status tool
- Yield turn to user if results are critical before continuing
2026-01-30 13:00:02 +11:00
Dhanji R. Prasanna
4e1694248f Add input formatting for interactive CLI
When users type prompts in interactive mode, the input is now
reformatted in place with enhanced highlighting:

- ALL CAPS words (2+ chars) become bold green (e.g., FIX, BUG, HTTP2)
- Quoted text ("..." or ...) becomes cyan
- Standard markdown formatting is also supported

New module: input_formatter.rs with 10 unit tests
Integrated into interactive.rs for both single-line and multiline input
2026-01-30 12:03:36 +11:00
Dhanji R. Prasanna
2e21502357 Fix --project flag not working in agent mode
- Add CommonFlags struct to group flags that apply across all modes
- Refactor run_agent_mode() to accept CommonFlags instead of individual params
- Add project loading logic for agent chat mode
- Add integration tests for --project with agent mode

This refactor prevents future bugs where new flags work in one mode
but are forgotten in another.
2026-01-30 11:28:48 +11:00
Dhanji R. Prasanna
51d22b3282 gemini model perf 2026-01-30 10:09:46 +11:00
Dhanji R. Prasanna
8191a5e8e6 feat(embedded): add GLM tool format adapter for code fence stripping
GLM-4 models wrap tool calls in markdown code fences and inline backticks,
which prevents the streaming parser from detecting them. This adapter:

- Strips ```json and ``` code fence markers during streaming
- Strips inline backticks from tool call JSON
- Handles chunked streaming correctly (buffers potential fence lines)
- Transforms GLM native format (<|assistant|>tool_name) to g3 JSON format

Also refactors embedded provider into module structure:
- embedded/mod.rs - module exports
- embedded/provider.rs - main EmbeddedProvider (moved from embedded.rs)
- embedded/adapters/mod.rs - ToolFormatAdapter trait
- embedded/adapters/glm.rs - GLM-specific adapter

Includes 22 unit tests covering edge cases like nested JSON in strings,
chunk boundary handling, and false pattern detection.

Updates README to show GLM-4 9B now works () for agentic tasks.
2026-01-29 12:52:09 +11:00
Dhanji R. Prasanna
457ba35f80 docs: Fix documentation accuracy and add missing Gemini provider
Corrections made:
- docs/architecture.md: Fix crate count from 9 to 8 (actual count)
- docs/tools.md: Fix code_search supported languages (kotlin -> haskell, scheme, racket)
- docs/CODE_SEARCH.md: Add missing Haskell and Scheme to supported languages list
- docs/providers.md: Add complete Gemini provider documentation section
- docs/configuration.md: Add Gemini configuration section

The Gemini provider (crates/g3-providers/src/gemini.rs) was fully implemented
but not documented. The code_search tool actually supports haskell and scheme
(via tree-sitter) but documentation incorrectly listed kotlin.

Agent: lamport
2026-01-29 12:06:53 +11:00
Dhanji R. Prasanna
f9e0b94cc1 tiny tweak 2026-01-29 12:02:11 +11:00
Dhanji R. Prasanna
853237e62e Update dependency analysis artifacts
Generated comprehensive static dependency analysis for g3 workspace:

- graph.json: 108 nodes (9 crates, 99 files), 186 edges
- graph.summary.md: Overview with metrics, entrypoints, fan-in/fan-out rankings
- sccs.md: No cycles detected (DAG structure confirmed)
- layers.observed.md: 4-layer crate hierarchy identified
- hotspots.md: ui_writer.rs (15 fan-in), agent_mode.rs (13 fan-out) as key nodes
- limitations.md: Documents extraction methodology and caveats

Updated AGENTS.md with artifact documentation table.

Agent: euler
2026-01-29 11:46:39 +11:00
Dhanji R. Prasanna
cba7d31996 Merge sessions/carmack/ee92b215 2026-01-29 11:40:48 +11:00
Dhanji R. Prasanna
d4941dc95a refactor(providers): improve readability of embedded.rs and gemini.rs
embedded.rs (937→789 lines, -16%):
- Extract duplicated inference setup into prepare_context() helper
- Extract stop sequence handling into find_stop_sequence() and truncate_at_stop_sequence()
- Add InferenceParams struct to consolidate request parameter extraction
- Add clear section markers for code organization
- Tests now use module-level format functions directly (no duplication)

gemini.rs:
- Extract common request building into build_request() method
- Reduces duplication between complete() and stream() methods

All 399 unit tests pass. Behavior unchanged.

Agent: carmack
2026-01-29 11:39:46 +11:00
Dhanji R. Prasanna
cb3c523edf Compact workspace memory: -7.5% size, all concepts preserved
Transformations applied:
- Fixed incorrect line numbers in Streaming Utilities (IterationState 65→166, StreamingState 17→16)
- Updated file sizes with verified byte counts (context_window.rs, streaming.rs, compaction.rs, acd.rs)
- Tightened verbose descriptions throughout
- Removed redundant "Format" column from Chat Template table
- Shortened download command (python3 -m huggingface_hub... → huggingface-cli)
- Collapsed "Known issues" log-style narrative in Embedded Provider
- Removed filler words and redundant explanations

Metrics: 224→212 lines (-5%), 12581→11630 chars (-7.5%)
All 26 semantic entries preserved.

Agent: huffman
2026-01-29 11:38:53 +11:00
Dhanji R. Prasanna
1bff9b0025 huffman tweak to cover more ground 2026-01-29 11:36:09 +11:00
Dhanji R. Prasanna
653c5f72ac Compact workspace memory: 402→224 lines (-44%), 22k→12.6k chars (-43%)
Merged duplicate entries:
- Context Window & Compaction + Context Compaction → unified section
- Streaming Markdown Formatter + Code Blocks → single entry
- CLI Argument Parsing + CLI Entry Points + CLI Module Structure → CLI Module Structure
- Auto-Memory Feature + Tool Call Tracking + Auto-Memory Reminder Format → Auto-Memory System
- Agent Mode folded into CLI Module Structure

Tightened verbose sections:
- UTF-8 pattern: removed 10-line code example, kept pattern + danger zones
- ACD Fragment Storage: replaced 15-line JSON with inline field list
- GLM-4 downloads: replaced 12-line bash with table + single download template

Entry count: 37 → 26 (-30%)
All char ranges, function names, and gotchas preserved.

Agent: huffman
2026-01-29 11:34:17 +11:00
Dhanji R. Prasanna
bd4473b75f model performance tweaks to readme 2026-01-29 11:31:29 +11:00
Dhanji R. Prasanna
1bff9d5dcc tiny tweaks to huffman 2026-01-29 11:31:17 +11:00
Dhanji R. Prasanna
7cf9c3b7bb Merge sessions/hopper/8e287188 2026-01-29 11:30:54 +11:00