Commit Graph

21 Commits

Author SHA1 Message Date
Dhanji R. Prasanna
7032e75fc6 Add write_envelope tool with verify_envelope for explicit envelope creation
- New crates/g3-core/src/tools/envelope.rs with execute_write_envelope()
  and verify_envelope() (moved from shadow_datalog_verify in plan.rs)
- write_envelope accepts YAML facts, writes envelope.yaml to session dir,
  then runs datalog verification against analysis/rulespec.yaml in shadow mode
- plan_verify() now only checks envelope existence (no longer runs datalog)
- Tool count: 13 -> 14
- Updated system prompt to instruct agents to call write_envelope before
  marking last plan item done
- Updated integration tests to use write_envelope tool directly

Workflow: write_envelope -> verify_envelope -> datalog shadow artifacts
          plan_write(done) -> plan_verify -> checks envelope exists
2026-02-06 16:09:07 +11:00
Dhanji R. Prasanna
f7a240a99b refactor: decouple rulespec from plan_write, read from analysis/rulespec.yaml
- Remove rulespec parameter from plan_write tool definition and execution
- Remove rulespec compilation from plan_approve (no longer pre-compiles)
- Remove write_rulespec, get_rulespec_path, format_rulespec_yaml/markdown
  from invariants.rs; read_rulespec() now takes &Path working dir
- Remove save/load_compiled_rulespec, get_compiled_rulespec_path from datalog.rs
- Update shadow_datalog_verify() to compile on-the-fly from
  analysis/rulespec.yaml, writing rulespec.compiled.dl and
  datalog_evaluation.txt to session dir
- Remove rulespec display from plan_read output
- Remove Invariants/Rulespec section from native.md system prompt
- Remove rulespec from prompts.rs plan_write format and examples
- Update existing tests to remove rulespec from plan_write calls
- Add 3 integration tests for on-the-fly rulespec verification
2026-02-06 15:31:23 +11:00
Dhanji R. Prasanna
f35807b728 refactor: move research tools to loadable toolset
Migrate research and research_status tools from core tools to a
dynamically loadable toolset, following the same pattern as webdriver.

Changes:
- Add 'research' toolset to TOOLSET_REGISTRY in toolsets.rs
- Add create_research_tools() function with research and research_status
- Remove research tools from create_core_tools() in tool_definitions.rs
- Remove exclude_research field and with_research_excluded() from ToolConfig
- Update tests: core tools now 13 (was 15), added 3 research toolset tests

The agent must now call load_toolset('research') to use research tools.
This simplifies the default tool set and removes special-case logic for
the scout agent (which simply won't load the research toolset).
2026-02-06 11:17:32 +11:00
Dhanji R. Prasanna
cbced3390c feat: JIT-injectable toolsets with load_toolset tool
Implement dynamic tool loading system that allows tools to be loaded
on-demand rather than included in the default set.

Key changes:
- Add toolsets module with registry of loadable toolsets
- Add load_toolset tool that returns tool definitions for a named toolset
- Add <available_toolsets> section to system prompt
- Track loaded toolsets in Agent, extend tool definitions dynamically
- Move webdriver (15 tools) to JIT-only loading

Benefits:
- Leaner default context (fewer tokens consumed)
- On-demand loading when agent needs specialized tools
- Extensible registry for future toolsets
- Idempotent loading with helpful error messages

Files:
- crates/g3-core/src/toolsets.rs (new)
- crates/g3-core/src/tools/toolsets.rs (new)
- crates/g3-core/src/tool_definitions.rs
- crates/g3-core/src/tool_dispatch.rs
- crates/g3-core/src/prompts.rs
- crates/g3-core/src/lib.rs
- crates/g3-core/src/tools/executor.rs
2026-02-06 09:35:11 +11:00
Dhanji R. Prasanna
ff15db44c0 Restore research as first-class tool, remove research skill
Restores the research tool that was previously externalized as a skill:

- Add pending_research.rs: PendingResearchManager with thread-safe task tracking
- Add tools/research.rs: execute_research (async), execute_research_status
- Add research/research_status tool definitions with exclude_research config
- Integrate PendingResearchManager into Agent and ToolContext
- Inject completed research results in streaming loop

Remove research skill:
- Clear EMBEDDED_SKILLS array in embedded.rs
- Delete skills/research/ directory
- Update all tests expecting embedded research skill
- Update docs and memory to reflect the change

The research tool now:
- Spawns scout agent in background tokio task
- Returns immediately with research_id
- Automatically injects results into conversation when ready
- Supports status checks via research_status tool
2026-02-06 07:38:06 +11:00
Dhanji R. Prasanna
7e2d9bc22c Enforce rulespec creation with plan_write for new plans
Solves the tautology problem where the LLM would write invariants after
implementation, making them match what was done rather than constrain it.

Changes:
- plan_write now accepts 'rulespec' parameter
- New plans REQUIRE rulespec (fails with helpful error if missing)
- Plan updates don't require rulespec (backward compatible)
- Rulespec is parsed, validated, and written atomically with plan
- Updated system prompt with clear examples for new vs update
- Updated tool definition schema
- Updated all affected tests

New flow: task → plan+rulespec → user reviews BOTH → approve → implement
2026-02-05 21:12:02 +11:00
Dhanji R. Prasanna
39e586982c feat: Externalize research tool as embedded skill
Replaces the built-in research/research_status tools with a portable
skill-based approach:

- Add embedded skills infrastructure (skills compiled into binary)
- Add repo-local skills/ directory support (highest priority)
- Create research skill with SKILL.md and g3-research shell script
- Script extraction to .g3/bin/ with version tracking
- Filesystem-based handoff via .g3/research/<id>/status.json
- Remove PendingResearchManager and all research tool code
- Update system prompt to reference skill instead of tool

Benefits:
- No special tool infrastructure needed (just shell + read_file)
- Context-efficient (reports stay on disk until needed)
- Crash-resilient (state persisted to filesystem)
- Portable (skill can be overridden per-workspace)

Breaking change: research tool calls now return a deprecation message
pointing to the research skill.
2026-02-05 13:23:26 +11:00
Dhanji R. Prasanna
8bbaf6f02e Tighten system prompt and tool definitions
Prompt changes (native.md):
- Remove duplicate 'Temporary files' section
- Consolidate 'remember' instructions into single authoritative location
- Remove motivational 'Benefits' list from Plan Mode
- Add 'Code Search Tool Selection' guidance (code_search vs rg)

Tool changes (tool_definitions.rs, tool_dispatch.rs):
- Remove screenshot tool (webdriver_screenshot remains)
- Remove coverage tool
- Reduce plan_write description from 22 lines to 1 line
- Update tool count tests (16 -> 14 core tools)

Net result: ~6 lines removed from prompt, ~56 lines removed from
tool definitions, clearer tool selection guidance added.
2026-02-05 12:36:49 +11:00
Dhanji R. Prasanna
1f1a517620 feat(plan): support multiple negative and boundary checks
Change Plan Mode to allow multiple negative and boundary checks per item,
while keeping happy path as a single check.

Schema change:
- checks.negative: Check -> Vec<Check> (>=1 required)
- checks.boundary: Check -> Vec<Check> (>=1 required)
- checks.happy: Check (unchanged, single)

This better reflects real-world tasks where there are often multiple
error conditions and edge cases worth tracking.

Changes:
- Update Checks struct to use Vec<Check> for negative/boundary
- Update validation to require at least 1 of each
- Update prompts and tool definitions with new array syntax
- Add 4 new tests for multi-check scenarios
2026-02-05 11:36:45 +11:00
Dhanji R. Prasanna
a63950d8f5 Add Plan Mode to replace TODO system
Plan Mode is a cognitive forcing system that requires reasoning about:
- Happy path
- Negative case
- Boundary condition

New tools:
- plan_read: Read current plan for session
- plan_write: Create/update plan with YAML content (validates structure)
- plan_approve: Mark current revision as approved

New command:
- /feature <description>: Start Plan Mode for a new feature

Plan schema requires:
- plan_id, revision, approved_revision
- items with id, description, state, touches, checks (happy/negative/boundary)
- evidence and notes required when marking items done

Verification:
- plan_verify() called automatically when all items are done/blocked

Removed:
- todo_read, todo_write tools
- todo.rs module and related tests
2026-02-02 14:38:25 +11:00
Dhanji R. Prasanna
5ab1598e03 feat: async research tool - runs in background, returns immediately
The research tool now spawns the scout agent in a background tokio task
and returns immediately with a research_id placeholder. This allows the
agent to continue working while research runs (30-120 seconds).

Key changes:
- New PendingResearchManager for tracking async research tasks
- research tool returns immediately with placeholder containing research_id
- research_status tool to check progress of pending research
- Auto-injection of completed research at natural break points:
  - Start of each tool iteration (before LLM call)
  - Before prompting user in interactive mode
- /research CLI command to list all research tasks
- Updated system prompt to explain async behavior

The agent can:
- Continue with other work while research runs
- Check status with research_status tool
- Yield turn to user if results are critical before continuing
2026-01-30 13:00:02 +11:00
Dhanji R. Prasanna
a34a3b08e9 Rename Project Memory to Workspace Memory
Rename all references from "Project Memory" to "Workspace Memory" to avoid
future conflation if a "project" concept is introduced later.

Changes:
- Rename read_project_memory() -> read_workspace_memory()
- Update all prompts, tool descriptions, and comments
- Update header parsing in memory.rs to use "# Workspace Memory"
- Update display detection for "=== Workspace Memory ==="
- Update documentation and analysis/memory.md

11 files changed, ~36 occurrences updated.
2026-01-21 14:08:42 +05:30
Dhanji R. Prasanna
9ef064a041 Add guidance to shell tool description to avoid unnecessary cd prefixes
LLMs were prefixing shell commands with `cd <workspace> &&` unnecessarily,
wasting tokens and cluttering CLI display. Added clear guidance in the
shell tool description that commands already execute in the working directory.
2026-01-14 19:00:53 +05:30
Dhanji R. Prasanna
dea0e6b1ca Compact tool output improvements
- Rename take_screenshot -> screenshot, code_coverage -> coverage (shorter names)
- Align | character across all compact tools (pad to 11 chars for str_replace)
- Make code_search a compact tool with summary display
- Show language and search name in code_search output (e.g., rust:"find structs")
- Add format_code_search_summary() to extract match/file counts from JSON response
2026-01-14 08:12:50 +05:30
Dhanji R. Prasanna
151b8c4658 Add Racket tree-sitter support, remove Kotlin
- Add tree-sitter-racket dependency (v0.24)
- Initialize Racket parser in code search
- Add .rkt, .rktl, .rktd file extensions
- Add test_racket_search test
- Remove Kotlin from supported languages (was disabled)
- Clean up duplicate test files

Supported languages: Rust, Python, JavaScript, TypeScript, Go, Java, C, C++, Racket
2026-01-13 18:44:59 +05:30
Dhanji R. Prasanna
f415dbb84b Fix ACD turn summary loss and add /dump command
ACD (Aggressive Context Dehydration) fixes:
- Fixed dehydrate_context() to extract turn summary from context window
  instead of using the passed-in final_response (which contained only
  the timing footer, not the actual LLM response)
- Removed final_response parameter from dehydrate_context() since it
  now self-extracts the last assistant message as the summary
- This ensures the actual turn summary is preserved after dehydration,
  not just the timing footer

New /dump command:
- Added /dump command to dump entire context window to tmp/ for debugging
- Shows message index, role, kind, content length, and full content
- Available in both console and machine modes

UTF-8 safety:
- Fixed truncate_to_word_boundary() to use character indices instead of
  byte indices, preventing panics on multi-byte UTF-8 characters
- Added UTF-8 string slicing guidance to AGENTS.md

Agent: g3
2026-01-12 05:13:02 +05:30
Dhanji R. Prasanna
1090e30d6c Simplify system prompt: remove coding style and parallel tool call sections
- Remove IMPORTANT FOR CODING section (~1,500 chars of coding guidelines)
- Remove <use_parallel_tool_calls> block (~500 chars)
- Remove unused const_format dependency from g3-core
- Simplify get_system_prompt_for_native() to just return base prompt
- Response Guidelines now cleanly ends the static prompt

Prompt reduced from ~8,500 to ~6,500 characters.
2026-01-11 06:35:18 +08:00
Dhanji R. Prasanna
33e5705fc3 Add research tool for web-based research via scout agent
New tool that spawns a scout agent to perform web research and return
a structured research brief. The scout agent uses webdriver to browse
the web and returns a decision-ready report.

Changes:
- Added 'research' tool definition (12 core tools total)
- Added research tool dispatch in tool_dispatch.rs
- Created tools/research.rs implementation:
  - Spawns 'g3 --agent scout <query>' as subprocess
  - Captures stdout and extracts last line (report file path)
  - Reads and returns the report file contents
- Added exclude_research flag to ToolConfig
- Scout agent (agent_name == 'scout') does NOT have access to research
  tool to prevent infinite recursion
- Updated system prompts to describe when to use research tool
- Added scout.md agent prompt with research brief output contract

The research tool is preferred for complex research tasks (APIs, SDKs,
libraries, approaches, bugs). WebDriver can still be used directly for
simple lookups or fine-grained control.
2026-01-09 15:59:19 +11:00
Dhanji R. Prasanna
777191b3cb Remove final_output tool - let summaries stream naturally
- Remove final_output from tool definitions, dispatch, and misc tools
- Update system prompts to request summaries as regular markdown text
- Remove print_final_output from UiWriter trait and all implementations
- Remove final_output handling from agent core logic
- Rename final_output_summary → summary in session continuation
- Delete final_output test files
- Update tool count tests (12→11, 27→26)

This allows LLM summaries to stream through the markdown formatter
for a more natural, responsive user experience instead of buffering
everything into a tool call.
2026-01-09 14:57:24 +11:00
Dhanji R. Prasanna
386176899e Remove vision tools (except take_screenshot) and macax tools
Vision tools removed:
- extract_text (OCR from image files)
- extract_text_with_boxes (OCR with bounding boxes)
- vision_find_text (find text in app windows)
- vision_click_text (find and click on text)
- vision_click_near_text (click near text labels)

macax tools removed:
- macax_list_apps
- macax_get_frontmost_app
- macax_activate_app
- macax_press_key
- macax_type_text

The LLM can now read images directly via read_image tool.
take_screenshot is retained for capturing application windows.

Files deleted:
- crates/g3-core/src/tools/vision.rs
- crates/g3-core/src/tools/macax.rs
- docs/macax-tools.md

Updated tool counts: 12 core + 15 webdriver = 27 total
2026-01-03 17:38:25 +11:00
Dhanji R. Prasanna
4c25e43ee4 refactoring 2025-12-26 15:16:12 +11:00