alex/g3 - g3 - Millerson GIT hosting

alex/g3

Author	SHA1	Message	Date
Dhanji R. Prasanna	7032e75fc6	Add write_envelope tool with verify_envelope for explicit envelope creation - New crates/g3-core/src/tools/envelope.rs with execute_write_envelope() and verify_envelope() (moved from shadow_datalog_verify in plan.rs) - write_envelope accepts YAML facts, writes envelope.yaml to session dir, then runs datalog verification against analysis/rulespec.yaml in shadow mode - plan_verify() now only checks envelope existence (no longer runs datalog) - Tool count: 13 -> 14 - Updated system prompt to instruct agents to call write_envelope before marking last plan item done - Updated integration tests to use write_envelope tool directly Workflow: write_envelope -> verify_envelope -> datalog shadow artifacts plan_write(done) -> plan_verify -> checks envelope exists	2026-02-06 16:09:07 +11:00
Dhanji R. Prasanna	f7a240a99b	refactor: decouple rulespec from plan_write, read from analysis/rulespec.yaml - Remove rulespec parameter from plan_write tool definition and execution - Remove rulespec compilation from plan_approve (no longer pre-compiles) - Remove write_rulespec, get_rulespec_path, format_rulespec_yaml/markdown from invariants.rs; read_rulespec() now takes &Path working dir - Remove save/load_compiled_rulespec, get_compiled_rulespec_path from datalog.rs - Update shadow_datalog_verify() to compile on-the-fly from analysis/rulespec.yaml, writing rulespec.compiled.dl and datalog_evaluation.txt to session dir - Remove rulespec display from plan_read output - Remove Invariants/Rulespec section from native.md system prompt - Remove rulespec from prompts.rs plan_write format and examples - Update existing tests to remove rulespec from plan_write calls - Add 3 integration tests for on-the-fly rulespec verification	2026-02-06 15:31:23 +11:00
Dhanji R. Prasanna	f35807b728	refactor: move research tools to loadable toolset Migrate research and research_status tools from core tools to a dynamically loadable toolset, following the same pattern as webdriver. Changes: - Add 'research' toolset to TOOLSET_REGISTRY in toolsets.rs - Add create_research_tools() function with research and research_status - Remove research tools from create_core_tools() in tool_definitions.rs - Remove exclude_research field and with_research_excluded() from ToolConfig - Update tests: core tools now 13 (was 15), added 3 research toolset tests The agent must now call load_toolset('research') to use research tools. This simplifies the default tool set and removes special-case logic for the scout agent (which simply won't load the research toolset).	2026-02-06 11:17:32 +11:00
Dhanji R. Prasanna	cbced3390c	feat: JIT-injectable toolsets with load_toolset tool Implement dynamic tool loading system that allows tools to be loaded on-demand rather than included in the default set. Key changes: - Add toolsets module with registry of loadable toolsets - Add load_toolset tool that returns tool definitions for a named toolset - Add <available_toolsets> section to system prompt - Track loaded toolsets in Agent, extend tool definitions dynamically - Move webdriver (15 tools) to JIT-only loading Benefits: - Leaner default context (fewer tokens consumed) - On-demand loading when agent needs specialized tools - Extensible registry for future toolsets - Idempotent loading with helpful error messages Files: - crates/g3-core/src/toolsets.rs (new) - crates/g3-core/src/tools/toolsets.rs (new) - crates/g3-core/src/tool_definitions.rs - crates/g3-core/src/tool_dispatch.rs - crates/g3-core/src/prompts.rs - crates/g3-core/src/lib.rs - crates/g3-core/src/tools/executor.rs	2026-02-06 09:35:11 +11:00
Dhanji R. Prasanna	ff15db44c0	Restore research as first-class tool, remove research skill Restores the research tool that was previously externalized as a skill: - Add pending_research.rs: PendingResearchManager with thread-safe task tracking - Add tools/research.rs: execute_research (async), execute_research_status - Add research/research_status tool definitions with exclude_research config - Integrate PendingResearchManager into Agent and ToolContext - Inject completed research results in streaming loop Remove research skill: - Clear EMBEDDED_SKILLS array in embedded.rs - Delete skills/research/ directory - Update all tests expecting embedded research skill - Update docs and memory to reflect the change The research tool now: - Spawns scout agent in background tokio task - Returns immediately with research_id - Automatically injects results into conversation when ready - Supports status checks via research_status tool	2026-02-06 07:38:06 +11:00
Dhanji R. Prasanna	7e2d9bc22c	Enforce rulespec creation with plan_write for new plans Solves the tautology problem where the LLM would write invariants after implementation, making them match what was done rather than constrain it. Changes: - plan_write now accepts 'rulespec' parameter - New plans REQUIRE rulespec (fails with helpful error if missing) - Plan updates don't require rulespec (backward compatible) - Rulespec is parsed, validated, and written atomically with plan - Updated system prompt with clear examples for new vs update - Updated tool definition schema - Updated all affected tests New flow: task → plan+rulespec → user reviews BOTH → approve → implement	2026-02-05 21:12:02 +11:00
Dhanji R. Prasanna	39e586982c	feat: Externalize research tool as embedded skill Replaces the built-in research/research_status tools with a portable skill-based approach: - Add embedded skills infrastructure (skills compiled into binary) - Add repo-local skills/ directory support (highest priority) - Create research skill with SKILL.md and g3-research shell script - Script extraction to .g3/bin/ with version tracking - Filesystem-based handoff via .g3/research/<id>/status.json - Remove PendingResearchManager and all research tool code - Update system prompt to reference skill instead of tool Benefits: - No special tool infrastructure needed (just shell + read_file) - Context-efficient (reports stay on disk until needed) - Crash-resilient (state persisted to filesystem) - Portable (skill can be overridden per-workspace) Breaking change: research tool calls now return a deprecation message pointing to the research skill.	2026-02-05 13:23:26 +11:00
Dhanji R. Prasanna	8bbaf6f02e	Tighten system prompt and tool definitions Prompt changes (native.md): - Remove duplicate 'Temporary files' section - Consolidate 'remember' instructions into single authoritative location - Remove motivational 'Benefits' list from Plan Mode - Add 'Code Search Tool Selection' guidance (code_search vs rg) Tool changes (tool_definitions.rs, tool_dispatch.rs): - Remove screenshot tool (webdriver_screenshot remains) - Remove coverage tool - Reduce plan_write description from 22 lines to 1 line - Update tool count tests (16 -> 14 core tools) Net result: ~6 lines removed from prompt, ~56 lines removed from tool definitions, clearer tool selection guidance added.	2026-02-05 12:36:49 +11:00
Dhanji R. Prasanna	1f1a517620	feat(plan): support multiple negative and boundary checks Change Plan Mode to allow multiple negative and boundary checks per item, while keeping happy path as a single check. Schema change: - checks.negative: Check -> Vec<Check> (>=1 required) - checks.boundary: Check -> Vec<Check> (>=1 required) - checks.happy: Check (unchanged, single) This better reflects real-world tasks where there are often multiple error conditions and edge cases worth tracking. Changes: - Update Checks struct to use Vec<Check> for negative/boundary - Update validation to require at least 1 of each - Update prompts and tool definitions with new array syntax - Add 4 new tests for multi-check scenarios	2026-02-05 11:36:45 +11:00
Dhanji R. Prasanna	a63950d8f5	Add Plan Mode to replace TODO system Plan Mode is a cognitive forcing system that requires reasoning about: - Happy path - Negative case - Boundary condition New tools: - plan_read: Read current plan for session - plan_write: Create/update plan with YAML content (validates structure) - plan_approve: Mark current revision as approved New command: - /feature <description>: Start Plan Mode for a new feature Plan schema requires: - plan_id, revision, approved_revision - items with id, description, state, touches, checks (happy/negative/boundary) - evidence and notes required when marking items done Verification: - plan_verify() called automatically when all items are done/blocked Removed: - todo_read, todo_write tools - todo.rs module and related tests	2026-02-02 14:38:25 +11:00
Dhanji R. Prasanna	5ab1598e03	feat: async research tool - runs in background, returns immediately The research tool now spawns the scout agent in a background tokio task and returns immediately with a research_id placeholder. This allows the agent to continue working while research runs (30-120 seconds). Key changes: - New PendingResearchManager for tracking async research tasks - research tool returns immediately with placeholder containing research_id - research_status tool to check progress of pending research - Auto-injection of completed research at natural break points: - Start of each tool iteration (before LLM call) - Before prompting user in interactive mode - /research CLI command to list all research tasks - Updated system prompt to explain async behavior The agent can: - Continue with other work while research runs - Check status with research_status tool - Yield turn to user if results are critical before continuing	2026-01-30 13:00:02 +11:00
Dhanji R. Prasanna	a34a3b08e9	Rename Project Memory to Workspace Memory Rename all references from "Project Memory" to "Workspace Memory" to avoid future conflation if a "project" concept is introduced later. Changes: - Rename read_project_memory() -> read_workspace_memory() - Update all prompts, tool descriptions, and comments - Update header parsing in memory.rs to use "# Workspace Memory" - Update display detection for "=== Workspace Memory ===" - Update documentation and analysis/memory.md 11 files changed, ~36 occurrences updated.	2026-01-21 14:08:42 +05:30
Dhanji R. Prasanna	9ef064a041	Add guidance to shell tool description to avoid unnecessary cd prefixes LLMs were prefixing shell commands with `cd <workspace> &&` unnecessarily, wasting tokens and cluttering CLI display. Added clear guidance in the shell tool description that commands already execute in the working directory.	2026-01-14 19:00:53 +05:30
Dhanji R. Prasanna	dea0e6b1ca	Compact tool output improvements - Rename take_screenshot -> screenshot, code_coverage -> coverage (shorter names) - Align \| character across all compact tools (pad to 11 chars for str_replace) - Make code_search a compact tool with summary display - Show language and search name in code_search output (e.g., rust:"find structs") - Add format_code_search_summary() to extract match/file counts from JSON response	2026-01-14 08:12:50 +05:30
Dhanji R. Prasanna	151b8c4658	Add Racket tree-sitter support, remove Kotlin - Add tree-sitter-racket dependency (v0.24) - Initialize Racket parser in code search - Add .rkt, .rktl, .rktd file extensions - Add test_racket_search test - Remove Kotlin from supported languages (was disabled) - Clean up duplicate test files Supported languages: Rust, Python, JavaScript, TypeScript, Go, Java, C, C++, Racket	2026-01-13 18:44:59 +05:30
Dhanji R. Prasanna	f415dbb84b	Fix ACD turn summary loss and add /dump command ACD (Aggressive Context Dehydration) fixes: - Fixed dehydrate_context() to extract turn summary from context window instead of using the passed-in final_response (which contained only the timing footer, not the actual LLM response) - Removed final_response parameter from dehydrate_context() since it now self-extracts the last assistant message as the summary - This ensures the actual turn summary is preserved after dehydration, not just the timing footer New /dump command: - Added /dump command to dump entire context window to tmp/ for debugging - Shows message index, role, kind, content length, and full content - Available in both console and machine modes UTF-8 safety: - Fixed truncate_to_word_boundary() to use character indices instead of byte indices, preventing panics on multi-byte UTF-8 characters - Added UTF-8 string slicing guidance to AGENTS.md Agent: g3	2026-01-12 05:13:02 +05:30
Dhanji R. Prasanna	1090e30d6c	Simplify system prompt: remove coding style and parallel tool call sections - Remove IMPORTANT FOR CODING section (~1,500 chars of coding guidelines) - Remove <use_parallel_tool_calls> block (~500 chars) - Remove unused const_format dependency from g3-core - Simplify get_system_prompt_for_native() to just return base prompt - Response Guidelines now cleanly ends the static prompt Prompt reduced from ~8,500 to ~6,500 characters.	2026-01-11 06:35:18 +08:00
Dhanji R. Prasanna	33e5705fc3	Add research tool for web-based research via scout agent New tool that spawns a scout agent to perform web research and return a structured research brief. The scout agent uses webdriver to browse the web and returns a decision-ready report. Changes: - Added 'research' tool definition (12 core tools total) - Added research tool dispatch in tool_dispatch.rs - Created tools/research.rs implementation: - Spawns 'g3 --agent scout <query>' as subprocess - Captures stdout and extracts last line (report file path) - Reads and returns the report file contents - Added exclude_research flag to ToolConfig - Scout agent (agent_name == 'scout') does NOT have access to research tool to prevent infinite recursion - Updated system prompts to describe when to use research tool - Added scout.md agent prompt with research brief output contract The research tool is preferred for complex research tasks (APIs, SDKs, libraries, approaches, bugs). WebDriver can still be used directly for simple lookups or fine-grained control.	2026-01-09 15:59:19 +11:00
Dhanji R. Prasanna	777191b3cb	Remove final_output tool - let summaries stream naturally - Remove final_output from tool definitions, dispatch, and misc tools - Update system prompts to request summaries as regular markdown text - Remove print_final_output from UiWriter trait and all implementations - Remove final_output handling from agent core logic - Rename final_output_summary → summary in session continuation - Delete final_output test files - Update tool count tests (12→11, 27→26) This allows LLM summaries to stream through the markdown formatter for a more natural, responsive user experience instead of buffering everything into a tool call.	2026-01-09 14:57:24 +11:00
Dhanji R. Prasanna	386176899e	Remove vision tools (except take_screenshot) and macax tools Vision tools removed: - extract_text (OCR from image files) - extract_text_with_boxes (OCR with bounding boxes) - vision_find_text (find text in app windows) - vision_click_text (find and click on text) - vision_click_near_text (click near text labels) macax tools removed: - macax_list_apps - macax_get_frontmost_app - macax_activate_app - macax_press_key - macax_type_text The LLM can now read images directly via read_image tool. take_screenshot is retained for capturing application windows. Files deleted: - crates/g3-core/src/tools/vision.rs - crates/g3-core/src/tools/macax.rs - docs/macax-tools.md Updated tool counts: 12 core + 15 webdriver = 27 total	2026-01-03 17:38:25 +11:00
Dhanji R. Prasanna	4c25e43ee4	refactoring	2025-12-26 15:16:12 +11:00

21 Commits