diff --git a/analysis/memory.md b/analysis/memory.md index 305022a..f0d542b 100644 --- a/analysis/memory.md +++ b/analysis/memory.md @@ -1,5 +1,5 @@ # Workspace Memory -> Updated: 2026-02-06T00:59:11Z | Size: 20.2k chars +> Updated: 2026-02-06T04:29:34Z | Size: 21.0k chars ### Remember Tool Wiring - `crates/g3-core/src/tools/memory.rs` [0..5000] - `execute_remember()`, `get_memory_path()`, `merge_memory()` @@ -363,4 +363,14 @@ Makes tool output responsive to terminal width - no line wrapping, with 4-char r **Datalog Flow**: 1. `plan_approve` → `compile_rulespec()` → saves `rulespec.compiled.json` -2. `plan_verify` → `shadow_datalog_verify()` → loads compiled + envelope → `extract_facts()` → `execute_rules()` → `eprint!()` (shadow mode) \ No newline at end of file +2. `plan_verify` → `shadow_datalog_verify()` → loads compiled + envelope → `extract_facts()` → `execute_rules()` → `eprint!()` (shadow mode) + +### Rulespec Changes (2026-02-06) +- Rulespec is no longer generated on-the-fly during `plan_write` — it's now read from `analysis/rulespec.yaml` (checked-in, hand-crafted) +- `read_rulespec()` in `invariants.rs` now takes `&Path` (working_dir) instead of `&str` (session_id) +- `write_rulespec()`, `get_rulespec_path()`, `format_rulespec_yaml()`, `format_rulespec_markdown()` removed from `invariants.rs` +- `save_compiled_rulespec()`, `load_compiled_rulespec()`, `get_compiled_rulespec_path()` removed from `datalog.rs` +- `shadow_datalog_verify()` now compiles rulespec on-the-fly at verify time, writes `rulespec.compiled.dl` and `datalog_evaluation.txt` to session dir +- `plan_write` tool no longer accepts `rulespec` parameter +- `plan_approve` no longer compiles rulespec +- `format_verification_results()` now takes `working_dir: Option<&Path>` as third parameter \ No newline at end of file diff --git a/crates/g3-core/src/prompts.rs b/crates/g3-core/src/prompts.rs index e41e0cf..e740060 100644 --- a/crates/g3-core/src/prompts.rs +++ b/crates/g3-core/src/prompts.rs @@ -58,9 +58,8 @@ Short description for providers without native calling specs: - Example: {\"tool\": \"plan_read\", \"args\": {}} - **plan_write**: Create or update the Plan with YAML content - - Format: {\"tool\": \"plan_write\", \"args\": {\"plan\": \"plan_id: my-plan\\nitems: [...]\", \"rulespec\": \"claims: [...]\\npredicates: [...]\"}} - - For NEW plans, rulespec is REQUIRED. For updates, it's optional. - - Example (new plan): {\"tool\": \"plan_write\", \"args\": {\"plan\": \"plan_id: feature-x\\nitems:\\n - id: I1\\n description: Add feature\\n state: todo\\n touches: [src/lib.rs]\\n checks:\\n happy: {desc: Works, target: lib}\\n negative:\\n - {desc: Errors, target: lib}\\n boundary:\\n - {desc: Edge, target: lib}\", \"rulespec\": \"claims:\\n - name: feature\\n selector: feature.done\\npredicates:\\n - claim: feature\\n rule: exists\\n source: task_prompt\"}} + - Format: {\"tool\": \"plan_write\", \"args\": {\"plan\": \"plan_id: my-plan\\nitems: [...]\"}} + - Example (new plan): {\"tool\": \"plan_write\", \"args\": {\"plan\": \"plan_id: feature-x\\nitems:\\n - id: I1\\n description: Add feature\\n state: todo\\n touches: [src/lib.rs]\\n checks:\\n happy: {desc: Works, target: lib}\\n negative:\\n - {desc: Errors, target: lib}\\n boundary:\\n - {desc: Edge, target: lib}\"}} - Example (update): {\"tool\": \"plan_write\", \"args\": {\"plan\": \"plan_id: feature-x\\nitems:\\n - id: I1\\n state: done\\n evidence: [src/lib.rs:42]\\n notes: Implemented\"}} - **plan_approve**: Approve the current plan revision (called by user) diff --git a/crates/g3-core/src/tool_definitions.rs b/crates/g3-core/src/tool_definitions.rs index 9f96fc8..ba6224b 100644 --- a/crates/g3-core/src/tool_definitions.rs +++ b/crates/g3-core/src/tool_definitions.rs @@ -192,17 +192,13 @@ fn create_core_tools() -> Vec { tools.push(Tool { name: "plan_write".to_string(), - description: "Create or update the Plan for this session. For NEW plans, you MUST provide both 'plan' and 'rulespec' arguments. The rulespec defines invariants (constraints that must/must not hold) extracted from the task and memory. For plan UPDATES, rulespec is optional.".to_string(), + description: "Create or update the Plan for this session. Provide the plan as YAML with plan_id, revision, and items array.".to_string(), input_schema: json!({ "type": "object", "properties": { "plan": { "type": "string", "description": "The plan as YAML. Must include plan_id and items array." - }, - "rulespec": { - "type": "string", - "description": "The rulespec as YAML with claims and predicates. REQUIRED for new plans, optional for updates. Defines invariants from task_prompt and memory." } }, "required": ["plan"] diff --git a/crates/g3-core/src/tools/datalog.rs b/crates/g3-core/src/tools/datalog.rs index 45f78dc..a7ee928 100644 --- a/crates/g3-core/src/tools/datalog.rs +++ b/crates/g3-core/src/tools/datalog.rs @@ -6,10 +6,10 @@ //! //! ## Architecture //! -//! 1. **Compilation Phase** (on plan_approve): +//! 1. **Compilation Phase** (on-the-fly at plan_verify): //! - Parse rulespec claims and predicates //! - Generate datafrog relations and rules -//! - Store compiled representation for later execution +//! - Rulespec is read from `analysis/rulespec.yaml` //! //! 2. **Execution Phase** (on plan_verify): //! - Extract facts from action envelope using selectors @@ -34,7 +34,6 @@ use datafrog::{Iteration, Relation}; use serde::{Deserialize, Serialize}; use serde_yaml::Value as YamlValue; use std::collections::{HashMap, HashSet}; -use std::path::PathBuf; use super::invariants::{ ActionEnvelope, InvariantSource, PredicateRule, Rulespec, Selector, @@ -42,7 +41,6 @@ use super::invariants::{ #[cfg(test)] use super::invariants::{Claim, Predicate}; -use crate::paths::get_session_logs_dir; // ============================================================================ // Compiled Datalog Representation @@ -537,33 +535,6 @@ fn evaluate_predicate_datalog( } // ============================================================================ -// Storage -// ============================================================================ - -/// Get the path to the compiled rulespec file for a session. -pub fn get_compiled_rulespec_path(session_id: &str) -> PathBuf { - get_session_logs_dir(session_id).join("rulespec.compiled.json") -} - -/// Save a compiled rulespec to disk. -pub fn save_compiled_rulespec(session_id: &str, compiled: &CompiledRulespec) -> Result<()> { - let path = get_compiled_rulespec_path(session_id); - let json = serde_json::to_string_pretty(compiled)?; - std::fs::write(&path, json)?; - Ok(()) -} - -/// Load a compiled rulespec from disk. -pub fn load_compiled_rulespec(session_id: &str) -> Result> { - let path = get_compiled_rulespec_path(session_id); - if !path.exists() { - return Ok(None); - } - let json = std::fs::read_to_string(&path)?; - let compiled: CompiledRulespec = serde_json::from_str(&json)?; - Ok(Some(compiled)) -} - // ============================================================================ // Formatting // ============================================================================ diff --git a/crates/g3-core/src/tools/invariants.rs b/crates/g3-core/src/tools/invariants.rs index 27baec6..107db03 100644 --- a/crates/g3-core/src/tools/invariants.rs +++ b/crates/g3-core/src/tools/invariants.rs @@ -4,15 +4,15 @@ //! - **Rulespec**: Machine-readable invariants with claims and predicates //! - **ActionEnvelope**: Evidence of work done (facts about completed work) //! -//! The rulespec is written as the penultimate step in a plan, and the -//! action envelope is written as the final step. Together they enable -//! verification that invariants extracted from the task prompt and -//! workspace memory are satisfied by the completed work. +//! The rulespec is checked into `analysis/rulespec.yaml` and read at +//! plan verification time. The action envelope is written per-session +//! and verified against the rulespec. use anyhow::{anyhow, Result}; use serde::{Deserialize, Serialize}; use serde_yaml::Value as YamlValue; use std::collections::HashMap; +use std::path::Path; use std::path::PathBuf; use crate::paths::get_session_logs_dir; @@ -685,19 +685,14 @@ fn yaml_to_display(value: &YamlValue) -> String { // File Storage // ============================================================================ -/// Get the path to the rulespec.yaml file for a session. -pub fn get_rulespec_path(session_id: &str) -> PathBuf { - get_session_logs_dir(session_id).join("rulespec.yaml") -} - /// Get the path to the envelope.yaml file for a session. pub fn get_envelope_path(session_id: &str) -> PathBuf { get_session_logs_dir(session_id).join("envelope.yaml") } -/// Read a rulespec from the session's rulespec.yaml file. -pub fn read_rulespec(session_id: &str) -> Result> { - let path = get_rulespec_path(session_id); +/// Read a rulespec from `analysis/rulespec.yaml` relative to the working directory. +pub fn read_rulespec(working_dir: &Path) -> Result> { + let path = working_dir.join("analysis").join("rulespec.yaml"); if !path.exists() { return Ok(None); } @@ -707,16 +702,6 @@ pub fn read_rulespec(session_id: &str) -> Result> { Ok(Some(rulespec)) } -/// Write a rulespec to the session's rulespec.yaml file. -pub fn write_rulespec(session_id: &str, rulespec: &Rulespec) -> Result<()> { - rulespec.validate()?; - - let path = get_rulespec_path(session_id); - let content = format_rulespec_yaml(rulespec); - std::fs::write(&path, content)?; - Ok(()) -} - /// Read an action envelope from the session's envelope.yaml file. pub fn read_envelope(session_id: &str) -> Result> { let path = get_envelope_path(session_id); @@ -737,19 +722,6 @@ pub fn write_envelope(session_id: &str, envelope: &ActionEnvelope) -> Result<()> Ok(()) } -/// Format a rulespec as pretty YAML with comments. -fn format_rulespec_yaml(rulespec: &Rulespec) -> String { - let mut output = String::new(); - output.push_str("# Rulespec - Machine-readable invariants\n"); - output.push_str("# Generated by g3 Plan Mode\n\n"); - - let yaml = serde_yaml::to_string(rulespec) - .unwrap_or_else(|_| "# Error serializing rulespec".to_string()); - output.push_str(&yaml); - - output -} - /// Format an action envelope as pretty YAML with comments. fn format_envelope_yaml(envelope: &ActionEnvelope) -> String { let mut output = String::new(); @@ -903,77 +875,6 @@ pub fn format_evaluation_results(eval: &RulespecEvaluation) -> String { output } -/// Format a rulespec as human-readable markdown. -/// -/// This produces a rich, readable format suitable for tool output, -/// not raw YAML. -pub fn format_rulespec_markdown(rulespec: &Rulespec) -> String { - let mut output = String::new(); - - output.push_str("\n"); - output.push_str("### Invariants (Rulespec)\n\n"); - - if rulespec.claims.is_empty() && rulespec.predicates.is_empty() { - output.push_str("_No invariants defined._\n"); - return output; - } - - // Group predicates by source - let task_predicates: Vec<_> = rulespec.predicates.iter() - .filter(|p| p.source == InvariantSource::TaskPrompt) - .collect(); - let memory_predicates: Vec<_> = rulespec.predicates.iter() - .filter(|p| p.source == InvariantSource::Memory) - .collect(); - - // Build claim lookup for selector display - let claims: std::collections::HashMap<&str, &Claim> = rulespec.claims.iter() - .map(|c| (c.name.as_str(), c)) - .collect(); - - // Format predicates from task prompt - if !task_predicates.is_empty() { - output.push_str("**From Task:**\n"); - for pred in &task_predicates { - format_predicate_markdown(&mut output, pred, &claims); - } - output.push_str("\n"); - } - - // Format predicates from memory - if !memory_predicates.is_empty() { - output.push_str("**From Memory:**\n"); - for pred in &memory_predicates { - format_predicate_markdown(&mut output, pred, &claims); - } - output.push_str("\n"); - } - - output -} - -/// Format a single predicate as a markdown list item. -fn format_predicate_markdown( - output: &mut String, - pred: &Predicate, - claims: &std::collections::HashMap<&str, &Claim>, -) { - let selector = claims.get(pred.claim.as_str()) - .map(|c| c.selector.as_str()) - .unwrap_or(&pred.claim); - - let value_str = match &pred.value { - Some(v) => format!(" `{}`", yaml_to_display(v)), - None => String::new(), - }; - - output.push_str(&format!("- `{}` **{}**{}\n", selector, pred.rule, value_str)); - - if let Some(notes) = &pred.notes { - output.push_str(&format!(" - _{}_\n", notes)); - } -} - /// Format an action envelope as human-readable markdown. /// /// This produces a rich, readable format suitable for tool output, @@ -1416,57 +1317,7 @@ mod tests { // ======================================================================== // Format Rulespec Markdown Tests - // ======================================================================== - - #[test] - fn test_format_rulespec_markdown_empty() { - let rulespec = Rulespec::new(); - let output = format_rulespec_markdown(&rulespec); - - assert!(output.contains("### Invariants (Rulespec)")); - assert!(output.contains("_No invariants defined._")); - } - - #[test] - fn test_format_rulespec_markdown_with_predicates() { - let mut rulespec = Rulespec::new(); - rulespec.add_claim(Claim::new("caps", "csv_importer.capabilities")); - rulespec.add_predicate( - Predicate::new("caps", PredicateRule::Contains, InvariantSource::TaskPrompt) - .with_value(YamlValue::String("handle_tsv".to_string())) - .with_notes("User requested TSV support") - ); - rulespec.add_predicate( - Predicate::new("caps", PredicateRule::Exists, InvariantSource::Memory) - ); - - let output = format_rulespec_markdown(&rulespec); - - assert!(output.contains("### Invariants (Rulespec)")); - assert!(output.contains("**From Task:**")); - assert!(output.contains("**From Memory:**")); - assert!(output.contains("`csv_importer.capabilities`")); - assert!(output.contains("**contains**")); - assert!(output.contains("`handle_tsv`")); - assert!(output.contains("_User requested TSV support_")); - assert!(output.contains("**exists**")); - } - - #[test] - fn test_format_rulespec_markdown_task_only() { - let mut rulespec = Rulespec::new(); - rulespec.add_claim(Claim::new("test", "foo.bar")); - rulespec.add_predicate( - Predicate::new("test", PredicateRule::Exists, InvariantSource::TaskPrompt) - ); - - let output = format_rulespec_markdown(&rulespec); - - assert!(output.contains("**From Task:**")); - assert!(!output.contains("**From Memory:**")); - } - - // ======================================================================== + // ======================================================================== // Format Envelope Markdown Tests // ======================================================================== diff --git a/crates/g3-core/src/tools/plan.rs b/crates/g3-core/src/tools/plan.rs index d753dbd..4a50472 100644 --- a/crates/g3-core/src/tools/plan.rs +++ b/crates/g3-core/src/tools/plan.rs @@ -20,9 +20,10 @@ use crate::ToolCall; use super::executor::ToolContext; -use super::invariants::{format_envelope_markdown, format_rulespec_markdown, get_envelope_path, get_rulespec_path, read_envelope, read_rulespec, write_rulespec, Rulespec}; -use super::datalog::{compile_rulespec, save_compiled_rulespec, format_datalog_results}; -use super::datalog::{load_compiled_rulespec, extract_facts, execute_rules}; +use std::path::Path; +use super::invariants::{format_envelope_markdown, get_envelope_path, read_envelope, read_rulespec}; +use super::datalog::{compile_rulespec, format_datalog_results}; +use super::datalog::{extract_facts, execute_rules}; // ============================================================================ // Plan Schema @@ -713,22 +714,31 @@ pub fn plan_verify(plan: &Plan, working_dir: Option<&str>) -> PlanVerification { /// Shadow datalog verification - runs datalog rules and writes to evaluation file. /// This is for dry-run/shadow testing - results are written to /// `.g3/sessions//datalog_evaluation.txt`, NOT injected into context window. -fn shadow_datalog_verify(session_id: &str) { - // Load compiled rulespec - let compiled = match load_compiled_rulespec(session_id) { - Ok(Some(c)) => c, +fn shadow_datalog_verify(session_id: &str, working_dir: &Path) { + // Read rulespec from analysis/rulespec.yaml + let rulespec = match read_rulespec(working_dir) { + Ok(Some(rs)) => rs, Ok(None) => { - eprintln!("\n⚠️ [SHADOW] No compiled rulespec found - skipping datalog verification"); + eprintln!("\nℹ️ No analysis/rulespec.yaml found - skipping datalog verification"); return; } Err(e) => { - eprintln!("\n⚠️ [SHADOW] Failed to load compiled rulespec: {}", e); + eprintln!("\n⚠️ Failed to read analysis/rulespec.yaml: {}", e); + return; + } + }; + + // Compile rulespec on-the-fly + let compiled = match compile_rulespec(&rulespec, "plan-verify", 0) { + Ok(c) => c, + Err(e) => { + eprintln!("\n⚠️ Failed to compile rulespec: {}", e); return; } }; if compiled.is_empty() { - eprintln!("\n⚠️ [SHADOW] Compiled rulespec has no predicates - skipping datalog verification"); + eprintln!("\nℹ️ Rulespec has no predicates - skipping datalog verification"); return; } @@ -736,11 +746,11 @@ fn shadow_datalog_verify(session_id: &str) { let envelope = match read_envelope(session_id) { Ok(Some(e)) => e, Ok(None) => { - eprintln!("\n⚠️ [SHADOW] No envelope found - skipping datalog verification"); + eprintln!("\n⚠️ No envelope found - skipping datalog verification"); return; } Err(e) => { - eprintln!("\n⚠️ [SHADOW] Failed to load envelope: {}", e); + eprintln!("\n⚠️ Failed to load envelope: {}", e); return; } }; @@ -754,11 +764,21 @@ fn shadow_datalog_verify(session_id: &str) { // Format results let output = format_datalog_results(&result); - // Write to evaluation file (shadow mode - not in context window) - let eval_path = get_session_logs_dir(session_id).join("datalog_evaluation.txt"); + let session_dir = get_session_logs_dir(session_id); + + // Write compiled rules to .dl file + let dl_path = session_dir.join("rulespec.compiled.dl"); + let compiled_yaml = serde_yaml::to_string(&compiled).unwrap_or_default(); + if let Err(e) = std::fs::write(&dl_path, &compiled_yaml) { + eprintln!("⚠️ Failed to write compiled rules: {}", e); + } + + // Write evaluation report + let eval_path = session_dir.join("datalog_evaluation.txt"); match std::fs::write(&eval_path, &output) { Ok(_) => { - eprintln!("📊 Datalog evaluation written to: {}", eval_path.display()); + eprintln!("📊 Compiled rules: {}", dl_path.display()); + eprintln!("📊 Evaluation report: {}", eval_path.display()); } Err(e) => { eprintln!("⚠️ Failed to write datalog evaluation: {}", e); @@ -768,8 +788,8 @@ fn shadow_datalog_verify(session_id: &str) { /// Format verification results as a string for display. /// Uses loud formatting for warnings and errors. -/// If session_id is provided, also prints rulespec and envelope file locations. -pub fn format_verification_results(verification: &PlanVerification, session_id: Option<&str>) -> String { +/// If session_id is provided, also prints envelope file location and runs datalog verification. +pub fn format_verification_results(verification: &PlanVerification, session_id: Option<&str>, working_dir: Option<&Path>) -> String { let mut output = String::new(); let (warnings, errors) = verification.count_issues(); @@ -810,24 +830,22 @@ pub fn format_verification_results(verification: &PlanVerification, session_id: output.push_str("✅ VERIFICATION COMPLETE: All evidence validated\n"); } - // Print rulespec and envelope locations if session_id provided + // Print envelope location and run datalog verification if session_id provided if let Some(sid) = session_id { output.push_str("\n"); - output.push_str("📜 INVARIANTS\n"); - - let rulespec_path = get_rulespec_path(sid); + output.push_str("📜 ARTIFACTS\n"); + let envelope_path = get_envelope_path(sid); - - let rulespec_status = if rulespec_path.exists() { "✅" } else { "⚠️ (not found)" }; let envelope_status = if envelope_path.exists() { "✅" } else { "⚠️ (not found)" }; - - output.push_str(&format!(" {} Rulespec: {}\n", rulespec_status, rulespec_path.display())); output.push_str(&format!(" {} Envelope: {}\n", envelope_status, envelope_path.display())); - + output.push_str("\n"); // Shadow datalog verification - print to stderr, NOT included in tool output - shadow_datalog_verify(sid); + let effective_wd = working_dir + .map(|p| p.to_path_buf()) + .unwrap_or_else(|| std::env::current_dir().unwrap_or_default()); + shadow_datalog_verify(sid, &effective_wd); } output.push_str(&"═".repeat(60)); @@ -867,12 +885,6 @@ pub async fn execute_plan_read( yaml ); - // Append rulespec if present - match read_rulespec(session_id) { - Ok(Some(rulespec)) => output.push_str(&format_rulespec_markdown(&rulespec)), - _ => output.push_str("\n\n_No rulespec generated._\n"), - } - // Append envelope if present match read_envelope(session_id) { Ok(Some(envelope)) => output.push_str(&format_envelope_markdown(&envelope)), @@ -906,9 +918,6 @@ pub async fn execute_plan_write( None => return Ok("❌ Missing 'plan' argument. Provide the plan as YAML.".to_string()), }; - // Get optional rulespec content from args - let rulespec_yaml = tool_call.args.get("rulespec").and_then(|v| v.as_str()); - // Parse the YAML let mut plan: Plan = match serde_yaml::from_str(plan_yaml) { Ok(p) => p, @@ -917,44 +926,6 @@ pub async fn execute_plan_write( // Load existing plan to check if this is a new plan or an update let existing_plan = read_plan(session_id)?; - let is_new_plan = existing_plan.is_none(); - - // For NEW plans, rulespec is REQUIRED - // This prevents the tautology problem where invariants are written after implementation - if is_new_plan && rulespec_yaml.is_none() { - return Ok("❌ Missing 'rulespec' argument. New plans MUST include a rulespec with invariants.\n\n\ - The rulespec defines constraints that MUST or MUST NOT hold, extracted from:\n\ - - task_prompt: What the user explicitly requires\n\ - - memory: Persistent rules from workspace memory\n\n\ - Example rulespec:\n\ - ```yaml\n\ - claims:\n\ - - name: feature_capabilities\n\ - selector: \"feature.capabilities\"\n\ - predicates:\n\ - - claim: feature_capabilities\n\ - rule: contains\n\ - value: \"required_feature\"\n\ - source: task_prompt\n\ - notes: \"User explicitly requested this\"\n\ - ```".to_string()); - } - - // Parse and validate rulespec if provided - let rulespec: Option = if let Some(yaml) = rulespec_yaml { - match serde_yaml::from_str(yaml) { - Ok(r) => { - let rs: Rulespec = r; - if let Err(e) = rs.validate() { - return Ok(format!("❌ Invalid rulespec: {}", e)); - } - Some(rs) - } - Err(e) => return Ok(format!("❌ Invalid rulespec YAML: {}", e)), - } - } else { - None - }; if let Some(existing) = existing_plan { // Preserve approved_revision from existing plan @@ -992,25 +963,12 @@ pub async fn execute_plan_write( return Ok(format!("❌ Failed to write plan: {}", e)); } - // Write the rulespec if provided (atomically with plan) - if let Some(ref rs) = rulespec { - if let Err(e) = write_rulespec(session_id, rs) { - return Ok(format!("❌ Failed to write rulespec: {}", e)); - } - } - // Display the plan in compact format let plan_path = get_plan_path(session_id); let plan_path_str = plan_path.to_string_lossy().to_string(); let yaml = serde_yaml::to_string(&plan)?; ctx.ui_writer.print_plan_compact(Some(&yaml), Some(&plan_path_str), true); - // Format rulespec section - use provided rulespec or read from disk - let rulespec_section = match rulespec.as_ref().or(read_rulespec(session_id).ok().flatten().as_ref()) { - Some(rs) => format_rulespec_markdown(rs), - None => "\n_No rulespec defined._\n".to_string(), - }; - // Read and format envelope if it exists let envelope_section = match read_envelope(session_id) { Ok(Some(envelope)) => format_envelope_markdown(&envelope), @@ -1021,20 +979,18 @@ pub async fn execute_plan_write( // Check if plan is now complete and trigger verification if plan.is_complete() && plan.is_approved() { let verification = plan_verify(&plan, ctx.working_dir); - let verification_output = format_verification_results(&verification, ctx.session_id); + let verification_output = format_verification_results(&verification, ctx.session_id, ctx.working_dir.map(std::path::Path::new)); return Ok(format!( - "✅ Plan updated: {}\n{}\n{}\n{}", + "✅ Plan updated: {}\n{}\n{}", plan.status_summary(), verification_output, - rulespec_section, envelope_section )); } Ok(format!( - "✅ Plan updated: {}\n{}\n{}", + "✅ Plan updated: {}\n{}", plan.status_summary(), - rulespec_section, envelope_section )) } @@ -1068,43 +1024,14 @@ pub async fn execute_plan_approve( // Approve the plan plan.approve(); - // Compile rulespec to datalog on approval - let compile_message; - match read_rulespec(session_id) { - Ok(Some(rulespec)) => { - match compile_rulespec(&rulespec, &plan.plan_id, plan.revision) { - Ok(compiled) => { - if let Err(e) = save_compiled_rulespec(session_id, &compiled) { - compile_message = format!("\n⚠️ Failed to save compiled rulespec: {}", e); - } else { - compile_message = format!( - "\n📜 Compiled {} invariant(s) to datalog rules.", - compiled.predicates.len() - ); - } - } - Err(e) => { - compile_message = format!("\n⚠️ Failed to compile rulespec: {}", e); - } - } - } - Ok(None) => { - compile_message = "\n⚠️ No rulespec found - datalog verification will be skipped.".to_string(); - } - Err(e) => { - compile_message = format!("\n⚠️ Failed to read rulespec: {}", e); - } - } - // Write back if let Err(e) = write_plan(session_id, &plan) { return Ok(format!("❌ Failed to save approved plan: {}", e)); } Ok(format!( - "✅ Plan approved at revision {}. You may now begin implementation.{}", - plan.revision, - compile_message + "✅ Plan approved at revision {}. You may now begin implementation.", + plan.revision )) } diff --git a/crates/g3-core/tests/stream_completion_characterization_test.rs b/crates/g3-core/tests/stream_completion_characterization_test.rs index f98aa0b..8d50838 100644 --- a/crates/g3-core/tests/stream_completion_characterization_test.rs +++ b/crates/g3-core/tests/stream_completion_characterization_test.rs @@ -622,14 +622,6 @@ items: - desc: Edge cases target: test::module"# , - "rulespec": r#"claims: - - name: test_feature - selector: test.done -predicates: - - claim: test_feature - rule: exists - source: task_prompt - notes: Test invariant"# }), }; let write_result = agent.execute_tool(&write_call).await.unwrap(); diff --git a/crates/g3-core/tests/tool_execution_roundtrip_test.rs b/crates/g3-core/tests/tool_execution_roundtrip_test.rs index 45286ba..40af83a 100644 --- a/crates/g3-core/tests/tool_execution_roundtrip_test.rs +++ b/crates/g3-core/tests/tool_execution_roundtrip_test.rs @@ -425,14 +425,6 @@ items: boundary: - desc: Edge target: test"# - , - "rulespec": r#"claims: - - name: test_feature - selector: test.done -predicates: - - claim: test_feature - rule: exists - source: task_prompt"# }), ); @@ -487,14 +479,6 @@ items: happy: {desc: Works, target: test} negative: [{desc: Errors, target: test}] boundary: [{desc: Edge, target: test}]"# - , - "rulespec": r#"claims: - - name: approval_test - selector: test.approved -predicates: - - claim: approval_test - rule: exists - source: task_prompt"# }), ); agent.execute_tool(&write_call).await.unwrap(); @@ -507,3 +491,214 @@ predicates: "Should approve plan: {}", result); } } + + +// ============================================================================= +// Test: plan_verify with analysis/rulespec.yaml datalog integration +// ============================================================================= + +mod plan_verify_datalog_integration { + use super::*; + + /// Helper: write a complete plan, approve it, and set up envelope. + /// Returns the actual session ID (which has a unique suffix). + async fn setup_complete_plan_with_envelope( + agent: &mut Agent, + temp_dir: &TempDir, + description: &str, + ) -> String { + agent.init_session_id_for_test(description); + let actual_session_id = agent.get_session_id().unwrap().to_string(); + + // Write a plan + let write_call = make_tool_call( + "plan_write", + serde_json::json!({ + "plan": r#"plan_id: datalog-test +revision: 1 +items: + - id: I1 + description: Implement feature + state: todo + touches: ["src/lib.rs"] + checks: + happy: {desc: Works, target: lib} + negative: [{desc: Errors, target: lib}] + boundary: [{desc: Edge, target: lib}]"# + }), + ); + agent.execute_tool(&write_call).await.unwrap(); + + // Approve + let approve_call = make_tool_call("plan_approve", serde_json::json!({})); + agent.execute_tool(&approve_call).await.unwrap(); + + // Write envelope.yaml to session dir (using actual session ID) + let session_dir = temp_dir + .path() + .join(".g3") + .join("sessions") + .join(&actual_session_id); + fs::create_dir_all(&session_dir).unwrap(); + fs::write( + session_dir.join("envelope.yaml"), + "facts: + feature: + done: true + capabilities: [handle_csv, handle_tsv] + file: src/lib.rs +", + ) + .unwrap(); + + // Create a dummy evidence file + let src_dir = temp_dir.path().join("src"); + fs::create_dir_all(&src_dir).unwrap(); + fs::write(src_dir.join("lib.rs"), "// test file").unwrap(); + + actual_session_id + } + + /// Test: plan_verify compiles datalog rules on-the-fly from analysis/rulespec.yaml + /// and writes .dl + evaluation files to session dir + #[tokio::test] + #[serial] + async fn test_plan_verify_with_analysis_rulespec() { + let temp_dir = TempDir::new().unwrap(); + let mut agent = create_test_agent(&temp_dir).await; + + let session_id = setup_complete_plan_with_envelope( + &mut agent, &temp_dir, "datalog-rulespec-test" + ).await; + + // Write analysis/rulespec.yaml + let analysis_dir = temp_dir.path().join("analysis"); + fs::create_dir_all(&analysis_dir).unwrap(); + fs::write( + analysis_dir.join("rulespec.yaml"), + "claims: + - name: feature_done + selector: feature.done +predicates: + - claim: feature_done + rule: exists + source: task_prompt + notes: Feature must be marked done +", + ) + .unwrap(); + + // Mark item done - this triggers plan_verify + shadow_datalog_verify + let done_call = make_tool_call( + "plan_write", + serde_json::json!({ + "plan": "plan_id: datalog-test\nrevision: 2\nitems:\n - id: I1\n description: Implement feature\n state: done\n touches: [src/lib.rs]\n checks:\n happy: {desc: Works, target: lib}\n negative: [{desc: Errors, target: lib}]\n boundary: [{desc: Edge, target: lib}]\n evidence: [src/lib.rs:1]\n notes: Implemented the feature" + }), + ); + let result = agent.execute_tool(&done_call).await.unwrap(); + assert!(result.contains("VERIFICATION"), "Should trigger verification: {}", result); + + // Check that .dl and evaluation files were written to session dir + let session_dir = temp_dir + .path() + .join(".g3") + .join("sessions") + .join(&session_id); + let dl_path = session_dir.join("rulespec.compiled.dl"); + let eval_path = session_dir.join("datalog_evaluation.txt"); + + assert!(dl_path.exists(), "Compiled .dl file should exist at {}", dl_path.display()); + assert!(eval_path.exists(), "Evaluation report should exist at {}", eval_path.display()); + + // Verify evaluation content shows pass + let eval_content = fs::read_to_string(&eval_path).unwrap(); + assert!(eval_content.contains("satisfied") || eval_content.contains("PASS"), + "Evaluation should show passing results: {}", eval_content); + } + + /// Test: plan_verify works gracefully when analysis/rulespec.yaml is absent + #[tokio::test] + #[serial] + async fn test_plan_verify_without_rulespec() { + let temp_dir = TempDir::new().unwrap(); + let mut agent = create_test_agent(&temp_dir).await; + + let session_id = setup_complete_plan_with_envelope( + &mut agent, &temp_dir, "datalog-no-rulespec-test" + ).await; + + // Do NOT create analysis/rulespec.yaml + + // Mark item done + let done_call = make_tool_call( + "plan_write", + serde_json::json!({ + "plan": "plan_id: datalog-test\nrevision: 2\nitems:\n - id: I1\n description: Implement feature\n state: done\n touches: [src/lib.rs]\n checks:\n happy: {desc: Works, target: lib}\n negative: [{desc: Errors, target: lib}]\n boundary: [{desc: Edge, target: lib}]\n evidence: [src/lib.rs:1]\n notes: Implemented the feature" + }), + ); + let result = agent.execute_tool(&done_call).await.unwrap(); + assert!(result.contains("VERIFICATION"), "Should still verify: {}", result); + + // No .dl or evaluation files should exist + let session_dir = temp_dir + .path() + .join(".g3") + .join("sessions") + .join(&session_id); + assert!(!session_dir.join("rulespec.compiled.dl").exists(), + "No .dl file should exist without rulespec"); + assert!(!session_dir.join("datalog_evaluation.txt").exists(), + "No evaluation file should exist without rulespec"); + } + + /// Test: rulespec predicate that fails against envelope shows failure + #[tokio::test] + #[serial] + async fn test_plan_verify_rulespec_failure() { + let temp_dir = TempDir::new().unwrap(); + let mut agent = create_test_agent(&temp_dir).await; + + let session_id = setup_complete_plan_with_envelope( + &mut agent, &temp_dir, "datalog-fail-test" + ).await; + + // Write a rulespec that will FAIL (expects a fact that doesn't exist) + let analysis_dir = temp_dir.path().join("analysis"); + fs::create_dir_all(&analysis_dir).unwrap(); + fs::write( + analysis_dir.join("rulespec.yaml"), + "claims: + - name: missing_feature + selector: nonexistent.field +predicates: + - claim: missing_feature + rule: exists + source: task_prompt + notes: This field does not exist in the envelope +", + ) + .unwrap(); + + // Mark item done + let done_call = make_tool_call( + "plan_write", + serde_json::json!({ + "plan": "plan_id: datalog-test\nrevision: 2\nitems:\n - id: I1\n description: Implement feature\n state: done\n touches: [src/lib.rs]\n checks:\n happy: {desc: Works, target: lib}\n negative: [{desc: Errors, target: lib}]\n boundary: [{desc: Edge, target: lib}]\n evidence: [src/lib.rs:1]\n notes: Implemented the feature" + }), + ); + agent.execute_tool(&done_call).await.unwrap(); + + // Check evaluation file shows failure + let session_dir = temp_dir + .path() + .join(".g3") + .join("sessions") + .join(&session_id); + let eval_path = session_dir.join("datalog_evaluation.txt"); + assert!(eval_path.exists(), "Evaluation report should exist"); + + let eval_content = fs::read_to_string(&eval_path).unwrap(); + assert!(eval_content.contains("FAIL") || eval_content.contains("fail"), + "Evaluation should show failing results: {}", eval_content); + } +} diff --git a/prompts/system/native.md b/prompts/system/native.md index f42b612..e262cd8 100644 --- a/prompts/system/native.md +++ b/prompts/system/native.md @@ -19,7 +19,7 @@ Plan Mode is a cognitive forcing system that prevents: ## Workflow -1. **Draft**: Call `plan_read` to check for existing plan, then `plan_write` with BOTH plan AND rulespec +1. **Draft**: Call `plan_read` to check for existing plan, then `plan_write` with the plan YAML 2. **Approval**: Ask user to approve before starting work ("'approve', or edit plan?"). In non-interactive mode (autonomous/one-shot), plans auto-approve on write. 3. **Execute**: Implement items, updating plan with `plan_write` to mark progress 4. **Complete**: When all items are done/blocked, verification runs automatically @@ -44,47 +44,10 @@ When drafting a plan, you MUST: - Keep items ~7 by default - Commit to where the work will live (touches) - Provide all three checks (happy, negative, boundary) -- **Include rulespec with invariants** (required for new plans) When updating a plan: - Cannot remove items from an approved plan (mark as blocked instead) - Must provide evidence and notes when marking item as done -- Rulespec is optional for updates (already saved from initial creation) - -## Invariants (Rulespec) - -For all NEW plans, you MUST extract invariants and provide them as the `rulespec` argument to `plan_write`. - -### What are Invariants? - -Invariants are constraints that MUST or MUST NOT hold. Extract them from: -- **task_prompt**: What the user explicitly requires ("must support TSV", "must not break existing API") -- **memory**: Persistent rules from workspace memory ("must be Send + Sync", "must not block async runtime") - -### Rulespec Structure - -```yaml -claims: - - name: csv_capabilities - selector: "csv_importer.capabilities" - -predicates: - - claim: csv_capabilities - rule: contains - value: "handle_tsv" - source: task_prompt - notes: "User explicitly requested TSV support" -``` - -### Predicate Rules - -- `contains`: Array contains value, or string contains substring -- `equals`: Exact match -- `exists`: Value is present -- `not_exists`: Value is absent -- `min_length` / `max_length`: Array size constraints -- `greater_than` / `less_than`: Numeric comparisons -- `matches`: Regex pattern match ## Example Plan @@ -108,17 +71,6 @@ plan_write( - desc: Empty file yields empty import without error target: import::csv ", - rulespec: " - claims: - - name: csv_capabilities - selector: csv_importer.capabilities - predicates: - - claim: csv_capabilities - rule: contains - value: handle_tsv - source: task_prompt - notes: User explicitly requested TSV support - " ) ``` @@ -126,7 +78,7 @@ When marking done, add `evidence` and `notes` to the item. ## Action Envelope -Before marking the last plan item done, write an `envelope.yaml` file with facts about completed work. The envelope captures what was actually built so it can be verified against the rulespec. +Before marking the last plan item done, write an `envelope.yaml` file with facts about completed work. The envelope captures what was actually built so it can be verified against invariants in `analysis/rulespec.yaml` if present. ```yaml facts: @@ -141,10 +93,10 @@ facts: ``` **Rules:** -- Selectors in rulespec (e.g., `csv_importer.capabilities`) are evaluated against envelope facts +- Selectors in `analysis/rulespec.yaml` (e.g., `csv_importer.capabilities`) are evaluated against envelope facts - Use dot notation for nested access: `api_changes.breaking` - Use `null` to explicitly assert absence (for `not_exists` predicates) -- The envelope is automatically verified against the rulespec when the plan completes +- The envelope is automatically verified against `analysis/rulespec.yaml` when the plan completes (if the file exists) # Workspace Memory