Add documentation for retry configuration in planning mode: - Document retry settings in .g3.toml under [agent] section - Note RetryConfig implementation in g3-core/src/retry.rs - Clarify hardcoded vs config-based retry values Verify existing retry loop and coach feedback parsing: - Confirm execute_with_retry() handles recoverable errors - Document feedback extraction source priority order - Provide manual verification steps for testing
117 lines
5.0 KiB
Markdown
117 lines
5.0 KiB
Markdown
{{CURRENT REQUIREMENTS}}
|
|
|
|
These requirements specify verification tasks for the planning mode's retry logic and coach
|
|
response parsing, along with documentation of where configuration is located.
|
|
|
|
## 1. Document Retry Configuration Location
|
|
|
|
**Goal**: Clarify where retry settings are configured for planning mode.
|
|
|
|
**Findings to document**:
|
|
1. Retry configuration is in the `.g3.toml` config file (or `config.example.toml` as template)
|
|
under the `[agent]` section:
|
|
```toml
|
|
[agent]
|
|
max_retry_attempts = 3 # Default mode retries
|
|
autonomous_max_retry_attempts = 6 # Used by planning/autonomous mode
|
|
```
|
|
|
|
2. The retry infrastructure is implemented in `crates/g3-core/src/retry.rs`:
|
|
- `RetryConfig` struct defines retry behavior per role
|
|
- `RetryConfig::planning("player")` and `RetryConfig::planning("coach")` create presets
|
|
- Default max retries is 3 (hardcoded in `RetryConfig::planning()`)
|
|
|
|
3. **Note**: Currently `RetryConfig::planning()` uses a hardcoded `max_retries: 3` rather than
|
|
reading from the config file's `autonomous_max_retry_attempts`. This may be intentional or
|
|
a gap to address.
|
|
|
|
**Required action**:
|
|
|
|
- add examples to config.example.toml for the coach and player retry configs.
|
|
|
|
## 2. Verify Retry Loop Functionality
|
|
|
|
**Goal**: Confirm that connection retry loops in planning mode work correctly for recoverable
|
|
errors.
|
|
|
|
**Verification approach**:
|
|
1. The retry logic is implemented in `g3_core::retry::execute_with_retry()` and is already
|
|
used by both player and coach phases in `run_coach_player_loop()` (planner.rs lines 633-640
|
|
and 682-689).
|
|
|
|
2. Error classification happens in `g3_core::error_handling::classify_error()` which identifies:
|
|
- `RecoverableError::RateLimit` (429 errors)
|
|
- `RecoverableError::NetworkError` (connection failures)
|
|
- `RecoverableError::ServerError` (5xx errors)
|
|
- `RecoverableError::Timeout` (request timeouts)
|
|
- `RecoverableError::ModelBusy` (capacity issues)
|
|
|
|
3. **Manual verification steps** (for a human tester):
|
|
- Run planning mode with a temporarily invalid API endpoint to trigger network errors
|
|
- Observe retry messages: `"⚠️ player error (attempt X/3): NetworkError - ..."`
|
|
- Observe backoff: `"🔄 Retrying player in Xs..."`
|
|
- After max retries, observe: `"🔄 Max retries (3) reached for player"`
|
|
|
|
4. **Existing test coverage**:
|
|
- `g3-core/src/retry.rs` has unit tests for `RetryConfig` construction
|
|
- `g3-core/src/error_handling.rs` has tests for `classify_error()` and delay calculations
|
|
|
|
**Required action**:
|
|
- No code changes needed if retry loops are already functioning.
|
|
- If issues are found during manual verification, document specific failure scenarios.
|
|
|
|
## 3. Verify Coach Response Parsing
|
|
|
|
**Goal**: Confirm that coach feedback extraction works correctly in planning mode.
|
|
|
|
**Current implementation**:
|
|
1. Coach feedback extraction uses `g3_core::feedback_extraction::extract_coach_feedback()`
|
|
(called at planner.rs ~line 695).
|
|
|
|
2. The extraction tries multiple sources in order:
|
|
- `FeedbackSource::SessionLog` - from session log JSON file
|
|
- `FeedbackSource::NativeToolCall` - from native tool call JSON in response
|
|
- `FeedbackSource::ConversationHistory` - from conversation history
|
|
- `FeedbackSource::TaskResultResponse` - from TaskResult parsing
|
|
- `FeedbackSource::DefaultFallback` - default message
|
|
|
|
3. Planning mode displays the extraction source:
|
|
```
|
|
📝 Coach feedback extracted from SessionLog: 1234 chars
|
|
```
|
|
|
|
**Verification approach**:
|
|
1. **Manual verification steps**:
|
|
- Run a planning mode session through at least one coach/player cycle
|
|
- Observe the feedback extraction message and confirm it shows a valid source
|
|
(preferably `SessionLog` or `NativeToolCall`, not `DefaultFallback`)
|
|
- Verify the first 25 lines of feedback are displayed correctly
|
|
- Confirm `IMPLEMENTATION_APPROVED` detection works when coach approves
|
|
|
|
2. **Existing test coverage**:
|
|
- `g3-core/src/feedback_extraction.rs` has comprehensive unit tests:
|
|
- `test_extract_balanced_json_*` - JSON parsing
|
|
- `test_try_extract_json_tool_call` - tool call extraction
|
|
- `test_is_final_output_tool_call_*` - detecting final_output calls
|
|
- `test_extracted_feedback_is_approved` - approval detection
|
|
|
|
**Required action**:
|
|
- No code changes needed if parsing is working correctly.
|
|
- If `DefaultFallback` is observed frequently during manual testing, investigate why
|
|
earlier extraction methods are failing and document findings.
|
|
|
|
## 4. Optional: Add Integration Test for Retry + Feedback Flow
|
|
|
|
**Goal**: Create a lightweight integration test that verifies the retry and feedback
|
|
extraction machinery works together.
|
|
|
|
**Scope**: Only implement if time permits and manual verification reveals issues.
|
|
|
|
**Approach**:
|
|
1. Create a test in `crates/g3-planner/tests/` that:
|
|
- Mocks an LLM provider that returns a `final_output` tool call
|
|
- Verifies `extract_coach_feedback()` successfully extracts the feedback
|
|
- Optionally simulates a recoverable error to test retry logic
|
|
|
|
2. This test should NOT require actual API calls or network access.
|