- Add TODO completion check to final_output tool in autonomous mode only
- When incomplete TODO items exist, reject final_output and prompt LLM to continue
- Non-autonomous modes (interactive, chat) are unaffected
- Add 6 tests verifying behavior in both autonomous and non-autonomous modes
Fixes issue where LLM would call final_output after completing first phase,
causing agent to stop prematurely instead of continuing with remaining phases.