Calibrate used_tokens from API prompt_tokens (ground truth) to fix
progress bar drift in interactive mode. Three issues fixed:
1. update_usage_from_response() only updated cumulative_tokens, never
calibrated used_tokens. Now snaps used_tokens to prompt_tokens when
available (falls back to heuristic when prompt_tokens is 0).
2. Moved calibration call inline during streaming (when usage chunk
arrives) instead of after the loop. Text-only responses — the most
common case in interactive mode — take an early return path that
bypassed the post-loop usage update entirely.
3. Removed mock Usage with hardcoded prompt_tokens=100 from
execute_single_task() which corrupted calibration.