feat(embedded): add GLM tool format adapter for code fence stripping

GLM-4 models wrap tool calls in markdown code fences and inline backticks,
which prevents the streaming parser from detecting them. This adapter:

- Strips ```json and ``` code fence markers during streaming
- Strips inline backticks from tool call JSON
- Handles chunked streaming correctly (buffers potential fence lines)
- Transforms GLM native format (<|assistant|>tool_name) to g3 JSON format

Also refactors embedded provider into module structure:
- embedded/mod.rs - module exports
- embedded/provider.rs - main EmbeddedProvider (moved from embedded.rs)
- embedded/adapters/mod.rs - ToolFormatAdapter trait
- embedded/adapters/glm.rs - GLM-specific adapter

Includes 22 unit tests covering edge cases like nested JSON in strings,
chunk boundary handling, and false pattern detection.

Updates README to show GLM-4 9B now works () for agentic tasks.
This commit is contained in:
Dhanji R. Prasanna
2026-01-29 12:52:09 +11:00
parent 457ba35f80
commit 8191a5e8e6
5 changed files with 871 additions and 4 deletions

View File

@@ -133,7 +133,7 @@ g3 supports local models via llama.cpp with Metal acceleration on macOS. Here's
|-------|------|-------|---------------|-------|
| ~~Qwen3-32B~~ (Dense) | 18 GB | Slow | ❌ | Good reasoning, but flails on execution and crashes |
| Qwen3-14B | 8.4 GB | Medium | ⭐⭐ | Understands tasks but makes implementation errors |
| ~~GLM-4 9B~~ | 5.7 GB | Fast | | Uses incompatible native tool format, not JSON |
| GLM-4 9B | 5.7 GB | Fast | ⭐⭐ | Works with adapter (strips code fences) |
| Qwen3-4B | 2.3 GB | Very Fast | ❌ | Generates malformed tool calls - not for agentic use |
| ~~Qwen3-30B-A3B~~ (MoE) | 17 GB | Very Fast | ❌ | **Avoid** - loops infinitely on tool calls |