feat(embedded): add GLM tool format adapter for code fence stripping
GLM-4 models wrap tool calls in markdown code fences and inline backticks, which prevents the streaming parser from detecting them. This adapter: - Strips ```json and ``` code fence markers during streaming - Strips inline backticks from tool call JSON - Handles chunked streaming correctly (buffers potential fence lines) - Transforms GLM native format (<|assistant|>tool_name) to g3 JSON format Also refactors embedded provider into module structure: - embedded/mod.rs - module exports - embedded/provider.rs - main EmbeddedProvider (moved from embedded.rs) - embedded/adapters/mod.rs - ToolFormatAdapter trait - embedded/adapters/glm.rs - GLM-specific adapter Includes 22 unit tests covering edge cases like nested JSON in strings, chunk boundary handling, and false pattern detection. Updates README to show GLM-4 9B now works (⭐⭐) for agentic tasks.
This commit is contained in:
@@ -133,7 +133,7 @@ g3 supports local models via llama.cpp with Metal acceleration on macOS. Here's
|
||||
|-------|------|-------|---------------|-------|
|
||||
| ~~Qwen3-32B~~ (Dense) | 18 GB | Slow | ❌ | Good reasoning, but flails on execution and crashes |
|
||||
| Qwen3-14B | 8.4 GB | Medium | ⭐⭐ | Understands tasks but makes implementation errors |
|
||||
| ~~GLM-4 9B~~ | 5.7 GB | Fast | ❌ | Uses incompatible native tool format, not JSON |
|
||||
| GLM-4 9B | 5.7 GB | Fast | ⭐⭐ | Works with adapter (strips code fences) |
|
||||
| Qwen3-4B | 2.3 GB | Very Fast | ❌ | Generates malformed tool calls - not for agentic use |
|
||||
| ~~Qwen3-30B-A3B~~ (MoE) | 17 GB | Very Fast | ❌ | **Avoid** - loops infinitely on tool calls |
|
||||
|
||||
|
||||
Reference in New Issue
Block a user