Dhanji R. Prasanna
dc45987e8d
Add characterization tests for UTF-8 truncation and parser sanitization
Agent: hopper
Adds 32 new integration tests covering recent commits:
## UTF-8 Safe Truncation Tests (14 tests)
Covers commit f30f145 (Fix UTF-8 panics):
- Topic extraction with emoji, CJK, and multi-byte characters
- Truncation at character boundaries (not byte boundaries)
- Edge cases: exactly 50 chars, 51 chars, 2-byte/3-byte/4-byte UTF-8
- Stub generation with multi-byte topics
- Combining characters and diacritics
## Parser Sanitization Tests (18 tests)
Covers commit 4c36cc0 (Prevent parser poisoning):
- Code block contexts (inline code, after fences, prose)
- Line boundary edge cases (empty lines, whitespace, indentation)
- Unicode handling (emoji, bullets, CJK before patterns)
- Multiple patterns on same line
- Negative cases (similar but different patterns, partial patterns)
- Real-world scenarios from the original bug report
All tests are blackbox/characterization style - they test observable
outputs through stable public interfaces without encoding internal
implementation details.
2026-01-13 11:22:46 +05:30
..
2026-01-10 13:43:04 +11:00
2026-01-12 18:20:08 +05:30
2025-11-27 21:00:02 +11:00
2026-01-12 11:40:19 +05:30
2026-01-09 15:20:57 +11:00
2026-01-12 21:17:32 +05:30
2026-01-12 11:40:19 +05:30
2026-01-09 15:20:57 +11:00
2026-01-12 18:20:08 +05:30
2026-01-13 11:22:46 +05:30
2026-01-09 13:28:07 +11:00
2025-12-26 11:19:37 +11:00
2026-01-12 11:40:19 +05:30
2026-01-09 15:20:57 +11:00
2026-01-12 05:13:02 +05:30
2026-01-03 16:44:58 +11:00
2025-12-22 10:32:21 +11:00
2026-01-03 14:50:08 +11:00
2025-12-01 14:38:21 +11:00
2026-01-12 18:20:08 +05:30
2025-12-02 14:45:12 +11:00
2025-12-02 11:07:13 +11:00
2025-12-22 10:32:21 +11:00
2025-11-27 21:00:02 +11:00
2026-01-08 12:54:03 +11:00
2026-01-09 14:57:24 +11:00
2026-01-12 11:40:19 +05:30
2026-01-10 13:43:04 +11:00
2026-01-13 11:22:46 +05:30