Beyond Perplexity: UTF-8 Validity in Byte-aware Language Models

Published in Forty-Third International Conference on Machine Learning (ICML), 2026