diff --git a/docs/BPE.md b/docs/BPE.md index 02dca0a..eee3bac 100644 --- a/docs/BPE.md +++ b/docs/BPE.md @@ -17,5 +17,6 @@ - [Scaffold-BPE: Enhancing Byte Pair Encoding for Large Language Models with Simple and Effective Scaffold Token Removal](https://ojs.aaai.org/index.php/AAAI/article/view/34633) - [Parity-Aware Byte-Pair Encoding: Improving Cross-lingual Fairness in Tokenization](https://arxiv.org/pdf/2508.04796) - [Code Completion using Neural A‚ention and Byte Pair Encoding](https://arxiv.org/pdf/2004.06343) +- [Getting the most out of your tokenizer for pre-training and domain adaptation](https://arxiv.org/html/2402.01035v2)