From 1bbb4a0999ef289d7f17cb1231f12e95576eaae6 Mon Sep 17 00:00:00 2001 From: Christian Risi <75698846+CnF-Gris@users.noreply.github.com> Date: Thu, 25 Sep 2025 20:17:48 +0200 Subject: [PATCH] Added new paper --- docs/BPE.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/BPE.md b/docs/BPE.md index 02dca0a..eee3bac 100644 --- a/docs/BPE.md +++ b/docs/BPE.md @@ -17,5 +17,6 @@ - [Scaffold-BPE: Enhancing Byte Pair Encoding for Large Language Models with Simple and Effective Scaffold Token Removal](https://ojs.aaai.org/index.php/AAAI/article/view/34633) - [Parity-Aware Byte-Pair Encoding: Improving Cross-lingual Fairness in Tokenization](https://arxiv.org/pdf/2508.04796) - [Code Completion using Neural A‚ention and Byte Pair Encoding](https://arxiv.org/pdf/2004.06343) +- [Getting the most out of your tokenizer for pre-training and domain adaptation](https://arxiv.org/html/2402.01035v2)