4.7 KiB
4.7 KiB
Research Material
BPE
- BPE Wikipedia
- BPE Hugging Face
- BPE GeeksForGeeks
- BPE Medium Chetna Khanna
- Stack Overflow "Explain bpe (Byte Pair Encoding) with examples?"
- Implementing a byte pair encoding(BPE) Tokenizer from scratch
- Thoretical Analysis of Byte-Pair Encoding
- A Formal Perspective on Byte-Pair Encoding
- Byte Pair Encoding is Suboptimal for Language Model Pretraining
- Byte pair encoding: a text compression scheme that accelerates pattern matching
- A Formal Perspective on Byte-Pair Encoding
- Controlling byte pair encoding for neural machine translation
- Scaffold-BPE: Enhancing Byte Pair Encoding for Large Language Models with Simple and Effective Scaffold Token Removal
- Parity-Aware Byte-Pair Encoding: Improving Cross-lingual Fairness in Tokenization
- Code Completion using Neural Aention and Byte Pair Encoding
- Getting the most out of your tokenizer for pre-training and domain adaptation
Embedder
- ROFORMER: ENHANCED TRANSFORMER WITH ROTARY POSITION EMBEDDING
- You could have designed state of the art positional encoding
- Rotary Embeddings: A Relative Revolution
- Round and Round We Go! What makes Rotary Positional Encodings useful?
- Inside RoPE: Rotary Magic into Position Embeddings
- What Rotary Position Embedding Can Tell Us: Identifying Query and Key Weights Corresponding to Basic Syntactic or High-level Semantic Information
- A gentle introduction to Rotary Position Embedding
- Context-aware Rotary Position Embedding
- LIERE: GENERALIZING ROTARY POSITION ENCODINGS TO HIGHER DIMENSIONAL INPUTS
- Rotary Positional Embeddings (RoPE)
- Decoding Llama3: An explainer for tinkerers
Attention
- Standard Self-Attention (Attention is all you need)
- TransMLA: Multi-Head Latent Attention Is All You Need
- A Gentle Introduction to Multi-Head Latent Attention (MLA)
- Understanding Multi-Head Latent Attention
- DeepSeek's Multi-Head Latent Attention
- MatchFormer: Interleaving Attention in Transformers for Feature Matching
- FIT: Far-reaching Interleaved Transformers
- Gemma explained: What’s new in Gemma 3
- The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation
- Attention was never enough: Tracing the rise of hybrid LLMs