Fixed Markdown violations

2025-09-17 12:51:14 +02:00
parent cececa14ce
commit 72eb937b47
1 changed files with 26 additions and 17 deletions
--- a/docs/RESOURCES.md
+++ b/docs/RESOURCES.md
@@ -1,28 +1,31 @@
 # Byte-Pair Encoding (BPE)
 ## Overview
-Byte-Pair Encoding (BPE) is a simple but powerful text compression and tokenization algorithm.  
+
 Byte-Pair Encoding (BPE) is a simple but powerful text compression and tokenization algorithm.
 Originally introduced as a data compression method, it has been widely adopted in **Natural Language Processing (NLP)** to build subword vocabularies for models such as GPT and BERT.
 ---
 ## Key Idea
-BPE works by iteratively replacing the most frequent pair of symbols (initially characters) with a new symbol.  
+
 BPE works by iteratively replacing the most frequent pair of symbols (initially characters) with a new symbol.
 Over time, frequent character sequences (e.g., common morphemes, prefixes, suffixes) are merged into single tokens.
 ---
 ## Algorithm Steps
-1. **Initialization**  
+
 1. **Initialization**
   - Treat each character of the input text as a token.
-2. **Find Frequent Pairs**  
+2. **Find Frequent Pairs**
   - Count all adjacent token pairs in the sequence.
-3. **Merge Most Frequent Pair**  
+3. **Merge Most Frequent Pair**
   - Replace the most frequent pair with a new symbol not used in the text.
-4. **Repeat**  
+4. **Repeat**
   - Continue until no frequent pairs remain or a desired vocabulary size is reached.
 ---
@@ -31,14 +34,15 @@ Over time, frequent character sequences (e.g., common morphemes, prefixes, suffi
 Suppose the data to be encoded is:
-```
+```text
 aaabdaaabac
 ```
 ### Step 1: Merge `"aa"`
 Most frequent pair: `"aa"` → replace with `"Z"`
-```
+```text
 ZabdZabac
 Z = aa
 ```
@@ -46,9 +50,10 @@ Z = aa
 ---
 ### Step 2: Merge `"ab"`
 Most frequent pair: `"ab"` → replace with `"Y"`
-```
+```text
 ZYdZYac
 Y = ab
 Z = aa
@@ -57,9 +62,10 @@ Z = aa
 ---
 ### Step 3: Merge `"ZY"`
 Most frequent pair: `"ZY"` → replace with `"X"`
-```
+```text
 XdXac
 X = ZY
 Y = ab
@@ -73,9 +79,10 @@ At this point, no pairs occur more than once, so the process stops.
 ---
 ## Decompression
 To recover the original data, replacements are applied in **reverse order**:
-```
+```text
 XdXac
 → ZYdZYac
 → ZabdZabac
@@ -85,13 +92,15 @@ XdXac
 ---
 ## Advantages
- **Efficient vocabulary building**: reduces the need for massive word lists.  
+
- **Handles rare words**: breaks them into meaningful subword units.  
+- **Efficient vocabulary building**: reduces the need for massive word lists.
- **Balances character- and word-level tokenization**.  
+- **Handles rare words**: breaks them into meaningful subword units.
 - **Balances character- and word-level tokenization**.
 ---
 ## Limitations
- Does not consider linguistic meaning—merges are frequency-based.  
+
- May create tokens that are not linguistically natural.  
+- Does not consider linguistic meaning—merges are frequency-based.
- Vocabulary is fixed after training.  
+- May create tokens that are not linguistically natural.
 - Vocabulary is fixed after training.