Fixed Markdown violations

2025-09-17 12:51:14 +02:00
parent cececa14ce
commit 72eb937b47
1 changed files with 26 additions and 17 deletions
--- a/docs/RESOURCES.md
+++ b/docs/RESOURCES.md
@@ -1,28 +1,31 @@
 # Byte-Pair Encoding (BPE)

 ## Overview
-Byte-Pair Encoding (BPE) is a simple but powerful text compression and tokenization algorithm.  
+
+Byte-Pair Encoding (BPE) is a simple but powerful text compression and tokenization algorithm.
 Originally introduced as a data compression method, it has been widely adopted in **Natural Language Processing (NLP)** to build subword vocabularies for models such as GPT and BERT.

 ---

 ## Key Idea
-BPE works by iteratively replacing the most frequent pair of symbols (initially characters) with a new symbol.  
+
+BPE works by iteratively replacing the most frequent pair of symbols (initially characters) with a new symbol.
 Over time, frequent character sequences (e.g., common morphemes, prefixes, suffixes) are merged into single tokens.

 ---

 ## Algorithm Steps
-1. **Initialization**  
+
+1. **Initialization**
   - Treat each character of the input text as a token.

-2. **Find Frequent Pairs**  
+2. **Find Frequent Pairs**
   - Count all adjacent token pairs in the sequence.

-3. **Merge Most Frequent Pair**  
+3. **Merge Most Frequent Pair**
   - Replace the most frequent pair with a new symbol not used in the text.

-4. **Repeat**  
+4. **Repeat**
   - Continue until no frequent pairs remain or a desired vocabulary size is reached.

 ---
@@ -31,14 +34,15 @@ Over time, frequent character sequences (e.g., common morphemes, prefixes, suffi

 Suppose the data to be encoded is:

-```
+```text
 aaabdaaabac
 ```

 ### Step 1: Merge `"aa"`
+
 Most frequent pair: `"aa"` → replace with `"Z"`

-```
+```text
 ZabdZabac
 Z = aa
 ```
@@ -46,9 +50,10 @@ Z = aa
 ---

 ### Step 2: Merge `"ab"`
+
 Most frequent pair: `"ab"` → replace with `"Y"`

-```
+```text
 ZYdZYac
 Y = ab
 Z = aa
@@ -57,9 +62,10 @@ Z = aa
 ---

 ### Step 3: Merge `"ZY"`
+
 Most frequent pair: `"ZY"` → replace with `"X"`

-```
+```text
 XdXac
 X = ZY
 Y = ab
@@ -73,9 +79,10 @@ At this point, no pairs occur more than once, so the process stops.
 ---

 ## Decompression
+
 To recover the original data, replacements are applied in **reverse order**:

-```
+```text
 XdXac
 → ZYdZYac
 → ZabdZabac
@@ -85,13 +92,15 @@ XdXac
 ---

 ## Advantages
- **Efficient vocabulary building**: reduces the need for massive word lists.  
- **Handles rare words**: breaks them into meaningful subword units.  
- **Balances character- and word-level tokenization**.  
+
+- **Efficient vocabulary building**: reduces the need for massive word lists.
+- **Handles rare words**: breaks them into meaningful subword units.
+- **Balances character- and word-level tokenization**.

 ---

 ## Limitations
- Does not consider linguistic meaning—merges are frequency-based.  
- May create tokens that are not linguistically natural.  
- Vocabulary is fixed after training.  
+
+- Does not consider linguistic meaning—merges are frequency-based.
+- May create tokens that are not linguistically natural.
+- Vocabulary is fixed after training.