Typo correction in the markdown

2025-09-18 20:24:11 +02:00
parent 6686b47328
commit 1c715dc569
1 changed files with 13 additions and 11 deletions
--- a/docs/RESOURCES.md
+++ b/docs/RESOURCES.md
@@ -1,20 +1,22 @@
-# Byte-Pair Encoding (BPE)
+# Resources
-## Overview
+## Byte-Pair Encoding (BPE)
 ### Overview
 Byte-Pair Encoding (BPE) is a simple but powerful text compression and tokenization algorithm.
 Originally introduced as a data compression method, it has been widely adopted in **Natural Language Processing (NLP)** to build subword vocabularies for models such as GPT and BERT.
 ---
-## Key Idea
+### Key Idea
 BPE works by iteratively replacing the most frequent pair of symbols (initially characters) with a new symbol.
 Over time, frequent character sequences (e.g., common morphemes, prefixes, suffixes) are merged into single tokens.
 ---
-## Algorithm Steps
+### Algorithm Steps
 1. **Initialization**
   - Treat each character of the input text as a token.
@@ -30,7 +32,7 @@ Over time, frequent character sequences (e.g., common morphemes, prefixes, suffi
 ---
-## Example
+### Example
 Suppose the data to be encoded is:
@@ -38,7 +40,7 @@ Suppose the data to be encoded is:
 aaabdaaabac
 ```
-### Step 1: Merge `"aa"`
+#### Step 1: Merge `"aa"`
 Most frequent pair: `"aa"` → replace with `"Z"`
@@ -49,7 +51,7 @@ Z = aa
 ---
-### Step 2: Merge `"ab"`
+#### Step 2: Merge `"ab"`
 Most frequent pair: `"ab"` → replace with `"Y"`
@@ -61,7 +63,7 @@ Z = aa
 ---
-### Step 3: Merge `"ZY"`
+#### Step 3: Merge `"ZY"`
 Most frequent pair: `"ZY"` → replace with `"X"`
@@ -78,7 +80,7 @@ At this point, no pairs occur more than once, so the process stops.
 ---
-## Decompression
+### Decompression
 To recover the original data, replacements are applied in **reverse order**:
@@ -91,7 +93,7 @@ XdXac
 ---
-## Advantages
+### Advantages
 - **Efficient vocabulary building**: reduces the need for massive word lists.
 - **Handles rare words**: breaks them into meaningful subword units.
@@ -99,7 +101,7 @@ XdXac
 ---
-## Limitations
+### Limitations
 - Does not consider linguistic meaning—merges are frequency-based.
 - May create tokens that are not linguistically natural.