Typo correction in the markdown

2025-09-18 20:24:11 +02:00
parent 6686b47328
commit 1c715dc569
1 changed files with 13 additions and 11 deletions
--- a/docs/RESOURCES.md
+++ b/docs/RESOURCES.md
@@ -1,20 +1,22 @@
-# Byte-Pair Encoding (BPE)
+# Resources

-## Overview
+## Byte-Pair Encoding (BPE)
+
+### Overview

 Byte-Pair Encoding (BPE) is a simple but powerful text compression and tokenization algorithm.
 Originally introduced as a data compression method, it has been widely adopted in **Natural Language Processing (NLP)** to build subword vocabularies for models such as GPT and BERT.

 ---

-## Key Idea
+### Key Idea

 BPE works by iteratively replacing the most frequent pair of symbols (initially characters) with a new symbol.
 Over time, frequent character sequences (e.g., common morphemes, prefixes, suffixes) are merged into single tokens.

 ---

-## Algorithm Steps
+### Algorithm Steps

 1. **Initialization**
   - Treat each character of the input text as a token.
@@ -30,7 +32,7 @@ Over time, frequent character sequences (e.g., common morphemes, prefixes, suffi

 ---

-## Example
+### Example

 Suppose the data to be encoded is:

@@ -38,7 +40,7 @@ Suppose the data to be encoded is:
 aaabdaaabac
 ```

-### Step 1: Merge `"aa"`
+#### Step 1: Merge `"aa"`

 Most frequent pair: `"aa"` → replace with `"Z"`

@@ -49,7 +51,7 @@ Z = aa

 ---

-### Step 2: Merge `"ab"`
+#### Step 2: Merge `"ab"`

 Most frequent pair: `"ab"` → replace with `"Y"`

@@ -61,7 +63,7 @@ Z = aa

 ---

-### Step 3: Merge `"ZY"`
+#### Step 3: Merge `"ZY"`

 Most frequent pair: `"ZY"` → replace with `"X"`

@@ -78,7 +80,7 @@ At this point, no pairs occur more than once, so the process stops.

 ---

-## Decompression
+### Decompression

 To recover the original data, replacements are applied in **reverse order**:

@@ -91,7 +93,7 @@ XdXac

 ---

-## Advantages
+### Advantages

 - **Efficient vocabulary building**: reduces the need for massive word lists.
 - **Handles rare words**: breaks them into meaningful subword units.
@@ -99,7 +101,7 @@ XdXac

 ---

-## Limitations
+### Limitations

 - Does not consider linguistic meaning—merges are frequency-based.
 - May create tokens that are not linguistically natural.