Added Chapters 12 and 13
This commit is contained in:
@@ -21,8 +21,8 @@ to the first `encoder` and `decoder`.
|
||||
Here we transform each word of the input into an ***embedding*** and add a vector to account for
|
||||
position. This positional encoding can either be learnt or can follow this formula:
|
||||
|
||||
|
||||
- Even size:
|
||||
|
||||
$$
|
||||
\text{positional\_encoding}_{
|
||||
(position, 2\text{size})
|
||||
@@ -40,7 +40,9 @@ $$
|
||||
}
|
||||
\right)
|
||||
$$
|
||||
|
||||
- Odd size:
|
||||
|
||||
$$
|
||||
\text{positional\_encoding}_{
|
||||
(position, 2\text{size} + 1)
|
||||
@@ -59,7 +61,6 @@ $$
|
||||
\right)
|
||||
$$
|
||||
|
||||
|
||||
### Encoder
|
||||
|
||||
> [!CAUTION]
|
||||
@@ -164,17 +165,17 @@ It can be used as a classifier and can be fine tuned.
|
||||
The fine tuning happens by **masking** input and **predict** the **masked word**:
|
||||
|
||||
- 15% of total words in input are masked
|
||||
- 80% will become a `[masked]` token
|
||||
- 10% will become random words
|
||||
- 10% will remain unchanged
|
||||
- 80% will become a `[masked]` token
|
||||
- 10% will become random words
|
||||
- 10% will remain unchanged
|
||||
|
||||
#### Bert tasks
|
||||
|
||||
- **Classification**
|
||||
- **Fine Tuning**
|
||||
- **2 sentences tasks**
|
||||
- **Are they paraphrases?**
|
||||
- **Does one sentence follow from this other one?**
|
||||
- **Are they paraphrases?**
|
||||
- **Does one sentence follow from this other one?**
|
||||
- **Feature Extraction**: "Allows us to extract feature to use in our model
|
||||
|
||||
### GPT-2
|
||||
|
||||
Reference in New Issue
Block a user