GassiGiuseppe
|
96610612fe
|
Batcher added
|
2025-10-10 20:10:08 +02:00 |
|
Christian Risi
|
aac7675b30
|
Pipeline fix and added a util to decode
|
2025-10-09 13:24:48 +02:00 |
|
GassiGiuseppe
|
b4ee8362a2
|
WIP training Batching
|
2025-10-07 17:41:53 +02:00 |
|
Christian Risi
|
da0bdf703b
|
Added a way to see vocabulary size
|
2025-10-04 19:42:29 +02:00 |
|
Christian Risi
|
03cdca1f00
|
Modified imports for BPE
|
2025-10-04 19:42:02 +02:00 |
|
Christian Risi
|
23d1eaf99e
|
Fixed a rare bug over training multiple times
|
2025-10-04 10:47:39 +02:00 |
|
Christian Risi
|
d2a3dfe90f
|
Fixed bug
|
2025-10-03 17:59:46 +02:00 |
|
Christian Risi
|
0ee6e48004
|
Fixed the same bug as before, but this time is correct
|
2025-10-03 16:09:53 +02:00 |
|
Christian Risi
|
55e0d2ac23
|
Fixed a encoding bug
|
2025-10-03 16:08:11 +02:00 |
|
Christian Risi
|
c5c0c61f79
|
Fix of bugs and semantics
|
2025-10-03 13:26:58 +02:00 |
|
Christian Risi
|
6b9cb7cd35
|
Modified imports
|
2025-10-03 13:26:42 +02:00 |
|
Christian Risi
|
e8894504c6
|
Fixed a bug where a token (int) was yielded instead of a list of int
|
2025-10-03 11:44:44 +02:00 |
|
GassiGiuseppe
|
070dc1b744
|
implemented token nano for the BPE encoding/decoding
|
2025-10-03 01:04:06 +02:00 |
|
GassiGiuseppe
|
8121c75a09
|
Updated NanoSocratesSplitter to split also token in decode phase
|
2025-10-03 01:00:36 +02:00 |
|
GassiGiuseppe
|
a5b8692a77
|
Updated NanoSocratesSpecial to work with TokeNano
|
2025-10-03 00:59:15 +02:00 |
|
GassiGiuseppe
|
7c935d2700
|
Update NanoSocratesBPE: corrected a minor bug about dictionary lenght,
added some comment to make the code more clear
|
2025-10-03 00:57:19 +02:00 |
|
GassiGiuseppe
|
0eef2148a9
|
in NanoSocratesBPE: encode() method rewritten and tested
|
2025-10-02 12:12:44 +02:00 |
|
Christian Risi
|
856bd8909c
|
Added treshold
|
2025-10-02 11:02:03 +02:00 |
|
Christian Risi
|
2e595a3a23
|
Changed training phase to take directly data instead of its encode
|
2025-10-02 09:56:44 +02:00 |
|
Christian Risi
|
1eae8582b2
|
Fixed decoding phase
|
2025-10-02 09:33:58 +02:00 |
|
Christian Risi
|
aa765b4555
|
Added time checking
|
2025-10-02 08:48:45 +02:00 |
|
Christian Risi
|
0975c19e69
|
added nwew method to encode from list of tokens
|
2025-10-02 08:48:13 +02:00 |
|
Christian Risi
|
3fe4e45ceb
|
Fixed a bug while joining frequencies
|
2025-10-02 01:50:37 +02:00 |
|
Christian Risi
|
d19426fa62
|
added multithreaded training to package
|
2025-10-02 01:31:05 +02:00 |
|
Christian Risi
|
63baf29805
|
Added multithreaded training
|
2025-10-02 01:30:24 +02:00 |
|
Christian Risi
|
b80b4e4112
|
Fixed returning type hints
|
2025-10-02 01:29:57 +02:00 |
|
Christian Risi
|
7cfaf601b4
|
Refactored to remove tokens that can't be compressed anymore
|
2025-10-01 19:42:22 +02:00 |
|
Christian Risi
|
fbbe6226bb
|
Finished uploading stubs for TokeNano
|
2025-10-01 18:56:53 +02:00 |
|
Christian Risi
|
66bcf6e55f
|
Added a way to recover iteration work
|
2025-10-01 12:21:42 +02:00 |
|
Christian Risi
|
89a0a1f4bb
|
Fixed bug for utf-8 conversion
|
2025-09-30 23:58:31 +02:00 |
|
Christian Risi
|
b09bd4acba
|
Created trainer to train BPE
|
2025-09-30 13:33:40 +02:00 |
|
Christian Risi
|
c9032cab09
|
Added fit method
|
2025-09-30 13:33:28 +02:00 |
|
Christian Risi
|
2fe1ce9e9a
|
Updated Inits
|
2025-09-30 13:32:37 +02:00 |
|
Christian Risi
|
e433941405
|
Added BPE
TODO:
- complete the fit method
|
2025-09-28 18:04:44 +02:00 |
|
Christian Risi
|
b46df4f91a
|
Added Special Encoder
|
2025-09-28 18:03:47 +02:00 |
|
Christian Risi
|
d179e01971
|
Added Splitter to divide tokens from text
|
2025-09-28 18:03:16 +02:00 |
|
Christian Risi
|
b071145f6e
|
Added Chunker
|
2025-09-28 18:02:06 +02:00 |
|
Christian Risi
|
ed0255e99b
|
Updated imports
|
2025-09-28 18:01:35 +02:00 |
|
Christian Risi
|
8db35732f9
|
Added Chunker to restrict our domains
|
2025-09-26 18:50:23 +02:00 |
|
Christian Risi
|
9972ab8a51
|
Added imports
|
2025-09-26 18:48:23 +02:00 |
|