Christian Risi
|
b80b4e4112
|
Fixed returning type hints
|
2025-10-02 01:29:57 +02:00 |
|
Christian Risi
|
7cfaf601b4
|
Refactored to remove tokens that can't be compressed anymore
|
2025-10-01 19:42:22 +02:00 |
|
Christian Risi
|
fbbe6226bb
|
Finished uploading stubs for TokeNano
|
2025-10-01 18:56:53 +02:00 |
|
Christian Risi
|
66bcf6e55f
|
Added a way to recover iteration work
|
2025-10-01 12:21:42 +02:00 |
|
Christian Risi
|
dbf1d99408
|
Added json utils to save and load json files
|
2025-10-01 12:20:59 +02:00 |
|
Christian Risi
|
76f24d4eb0
|
Renamed file
|
2025-09-30 23:58:43 +02:00 |
|
Christian Risi
|
89a0a1f4bb
|
Fixed bug for utf-8 conversion
|
2025-09-30 23:58:31 +02:00 |
|
Christian Risi
|
ccacea18d8
|
Created files to test BPE training
|
2025-09-30 13:33:54 +02:00 |
|
Christian Risi
|
b09bd4acba
|
Created trainer to train BPE
|
2025-09-30 13:33:40 +02:00 |
|
Christian Risi
|
c9032cab09
|
Added fit method
|
2025-09-30 13:33:28 +02:00 |
|
Christian Risi
|
7020c9e683
|
Added utils to make regexps and iterators that check for last element
|
2025-09-30 13:33:12 +02:00 |
|
Christian Risi
|
2fe1ce9e9a
|
Updated Inits
|
2025-09-30 13:32:37 +02:00 |
|
Christian Risi
|
18fc2ba9d8
|
Added Exceptions
|
2025-09-30 13:32:24 +02:00 |
|
Christian Risi
|
564b0d712e
|
Modified UML diagram
|
2025-09-28 18:05:03 +02:00 |
|
Christian Risi
|
e433941405
|
Added BPE
TODO:
- complete the fit method
|
2025-09-28 18:04:44 +02:00 |
|
Christian Risi
|
b46df4f91a
|
Added Special Encoder
|
2025-09-28 18:03:47 +02:00 |
|
Christian Risi
|
d179e01971
|
Added Splitter to divide tokens from text
|
2025-09-28 18:03:16 +02:00 |
|
Christian Risi
|
b071145f6e
|
Added Chunker
|
2025-09-28 18:02:06 +02:00 |
|
Christian Risi
|
ed0255e99b
|
Updated imports
|
2025-09-28 18:01:35 +02:00 |
|
Christian Risi
|
3e8b5c5579
|
Added test for chunker
|
2025-09-26 18:50:32 +02:00 |
|
Christian Risi
|
8db35732f9
|
Added Chunker to restrict our domains
|
2025-09-26 18:50:23 +02:00 |
|
Christian Risi
|
9552d61f8d
|
Added Excetption for when we don't find a delimiter
|
2025-09-26 18:49:56 +02:00 |
|
Christian Risi
|
be8a87ce01
|
Modified the architecture for BPE
|
2025-09-26 18:49:29 +02:00 |
|
Christian Risi
|
3f48b5c428
|
Added text files to test a chunker
|
2025-09-26 18:48:44 +02:00 |
|
Christian Risi
|
9972ab8a51
|
Added imports
|
2025-09-26 18:48:23 +02:00 |
|