18 Commits

Author SHA1 Message Date
GassiGiuseppe
fc44929a7b moved spanned mask variables in init for better reliability, also tested 2025-10-07 23:15:50 +02:00
Christian Risi
1797571bb2 Added test to see if illegal tokens were included in target 2025-10-06 16:17:12 +02:00
Christian Risi
d3bba9b944 Added actual test 2025-10-06 16:06:17 +02:00
Christian Risi
d3b1f7da91 Added testing for spanned masking 2025-10-06 15:55:40 +02:00
GassiGiuseppe
c2f9344c82 little test file 2025-10-04 18:58:20 +02:00
GassiGiuseppe
25f3a5d221 Logic to test BPE 2025-10-04 18:58:04 +02:00
Christian Risi
149deb407d added cache directories 2025-10-03 18:01:05 +02:00
Christian Risi
c74689d01d Fixed tests to reflect new version of tokenizer 2025-10-03 13:27:38 +02:00
GassiGiuseppe
09f7b39512 test files updated 2025-10-03 01:04:47 +02:00
Christian Risi
a1d143187d corrected test to reflect changes in BPE trainer 2025-10-02 20:11:43 +02:00
Christian Risi
2194cc7b4f Changed test to use pool trainer 2025-10-02 09:56:05 +02:00
Christian Risi
eadba1fb82 Corrected test to reflect changes in NanoSocratesBPE 2025-10-02 09:33:47 +02:00
Christian Risi
76f24d4eb0 Renamed file 2025-09-30 23:58:43 +02:00
Christian Risi
ccacea18d8 Created files to test BPE training 2025-09-30 13:33:54 +02:00
Christian Risi
e433941405 Added BPE
TODO:
- complete the fit method
2025-09-28 18:04:44 +02:00
Christian Risi
d179e01971 Added Splitter to divide tokens from text 2025-09-28 18:03:16 +02:00
Christian Risi
3e8b5c5579 Added test for chunker 2025-09-26 18:50:32 +02:00
Christian Risi
3f48b5c428 Added text files to test a chunker 2025-09-26 18:48:44 +02:00