280 Commits

Author SHA1 Message Date
GassiGiuseppe
e1549d4458 Modified decoder and decoder for sequential architecture 2025-10-06 18:20:46 +02:00
Christian Risi
456ce724fe Added capability of returning target after truncating 2025-10-06 17:43:01 +02:00
Christian Risi
44307cd917 Added util to create padding mask 2025-10-06 17:29:05 +02:00
Christian Risi
ffdb312d58 Added a util to create truncated RDF lists 2025-10-06 17:22:13 +02:00
Christian Risi
0007c38212 Added a util to make masked inference 2025-10-06 17:02:06 +02:00
Christian Risi
9c1043e0ba Added post tokenization utils 2025-10-06 17:01:18 +02:00
Christian Risi
ee8e56798c Added new utils 2025-10-06 17:00:55 +02:00
Christian Risi
1797571bb2 Added test to see if illegal tokens were included in target 2025-10-06 16:17:12 +02:00
Christian Risi
e93710af08 Fixed illegal tokens being added in target output 2025-10-06 16:16:47 +02:00
Christian Risi
d3bba9b944 Added actual test 2025-10-06 16:06:17 +02:00
Christian Risi
b1e7af0607 Merge branch 'dev.embedder' of https://repositories.communitynotfound.work/PoliBa-DeepLearning/NanoSocrates into dev.embedder 2025-10-06 15:55:44 +02:00
Christian Risi
d3b1f7da91 Added testing for spanned masking 2025-10-06 15:55:40 +02:00
Christian Risi
c217f5dec9 Added 2 types of masking 2025-10-06 15:45:45 +02:00
Christian Risi
49f0beb6ea Updated imports 2025-10-06 15:45:28 +02:00
GassiGiuseppe
05bb460999 file to test batch attention mask 2025-10-06 13:03:20 +02:00
GassiGiuseppe
948c3fd7ac update to batch attention mask 2025-10-06 13:03:03 +02:00
GassiGiuseppe
87409fecd5 added method fot batched attention_mask 2025-10-06 12:00:11 +02:00
GassiGiuseppe
7e40a36701 wip: NanoSocratesCore 2025-10-05 22:58:06 +02:00
GassiGiuseppe
d48815cca2 added task_type and updated init 2025-10-05 18:58:42 +02:00
GassiGiuseppe
0f243eaac2 added padding_mask entry to decoder and encoder 2025-10-05 18:46:06 +02:00
GassiGiuseppe
9c83d9fa71 Merge branch 'dev.embedder' of https://repositories.communitynotfound.work/PoliBa-DeepLearning/NanoSocrates into dev.embedder 2025-10-05 18:45:33 +02:00
Christian Risi
a693cbb77e A set of utils for our pipeline 2025-10-05 18:37:43 +02:00
GassiGiuseppe
6f219f634f Added attention_mask 2025-10-05 17:49:01 +02:00
GassiGiuseppe
b303affd18 updated uml of the model 2025-10-05 16:40:19 +02:00
Christian Risi
53c4decac7 Added playgrounds for the architecture 2025-10-05 16:30:23 +02:00
Christian Risi
c60da8ba82 Refactoring 2025-10-05 15:40:29 +02:00
Christian Risi
3b5e6c099c Merge branch 'dev' into dev.embedder 2025-10-05 11:17:09 +02:00
Christian Risi
ba3a718480 Merge branch 'dev.etl' into dev 2025-10-05 11:16:54 +02:00
GassiGiuseppe
69fba7c3e9 new utility to generate a csv debug file of the output of the pipeline 2025-10-04 21:33:09 +02:00
GassiGiuseppe
76200d936d added first classes (Encoder, Decoder, Attention) for the model 2025-10-04 21:07:58 +02:00
Christian Risi
9b656e7918 Added a playground to test the embedding phase 2025-10-04 19:43:42 +02:00
Christian Risi
9a797a0485 Added embedder code for "Attention is all you need" 2025-10-04 19:43:25 +02:00
Christian Risi
3b274ad807 Added a way to take the default special token list 2025-10-04 19:43:02 +02:00
Christian Risi
8f5e2f2f0d Modifications 2025-10-04 19:42:45 +02:00
Christian Risi
da0bdf703b Added a way to see vocabulary size 2025-10-04 19:42:29 +02:00
Christian Risi
03cdca1f00 Modified imports for BPE 2025-10-04 19:42:02 +02:00
Christian Risi
7188c8678a Added imports for Embedder 2025-10-04 19:41:48 +02:00
Christian Risi
1eef25a697 Merge branch 'dev' into dev.embedder 2025-10-04 19:04:03 +02:00
Christian Risi
e9165fb146 Merge branch 'dev.bpe' into dev 2025-10-04 19:03:09 +02:00
GassiGiuseppe
bbadd4c521 update cleaning pipeline with a new method to filter also by number of films,
also updated the signature of the pipeline
2025-10-04 19:00:05 +02:00
GassiGiuseppe
c2f9344c82 little test file 2025-10-04 18:58:20 +02:00
GassiGiuseppe
25f3a5d221 Logic to test BPE 2025-10-04 18:58:04 +02:00
Christian Risi
e8ff82c40a Updated with tasks architectures 2025-10-04 10:57:12 +02:00
Christian Risi
23d1eaf99e Fixed a rare bug over training multiple times 2025-10-04 10:47:39 +02:00
Christian Risi
25a6ad1254 Added model high level architecture 2025-10-03 23:37:16 +02:00
Christian Risi
460d4f5188 Renamed directory to Playgrounds 2025-10-03 22:59:43 +02:00
Christian Risi
c6ac6df2c2 Added stubs for other libraries 2025-10-03 20:28:23 +02:00
Christian Risi
15baba54ab Sanity check to autodetect Device 2025-10-03 20:16:01 +02:00
Christian Risi
87f24878f4 Added shims for utils on using Pytorch 2025-10-03 20:11:14 +02:00
Christian Risi
999141f886 Merge branch 'dev' into dev.embedder 2025-10-03 18:08:34 +02:00