32 Commits

Author SHA1 Message Date
GassiGiuseppe
972a73758d added holdout for curated dataset 2025-10-12 19:06:09 +02:00
GassiGiuseppe
b38a011105 added curated dataset, which is 8000 2025-10-12 19:01:28 +02:00
Christian Risi
160b7dbfc0 V0.0.1 Athene 2025-10-11 19:35:43 +02:00
Christian Risi
15f203cad5 Added boe 16k tokens vocabulary 2025-10-10 18:43:02 +02:00
Christian Risi
bed9718f27 Added BPE small vocabulary 2025-10-10 11:40:39 +02:00
Christian Risi
d1ff88da82 Added small dataset 2025-10-07 20:44:40 +02:00
Christian Risi
3f465991f0 Added toy dataset 2025-10-07 20:44:11 +02:00
GassiGiuseppe
0f95aeb122 toy dictionary for bpe implemeted 2025-10-03 16:26:01 +02:00
Christian Risi
4548a683c2 Fixed DB 2025-09-25 17:57:45 +02:00
Christian Risi
87ca748f45 Updated DB to reflect new changes 2025-09-24 19:29:57 +02:00
Christian Risi
9a5d633b5e Fixed Typos 2025-09-24 19:29:07 +02:00
Christian Risi
8a22e453e4 Fixed csv 2025-09-24 14:44:25 +02:00
Christian Risi
a4b44ab2ee Fixed Typos 2025-09-24 14:04:27 +02:00
Christian Risi
74b6b609dd Fixed typos 2025-09-24 13:59:19 +02:00
Christian Risi
f696f5950b Added uri-abbreviations 2025-09-24 13:48:53 +02:00
GassiGiuseppe
edd01a2c83 Dataset updated, the new one is built with the new method
( 50 new rows found ... upon 13 milion )
2025-09-22 16:57:06 +02:00
GassiGiuseppe
0970cabf92 reverse.csv grammar correction of the header
it seemed to have missplaced the header also in the middle of the csv
2025-09-22 13:47:20 +02:00
GassiGiuseppe
a26d92750f Update movie-pageid.csv : grammar correction of the header 2025-09-22 12:59:35 +02:00
GassiGiuseppe
34c4782232 Dataset.db update. it seems to be correct 2025-09-20 23:33:56 +02:00
Christian Risi
1076dc8aa6 Run /Scripts/DataCleaning/SQL_Queries/db_creation.sql 2025-09-20 16:39:16 +02:00
GassiGiuseppe
0d30e90ee0 Created file for the db DatawareHouse
Also decided firsts schema models into DBMerger
2025-09-20 15:53:32 +02:00
GassiGiuseppe
242d7f674f wikipedia summary file uploaded
Dataset composed of PageId and wikipedia Summary
2025-09-20 14:32:25 +02:00
GassiGiuseppe
7a1a221017 update of the database of movie-pageid
which has subject has film uri and object wikipage id
2025-09-19 13:37:56 +02:00
Christian Risi
fafe6ae0f9 Modified tree structure with more TMP directories 2025-09-19 12:46:31 +02:00
GassiGiuseppe
c415b175a0 added reverse.csv with the reletion incoming to films 2025-09-18 20:26:51 +02:00
Christian Risi
67bcd732b5 Updated movies 2025-09-18 18:36:52 +02:00
Christian Risi
ca8729b67c Merge branch 'dev' of https://repositories.communitynotfound.work/PoliBa-DeepLearning/NanoSocrates into dev 2025-09-18 18:36:30 +02:00
GassiGiuseppe
9dbffc52ed Added dataset of movies and their wikipedia's page link 2025-09-18 18:16:51 +02:00
Christian Risi
b7f504942a Created Dataset 2025-09-18 17:24:08 +02:00
Christian Risi
16e7ab4d9f Modified Datasets 2025-09-17 17:30:51 +02:00
Christian Risi
3e59efcf33 Generated datasets 2025-09-17 17:06:14 +02:00
Christian Risi
6afd6f91cc Added barebone structure 2025-09-17 11:02:51 +02:00