106 Commits

Author SHA1 Message Date
Christian Risi
3f48b5c428 Added text files to test a chunker 2025-09-26 18:48:44 +02:00
Christian Risi
9972ab8a51 Added imports 2025-09-26 18:48:23 +02:00
Christian Risi
90012285b5 UML Diagram to explain bpe workflows 2025-09-25 20:18:21 +02:00
Christian Risi
1bbb4a0999 Added new paper 2025-09-25 20:17:48 +02:00
Christian Risi
ee0aa583d5 Added Docs for BPE research 2025-09-25 19:10:45 +02:00
GassiGiuseppe
9440a562f2 Merge branch 'dev.etl' of https://repositories.communitynotfound.work/PoliBa-DeepLearning/NanoSocrates into dev.etl 2025-09-25 18:33:51 +02:00
Christian Risi
5eda131aac Fixed creation query to be unique even with movieID in RDFs 2025-09-25 17:58:09 +02:00
GassiGiuseppe
57884eaf2e CSV support added to path_splitter_tree
Also resolved a minor bug to print also leaf nodes
2025-09-25 17:57:46 +02:00
Christian Risi
4548a683c2 Fixed DB 2025-09-25 17:57:45 +02:00
GassiGiuseppe
3eec49ffa5 WIP: added test file: clean_relationship.jupyter
to create a first cleaning pipeline
2025-09-25 16:28:24 +02:00
Christian Risi
0bc7f4b227 Fixed Typos 2025-09-25 12:37:52 +02:00
Christian Risi
f28952b0a2 Added todo 2025-09-25 12:00:26 +02:00
Christian Risi
0b626a8e09 Modified query to take all data 2025-09-25 11:53:12 +02:00
Christian Risi
b254098532 Added views to count for subjects and objects 2025-09-25 11:40:44 +02:00
Christian Risi
ee88ffe4cf Added View to filter over relationship counts 2025-09-25 11:32:03 +02:00
Christian Risi
70b4bd8645 Added Complex query 2025-09-25 11:31:34 +02:00
Christian Risi
6316d2bfc4 Added queries to take data from SQL for dataset 2025-09-25 11:27:19 +02:00
Christian Risi
87ca748f45 Updated DB to reflect new changes 2025-09-24 19:29:57 +02:00
Christian Risi
4315d70109 Merged abbreviation_datawarehouse into datawarehouse 2025-09-24 19:29:43 +02:00
Christian Risi
9a5d633b5e Fixed Typos 2025-09-24 19:29:07 +02:00
Christian Risi
a6760cd52d Updated SQL Queries to support parsing in DB 2025-09-24 19:28:55 +02:00
GassiGiuseppe
a7eb92227d Moved all db queries file in their own folder 2025-09-24 16:44:55 +02:00
GassiGiuseppe
9f221e31cd Merge branch 'dev.etl' of https://repositories.communitynotfound.work/PoliBa-DeepLearning/NanoSocrates into dev.etl 2025-09-24 16:32:52 +02:00
GassiGiuseppe
47197194d5 WIP abbrevietion_datawarehouse to creat an abbreviation system 2025-09-24 16:32:09 +02:00
Christian Risi
0cdbf6f624 Added query to retrieve a dirty dataset from SQLite DB 2025-09-24 16:15:47 +02:00
Christian Risi
3e30489f86 Updated Queries for DB 2025-09-24 14:44:53 +02:00
Christian Risi
8a22e453e4 Fixed csv 2025-09-24 14:44:25 +02:00
Christian Risi
7feb4eb857 Fixed URI generation 2025-09-24 14:44:07 +02:00
Christian Risi
70af19d356 Removed unused imports and added trailing slashes 2025-09-24 14:04:48 +02:00
Christian Risi
a4b44ab2ee Fixed Typos 2025-09-24 14:04:27 +02:00
Christian Risi
74b6b609dd Fixed typos 2025-09-24 13:59:19 +02:00
Christian Risi
59796c37cb Added script to take dbpedia uris 2025-09-24 13:49:29 +02:00
Christian Risi
f696f5950b Added uri-abbreviations 2025-09-24 13:48:53 +02:00
Christian Risi
605b496da7 Added barebone UML diagram for a Cleaning Pipeline 2025-09-23 19:49:01 +02:00
Christian Risi
7d693964dd Added new directories to tree structure 2025-09-23 19:47:56 +02:00
Christian Risi
25f401b577 Fixed bug for parsing and added CLI functionalities 2025-09-23 17:58:08 +02:00
Christian Risi
14c5ade230 Added CLI functionalities 2025-09-23 17:57:38 +02:00
4c9c51f902 Added barebone to have a splitter 2025-09-23 15:34:53 +02:00
GassiGiuseppe
63c1a4a160 added little snippet to rebuild db from db_creation.sql 2025-09-22 17:52:23 +02:00
GassiGiuseppe
51114af853 DataRetrivial deleted since it does the same thing as datawarehouse.py 2025-09-22 17:51:35 +02:00
GassiGiuseppe
3a6dca0681 Infos about Dataset contruction from csv moved
from python file to markdown
2025-09-22 17:39:44 +02:00
GassiGiuseppe
346098d2b7 Added query.sql , file with the query used to populate the Dataset 2025-09-22 17:21:32 +02:00
GassiGiuseppe
64f9b41378 Built datawarehouse.py which populate the dataset 2025-09-22 17:17:22 +02:00
GassiGiuseppe
ac1ed42c49 Folder DataCleaning renamed to DatasetMerging since it doesn't clean nothing
and instead Build the dataset
2025-09-22 17:11:49 +02:00
GassiGiuseppe
edd01a2c83 Dataset updated, the new one is built with the new method
( 50 new rows found ... upon 13 milion )
2025-09-22 16:57:06 +02:00
GassiGiuseppe
5aa9e3fcf3 Added in DBPEDIA the query to get Film \ wiki page ID
plus some editing
2025-09-22 15:42:57 +02:00
GassiGiuseppe
0970cabf92 reverse.csv grammar correction of the header
it seemed to have missplaced the header also in the middle of the csv
2025-09-22 13:47:20 +02:00
GassiGiuseppe
a26d92750f Update movie-pageid.csv : grammar correction of the header 2025-09-22 12:59:35 +02:00
GassiGiuseppe
34c4782232 Dataset.db update. it seems to be correct 2025-09-20 23:33:56 +02:00
GassiGiuseppe
c5439533e6 DataRetrivial update, without df 2025-09-20 23:32:08 +02:00