113 Commits

Author SHA1 Message Date
GassiGiuseppe
2bd24ec278 Created legacy folder for old pipeline
this pipeline still works but is slower then the new,
some ot its method can be used later
2025-10-05 14:54:32 +02:00
GassiGiuseppe
69fba7c3e9 new utility to generate a csv debug file of the output of the pipeline 2025-10-04 21:33:09 +02:00
GassiGiuseppe
64e355e80c Added regex to delete new lines and * from ObjectURI 2025-09-30 15:00:07 +02:00
GassiGiuseppe
007f1e9554 minor updates 2025-09-29 18:53:33 +02:00
GassiGiuseppe
c319398ca0 little update to UML pipeline 2025-09-29 17:03:31 +02:00
GassiGiuseppe
255d8a072d First implementation of the cleaning pipeline UML 2025-09-29 16:59:52 +02:00
GassiGiuseppe
8167c9d435 Added Toy Dataset entry point into the Pipeline class
Before it was forced into the sql_endpoint,
now all the pipeline can be managed in the Pipeline class
2025-09-29 16:03:49 +02:00
GassiGiuseppe
bd72ad3571 Added file to execute the complete cleaning pipeline 2025-09-29 15:21:26 +02:00
GassiGiuseppe
6ddb7de9da Added sqlAlchemy to requirements 2025-09-29 15:19:19 +02:00
GassiGiuseppe
650b37c586 Added vscode setting to execute jupyternotebook from root dir 2025-09-26 11:24:34 +02:00
GassiGiuseppe
e521b0704e deleted TODO in path_splitter_tree, as it was already resolved 2025-09-25 19:19:11 +02:00
Christian Risi
0a698e9837 Added schema to extract from DB for BPE 2025-09-25 19:09:52 +02:00
GassiGiuseppe
9440a562f2 Merge branch 'dev.etl' of https://repositories.communitynotfound.work/PoliBa-DeepLearning/NanoSocrates into dev.etl 2025-09-25 18:33:51 +02:00
Christian Risi
5eda131aac Fixed creation query to be unique even with movieID in RDFs 2025-09-25 17:58:09 +02:00
GassiGiuseppe
57884eaf2e CSV support added to path_splitter_tree
Also resolved a minor bug to print also leaf nodes
2025-09-25 17:57:46 +02:00
Christian Risi
4548a683c2 Fixed DB 2025-09-25 17:57:45 +02:00
GassiGiuseppe
3eec49ffa5 WIP: added test file: clean_relationship.jupyter
to create a first cleaning pipeline
2025-09-25 16:28:24 +02:00
Christian Risi
0bc7f4b227 Fixed Typos 2025-09-25 12:37:52 +02:00
Christian Risi
f28952b0a2 Added todo 2025-09-25 12:00:26 +02:00
Christian Risi
0b626a8e09 Modified query to take all data 2025-09-25 11:53:12 +02:00
Christian Risi
b254098532 Added views to count for subjects and objects 2025-09-25 11:40:44 +02:00
Christian Risi
ee88ffe4cf Added View to filter over relationship counts 2025-09-25 11:32:03 +02:00
Christian Risi
70b4bd8645 Added Complex query 2025-09-25 11:31:34 +02:00
Christian Risi
6316d2bfc4 Added queries to take data from SQL for dataset 2025-09-25 11:27:19 +02:00
Christian Risi
87ca748f45 Updated DB to reflect new changes 2025-09-24 19:29:57 +02:00
Christian Risi
4315d70109 Merged abbreviation_datawarehouse into datawarehouse 2025-09-24 19:29:43 +02:00
Christian Risi
9a5d633b5e Fixed Typos 2025-09-24 19:29:07 +02:00
Christian Risi
a6760cd52d Updated SQL Queries to support parsing in DB 2025-09-24 19:28:55 +02:00
GassiGiuseppe
a7eb92227d Moved all db queries file in their own folder 2025-09-24 16:44:55 +02:00
GassiGiuseppe
9f221e31cd Merge branch 'dev.etl' of https://repositories.communitynotfound.work/PoliBa-DeepLearning/NanoSocrates into dev.etl 2025-09-24 16:32:52 +02:00
GassiGiuseppe
47197194d5 WIP abbrevietion_datawarehouse to creat an abbreviation system 2025-09-24 16:32:09 +02:00
Christian Risi
0cdbf6f624 Added query to retrieve a dirty dataset from SQLite DB 2025-09-24 16:15:47 +02:00
Christian Risi
3e30489f86 Updated Queries for DB 2025-09-24 14:44:53 +02:00
Christian Risi
8a22e453e4 Fixed csv 2025-09-24 14:44:25 +02:00
Christian Risi
7feb4eb857 Fixed URI generation 2025-09-24 14:44:07 +02:00
Christian Risi
70af19d356 Removed unused imports and added trailing slashes 2025-09-24 14:04:48 +02:00
Christian Risi
a4b44ab2ee Fixed Typos 2025-09-24 14:04:27 +02:00
Christian Risi
74b6b609dd Fixed typos 2025-09-24 13:59:19 +02:00
Christian Risi
59796c37cb Added script to take dbpedia uris 2025-09-24 13:49:29 +02:00
Christian Risi
f696f5950b Added uri-abbreviations 2025-09-24 13:48:53 +02:00
Christian Risi
605b496da7 Added barebone UML diagram for a Cleaning Pipeline 2025-09-23 19:49:01 +02:00
Christian Risi
7d693964dd Added new directories to tree structure 2025-09-23 19:47:56 +02:00
Christian Risi
25f401b577 Fixed bug for parsing and added CLI functionalities 2025-09-23 17:58:08 +02:00
Christian Risi
14c5ade230 Added CLI functionalities 2025-09-23 17:57:38 +02:00
4c9c51f902 Added barebone to have a splitter 2025-09-23 15:34:53 +02:00
GassiGiuseppe
63c1a4a160 added little snippet to rebuild db from db_creation.sql 2025-09-22 17:52:23 +02:00
GassiGiuseppe
51114af853 DataRetrivial deleted since it does the same thing as datawarehouse.py 2025-09-22 17:51:35 +02:00
GassiGiuseppe
3a6dca0681 Infos about Dataset contruction from csv moved
from python file to markdown
2025-09-22 17:39:44 +02:00
GassiGiuseppe
346098d2b7 Added query.sql , file with the query used to populate the Dataset 2025-09-22 17:21:32 +02:00
GassiGiuseppe
64f9b41378 Built datawarehouse.py which populate the dataset 2025-09-22 17:17:22 +02:00