23 Commits

Author SHA1 Message Date
GassiGiuseppe
2bd24ec278 Created legacy folder for old pipeline
this pipeline still works but is slower then the new,
some ot its method can be used later
2025-10-05 14:54:32 +02:00
GassiGiuseppe
69fba7c3e9 new utility to generate a csv debug file of the output of the pipeline 2025-10-04 21:33:09 +02:00
GassiGiuseppe
64e355e80c Added regex to delete new lines and * from ObjectURI 2025-09-30 15:00:07 +02:00
GassiGiuseppe
8167c9d435 Added Toy Dataset entry point into the Pipeline class
Before it was forced into the sql_endpoint,
now all the pipeline can be managed in the Pipeline class
2025-09-29 16:03:49 +02:00
GassiGiuseppe
bd72ad3571 Added file to execute the complete cleaning pipeline 2025-09-29 15:21:26 +02:00
GassiGiuseppe
e521b0704e deleted TODO in path_splitter_tree, as it was already resolved 2025-09-25 19:19:11 +02:00
GassiGiuseppe
57884eaf2e CSV support added to path_splitter_tree
Also resolved a minor bug to print also leaf nodes
2025-09-25 17:57:46 +02:00
GassiGiuseppe
3eec49ffa5 WIP: added test file: clean_relationship.jupyter
to create a first cleaning pipeline
2025-09-25 16:28:24 +02:00
Christian Risi
f28952b0a2 Added todo 2025-09-25 12:00:26 +02:00
Christian Risi
4315d70109 Merged abbreviation_datawarehouse into datawarehouse 2025-09-24 19:29:43 +02:00
GassiGiuseppe
47197194d5 WIP abbrevietion_datawarehouse to creat an abbreviation system 2025-09-24 16:32:09 +02:00
Christian Risi
7feb4eb857 Fixed URI generation 2025-09-24 14:44:07 +02:00
Christian Risi
70af19d356 Removed unused imports and added trailing slashes 2025-09-24 14:04:48 +02:00
Christian Risi
59796c37cb Added script to take dbpedia uris 2025-09-24 13:49:29 +02:00
Christian Risi
25f401b577 Fixed bug for parsing and added CLI functionalities 2025-09-23 17:58:08 +02:00
4c9c51f902 Added barebone to have a splitter 2025-09-23 15:34:53 +02:00
GassiGiuseppe
ac1ed42c49 Folder DataCleaning renamed to DatasetMerging since it doesn't clean nothing
and instead Build the dataset
2025-09-22 17:11:49 +02:00
GassiGiuseppe
c5439533e6 DataRetrivial update, without df 2025-09-20 23:32:08 +02:00
GassiGiuseppe
8819b8e87f DataRetrivial populate the db from csv 2025-09-20 19:56:24 +02:00
Christian Risi
3d15e03b09 Renamed file to fix spelling 2025-09-20 16:38:38 +02:00
Christian Risi
0ee2ec6fcd Spelling corrections 2025-09-20 16:37:57 +02:00
Christian Risi
95cfa5486c Added instructions to create databse schema 2025-09-20 16:30:08 +02:00
GassiGiuseppe
0d30e90ee0 Created file for the db DatawareHouse
Also decided firsts schema models into DBMerger
2025-09-20 15:53:32 +02:00