GassiGiuseppe
|
7307916891
|
update sql_endpoint to work with the new pipeline
|
2025-10-05 14:58:03 +02:00 |
|
GassiGiuseppe
|
acb43fc899
|
new faster pipeline
|
2025-10-05 14:57:45 +02:00 |
|
GassiGiuseppe
|
255d801a80
|
updated the mask rdf_mask_task.
however since the model will build the mask itself, it is deprecated
|
2025-10-05 14:56:33 +02:00 |
|
GassiGiuseppe
|
2bd24ec278
|
Created legacy folder for old pipeline
this pipeline still works but is slower then the new,
some ot its method can be used later
|
2025-10-05 14:54:32 +02:00 |
|
GassiGiuseppe
|
69fba7c3e9
|
new utility to generate a csv debug file of the output of the pipeline
|
2025-10-04 21:33:09 +02:00 |
|
GassiGiuseppe
|
64e355e80c
|
Added regex to delete new lines and * from ObjectURI
|
2025-09-30 15:00:07 +02:00 |
|
GassiGiuseppe
|
007f1e9554
|
minor updates
|
2025-09-29 18:53:33 +02:00 |
|
GassiGiuseppe
|
c319398ca0
|
little update to UML pipeline
|
2025-09-29 17:03:31 +02:00 |
|
GassiGiuseppe
|
255d8a072d
|
First implementation of the cleaning pipeline UML
|
2025-09-29 16:59:52 +02:00 |
|
GassiGiuseppe
|
8167c9d435
|
Added Toy Dataset entry point into the Pipeline class
Before it was forced into the sql_endpoint,
now all the pipeline can be managed in the Pipeline class
|
2025-09-29 16:03:49 +02:00 |
|
GassiGiuseppe
|
bd72ad3571
|
Added file to execute the complete cleaning pipeline
|
2025-09-29 15:21:26 +02:00 |
|
GassiGiuseppe
|
6ddb7de9da
|
Added sqlAlchemy to requirements
|
2025-09-29 15:19:19 +02:00 |
|
GassiGiuseppe
|
650b37c586
|
Added vscode setting to execute jupyternotebook from root dir
|
2025-09-26 11:24:34 +02:00 |
|
GassiGiuseppe
|
e521b0704e
|
deleted TODO in path_splitter_tree, as it was already resolved
|
2025-09-25 19:19:11 +02:00 |
|
Christian Risi
|
0a698e9837
|
Added schema to extract from DB for BPE
|
2025-09-25 19:09:52 +02:00 |
|
GassiGiuseppe
|
9440a562f2
|
Merge branch 'dev.etl' of https://repositories.communitynotfound.work/PoliBa-DeepLearning/NanoSocrates into dev.etl
|
2025-09-25 18:33:51 +02:00 |
|
Christian Risi
|
5eda131aac
|
Fixed creation query to be unique even with movieID in RDFs
|
2025-09-25 17:58:09 +02:00 |
|
GassiGiuseppe
|
57884eaf2e
|
CSV support added to path_splitter_tree
Also resolved a minor bug to print also leaf nodes
|
2025-09-25 17:57:46 +02:00 |
|
Christian Risi
|
4548a683c2
|
Fixed DB
|
2025-09-25 17:57:45 +02:00 |
|
GassiGiuseppe
|
3eec49ffa5
|
WIP: added test file: clean_relationship.jupyter
to create a first cleaning pipeline
|
2025-09-25 16:28:24 +02:00 |
|
Christian Risi
|
0bc7f4b227
|
Fixed Typos
|
2025-09-25 12:37:52 +02:00 |
|
Christian Risi
|
f28952b0a2
|
Added todo
|
2025-09-25 12:00:26 +02:00 |
|
Christian Risi
|
0b626a8e09
|
Modified query to take all data
|
2025-09-25 11:53:12 +02:00 |
|
Christian Risi
|
b254098532
|
Added views to count for subjects and objects
|
2025-09-25 11:40:44 +02:00 |
|
Christian Risi
|
ee88ffe4cf
|
Added View to filter over relationship counts
|
2025-09-25 11:32:03 +02:00 |
|
Christian Risi
|
70b4bd8645
|
Added Complex query
|
2025-09-25 11:31:34 +02:00 |
|
Christian Risi
|
6316d2bfc4
|
Added queries to take data from SQL for dataset
|
2025-09-25 11:27:19 +02:00 |
|
Christian Risi
|
87ca748f45
|
Updated DB to reflect new changes
|
2025-09-24 19:29:57 +02:00 |
|
Christian Risi
|
4315d70109
|
Merged abbreviation_datawarehouse into datawarehouse
|
2025-09-24 19:29:43 +02:00 |
|
Christian Risi
|
9a5d633b5e
|
Fixed Typos
|
2025-09-24 19:29:07 +02:00 |
|
Christian Risi
|
a6760cd52d
|
Updated SQL Queries to support parsing in DB
|
2025-09-24 19:28:55 +02:00 |
|
GassiGiuseppe
|
a7eb92227d
|
Moved all db queries file in their own folder
|
2025-09-24 16:44:55 +02:00 |
|
GassiGiuseppe
|
9f221e31cd
|
Merge branch 'dev.etl' of https://repositories.communitynotfound.work/PoliBa-DeepLearning/NanoSocrates into dev.etl
|
2025-09-24 16:32:52 +02:00 |
|
GassiGiuseppe
|
47197194d5
|
WIP abbrevietion_datawarehouse to creat an abbreviation system
|
2025-09-24 16:32:09 +02:00 |
|
Christian Risi
|
0cdbf6f624
|
Added query to retrieve a dirty dataset from SQLite DB
|
2025-09-24 16:15:47 +02:00 |
|
Christian Risi
|
3e30489f86
|
Updated Queries for DB
|
2025-09-24 14:44:53 +02:00 |
|
Christian Risi
|
8a22e453e4
|
Fixed csv
|
2025-09-24 14:44:25 +02:00 |
|
Christian Risi
|
7feb4eb857
|
Fixed URI generation
|
2025-09-24 14:44:07 +02:00 |
|
Christian Risi
|
70af19d356
|
Removed unused imports and added trailing slashes
|
2025-09-24 14:04:48 +02:00 |
|
Christian Risi
|
a4b44ab2ee
|
Fixed Typos
|
2025-09-24 14:04:27 +02:00 |
|
Christian Risi
|
74b6b609dd
|
Fixed typos
|
2025-09-24 13:59:19 +02:00 |
|
Christian Risi
|
59796c37cb
|
Added script to take dbpedia uris
|
2025-09-24 13:49:29 +02:00 |
|
Christian Risi
|
f696f5950b
|
Added uri-abbreviations
|
2025-09-24 13:48:53 +02:00 |
|
Christian Risi
|
605b496da7
|
Added barebone UML diagram for a Cleaning Pipeline
|
2025-09-23 19:49:01 +02:00 |
|
Christian Risi
|
7d693964dd
|
Added new directories to tree structure
|
2025-09-23 19:47:56 +02:00 |
|
Christian Risi
|
25f401b577
|
Fixed bug for parsing and added CLI functionalities
|
2025-09-23 17:58:08 +02:00 |
|
Christian Risi
|
14c5ade230
|
Added CLI functionalities
|
2025-09-23 17:57:38 +02:00 |
|
|
|
4c9c51f902
|
Added barebone to have a splitter
|
2025-09-23 15:34:53 +02:00 |
|
GassiGiuseppe
|
63c1a4a160
|
added little snippet to rebuild db from db_creation.sql
|
2025-09-22 17:52:23 +02:00 |
|
GassiGiuseppe
|
51114af853
|
DataRetrivial deleted since it does the same thing as datawarehouse.py
|
2025-09-22 17:51:35 +02:00 |
|