Christian Risi
90012285b5
UML Diagram to explain bpe workflows
2025-09-25 20:18:21 +02:00
GassiGiuseppe
9440a562f2
Merge branch 'dev.etl' of https://repositories.communitynotfound.work/PoliBa-DeepLearning/NanoSocrates into dev.etl
2025-09-25 18:33:51 +02:00
Christian Risi
5eda131aac
Fixed creation query to be unique even with movieID in RDFs
2025-09-25 17:58:09 +02:00
GassiGiuseppe
57884eaf2e
CSV support added to path_splitter_tree
...
Also resolved a minor bug to print also leaf nodes
2025-09-25 17:57:46 +02:00
GassiGiuseppe
3eec49ffa5
WIP: added test file: clean_relationship.jupyter
...
to create a first cleaning pipeline
2025-09-25 16:28:24 +02:00
Christian Risi
0bc7f4b227
Fixed Typos
2025-09-25 12:37:52 +02:00
Christian Risi
f28952b0a2
Added todo
2025-09-25 12:00:26 +02:00
Christian Risi
0b626a8e09
Modified query to take all data
2025-09-25 11:53:12 +02:00
Christian Risi
b254098532
Added views to count for subjects and objects
2025-09-25 11:40:44 +02:00
Christian Risi
ee88ffe4cf
Added View to filter over relationship counts
2025-09-25 11:32:03 +02:00
Christian Risi
70b4bd8645
Added Complex query
2025-09-25 11:31:34 +02:00
Christian Risi
6316d2bfc4
Added queries to take data from SQL for dataset
2025-09-25 11:27:19 +02:00
Christian Risi
4315d70109
Merged abbreviation_datawarehouse into datawarehouse
2025-09-24 19:29:43 +02:00
Christian Risi
a6760cd52d
Updated SQL Queries to support parsing in DB
2025-09-24 19:28:55 +02:00
GassiGiuseppe
a7eb92227d
Moved all db queries file in their own folder
2025-09-24 16:44:55 +02:00
GassiGiuseppe
9f221e31cd
Merge branch 'dev.etl' of https://repositories.communitynotfound.work/PoliBa-DeepLearning/NanoSocrates into dev.etl
2025-09-24 16:32:52 +02:00
GassiGiuseppe
47197194d5
WIP abbrevietion_datawarehouse to creat an abbreviation system
2025-09-24 16:32:09 +02:00
Christian Risi
0cdbf6f624
Added query to retrieve a dirty dataset from SQLite DB
2025-09-24 16:15:47 +02:00
Christian Risi
3e30489f86
Updated Queries for DB
2025-09-24 14:44:53 +02:00
Christian Risi
7feb4eb857
Fixed URI generation
2025-09-24 14:44:07 +02:00
Christian Risi
70af19d356
Removed unused imports and added trailing slashes
2025-09-24 14:04:48 +02:00
Christian Risi
59796c37cb
Added script to take dbpedia uris
2025-09-24 13:49:29 +02:00
Christian Risi
605b496da7
Added barebone UML diagram for a Cleaning Pipeline
2025-09-23 19:49:01 +02:00
Christian Risi
7d693964dd
Added new directories to tree structure
2025-09-23 19:47:56 +02:00
Christian Risi
25f401b577
Fixed bug for parsing and added CLI functionalities
2025-09-23 17:58:08 +02:00
Christian Risi
14c5ade230
Added CLI functionalities
2025-09-23 17:57:38 +02:00
4c9c51f902
Added barebone to have a splitter
2025-09-23 15:34:53 +02:00
GassiGiuseppe
63c1a4a160
added little snippet to rebuild db from db_creation.sql
2025-09-22 17:52:23 +02:00
GassiGiuseppe
51114af853
DataRetrivial deleted since it does the same thing as datawarehouse.py
2025-09-22 17:51:35 +02:00
GassiGiuseppe
3a6dca0681
Infos about Dataset contruction from csv moved
...
from python file to markdown
2025-09-22 17:39:44 +02:00
GassiGiuseppe
346098d2b7
Added query.sql , file with the query used to populate the Dataset
2025-09-22 17:21:32 +02:00
GassiGiuseppe
64f9b41378
Built datawarehouse.py which populate the dataset
2025-09-22 17:17:22 +02:00
GassiGiuseppe
ac1ed42c49
Folder DataCleaning renamed to DatasetMerging since it doesn't clean nothing
...
and instead Build the dataset
2025-09-22 17:11:49 +02:00
GassiGiuseppe
c5439533e6
DataRetrivial update, without df
2025-09-20 23:32:08 +02:00
GassiGiuseppe
8819b8e87f
DataRetrivial populate the db from csv
2025-09-20 19:56:24 +02:00
Christian Risi
3d15e03b09
Renamed file to fix spelling
2025-09-20 16:38:38 +02:00
Christian Risi
0ee2ec6fcd
Spelling corrections
2025-09-20 16:37:57 +02:00
Christian Risi
95cfa5486c
Added instructions to create databse schema
2025-09-20 16:30:08 +02:00
GassiGiuseppe
0d30e90ee0
Created file for the db DatawareHouse
...
Also decided firsts schema models into DBMerger
2025-09-20 15:53:32 +02:00
Christian Risi
854e5f1d98
Updated file to gather data from wikipedia
2025-09-20 14:32:30 +02:00
Christian Risi
de8c2afceb
Added reconciliation
2025-09-19 22:22:09 +02:00
Christian Risi
f89dffff75
Created script to gather wikipedia abstracts
2025-09-19 19:01:38 +02:00
Christian Risi
e32444df75
Updated fetchdata to be used in terminal
...
Changes:
- now you can use it as if it were a cli command
Missing:
- documentation
2025-09-19 12:35:15 +02:00
Christian Risi
b74b7ac4f0
Added new directories to make experiments and updated .gitignore
...
Changes:
- Added /Scripts/Experiments/Queries to keep track
of important queries, once set
- Added /Scripts/Experiments/Tmp to run quick experiments
when still unsure while explorating datasets
2025-09-19 08:43:54 +02:00
Christian Risi
22134391d9
Added Scripts/Experiment directory
...
This directory is to place files to make experiments
2025-09-19 08:41:46 +02:00
Christian Risi
00b87e01ea
Moved fetchdata.py to reflect working tree
...
old - ${Proj}/Scripts/fetchdata.py
new - ${Proj}/Scripts/DataGathering/fetchdata.py
2025-09-19 08:37:04 +02:00
Christian Risi
ce3d4bf6c5
Renamed dir from Script to Scripts
2025-09-19 08:31:00 +02:00