25 Commits

Author SHA1 Message Date
Christian Risi
605b496da7 Added barebone UML diagram for a Cleaning Pipeline 2025-09-23 19:49:01 +02:00
Christian Risi
7d693964dd Added new directories to tree structure 2025-09-23 19:47:56 +02:00
Christian Risi
25f401b577 Fixed bug for parsing and added CLI functionalities 2025-09-23 17:58:08 +02:00
Christian Risi
14c5ade230 Added CLI functionalities 2025-09-23 17:57:38 +02:00
4c9c51f902 Added barebone to have a splitter 2025-09-23 15:34:53 +02:00
GassiGiuseppe
63c1a4a160 added little snippet to rebuild db from db_creation.sql 2025-09-22 17:52:23 +02:00
GassiGiuseppe
51114af853 DataRetrivial deleted since it does the same thing as datawarehouse.py 2025-09-22 17:51:35 +02:00
GassiGiuseppe
3a6dca0681 Infos about Dataset contruction from csv moved
from python file to markdown
2025-09-22 17:39:44 +02:00
GassiGiuseppe
346098d2b7 Added query.sql , file with the query used to populate the Dataset 2025-09-22 17:21:32 +02:00
GassiGiuseppe
64f9b41378 Built datawarehouse.py which populate the dataset 2025-09-22 17:17:22 +02:00
GassiGiuseppe
ac1ed42c49 Folder DataCleaning renamed to DatasetMerging since it doesn't clean nothing
and instead Build the dataset
2025-09-22 17:11:49 +02:00
GassiGiuseppe
c5439533e6 DataRetrivial update, without df 2025-09-20 23:32:08 +02:00
GassiGiuseppe
8819b8e87f DataRetrivial populate the db from csv 2025-09-20 19:56:24 +02:00
Christian Risi
3d15e03b09 Renamed file to fix spelling 2025-09-20 16:38:38 +02:00
Christian Risi
0ee2ec6fcd Spelling corrections 2025-09-20 16:37:57 +02:00
Christian Risi
95cfa5486c Added instructions to create databse schema 2025-09-20 16:30:08 +02:00
GassiGiuseppe
0d30e90ee0 Created file for the db DatawareHouse
Also decided firsts schema models into DBMerger
2025-09-20 15:53:32 +02:00
Christian Risi
854e5f1d98 Updated file to gather data from wikipedia 2025-09-20 14:32:30 +02:00
Christian Risi
de8c2afceb Added reconciliation 2025-09-19 22:22:09 +02:00
Christian Risi
f89dffff75 Created script to gather wikipedia abstracts 2025-09-19 19:01:38 +02:00
Christian Risi
e32444df75 Updated fetchdata to be used in terminal
Changes:
  - now you can use it as if it were a cli command

Missing:
  - documentation
2025-09-19 12:35:15 +02:00
Christian Risi
b74b7ac4f0 Added new directories to make experiments and updated .gitignore
Changes:
  - Added /Scripts/Experiments/Queries to keep track
      of important queries, once set
  - Added /Scripts/Experiments/Tmp to run quick experiments
      when still unsure while explorating datasets
2025-09-19 08:43:54 +02:00
Christian Risi
22134391d9 Added Scripts/Experiment directory
This directory is to place files to make experiments
2025-09-19 08:41:46 +02:00
Christian Risi
00b87e01ea Moved fetchdata.py to reflect working tree
old - ${Proj}/Scripts/fetchdata.py
new - ${Proj}/Scripts/DataGathering/fetchdata.py
2025-09-19 08:37:04 +02:00
Christian Risi
ce3d4bf6c5 Renamed dir from Script to Scripts 2025-09-19 08:31:00 +02:00