272 Commits

Author SHA1 Message Date
Christian Risi
7d693964dd Added new directories to tree structure 2025-09-23 19:47:56 +02:00
Christian Risi
25f401b577 Fixed bug for parsing and added CLI functionalities 2025-09-23 17:58:08 +02:00
Christian Risi
14c5ade230 Added CLI functionalities 2025-09-23 17:57:38 +02:00
4c9c51f902 Added barebone to have a splitter 2025-09-23 15:34:53 +02:00
GassiGiuseppe
63c1a4a160 added little snippet to rebuild db from db_creation.sql 2025-09-22 17:52:23 +02:00
GassiGiuseppe
51114af853 DataRetrivial deleted since it does the same thing as datawarehouse.py 2025-09-22 17:51:35 +02:00
GassiGiuseppe
3a6dca0681 Infos about Dataset contruction from csv moved
from python file to markdown
2025-09-22 17:39:44 +02:00
GassiGiuseppe
346098d2b7 Added query.sql , file with the query used to populate the Dataset 2025-09-22 17:21:32 +02:00
GassiGiuseppe
64f9b41378 Built datawarehouse.py which populate the dataset 2025-09-22 17:17:22 +02:00
GassiGiuseppe
ac1ed42c49 Folder DataCleaning renamed to DatasetMerging since it doesn't clean nothing
and instead Build the dataset
2025-09-22 17:11:49 +02:00
GassiGiuseppe
edd01a2c83 Dataset updated, the new one is built with the new method
( 50 new rows found ... upon 13 milion )
2025-09-22 16:57:06 +02:00
GassiGiuseppe
5aa9e3fcf3 Added in DBPEDIA the query to get Film \ wiki page ID
plus some editing
2025-09-22 15:42:57 +02:00
GassiGiuseppe
0970cabf92 reverse.csv grammar correction of the header
it seemed to have missplaced the header also in the middle of the csv
2025-09-22 13:47:20 +02:00
GassiGiuseppe
a26d92750f Update movie-pageid.csv : grammar correction of the header 2025-09-22 12:59:35 +02:00
GassiGiuseppe
34c4782232 Dataset.db update. it seems to be correct 2025-09-20 23:33:56 +02:00
GassiGiuseppe
c5439533e6 DataRetrivial update, without df 2025-09-20 23:32:08 +02:00
GassiGiuseppe
8819b8e87f DataRetrivial populate the db from csv 2025-09-20 19:56:24 +02:00
Christian Risi
1076dc8aa6 Run /Scripts/DataCleaning/SQL_Queries/db_creation.sql 2025-09-20 16:39:16 +02:00
Christian Risi
3d15e03b09 Renamed file to fix spelling 2025-09-20 16:38:38 +02:00
Christian Risi
0ee2ec6fcd Spelling corrections 2025-09-20 16:37:57 +02:00
Christian Risi
95cfa5486c Added instructions to create databse schema 2025-09-20 16:30:08 +02:00
GassiGiuseppe
0d30e90ee0 Created file for the db DatawareHouse
Also decided firsts schema models into DBMerger
2025-09-20 15:53:32 +02:00
GassiGiuseppe
faaba17a98 Merge branch 'dev' of https://repositories.communitynotfound.work/PoliBa-DeepLearning/NanoSocrates into dev 2025-09-20 14:34:25 +02:00
Christian Risi
854e5f1d98 Updated file to gather data from wikipedia 2025-09-20 14:32:30 +02:00
GassiGiuseppe
242d7f674f wikipedia summary file uploaded
Dataset composed of PageId and wikipedia Summary
2025-09-20 14:32:25 +02:00
Christian Risi
de8c2afceb Added reconciliation 2025-09-19 22:22:09 +02:00
Christian Risi
f89dffff75 Created script to gather wikipedia abstracts 2025-09-19 19:01:38 +02:00
GassiGiuseppe
e39bad8348 Added Troubleshooting section to README
where are corrected some potential issue with git and big files
2025-09-19 13:39:56 +02:00
GassiGiuseppe
7a1a221017 update of the database of movie-pageid
which has subject has film uri and object wikipage id
2025-09-19 13:37:56 +02:00
Christian Risi
fafe6ae0f9 Modified tree structure with more TMP directories 2025-09-19 12:46:31 +02:00
Christian Risi
e32444df75 Updated fetchdata to be used in terminal
Changes:
  - now you can use it as if it were a cli command

Missing:
  - documentation
2025-09-19 12:35:15 +02:00
Christian Risi
b74b7ac4f0 Added new directories to make experiments and updated .gitignore
Changes:
  - Added /Scripts/Experiments/Queries to keep track
      of important queries, once set
  - Added /Scripts/Experiments/Tmp to run quick experiments
      when still unsure while explorating datasets
2025-09-19 08:43:54 +02:00
Christian Risi
22134391d9 Added Scripts/Experiment directory
This directory is to place files to make experiments
2025-09-19 08:41:46 +02:00
Christian Risi
82c9023849 Ignoring Scripts/Experiments files and always tracking .gitkeep files 2025-09-19 08:39:47 +02:00
Christian Risi
00b87e01ea Moved fetchdata.py to reflect working tree
old - ${Proj}/Scripts/fetchdata.py
new - ${Proj}/Scripts/DataGathering/fetchdata.py
2025-09-19 08:37:04 +02:00
Christian Risi
ce3d4bf6c5 Renamed dir from Script to Scripts 2025-09-19 08:31:00 +02:00
GassiGiuseppe
c415b175a0 added reverse.csv with the reletion incoming to films 2025-09-18 20:26:51 +02:00
GassiGiuseppe
ec81ea7930 Added file to gather wikipedia abstract from url 2025-09-18 20:26:11 +02:00
GassiGiuseppe
4bb03f86b3 Added file to study the most frequent relationship into a csv triplet 2025-09-18 20:25:25 +02:00
GassiGiuseppe
e5f201f3db DEVELOPMENT file makrdown created 2025-09-18 20:24:54 +02:00
GassiGiuseppe
1c715dc569 Typo correction in the markdown 2025-09-18 20:24:11 +02:00
GassiGiuseppe
6686b47328 Added SQL to obtain wikipedia url with movies 2025-09-18 20:23:10 +02:00
GassiGiuseppe
9a5a7d84fd Merge branch 'dev' of https://repositories.communitynotfound.work/PoliBa-DeepLearning/NanoSocrates into dev 2025-09-18 19:20:26 +02:00
GassiGiuseppe
9678ece9c0 Requirements changed
added Pandas and some other
2025-09-18 19:07:38 +02:00
Christian Risi
67bcd732b5 Updated movies 2025-09-18 18:36:52 +02:00
Christian Risi
1a4f900500 Updated git attributes 2025-09-18 18:36:42 +02:00
Christian Risi
ca8729b67c Merge branch 'dev' of https://repositories.communitynotfound.work/PoliBa-DeepLearning/NanoSocrates into dev 2025-09-18 18:36:30 +02:00
GassiGiuseppe
9dbffc52ed Added dataset of movies and their wikipedia's page link 2025-09-18 18:16:51 +02:00
Christian Risi
b7f504942a Created Dataset 2025-09-18 17:24:08 +02:00
Christian Risi
7f0c5ce8d3 Updated File for fetching 2025-09-18 17:23:56 +02:00