4c9c51f902
Added barebone to have a splitter
2025-09-23 15:34:53 +02:00
GassiGiuseppe
63c1a4a160
added little snippet to rebuild db from db_creation.sql
2025-09-22 17:52:23 +02:00
GassiGiuseppe
51114af853
DataRetrivial deleted since it does the same thing as datawarehouse.py
2025-09-22 17:51:35 +02:00
GassiGiuseppe
3a6dca0681
Infos about Dataset contruction from csv moved
...
from python file to markdown
2025-09-22 17:39:44 +02:00
GassiGiuseppe
346098d2b7
Added query.sql , file with the query used to populate the Dataset
2025-09-22 17:21:32 +02:00
GassiGiuseppe
64f9b41378
Built datawarehouse.py which populate the dataset
2025-09-22 17:17:22 +02:00
GassiGiuseppe
ac1ed42c49
Folder DataCleaning renamed to DatasetMerging since it doesn't clean nothing
...
and instead Build the dataset
2025-09-22 17:11:49 +02:00
GassiGiuseppe
edd01a2c83
Dataset updated, the new one is built with the new method
...
( 50 new rows found ... upon 13 milion )
2025-09-22 16:57:06 +02:00
GassiGiuseppe
5aa9e3fcf3
Added in DBPEDIA the query to get Film \ wiki page ID
...
plus some editing
2025-09-22 15:42:57 +02:00
GassiGiuseppe
0970cabf92
reverse.csv grammar correction of the header
...
it seemed to have missplaced the header also in the middle of the csv
2025-09-22 13:47:20 +02:00
GassiGiuseppe
a26d92750f
Update movie-pageid.csv : grammar correction of the header
2025-09-22 12:59:35 +02:00
GassiGiuseppe
34c4782232
Dataset.db update. it seems to be correct
2025-09-20 23:33:56 +02:00
GassiGiuseppe
c5439533e6
DataRetrivial update, without df
2025-09-20 23:32:08 +02:00
GassiGiuseppe
8819b8e87f
DataRetrivial populate the db from csv
2025-09-20 19:56:24 +02:00
Christian Risi
1076dc8aa6
Run /Scripts/DataCleaning/SQL_Queries/db_creation.sql
2025-09-20 16:39:16 +02:00
Christian Risi
3d15e03b09
Renamed file to fix spelling
2025-09-20 16:38:38 +02:00
Christian Risi
0ee2ec6fcd
Spelling corrections
2025-09-20 16:37:57 +02:00
Christian Risi
95cfa5486c
Added instructions to create databse schema
2025-09-20 16:30:08 +02:00
GassiGiuseppe
0d30e90ee0
Created file for the db DatawareHouse
...
Also decided firsts schema models into DBMerger
2025-09-20 15:53:32 +02:00
GassiGiuseppe
faaba17a98
Merge branch 'dev' of https://repositories.communitynotfound.work/PoliBa-DeepLearning/NanoSocrates into dev
2025-09-20 14:34:25 +02:00
Christian Risi
854e5f1d98
Updated file to gather data from wikipedia
2025-09-20 14:32:30 +02:00
GassiGiuseppe
242d7f674f
wikipedia summary file uploaded
...
Dataset composed of PageId and wikipedia Summary
2025-09-20 14:32:25 +02:00
Christian Risi
de8c2afceb
Added reconciliation
2025-09-19 22:22:09 +02:00
Christian Risi
f89dffff75
Created script to gather wikipedia abstracts
2025-09-19 19:01:38 +02:00
GassiGiuseppe
e39bad8348
Added Troubleshooting section to README
...
where are corrected some potential issue with git and big files
2025-09-19 13:39:56 +02:00
GassiGiuseppe
7a1a221017
update of the database of movie-pageid
...
which has subject has film uri and object wikipage id
2025-09-19 13:37:56 +02:00
Christian Risi
fafe6ae0f9
Modified tree structure with more TMP directories
2025-09-19 12:46:31 +02:00
Christian Risi
e32444df75
Updated fetchdata to be used in terminal
...
Changes:
- now you can use it as if it were a cli command
Missing:
- documentation
2025-09-19 12:35:15 +02:00
Christian Risi
b74b7ac4f0
Added new directories to make experiments and updated .gitignore
...
Changes:
- Added /Scripts/Experiments/Queries to keep track
of important queries, once set
- Added /Scripts/Experiments/Tmp to run quick experiments
when still unsure while explorating datasets
2025-09-19 08:43:54 +02:00
Christian Risi
22134391d9
Added Scripts/Experiment directory
...
This directory is to place files to make experiments
2025-09-19 08:41:46 +02:00
Christian Risi
82c9023849
Ignoring Scripts/Experiments files and always tracking .gitkeep files
2025-09-19 08:39:47 +02:00
Christian Risi
00b87e01ea
Moved fetchdata.py to reflect working tree
...
old - ${Proj}/Scripts/fetchdata.py
new - ${Proj}/Scripts/DataGathering/fetchdata.py
2025-09-19 08:37:04 +02:00
Christian Risi
ce3d4bf6c5
Renamed dir from Script to Scripts
2025-09-19 08:31:00 +02:00
GassiGiuseppe
c415b175a0
added reverse.csv with the reletion incoming to films
2025-09-18 20:26:51 +02:00
GassiGiuseppe
ec81ea7930
Added file to gather wikipedia abstract from url
2025-09-18 20:26:11 +02:00
GassiGiuseppe
4bb03f86b3
Added file to study the most frequent relationship into a csv triplet
2025-09-18 20:25:25 +02:00
GassiGiuseppe
e5f201f3db
DEVELOPMENT file makrdown created
2025-09-18 20:24:54 +02:00
GassiGiuseppe
1c715dc569
Typo correction in the markdown
2025-09-18 20:24:11 +02:00
GassiGiuseppe
6686b47328
Added SQL to obtain wikipedia url with movies
2025-09-18 20:23:10 +02:00
GassiGiuseppe
9a5a7d84fd
Merge branch 'dev' of https://repositories.communitynotfound.work/PoliBa-DeepLearning/NanoSocrates into dev
2025-09-18 19:20:26 +02:00
GassiGiuseppe
9678ece9c0
Requirements changed
...
added Pandas and some other
2025-09-18 19:07:38 +02:00
Christian Risi
67bcd732b5
Updated movies
2025-09-18 18:36:52 +02:00
Christian Risi
1a4f900500
Updated git attributes
2025-09-18 18:36:42 +02:00
Christian Risi
ca8729b67c
Merge branch 'dev' of https://repositories.communitynotfound.work/PoliBa-DeepLearning/NanoSocrates into dev
2025-09-18 18:36:30 +02:00
GassiGiuseppe
9dbffc52ed
Added dataset of movies and their wikipedia's page link
2025-09-18 18:16:51 +02:00
Christian Risi
b7f504942a
Created Dataset
2025-09-18 17:24:08 +02:00
Christian Risi
7f0c5ce8d3
Updated File for fetching
2025-09-18 17:23:56 +02:00
Christian Risi
9838e287a4
Updated file
2025-09-18 12:03:09 +02:00
Christian Risi
ca6143ea3c
Updated Query histories
2025-09-18 11:46:32 +02:00
Christian Risi
16e7ab4d9f
Modified Datasets
2025-09-17 17:30:51 +02:00