From 0d30e90ee0668f31ae35c49f597ceb74253586a7 Mon Sep 17 00:00:00 2001 From: GassiGiuseppe Date: Sat, 20 Sep 2025 15:53:32 +0200 Subject: [PATCH] Created file for the db DatawareHouse Also decided firsts schema models into DBMerger --- Assets/Dataset/DatawareHouse/dataset.db | 0 Scripts/DataCleaning/DBMerger.py | 28 +++++++++++++++++++++++++ 2 files changed, 28 insertions(+) create mode 100644 Assets/Dataset/DatawareHouse/dataset.db create mode 100644 Scripts/DataCleaning/DBMerger.py diff --git a/Assets/Dataset/DatawareHouse/dataset.db b/Assets/Dataset/DatawareHouse/dataset.db new file mode 100644 index 0000000..e69de29 diff --git a/Scripts/DataCleaning/DBMerger.py b/Scripts/DataCleaning/DBMerger.py new file mode 100644 index 0000000..8eb703a --- /dev/null +++ b/Scripts/DataCleaning/DBMerger.py @@ -0,0 +1,28 @@ +""" +What we have now: + +Wikipeda-summary : PageId / abstract +Movies : Movie URI +Dataset : Movie URI / Relationship / Object [RDF] +Movies-PageId : Movie URI / PageId (wiki) +Reverse : Subject / Relationship / Movie URI + +What we want: +( we will generate MovieID) +Movies : MovieID [PK] / Movie URI +WikiPageIDs : MovieID [PK, FK]/ PageId [IDX] (wiki) (Not important for now) +Abstracts : MovieID [PK, FK]/ abstract +Subjects : SubjectID [PK] / RDF Subject ( both from either Dataset.csv or Reverse.csv) / OriginID [FK] +Relationships : RelationshipID [PK]/ RDF Relationship (not the actual relationshi but the value) +Objects : ObjectID [PK]/ RDF Object / OriginID [FK] +Origins : OriginID [PK]/ Origin Name +RDFs : RDF_ID[PK] / MovieID [FK] / SubjectID [FK]/ RelationshipID [FK]/ ObjectID [FK] + +What we will build for the model + +we need RDF list for each movie together with abstract + +: MovieID / RDF_set / abstrct + +""" +