diff --git a/Assets/Dataset/DatawareHouse/dataset.db b/Assets/Dataset/DatawareHouse/dataset.db new file mode 100644 index 0000000..e69de29 diff --git a/Scripts/DataCleaning/DBMerger.py b/Scripts/DataCleaning/DBMerger.py new file mode 100644 index 0000000..8eb703a --- /dev/null +++ b/Scripts/DataCleaning/DBMerger.py @@ -0,0 +1,28 @@ +""" +What we have now: + +Wikipeda-summary : PageId / abstract +Movies : Movie URI +Dataset : Movie URI / Relationship / Object [RDF] +Movies-PageId : Movie URI / PageId (wiki) +Reverse : Subject / Relationship / Movie URI + +What we want: +( we will generate MovieID) +Movies : MovieID [PK] / Movie URI +WikiPageIDs : MovieID [PK, FK]/ PageId [IDX] (wiki) (Not important for now) +Abstracts : MovieID [PK, FK]/ abstract +Subjects : SubjectID [PK] / RDF Subject ( both from either Dataset.csv or Reverse.csv) / OriginID [FK] +Relationships : RelationshipID [PK]/ RDF Relationship (not the actual relationshi but the value) +Objects : ObjectID [PK]/ RDF Object / OriginID [FK] +Origins : OriginID [PK]/ Origin Name +RDFs : RDF_ID[PK] / MovieID [FK] / SubjectID [FK]/ RelationshipID [FK]/ ObjectID [FK] + +What we will build for the model + +we need RDF list for each movie together with abstract + +: MovieID / RDF_set / abstrct + +""" +