# HOW THE DATASET IS BUILT AND POPULATED Note: the data are taken from CSV files in 1-hop ## CSV files composition | CSV files | Original structure | Saved AS | |--------------------|---------------------------------------|-------------------------------------| | Wikipeda-summary | PageId / abstract | subject, text | | Movies | Movie URI | "subject" | | Dataset | Movie URI / Relationship / Object [RDF] | subject, relationship, object | | Movies-PageId | Movie URI / PageId (wiki) | "subject", "object" | | Reverse | Subject / Relationship / Movie URI | "subject", "relationship", "object" | ## Wanted tables schema | Table | Columns | |---------------|-------------------------------------------------------------------------| | Movies | MovieID [PK], Movie URI | | WikiPageIDs | MovieID [PK, FK], PageId [IDX] (wiki) *(Not important for now)* | | Abstracts | MovieID [PK, FK], abstract | | Subjects | SubjectID [PK], RDF Subject (from Dataset.csv or Reverse.csv), OriginID [FK] | | Relationships | RelationshipID [PK], RDF Relationship (value only, not the actual relation) | | Objects | ObjectID [PK], RDF Object, OriginID [FK] | | Origins | OriginID [PK], Origin Name | | RDFs | RDF_ID [PK], MovieID [FK], SubjectID [FK], RelationshipID [FK], ObjectID [FK] |