GassiGiuseppe 3a6dca0681 Infos about Dataset contruction from csv moved
from python file to markdown
2025-09-22 17:39:44 +02:00

1.7 KiB

HOW THE DATASET IS BUILT AND POPULATED

Note: the data are taken from CSV files in 1-hop

CSV files composition

CSV files Original structure Saved AS
Wikipeda-summary PageId / abstract subject, text
Movies Movie URI "subject"
Dataset Movie URI / Relationship / Object [RDF] subject, relationship, object
Movies-PageId Movie URI / PageId (wiki) "subject", "object"
Reverse Subject / Relationship / Movie URI "subject", "relationship", "object"

Wanted tables schema

Table Columns
Movies MovieID [PK], Movie URI
WikiPageIDs MovieID [PK, FK], PageId [IDX] (wiki) (Not important for now)
Abstracts MovieID [PK, FK], abstract
Subjects SubjectID [PK], RDF Subject (from Dataset.csv or Reverse.csv), OriginID [FK]
Relationships RelationshipID [PK], RDF Relationship (value only, not the actual relation)
Objects ObjectID [PK], RDF Object, OriginID [FK]
Origins OriginID [PK], Origin Name
RDFs RDF_ID [PK], MovieID [FK], SubjectID [FK], RelationshipID [FK], ObjectID [FK]