27 lines
1.7 KiB
Markdown
27 lines
1.7 KiB
Markdown
|
|
# HOW THE DATASET IS BUILT AND POPULATED
|
||
|
|
|
||
|
|
Note: the data are taken from CSV files in 1-hop
|
||
|
|
|
||
|
|
## CSV files composition
|
||
|
|
|
||
|
|
| CSV files | Original structure | Saved AS |
|
||
|
|
|--------------------|---------------------------------------|-------------------------------------|
|
||
|
|
| Wikipeda-summary | PageId / abstract | subject, text |
|
||
|
|
| Movies | Movie URI | "subject" |
|
||
|
|
| Dataset | Movie URI / Relationship / Object [RDF] | subject, relationship, object |
|
||
|
|
| Movies-PageId | Movie URI / PageId (wiki) | "subject", "object" |
|
||
|
|
| Reverse | Subject / Relationship / Movie URI | "subject", "relationship", "object" |
|
||
|
|
|
||
|
|
## Wanted tables schema
|
||
|
|
|
||
|
|
| Table | Columns |
|
||
|
|
|---------------|-------------------------------------------------------------------------|
|
||
|
|
| Movies | MovieID [PK], Movie URI |
|
||
|
|
| WikiPageIDs | MovieID [PK, FK], PageId [IDX] (wiki) *(Not important for now)* |
|
||
|
|
| Abstracts | MovieID [PK, FK], abstract |
|
||
|
|
| Subjects | SubjectID [PK], RDF Subject (from Dataset.csv or Reverse.csv), OriginID [FK] |
|
||
|
|
| Relationships | RelationshipID [PK], RDF Relationship (value only, not the actual relation) |
|
||
|
|
| Objects | ObjectID [PK], RDF Object, OriginID [FK] |
|
||
|
|
| Origins | OriginID [PK], Origin Name |
|
||
|
|
| RDFs | RDF_ID [PK], MovieID [FK], SubjectID [FK], RelationshipID [FK], ObjectID [FK] |
|