Infos about Dataset contruction from csv moved

from python file to markdown
This commit is contained in:
GassiGiuseppe 2025-09-22 17:39:44 +02:00
parent 346098d2b7
commit 3a6dca0681
2 changed files with 26 additions and 45 deletions

View File

@ -1,45 +0,0 @@
"""
What we have now: Saved AS:
Wikipeda-summary : PageId / abstract subject,text
Movies : Movie URI "subject"
Dataset : Movie URI / Relationship / Object [RDF] subject,relationship,object
Movies-PageId : Movie URI / PageId (wiki) "subject", "object"
Reverse : Subject / Relationship / Movie URI "subject","relationship","object"
What we want:
( we will generate MovieID)
Movies : MovieID [PK] / Movie URI
WikiPageIDs : MovieID [PK, FK]/ PageId [IDX] (wiki) (Not important for now)
Abstracts : MovieID [PK, FK]/ abstract
Subjects : SubjectID [PK] / RDF Subject ( both from either Dataset.csv or Reverse.csv) / OriginID [FK]
Relationships : RelationshipID [PK]/ RDF Relationship (not the actual relationshi but the value)
Objects : ObjectID [PK]/ RDF Object / OriginID [FK]
Origins : OriginID [PK]/ Origin Name
RDFs : RDF_ID[PK] / MovieID [FK] / SubjectID [FK]/ RelationshipID [FK]/ ObjectID [FK]
What we will build for the model
we need RDF list for each movie together with abstract
: MovieID / RDF_set / abstrct
"""
import sqlite3
# Create a SQL connection to our SQLite database
con = sqlite3.connect("data/portal_mammals.sqlite")
cur = con.cursor()
# Return all results of query
cur.execute('SELECT plot_id FROM plots WHERE plot_type="Control"')
cur.fetchall()
# Return first result of query
cur.execute('SELECT species FROM species WHERE taxa="Bird"')
cur.fetchone()
# Be sure to close the connection
con.close()

View File

@ -0,0 +1,26 @@
# HOW THE DATASET IS BUILT AND POPULATED
Note: the data are taken from CSV files in 1-hop
## CSV files composition
| CSV files | Original structure | Saved AS |
|--------------------|---------------------------------------|-------------------------------------|
| Wikipeda-summary | PageId / abstract | subject, text |
| Movies | Movie URI | "subject" |
| Dataset | Movie URI / Relationship / Object [RDF] | subject, relationship, object |
| Movies-PageId | Movie URI / PageId (wiki) | "subject", "object" |
| Reverse | Subject / Relationship / Movie URI | "subject", "relationship", "object" |
## Wanted tables schema
| Table | Columns |
|---------------|-------------------------------------------------------------------------|
| Movies | MovieID [PK], Movie URI |
| WikiPageIDs | MovieID [PK, FK], PageId [IDX] (wiki) *(Not important for now)* |
| Abstracts | MovieID [PK, FK], abstract |
| Subjects | SubjectID [PK], RDF Subject (from Dataset.csv or Reverse.csv), OriginID [FK] |
| Relationships | RelationshipID [PK], RDF Relationship (value only, not the actual relation) |
| Objects | ObjectID [PK], RDF Object, OriginID [FK] |
| Origins | OriginID [PK], Origin Name |
| RDFs | RDF_ID [PK], MovieID [FK], SubjectID [FK], RelationshipID [FK], ObjectID [FK] |