Folder DataCleaning renamed to DatasetMerging since it doesn't clean nothing

and instead Build the dataset
2025-09-22 17:11:49 +02:00
parent edd01a2c83
commit ac1ed42c49
4 changed files with 45 additions and 28 deletions
--- a/Scripts/DataCleaning/DBMerger.py
+++ b/Scripts/DataCleaning/DBMerger.py
@@ -1,28 +0,0 @@
-"""
-What we have now:
-
-Wikipeda-summary    : PageId / abstract
-Movies              : Movie URI
-Dataset             : Movie URI / Relationship / Object [RDF]
-Movies-PageId       : Movie URI / PageId (wiki)
-Reverse             : Subject / Relationship / Movie URI
-
-What we want:
-( we will generate MovieID)
-Movies              : MovieID [PK] / Movie URI
-WikiPageIDs         : MovieID [PK, FK]/ PageId [IDX] (wiki) (Not important for now)
-Abstracts           : MovieID [PK, FK]/ abstract
-Subjects            : SubjectID [PK] / RDF Subject ( both from either Dataset.csv or Reverse.csv) / OriginID [FK]
-Relationships       : RelationshipID [PK]/ RDF Relationship  (not the actual relationshi but the value)
-Objects             : ObjectID [PK]/ RDF Object / OriginID [FK]
-Origins             : OriginID [PK]/ Origin Name
-RDFs                : RDF_ID[PK] / MovieID [FK] / SubjectID [FK]/ RelationshipID [FK]/ ObjectID [FK]
-
-What we will build for the model
-
-we need RDF list for each movie together with abstract
-
-: MovieID / RDF_set / abstrct
-
-"""
-
--- a/Scripts/DatasetMerging/DBMerger.py
+++ b/Scripts/DatasetMerging/DBMerger.py
@@ -0,0 +1,45 @@
+"""
+What we have now:                                                   Saved AS:
+
+Wikipeda-summary    : PageId / abstract                             subject,text
+Movies              : Movie URI                                     "subject"
+Dataset             : Movie URI / Relationship / Object [RDF]       subject,relationship,object
+Movies-PageId       : Movie URI / PageId (wiki)                     "subject", "object"
+Reverse             : Subject / Relationship / Movie URI            "subject","relationship","object"
+
+What we want:
+( we will generate MovieID)
+Movies              : MovieID [PK] / Movie URI
+WikiPageIDs         : MovieID [PK, FK]/ PageId [IDX] (wiki) (Not important for now)
+Abstracts           : MovieID [PK, FK]/ abstract
+Subjects            : SubjectID [PK] / RDF Subject ( both from either Dataset.csv or Reverse.csv) / OriginID [FK]
+Relationships       : RelationshipID [PK]/ RDF Relationship  (not the actual relationshi but the value)
+Objects             : ObjectID [PK]/ RDF Object / OriginID [FK]
+Origins             : OriginID [PK]/ Origin Name
+RDFs                : RDF_ID[PK] / MovieID [FK] / SubjectID [FK]/ RelationshipID [FK]/ ObjectID [FK]
+
+What we will build for the model
+
+we need RDF list for each movie together with abstract
+
+: MovieID / RDF_set / abstrct
+
+"""
+
+import sqlite3
+
+# Create a SQL connection to our SQLite database
+con = sqlite3.connect("data/portal_mammals.sqlite")
+
+cur = con.cursor()
+
+# Return all results of query
+cur.execute('SELECT plot_id FROM plots WHERE plot_type="Control"')
+cur.fetchall()
+
+# Return first result of query
+cur.execute('SELECT species FROM species WHERE taxa="Bird"')
+cur.fetchone()
+
+# Be sure to close the connection
+con.close()
--- a/Scripts/DatasetMerging/DataRetrivial.py
+++ b/Scripts/DatasetMerging/DataRetrivial.py
--- a/Scripts/DatasetMerging/SQL_Queries/db_creation.sql
+++ b/Scripts/DatasetMerging/SQL_Queries/db_creation.sql