0% found this document useful (0 votes)
55 views

Py4Inf 14 Database

Py4Inf 14 Database
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

Py4Inf 14 Database

Py4Inf 14 Database
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 74

Relational Databases

and SQLite
Charles Severance

Python for Informatics: Exploring Information


www.pythonlearn.com
SQLite
Manager
for
Firefox

https://addons.mozilla.org/en-US/firefox/addon/sqlite-manager/
Relational Databases
Relational databases model data by storing
rows and columns in tables. The power of
the relational database lies in its ability to
efficiently retrieve data from those tables
and in particular where there are multiple
tables and the relationships between those
tables involved in the query.

http://en.wikipedia.org/wiki/Relational_database
Terminology
• Database - contains many tables

• Relation (or table) - contains tuples and attributes

• Tuple (or row) - a set of fields that generally represents an “object” like
a person or a music track

• Attribute (also column or field) - one of possibly many elements of


data corresponding to the object represented by the row
A relation is defined as a set of tuples that have the same attributes. A tuple usually
represents an object and information about that object. Objects are typically
physical objects or concepts. A relation is usually described as a table, which is
organized into rows and columns. All the data referenced by an attribute are in the
same domain and conform to the same constraints. (Wikipedia)
Columns / Attributes
Rows /
Tuples

Tables /
Relations
Two Roles in Large Projects
• Application Developer - Builds the logic for the application, the look
and feel of the application - monitors the application for problems

• Database Administrator - Monitors and adjusts the database as the


program runs in production

• Often both people participate in the building of the “Data model”


Application Structure
End Application SQL
Database
User Software Data Model

SQL

Developer
Database
DBA Tools
Database Administrator (dba)
A database administrator (DBA) is a person responsible for the
design, implementation, maintenance, and repair of an
organization’s database. The role includes the development and
design of database strategies, monitoring and improving database
performance and capacity, and planning for future expansion
requirements. They may also plan, coordinate, and implement
security measures to safeguard the database.

http://en.wikipedia.org/wiki/Database_administrator
Database Model
A database model or database schema is the structure or
format of a database, described in a formal language
supported by the database management system, In other
words, a “database model” is the application of a data
model when used in conjunction with a database
management system.

http://en.wikipedia.org/wiki/Database_model
SQL
• Structured Query Language is the language we use to issue
commands to the database

• Create a table

• Retrieve some data

• Insert data

• Delete data
http://en.wikipedia.org/wiki/SQL
Common Database Systems
• Three Major Database Management Systems in wide use
• Oracle - Large, commercial, enterprise-scale, very very tweakable
• MySql - Simpler but very fast and scalable - commercial open
source
• SqlServer - Very nice - from Microsoft (also Access)
• Many other smaller projects, free and open source
• HSQL, SQLite, Postgress, ...
SQLite Database Manager

• SQLite is a very popular database - it is free and fast and small

• We have a Firefox plugin to manipulate SQLite databases

• https://addons.mozilla.org/en-US/firefox/addon/sqlite-manager/

• SQLite is embedded in Python and a number of other languages


SQLite is in lots of software...

http://www.sqlite.org/famous.html
Text

https://addons.mozilla.org/en-US/firefox/addon/sqlite-manager/
Application Structure
End Application SQL Database
User Software Data Model

SQL

Developer
Database
DBA Tools
Start Simple - A Single Table
• Lets make a table of People - with a Name and an Email using the
“wizard” user interface...
Our first table with two columns
Our table with four rows
SQL

• Structured Query Language is the language we use to issue


commands to the database
• Create a table
• Retieve some data
• Insert data
• Delete data

http://en.wikipedia.org/wiki/SQL
SQL Insert

• The Insert statement inserts a row into a table

INSERT INTO Users (name, email) VALUES ('Ted', '[email protected]')


SQL Delete

• Deletes a row in a table based on a selection criteria

DELETE FROM Users WHERE email='[email protected]'


SQL: Update

• Allows the updating of a field with a where clause

UPDATE Users SET name='Charles' WHERE email='[email protected]'


Retrieving Records: Select

• The select statement retrieves a group of records - you can either


retrieve all the records or a subset of the records with a WHERE
clause

SELECT * FROM Users

SELECT * FROM Users WHERE email='[email protected]'


Sorting with ORDER BY

• You can add an ORDER BY clause to SELECT statements to get the


results sorted in ascending or descending order

SELECT * FROM Users ORDER BY email

SELECT * FROM Users ORDER BY name


SQL Summary
insert into Users (name, email) values ('Ted', '[email protected]')

delete from Users where email='[email protected]'

update Users set name="Charles" where email='[email protected]'

select * from Users

select * from Users where email='[email protected]'

select * from Users order by email


This is not too exciting (so far)

• Tables pretty much look like big fast programmable spreadsheets


with rows, columns, and commands

• The power comes when we have more than one table and we can
exploit the relationships between the tables
Complex Data Models and
Relationships
http://en.wikipedia.org/wiki/Relational_model
Database Design
• Database design is an art form of its own with particular skills and
experience

• Our goal is to avoid the really bad mistakes and design clean and
easily understood databases

• Others may performance tune things later

• Database design starts with a picture...


Building a Data Model

• Drawing a picture of the data objects for our application and then
figuring out how to represent the objects and their relationships

• Basic Rule: Don’t put the same string data in twice - use a
relationship instead

• When there is one thing in the “real world” there should be one copy
of that thing in the database
Track Len Artist Album Genre Rating Count
For each “piece of info”...
• Is the column an object or an Len Album
attribute of another object? Genre

• Once we define objects, we need Artist Rating


to define the relationships between
objects. Track
Count
Track
Track
Album Rating
Artist belongs-to
Album Len
Count
Genre Artist belongs-to
Rating
Len
Count belongs-to
Genre
Track
Artist Rating
belongs-to
Len
Count
Album belongs-to

belongs-to
Genre
Representing Relationships
in a Database
We want to keep track of which band is the “creator” of each music track...
What album does this song “belong to”??

Which album is this song related to?


Database Normalization (3NF)

• There is *tons* of database theory - way too much to understand


without excessive predicate calculus
• Do not replicate data - reference data - point at data
• Use integers for keys and for references
• Add a special “key” column to each table which we will make
references to. By convention, many programmers call this
column “id”

http://en.wikipedia.org/wiki/Database_normalization
Integer Reference Pattern

We use integers to reference Artist


rows in another table

Album
Keys
Finding our way around....
Three Kinds of Keys
• Primary key - generally an integer auto-
increment field Site
id
• Logical key - What the outside world title
uses for lookup
user_id
• Foreign key - generally an integer key ...
pointing to a row in another table
Primary Key Rules
User
Best practices id
login
• Never use your logical key as the primary key
password
name
• Logical keys can and do change, albeit slowly
email
created_at
• Relationships that are based on matching
modified_at
string fields are far less efficient than integers
login_at
performance-wise
Foreign Keys
• A foreign key is when a table has a
column that contains a key which
points to the primary key of another Site
User
table. id
id
title
login
• When all primary keys are integers, user_id
...
then all foreign keys are integers - ...
this is good - very good

• If you use strings as foreign keys -


you show yourself to be an
uncultured swine
Relationship Building (in tables)
Track
Artist Rating
belongs-to
Len
Count
Album belongs-to

belongs-to
Genre
belongs-to Track
Album Title
Rating
Len
Count
Track
id
Album
title
Table id
Primary key rating
Logical key title len
Foreign key
count
album_id
Artist Track

id Album id
name title
id
title rating
len
Table artist_id
Primary key count
Logical key album_id
Foreign key genre_id
Genre
id
Naming FK artist_id is a
convention name
insert into Artist (name) values ('Led Zepplin')
insert into Artist (name) values ('AC/DC')
insert into Genre (name) values ('Rock')
insert into Genre (name) values ('Metal')
insert into Album (title, artist_id) values ('Who Made Who', 2)
insert into Album (title, artist_id) values ('IV', 1)
insert into Track (title, rating, len, count, album_id, genre_id)
values ('Black Dog', 5, 297, 0, 2, 1)
insert into Track (title, rating, len, count, album_id, genre_id)

values ('Stairway', 5, 482, 0, 2, 1)


insert into Track (title, rating, len, count, album_id, genre_id)
values ('About to Rock', 5, 313, 0, 1, 2)
insert into Track (title, rating, len, count, album_id, genre_id)
values ('Who Made Who', 5, 207, 0, 1, 2)
We have relationships!

Track

Album

Genre

Artist
Using Join Across Tables

http://en.wikipedia.org/wiki/Join_(SQL)
Relational Power

• By removing the replicated data and replacing it with references to a


single copy of each bit of data we build a “web” of information that
the relational database can read through very quickly - even for very
large amounts of data

• Often when you want some data it comes from a number of tables
linked by these foreign keys
The JOIN Operation

• The JOIN operation links across several tables as part of a select


operation

• You must tell the JOIN how to use the keys that make the connection
between the tables using an ON clause
select Album.title, Artist.name from Album join Artist on Album.artist_id = Artist.id

What we want The tables that How the tables


to see hold the data are linked
Album.title Album.artist_id Artist.id Artist.name

select Album.title, Album.artist_id, Artist.id,Artist.name


from Album join Artist on Album.artist_id = Artist.id
select Track.title, Genre.name from Track join Genre on Track.genre_id = Genre.id

What we want The tables which How the tables


to see hold the data are linked
It can get complex...
select Track.title, Artist.name, Album.title, Genre.name
from Track join Genre join Album join Artist on
Track.genre_id = Genre.id and Track.album_id = Album.id
and Album.artist_id = Artist.id
What we want
to see
The tables which
hold the data
How the tables
are linked
Complexity Enables Speed
• Complexity makes speed possible and allows you to get very fast
results as the data size grows

• By normalizing the data and linking it with integer keys, the overall
amount of data which the relational database must scan is far lower
than if the data were simply flattened out

• It might seem like a tradeoff - spend some time designing your


database so it continues to be fast when your application is a success
Additional SQL Topics

• Indexes improve access performance for things like string fields

• Constraints on data - (cannot be NULL, etc..)

• Transactions - allow SQL operations to be grouped and done as a unit

• See SI664 - Database Design (All Semesters)


Summary
• Relational databases allow us to scale to very large amounts of data

• The key is to have one copy of any data element and use relations
and joins to link the data to multiple places

• This greatly reduces the amount of data which much be scanned


when doing complex operations across large amounts of data

• Database and SQL design is a bit of an art form


Acknowledgements / Contributions
Thes slide are Copyright 2010- Charles R. Severance (
...
www.dr-chuck.com) of the University of Michigan School of
Information and open.umich.edu and made available under a
Creative Commons Attribution 4.0 License. Please maintain this
last slide in all copies of the document to comply with the
attribution requirements of the license. If you make a change,
feel free to add your name and organization to the list of
contributors on this page as you republish the materials.

Initial Development: Charles Severance, University of Michigan


School of Information

… Insert new Contributors here

You might also like