0% found this document useful (0 votes)
18 views

Computer School

The document discusses databases and SQL. It explains that databases were created to address limitations with file-based data storage like data duplication, inconsistency, and security issues. The relational database model uses tables (relations) that have rows (tuples) and columns (attributes). SQL is used to query and manipulate data in databases. Common SQL commands are shown for tasks like creating tables, inserting/updating/deleting rows, and joining tables. Pandas and NumPy are Python libraries for working with data, with Pandas focused on tabular data and NumPy on numeric arrays. Pandas core data structures of Series and DataFrame are introduced.

Uploaded by

Suraj Raj
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Computer School

The document discusses databases and SQL. It explains that databases were created to address limitations with file-based data storage like data duplication, inconsistency, and security issues. The relational database model uses tables (relations) that have rows (tuples) and columns (attributes). SQL is used to query and manipulate data in databases. Common SQL commands are shown for tasks like creating tables, inserting/updating/deleting rows, and joining tables. Pandas and NumPy are Python libraries for working with data, with Pandas focused on tabular data and NumPy on numeric arrays. Pandas core data structures of Series and DataFrame are introduced.

Uploaded by

Suraj Raj
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 29

COMPUTER

Ch -1 Querying and SQL Function

Database

Previously, we stored data in files, which had several limitations, including data redundancy (data
duplication), data inconsistency, un-sharable data, unstandardized data, insecure data, incorrect data,
and so on. That is why we use a database (collection of data) management system to assist with data
storage and management.

The Benefits of a Database Management System:

 Reduce data redundancy (duplicate data).


 Reduce inconsistencies in the data.
 Information sharing
 Standards are enforced through databases.
 Ensure the safety of your data.
 Maintaining integrity (creating connections between all data)

Databases have three types of models:

 Hierarchal: maintain a tree-like structure.


 Network: maintain a linking structure.
 Relational: organized into tables, these tables are called relations.

Different terms are used in the relational model:

 Relation: table
 Domain: given value in column
 Tuple: rows
 Attribute: column
 Degree: number of attributes
 Cardinality: number of tuples
 Primary key: helps in identifying unique tuples
 Candidate keys: all attribute combinations inside a relation.
 Alternate keys: A candidate key that is not primary key
 Foreign key: if you access primary key of a table to another table
COMPUTER

My SQL (structured query language)

 SQL will be used if you wish to add or edit something. We may accomplish this by accessing SQL
statements, and each statement contains a command. My SQL is a SQL-based relational
database management system that is free and open source, where you can write code.
 My SQL s fast, reliable and shareable

DDL DML TCL


Data definition language Data manipulation language Transaction control language
commands commands commands
Allow you to perform task allow you to perform data Allow you to manage and
related to data definition manipulation control the transactions
E.g., creating, altering and E.g., modification of data E.g., undoing changes
dropping
Create, alter, drop, rename Insert, update, delete Commit, rollback

MySQL data types:

 Data type are means to identify the type of data and associated operations for handling it.

Data type spec


Char Takes character, String (0-255)
VARCHAR Takes alphabet and numbers both, String (0-255)
TEXT Takes character, String (0-65535)
INT Integer (-214783648 to 2147483647)
FLOAT Decimal (precise - 23 digits)
DOUBLE Decimal (24- 53 digits)
DATE YYYY-MM-DD
TIME HH:MM: SS

Creating table

Create table <table-name>

(<column-name> <data-type> (<size>), <column-name> <data-type> (<size>));


COMPUTER

structure table

describe <table name>;

Inserting value

2 ways -

If you want to write individually then do this

Insert into <table-name> values (<value>, <value>), (<value>, <value>);

If you want to write in one time, you will type column first and then the value

Insert into <table-name> values (<&column-name>), (<&column-name>);

Display table

Select <whatever you want to> from <which table> where <condition to satisfy>;

* - mean you want to select whole table

Add column

Alter table <table name> add (<column name><data type>(<data size>));

Modifying rows

Update<table name> set <column name> = < value>;

Modifying column

Alter table <table name> modify <column name> <data type> (new size);

Delete rows

Delete from <table name> where <predicate>;

Delete column

Alter table <table name> drop (<column name>);

Delete table

Drop table <table name>;

Functions

A function is used to perform some particular tasks and returns zero or more values as a result. It is
available in two types: single-row functions and aggregate functions.

Single row functions


COMPUTER

Also known as scalar functions. applied to a single value and returns a single value,
COMPUTER
COMPUTER

Aggregate Function

Aggregate functions are also called multiple row functions. These functions work on a set of records as a
whole.
COMPUTER
COMPUTER

Group by

 Group by clause – is used to fetch a group of rows on the basis of common values in a column, It
groups the rows together that contain the same values in a specified column.
 HAVING Clause in SQL is used to specify conditions on the rows with GROUP BY clause.

Select * from <name> union Select * from <name>;


COMPUTER

Select * from <name> intersect Select * from <name>;

Select * from <name> minus Select * from <name>;

Select * from <name>, <name>; this is for cartesian product like if you want to select all combination
form something
COMPUTER

Ch 2 Data handling

Introduction to python libraries


python libraries contain a collection of builtin modules that allow us to perform many actions without
writing detailed programs for it. Each library in Python contains a large number of modules that one can
import and use. Here are some popular libraries

 NumPy, which stands for ‘Numerical Python’, is a library. it is a package that can be used for
numerical data analysis and scientific computing.
 PANDAS (PANel DAta) is a high-level data manipulation tool used for analysing data. It is very
easy to import and export data using Pandas library which has a very rich set of functions.
 Matplotlib library in Python is used for plotting graphs and visualisation.

Differences between Pandas and Numpy:

1. A Numpy array requires homogeneous data, while a Pandas DataFrame can have different data
types (float, int, string, datetime, etc.).
2. Pandas have a simpler interface for operations like file loading, plotting, selection, joining, GROUP
BY, which come very handy in data-processing applications.
3. Pandas DataFrames (with column names) make it very easy to keep track of data.
4. Pandas is used when data is in Tabular Format, whereas Numpy is used for numeric array based data
manipulation.

Data Structure in Pandas

A data structure is a collection of data values and operations that can be applied to that data. It enables
efficient storage, retrieval and modification to the data.

Series
A Series is a one-dimensional array containing a sequence of values of any data type (int, float, list,
string, etc) which by default have numeric data labels starting from zero. The data label associated with a
particular value is called its index.

Creation of series
(A) Creation of series form scalar values
COMPUTER

import pandas as pd

series1 = pd.Series([10,20,30])

print(series1)

Output:

0 10
1 20
2 30

series2 = pd.Series(["Kavi","Shyam","Ravi"], index=[3,5,1])

print(series2)

Output:

3Kavi

5 Shyam

1Ravi

(B) Creation of series form numpy arrays

import numpy as np

import pandas as pd

array1 = np.array([1,2,3,4])

series3 = pd.Series(array1)

print(series3)

Output:

01

12

23

34

(C) Creation of series form dictionary


COMPUTER

dict1 = {'India': 'NewDelhi', 'UK': 'London', 'Japan': 'Tokyo'}

series8 = pd.Series(dict1)

print(series8)

Output:

India NewDelhi

UK London

Japan Tokyo

Accessing Elements of a series


(A) Indexing

seriesNum = pd.Series([10,20,30])

seriesNum[2]

Output:

30

(B) Slicing

seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'], index=['India', 'USA', 'UK',


'France'])

seriesCapCntry[1:3] #excludes the value at index position 3

Output:

USA WashingtonDC

UK London
COMPUTER

Attributes of Series

Methods of Series
seriesTenTwenty=pd.Series(np.arange( 10, 20, 1 ))

print(seriesTenTwenty)

Output:

0 10

1 11

2 12

3 13

4 14

5 15

6 16

7 17

8 18

9 19
COMPUTER

Mathematical Operations on Series


(A) Addition of two Series

seriesA + seriesB

Output:

a -9.0

b NaN

c -47.0

d NaN

e 105.0

y NaN

z NaN

The second method is applied when we do not want to have NaN values in the output. We can use the
series method add() and a parameter fill_value to replace missing value with a specified value.

seriesA.add(seriesB, fill_value=0)

Output:

a -9.0
COMPUTER

b 2.0

c -47.0

d 4.0

e 105.0

y 20.0

z 10.0

(B) Subtracting of two series

seriesA – seriesB

Output:

a 11.0

b NaN

c 53.0

d NaN

e -95.0

y NaN

Let us now replace the missing values with 1000 before subtracting seriesB from seriesA using explicit
subtraction method sub().

seriesA.sub(seriesB, fill_value=1000)

a 11.0

b -998.0

c 53.0

d -996.0

e -95.0

y 980.0

z 990.0

NaN for None in pandas


COMPUTER

(C) Multiplication of two Series

seriesA * seriesB

Output:

a -10.0

b NaN

c -150.0

d NaN

e 500.0

y NaN

Let us now replace the missing values with 0 before multiplication of seriesB with seriesA using explicit
multiplication method mul().

seriesA.mul(seriesB, fill_value=0)

a -10.0

b 0.0

c -150.0

d 0.0

e 500.0

y 0.0 z 0.0

(D) Division of two Series

seriesA/seriesB

Output:

a -0.10

b NaN

c -0.06
COMPUTER

d NaN

Dataframe
A DataFrame is a two-dimensional labelled data structure like a table of MySQL. It contains rows and
columns, and therefore has both a row and column index. Each column can have a different type of value
such as numeric, string, boolean, etc., as in tables of a database

Creation of a Dataframe
(A) Creation of Data Frame from NumPy ndarrays

import numpy as np

array1 = np.array([10,20,30])

array2 = np.array([100,200,300])

array3 = np.array([-10,-20,-30, -40])

dFrame5 = pd.DataFrame([array1, array3, array2], columns=[ 'A', 'B', 'C', 'D'])

dFrame5

Output:

A B C D

0 10 20 30 NaN

1 -10 -20 -30 -40.0

2 100 200 300 NaN

(B) Creation of Data Frame from List of Dictionaries

Create list of dictionaries

listDict = [{'a':10, 'b':20}, {'a':5, 'b':10, 'c':20}]

dFrameListDict = pd.DataFrame(listDict)

Output:

a b c

0 10 20 NaN

1 5 10 20.0
COMPUTER

(C) Creation of Data Frame from Dictionary of lists

dictForest = {'State': ['Assam', 'Delhi', 'Kerala'], 'GArea': [78438, 1483, 38852], 'VDF' : [2797, 6.72,1663]}

dFrameForest= pd.DataFrame(dictForest)

dFrameForest

Output:

State GArea VDF

0 Assam 78438 2797.00

1 Delhi 1483 6.72

2 Kerala 38852 1663.00

We can change the sequence of columns in a DataFrame

dFrameForest1 = pd.DataFrame(dictForest, columns = ['State','VDF', 'GArea'])

dFrameForest1

Output:

State VDF GArea

0 Assam 2797.00 78438

1 Delhi 6.72 1483

2 Kerala 1663.00 38852

(D) Creation of Data frame form series

seriesA = pd.Series([1,2,3,4,5], index = ['a', 'b', 'c', 'd', 'e'])

seriesB = pd.Series ([1000,2000,-1000,-5000,1000], index = ['a', 'b', 'c', 'd', 'e'])

seriesC = pd.Series([10,20,-10,-50,100], index = ['z', 'y', 'a', 'c', 'e'])

dFrame6 = pd.DataFrame(seriesA)

dFrame6
COMPUTER

Output:

a1

b2

c3

d4

e5

dFrame7 = pd.DataFrame([seriesA, seriesB])

dFrame7

Output:

a b c d e

0 1 2 3 4 5

1 1000 2000 -1000 -5000 1000

(E) Create data frame form dictionary of series

ResultSheet={ 'Arnab': pd.Series([90, 91, 97], index=['Maths','Science','Hindi']), 'Ramit': pd.Series([92,


81, 96], index=['Maths','Science','Hindi']),

'Samridhi': pd.Series([89, 91, 88], index=['Maths','Science','Hindi']),

'Riya': pd.Series([81, 71, 67], index=['Maths','Science','Hindi']),

'Mallika': pd.Series([94, 95, 99], index=['Maths','Science','Hindi'])}

ResultDF = pd.DataFrame(ResultSheet)

ResultDF

Output:

Arnab Ramit Samridhi Riya Mallika

Maths 90 92 89 81 94

Science 91 81 91 71 95

Hindi 97 96 88 67 99
COMPUTER

Operations on rows and columns in Data Frames


(A) Adding a new column to a data frame

ResultDF['Preeti']=[89,78,76]

ResultDF

Output:

Arnab Ramit Samridhi Riya Mallika Preeti

Maths 90 92 89 81 94 89

Science 91 81 91 71 95 78

Hindi 97 96 88 67 99 76

(B) Adding a new row to a data frame

ResultDF.loc['English'] = [85, 86, 83, 80, 90, 89]

Output:

Arnab Ramit Samridhi Riya Mallika Preeti

Maths 90 92 89 81 94 89

Science 91 81 91 71 95 78

Hindi 97 96 88 67 99 76

English 85 86 83 80 90 89

(C) Deleting Rows or columns from a data frame

To delete a row, the parameter axis is assigned the value 0 and for deleting a column, the parameter axis
is assigned the value 1

ResultDF = ResultDF.drop('Science', axis=0)

(D) Renaming Row labels of a data frame

ResultDF = ResultDF.rename({'Maths':'Sub1', ‘Science':'Sub2','English':'Sub3', 'Hindi':'Sub4'}, axis='index')

Output:

Arnab Ramit Samridhi Riya Mallika


COMPUTER

Sub1 90 92 89 81 94

Sub2 91 81 91 71 95

Sub3 97 96 88 67 99

Sub4 97 89 78 60 45

The parameter axis='index' is used to specify that the row label is to be changed.

Renaming column lables of a dataframe

ResultDF=ResultDF.rename({'Arnab':'Student1','Ramit':'Student2','
Samridhi':'Student3','Mallika':'Student4'}, axis='column’)

Output:

Student1 Student2 Student3 Riya Student4

Maths 90 92 89 81 94

Science 91 81 91 71 95

English 97 96 88 67 99

Hindi 97 89 78 60 45

Accessing Data Frames element through indexing

(A) Label Based indexing

ResultDF

Arnab Ramit Samridhi Riya Mallika

Maths 90 92 89 81 94

Science 91 81 91 71 95

Hindi 97 96 88 67 99

ResultDF.loc['Science']

Output:

Arnab 91

Ramit 81

Samridhi 91

Riya 71

Mallika 95
COMPUTER

dFrame10Multiples = pd.DataFrame([10,20,30,40,50])

dFrame10Multiples.loc[2]

Output:

0 30

When a single column label is passed, it returns the column as a Series.

ResultDF.loc[:,'Arnab']

Output:

Maths 90

Science 91

Hindi 97

(B) Boolean Indexing

ResultDF.loc['Maths'] > 90

Output:

Arnab False

Ramit True

Samridhi False

Riya False

Mallika True

ResultDF.loc[:,‘Arnab’] > 90

Output:

Maths False

Science True

Hindi True
COMPUTER

Accessing data frames element through slicing


ResultDF.loc['Maths': 'Science']

Output:

Arnab Ramit Samridhi Riya Mallika

Maths 90 92 89 81 94

Science 91 81 91 71 95

ResultDF.loc['Maths': 'Science', ‘Arnab’]

Output:

Maths 90

Science 91

ResultDF.loc['Maths': 'Science', ‘Arnab’:’Samridhi’]

Output:

Arnab Ramit Samridhi

Maths 90 92 89

Science 91 81 91

ResultDF.loc['Maths': 'Science', ‘Arnab’:’Samridhi’]

Output:

Arnab Samridhi

Maths 90 89

Science 91 91

Joining, Merging and Concatenation of Data Frames


(A) Joining
COMPUTER
COMPUTER
COMPUTER

Attributes of Data frames


COMPUTER

Importing and Exporting Data between CSV Files and Data Frames
Importing a CSV file to a Data frame
COMPUTER

Exporting a data frame to a csv file

#
COMPUTER

Pandas Series Vs NumPy NDARRAY

You might also like