0% found this document useful (0 votes)

27 views67 pages

HKU - 7001 - 3.2 Managing Data II

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views67 pages

HKU - 7001 - 3.2 Managing Data II

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 67

Managing Data II

MSBA7001 Business Intelligence and Analytics

HKU Business School
The University of Hong Kong

Instructor: Dr. DING Chao

Agenda
• SciPy
• NumPy
• Pandas
SciPy
What is SciPy?
• SciPy (pronounced /saɪpaɪ/) is a Python-based ecosystem of
open-source software for mathematics, science, and
engineering.
• The SciPy ecosystem includes general and specialized tools
for data management and computation, productive
experimentation and high-performance computing.
• It offers over 1000 modules/packages for Python
The SciPy Ecosystem

It defines numerical
array and matrix types

It makes possible It provides high-

Jupyter Notebook performance, easy to
use data structures
NumPy
What is the problem with lists?
• Lists are ok for storing small amounts of one-dimensional
data
• But, can’t use them directly with arithmetical operators
such as +, -, *, /, …
• Need efﬁcient arrays with arithmetic and better
multidimensional tools
What is NumPy?
NumPy (pronounced /nʌmpaɪ/), short for Numerical Python,
is the fundamental package required for high performance
scientific computing and data analysis.
• It provides:
 Arrays, a fast and space-efficient multidimensional array providing
vectorized arithmetic operations and sophisticated broadcasting
capabilities
 Standard mathematical functions for fast operations on entire
arrays of data without having to write loops
 Tools for reading / writing array data to disk and working with
memory-mapped files
 Linear algebra, random number generation, and Fourier transform
capabilities
The NumPy Arrays
• A NumPy array (also called ndarray) is a table of elements
(usually numbers), all of the same type, indexed by a tuple
of positive integers. Typical examples of multidimensional
arrays include vectors, matrices, images and spreadsheets.
• Dimensions are usually called axes, number of axes is the
rank
The NumPy Arrays

[7, 2, 9, 10] An array of rank 1, i.e., it has 1 axis of length 4

[ [ 5.2, 3.0, 4.5], An array of rank 2, i.e., it has 2 axes. The first
[9.1, 0.1, 0.3] ] of length 3, the second of length 3 (a matrix
with 2 rows and 3 columns
The NumPy Arrays
• NumPy array is a fast, flexible container for large data sets in
Python
• Before using NumPy, we need to import the numpy module

import numpy as np
Creating Arrays
• The easiest way to create an array is to use the array()
method
• This accepts any sequence-like object (e.g., list, tuple, and
dictionary) and produces a new NumPy array containing the
data passed to it.

data1 = [6, 7.5, 8, 0, 1]

arr1 = np.array(data1, float)
arr1
array([6. , 7.5, 8. , 0. , 1. ])
Specify data type.
It could also be int
Creating Arrays
• From nested sequences, like a list of lists
data2= [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2= np.array(data2)
arr2
array([[1, 2, 3, 4],
[5, 6, 7, 8]])

[[1 2 3 4]
print(arr2)
[5 6 7 8]]

Dimension 2
arr2.ndim
of the array

Structure of arr2.shape (2, 4)

the array
Creating Arrays
• array() tries to infer a good data type for the array that it
creates.
• The data type is stored in a special dtype object

arr1.dtype dtype('float64')

arr2.dtype dtype('int32')
Creating Arrays
• The size method returns the entire number of items in the
array
• We can call the method on an array object, or call the
numpy module and pass the array as an argument

arr2.size

np.size(arr2)

8
Creating ndarrays
• We can convert an array from one shape to another without
copying any data. To do this, pass a tuple indicating the new
shape to the reshape method.

arr3 = np.array([0, 1, 2, 3, 4, 5, 6, 7])

arr3.reshape((4, 2))

array([[0, 1],
[2, 3],
[4, 5],
[6, 7]])
Creating Special Arrays
• In addition to array, there are a number of other special
methods for creating new arrays.
• As examples, zeros and ones create arrays of 0’s or 1’s,
respectively, with a given length or shape. empty creates an
array without initializing its values to any particular value.
• To create a higher dimensional array with these methods,
pass a tuple for the shape
Creating Special Arrays

np.zeros(10) array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

np.zeros((3,6)) array([[0., 0., 0., 0., 0., 0.],

[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]])

np.empty((2,3,2)) array([[[1.05442863e-311, 2.86558075e-322],

[0.00000000e+000, 0.00000000e+000],
[7.56599806e-307, 2.92966904e-033]],

[[7.17473078e-091, 4.42510289e-062],
[4.31926418e-038, 4.19564746e+175],
[6.48224659e+170, 5.82471487e+257]]])
Creating an Array of Fixed Intervals
• arange is an array-valued version of the built-in Python
range function
np.arange(8)

array([0, 1, 2, 3, 4, 5, 6, 7])

• arange has three arguments

arange(start, stop, step)

np.arange(0,8,2)
array([0, 2, 4, 6])

• arange(8) is equivalent to arange(0,8,1)

Creating an Array of Fixed Intervals
• linspace returns evenly spaced numbers over a specified
interval. We can specify the number of values to be
generated
• linspace has a number of arguments

array([ 0. , 2.5, 5. , 7.5, 10. ])

np.linspace(0, 10, num = 5, endpoint = True, dtype = float)

Starting end Number If True, end is The type of

value value of values, the last value. the output
positive Default is True array

np.linspace(0, 10, num = 5, endpoint = False, dtype = int)

array([0, 2, 4, 6, 8])
Creating an Array of Fixed Intervals
• arange and linspace are going to be quite useful when
we are making plots.
• They can be used to generate data for the X axis

X axis
Creating a Random Array
• We can also use rand method in the random module to
generate some random data
• Pass shape to the argument
np.random.rand(5)

array([0.47487993, 0.55756924, 0.3188104 , 0.27839417, 0.11052682])

np.random.rand(4, 2)

array([[0.02473339, 0.93360131],
[0.21580826, 0.29531976],
[0.44972526, 0.53165493],
[0.50354605, 0.35704748]])
Creating a Random Array
• We can also use randn method in the random module to
draw data from a standard normal distribution
• Mean 0, variance 1

np.random.randn(5, 3)

shape

array([[-1.08369819, -0.12103409, -0.98555855],

[-0.89341613, -0.46729387, -0.36880701],
[ 2.03070419, -0.23967288, -1.04078775],
[-0.8740701 , -0.42289868, 0.00337789],
[-0.98268423, -0.65690555, 0.60583936]])
Creating a Random Array
• To draw data from any normal distribution 𝑁(𝜇, 𝜎 2 ), use
normal method

np.random.normal(mu, sigma, 10)

mean Standard size

deviation
array([83.83931654, 80.63749432, 77.88960687, 77.54264136, 79.23475542,
82.09625153, 75.7728796 , 82.87793414, 77.42665788, 78.72251953])

• For a full list of available distributions, see

https://numpy.org/doc/stable/reference/random/legacy.html
NumPy Constants
• np.e returns the Euler’s constant, which is the base of
natural logarithms
np.e 2.718281828459045

• np.nan returns Not a Number (NaN). This is used when

there is missing data or when the mathematical calculation
is not valid such as log(-1)
np.nan nan
• np.pi returns pi
np.pi 3.141592653589793
Array operations
• Arrays are important because they enable you to express
batch operations on data without writing any for loops
• This is usually called vectorization
• Any arithmetic operations between equal-size arrays applies
the operation elementwise
arr = np.array([[1, 2, 3],[4, 5, 6]])
arr
array([[1, 2, 3],
[4, 5, 6]])

arr * arr array([[ 1, 4, 9],

[16, 25, 36]])

1 / arr array([[1. , 0.5 , 0.33333333],

[0.25 , 0.2 , 0.16666667]])
Array operations
• The fill method sets all values in an array

zero_arr = np.zeros(5, int) array([0, 0, 0, 0, 0])

zero_arr

zero_arr.fill(4) array([4, 4, 4, 4, 4])

zero_arr

zero_arr[0] = 5.8 array([5, 4, 4, 4, 4])

zero_arr

5.8 is truncated since

the datatype is int
Universal Functions
• A universal function, or ufunc, is a function that performs
elementwise operations on data in ndarrays
• Like sqrt or exp. They take 1 array, thus unary ufuncs

arr = np.arange(10) [0 1 2 3 4 5 6 7 8 9]
print(arr)

np.sqrt(arr) array([0. , 1. , 1.41421356, 1.73205081,

2. ,
2.23606798, 2.44948974, 2.64575131,
2.82842712, 3. ])

np.exp(arr) array([1.00000000e+00, 2.71828183e+00,

7.38905610e+00, 2.00855369e+01,
5.45981500e+01, 1.48413159e+02,
4.03428793e+02, 1.09663316e+03,
2.98095799e+03, 8.10308393e+03])
Universal Functions
• add or maximum take 2 arrays and return a single array as
the result. They are binary ufuncs

x = np.random.randn(8) [0.59054432 0.79793215 -0.44441787

print(x) 0.74250776 -1.30106831 0.01595154
-0.63769349 -0.2519021 ]

y = np.random.randn(8) [0.55356053 -1.54007186 -0.40315681

print(y) -2.00758763 1.08729518 0.40433778
0.1940852 -0.57839798]

np.maximum(x, y) array([0.59054432, 0.79793215, -0.40315681,

0.74250776, 1.08729518,
0.40433778, 0.1940852 , -0.2519021])
Universal Functions
• Some unary ufuncs
Universal Functions
• Some binary ufuncs

• For a full list of ufuncs, go to

https://numpy.org/doc/stable/reference/ufuncs.html
Mathematical and Statistical Methods
• Aggregations (often called reductions) like sum, mean, and
standard deviation std can either be used by calling the
array method or using the top level NumPy function

arr.mean() np.mean(arr)

arr.sum() np.sum(arr)

arr.std() np.std(arr)
Mathematical and Statistical Methods
• Basic array statistical methods
Reading from and Writing to Text Files
• loadtxt method reads text file data into a 2D array
• savetxt method performs the inverse operation: writing
an array to a delimited text file.
values1 = np.random.random((10, 5))
np.savetxt(r'../data/nparray.txt', values1)

values2 = np.loadtxt(r'../data/nparray.txt’, dtype = float)

print(values2)
[[0.54135298 0.92492694 0.9705508 0.39579461 0.79874527]
[0.63508815 0.22996917 0.05120709 0.02846381 0.12284775]
[0.22021252 0.82902275 0.28549183 0.78106408 0.50466581]
……
Summary
• Creating NumPy arrays
• Array operations
• Universal functions
• Reading from and writing to text files
Exercise 1
• Given a random array of integers, write a program to
calculate the variance of the array. Do not use built-in
NumPy function np.var()
Exercise 2
• Given a random array, write a program to compute the
moving average at the interval of 3.

Given: [8 8 3 7 7 0 4 2 5 2 2 2]

Output: [6.3, 6. , 5.7, 4.7, 3.7, 2. , 3.7, 3. , 3. , 2. ]

• Hint: numpy.cumsum() returns an array of cumulative sum

arr = np.array([2, 3, 1, -1, 3, 5])

cumsum = np.cumsum(arr)
print(cumsum)

[ 2 5 6 5 8 13]
pandas
What is pandas?
• pandas contains high-level data structures and manipulation
tools designed to make data analysis fast and easy in
Python.
• pandas has two workhorse data structures: Series and
DataFrame.
• To use pandas, first import the module

import pandas as pd
Creating a Series
• A Series is a one-dimensional array-like object containing an
array of data (of any NumPy data type) and an associated
array of data labels, called its index.
• The simplest Series is formed from only an array of data
• We can use the Series method to create a series object

obj1 = pd.Series([4, 7, -5, 3])

obj1

0 4
1 7
2 -5
3 3
dtype: int64
Creating a Series
• The string representation of a series displayed interactively
shows the index on the left and the values on the right.
• By default, the index is consisted of integers 0 through N – 1
• Often it will be desirable to create a series with an index
identifying each data point

obj2 = pd.Series([4, 7, -5, 3], index = ['a', 'b', 'c', 'd'])

obj2

a 4
b 7
c -5
d 3
dtype: int64
Creating a Series
• We can also transform a Python dictionary to a Series

sdata = {'Ohio': 35000, 'Texas': 71000,

'Oregon': 16000, 'Utah': 5000}
obj3 = pd.Series(sdata)
obj3

Ohio 35000
Texas 71000
Oregon 16000
Utah 5000
dtype: int64
Selecting Index and Values in a Series
• We can get the array representation and index object of a
series object via its values and index attributes by calling the
values and index methods.
• A series’ index can be altered in place by assignment
obj2.values array([ 4, 7, -5, 3], dtype=int64)

obj2.index Index(['a', 'b', 'c', 'd'], dtype='object')

obj2.index = ['Bob', 'Steve', 'Jeff', 'Ryan']

obj2
Bob 4
Steve 7
Jeff -5
Ryan 3
dtype: int64
Selecting Index and Values in a Series
• We can use the index to find values and also change the
values.
• Very much like a dictionary

obj2['a'] 4

obj2['d'] = 6 c -5
obj2[['c','a','d']] a 4
d 6
dtype: int64
Series Operations
• Array operations, such as filtering with a Boolean array,
scalar multiplication, or applying math functions, will
preserve the index-value link

obj2[obj2 > 0] obj2 * 2 np.exp(obj2)

a 4 a 8 a 54.598150
b 7 b 14 b 1096.633158
d 6 c -10 c 0.006738
dtype: int64 d 12 d 403.428793
dtype: int64 dtype: float64
Replacing Values
• The replace method provides a simple and flexible way to
modify a subset of values
0 1.0
1 -999.0
data = pd.Series([1., -999., 2., 2 2.0
-999., -1000., 3.]) 3 -999.0
data 4 -1000.0
5 3.0
dtype: float64

0 1.0
1 NaN
import numpy as np 2 2.0
data.replace(-999, np.nan) 3 NaN
4 -1000.0
5 3.0
dtype: float64
DataFrame
• A DataFrame represents a tabular, spreadsheet-like data
structure containing an ordered collection of columns, each
of which can be a different value type (numeric, string,
Boolean, etc.).
• The DataFrame has both a row and column index; it can be
thought of as a dictionary of Series (one for all sharing the
same index).
Creating a DataFrame
• One of the most common way to create a DataFrame is from
a dictionary of equal-length lists
• The resulting DataFrame will have its index assigned
automatically as with Series
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
'year': [2000, 2001, 2002, 2001, 2002],
'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
frame1 = pd.DataFrame(data)
frame1

state year pop

0 Ohio 2000 1.5
1 Ohio 2001 1.7
2 Ohio 2002 3.6
3 Nevada 2001 2.4
4 Nevada 2002 2.9
Creating a DataFrame
• If you specify a sequence of columns, the DataFrame’s
columns will be exactly what you pass

pd.DataFrame(data, columns =
['year', 'state', 'pop'])

year state pop

0 2000 Ohio 1.5
1 2001 Ohio 1.7
2 2002 Ohio 3.6
3 2001 Nevada 2.4
4 2002 Nevada 2.9
Creating a DataFrame
• Call the describe method to see the basic statistics of the
numerical values in the dataframe

frame1.describe()
Adding a Column to a DataFrame
• If you pass a column that isn’t contained in data, it will
appear with NaN values in the result

frame2 = pd.DataFrame(data,
columns = ['year', 'state', 'pop', 'debt'],
index = ['one', 'two', 'three', 'four', 'five'])
frame2

year state pop debt

one 2000 Ohio 1.5 NaN
two 2001 Ohio 1.7 NaN
three 2002 Ohio 3.6 NaN
four 2001 Nevada 2.4 NaN
five 2002 Nevada 2.9 NaN
Selecting Index and Values from a DataFrame

• We can use the attribute and the column label to find the
values in a column

frame2.columns Index(['year', 'state', 'pop', 'debt'], dtype='object')

frame2['state'] frame2.state

one Ohio
two Ohio
three Ohio
four Nevada
five Nevada
Name: state, dtype: object
Selecting Index and Values from a DataFrame

• To retrieve values from a row:

• Use loc on the row [label]
• Or, use iloc on the original row [index]

frame2.loc['three'] frame2.iloc[2]

year 2002
state Ohio
pop 3.6
debt NaN
Name: three, dtype: object
Modify Values in a DataFrame
• Values in columns can be modified by assignment
• For example, the empty 'debt' column could be assigned a
scalar value or an array of values
year state pop debt
frame2.debt = 16.5 one 2000 Ohio 1.5 16.5
frame2 two 2001 Ohio 1.7 16.5
three 2002 Ohio 3.6 16.5
four 2001 Nevada 2.4 16.5
five 2002 Nevada 2.9 16.5

year state pop debt

frame2['debt'] = np.arange(5) one 2000 Ohio 1.5 0
frame2 two 2001 Ohio 1.7 1
three 2002 Ohio 3.6 2
four 2001 Nevada 2.4 3
five 2002 Nevada 2.9 4
Reading from a CSV file
• pandas features a number of functions for reading tabular
data as a DataFrame object
• The most used one is read_csv
• Let’s use advertising.csv

,TV,radio,newspaper,sales header
1,230.1,37.8,69.2,22.1
2,44.5,39.3,45.1,10.4
3,17.2,45.9,69.3,9.3 data
4,151.5,41.3,58.5,18.5
5,180.8,10.8,58.4,12.9
Reading from a CSV file

df = pd.read_csv(r'../advertising.csv')
df

Our data already has index

type(df) pandas.core.frame.DataFrame
Reading from a CSV file
• If data in the file already has index, then you can specify it
through index_col =

df = pd.read_csv(r'../data/advertising.csv',
index_col = 0)
df
Reading from a CSV file
• Use head or tail to narrow down the view of data
• The argument is the number of observations you want to
view. Default is 5

df.head(3)

df.tail(2)
Reading from a CSV file
• By default, it reads the first row as the header
• We can specify if the data has no header by header = None
• We can also assign header names by passing a list to the
argument names =

pd.read_csv(r'../data/advertising.csv',
index_col = 0, header = None).head(3)

pd.read_csv(r'../data/advertising.csv',
index_col = 0, names = ['col1', 'col2',
'col3', 'col4']).head(3)
Writing to a CSV file
• Using DataFrame’s to_csv method, we can write the data
out to a CSV file

data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],

'year': [2000, 2001, 2002, 2001, 2002],
'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
frame = pd.DataFrame(data)

frame.to_csv(r'../data/states.csv')

,state,year,pop
0,Ohio,2000,1.5
First column name is 1,Ohio,2001,1.7
empty since it’s the index 2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9
Writing to a CSV file
• By default, the delimiter is comma, the first row is header,
and there is index
• We can also specify the delimiter by sep =
• We can specify no header by header = False
• We can also specify no index by index = False

frame.to_csv(r'../data/states1.csv',
sep = '|', index = False, header = False)

Ohio|2000|1.5
Ohio|2001|1.7
Ohio|2002|3.6
Nevada|2001|2.4
Nevada|2002|2.9
Summary
• Creating a series
• Creating a dataframe
• Modifying a dataframe
• Removing duplicates
• Reading from and writing to csv files
Exercise 3
• Given a dataframe of three columns, write a program to add
a 4th column which is a lag of the 1st column, a 5th column
which is the lead of the 2nd column
Exercise 4
• Read “advertising.csv”. Take the first 5 rows, and add a
‘social’ column and populate the column with 5 random
values rounded to one decimal place. Then write the new
data to a csv file named “ads_new.csv”
More Exercises
• NumPy and pandas are a big universe. We can only cover a
very thin surface to get you started
• Practice is the only way to progress
• Find more exercises and solutions below

• NumPy: https://github.com/rougier/numpy-100
• pandas:
https://www.machinelearningplus.com/python/101-
pandas-exercises-python/
Before We Move On
Text, CSV, JSON

Regular Expression
Managing NumPy
Data
Pandas

StatsModels

Web Beautiful Soup

Scraping

Tableau
Data
Visualization Matplotlib
Install BeautifulSoup 4
• To prepare for the coming sessions, you need to install a
powerful Python package called BeautifulSoup 4.
• Following the instructions on the following page to
download and install the package
https://www.crummy.com/software/BeautifulSoup/
• The latest version is 4.9
• To test if you have successfully installed the package, run
the following code in Python. If it does not show anything,
then it’s installed.
from bs4 import BeautifulSoup

My Book of Python Computing - Abhijit Kar Gupta
50% (2)
My Book of Python Computing - Abhijit Kar Gupta
385 pages
Unit8_DataAnalyticsandVisualizationpdf__2023_10_17_09_16_46
No ratings yet
Unit8_DataAnalyticsandVisualizationpdf__2023_10_17_09_16_46
64 pages
Module Numpy
No ratings yet
Module Numpy
67 pages
Python-Unit-4
No ratings yet
Python-Unit-4
43 pages
Numpy Python
No ratings yet
Numpy Python
36 pages
UNIT 3 (1)
No ratings yet
UNIT 3 (1)
56 pages
Numpy
No ratings yet
Numpy
64 pages
Data Science Handwritten Notes - 3
No ratings yet
Data Science Handwritten Notes - 3
26 pages
Python Sem v Portion 2
No ratings yet
Python Sem v Portion 2
29 pages
NumPy
No ratings yet
NumPy
4 pages
Introduction To NumPy
No ratings yet
Introduction To NumPy
27 pages
vertopal.com_C1_W1_Lab_1_introduction_to_numpy_arrays
No ratings yet
vertopal.com_C1_W1_Lab_1_introduction_to_numpy_arrays
12 pages
Python Presentation 3
No ratings yet
Python Presentation 3
44 pages
Numpy
No ratings yet
Numpy
71 pages
45B AIML Practical1.1
No ratings yet
45B AIML Practical1.1
57 pages
Num Py
No ratings yet
Num Py
31 pages
Mastering NumPy for Data Science
No ratings yet
Mastering NumPy for Data Science
161 pages
Unit4
No ratings yet
Unit4
49 pages
CAP776 Numpy
No ratings yet
CAP776 Numpy
71 pages
Numpy Complete Notes
No ratings yet
Numpy Complete Notes
68 pages
FALLSEM2023-24 CSI3007 ETH VL2023240104352 2023-09-27 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSI3007 ETH VL2023240104352 2023-09-27 Reference-Material-I
47 pages
Unit-3_PSC
No ratings yet
Unit-3_PSC
62 pages
Numpy & Pandas
No ratings yet
Numpy & Pandas
13 pages
Python Numpy
No ratings yet
Python Numpy
48 pages
Python Unit 3
No ratings yet
Python Unit 3
38 pages
02_Appendix_2_Python_Packages (1)
No ratings yet
02_Appendix_2_Python_Packages (1)
25 pages
Numerical Methods Using Python: (MCSC-202)
No ratings yet
Numerical Methods Using Python: (MCSC-202)
34 pages
PYTHON UNIT-5 Part-B
No ratings yet
PYTHON UNIT-5 Part-B
3 pages
05 NumPy - Arrays and Vectorized Computation
No ratings yet
05 NumPy - Arrays and Vectorized Computation
47 pages
Numpy (Numerical Python)
No ratings yet
Numpy (Numerical Python)
80 pages
Unit Iii Using Numpy
No ratings yet
Unit Iii Using Numpy
23 pages
Numpy
No ratings yet
Numpy
27 pages
Unit Vi
No ratings yet
Unit Vi
60 pages
Numpy Tutorial
No ratings yet
Numpy Tutorial
19 pages
10 Numpy
No ratings yet
10 Numpy
39 pages
Unit 1
No ratings yet
Unit 1
170 pages
Print
No ratings yet
Print
296 pages
Machine Learning Part 02
No ratings yet
Machine Learning Part 02
161 pages
Efficient Computing with NumPy
No ratings yet
Efficient Computing with NumPy
73 pages
Numpy, Pandas and Matplotlib
No ratings yet
Numpy, Pandas and Matplotlib
60 pages
Tentative NumPy Tutorial
No ratings yet
Tentative NumPy Tutorial
30 pages
3rd Unit
100% (1)
3rd Unit
75 pages
Week2-1 Numpy
No ratings yet
Week2-1 Numpy
43 pages
Python Lectures
No ratings yet
Python Lectures
29 pages
B14_LT2_07_Numpy Matplotlib Pandas
No ratings yet
B14_LT2_07_Numpy Matplotlib Pandas
101 pages
NumPy Basics
No ratings yet
NumPy Basics
23 pages
NumPy Quickstart
No ratings yet
NumPy Quickstart
26 pages
python-notes-BCC-302 (Unit - 05)
No ratings yet
python-notes-BCC-302 (Unit - 05)
25 pages
APznzaaqszKXWidB7ZcUyElwKtMW9baPO5uwgBspe7mup3-RAjUbFs9a5J0SWJx5baBOtL8oMAExrcfE-xNmC3fbtEqgqkuUDV3hM3RFDNeuJc8K5DkloC95lixWjd8hSK4WWqCMirKOpcOSGSRNGGugDyjrAf-wzcSS5bC_l3kfkAro7lqM_CfNu8jP_XQRy6CFb
No ratings yet
APznzaaqszKXWidB7ZcUyElwKtMW9baPO5uwgBspe7mup3-RAjUbFs9a5J0SWJx5baBOtL8oMAExrcfE-xNmC3fbtEqgqkuUDV3hM3RFDNeuJc8K5DkloC95lixWjd8hSK4WWqCMirKOpcOSGSRNGGugDyjrAf-wzcSS5bC_l3kfkAro7lqM_CfNu8jP_XQRy6CFb
51 pages
Value Added Course: Programming in Python and Machine Learning UNIT-2
No ratings yet
Value Added Course: Programming in Python and Machine Learning UNIT-2
41 pages
Numerical Python Numpy
No ratings yet
Numerical Python Numpy
28 pages
Numpy
No ratings yet
Numpy
44 pages
15.NUMPY
No ratings yet
15.NUMPY
32 pages
p
No ratings yet
p
27 pages
1 Numpy
No ratings yet
1 Numpy
41 pages
03-Python Libraries - Numpy - Matplotlib
No ratings yet
03-Python Libraries - Numpy - Matplotlib
56 pages
Mds1111 Merged Numbered (1)
No ratings yet
Mds1111 Merged Numbered (1)
41 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
300+ Python Algorithms: Mastering the Art of Problem-Solving
From Everand
300+ Python Algorithms: Mastering the Art of Problem-Solving
Hernando Abella
5/5 (1)
VectorBloxPG
No ratings yet
VectorBloxPG
28 pages
PATSY 0.4 Quickstart
No ratings yet
PATSY 0.4 Quickstart
12 pages
Dokumen - Pub Python 3 Module Examples
No ratings yet
Dokumen - Pub Python 3 Module Examples
109 pages
Python For Finance Sample Chapter
No ratings yet
Python For Finance Sample Chapter
21 pages
Loan Data Analysis and Approval Prediction System For
No ratings yet
Loan Data Analysis and Approval Prediction System For
17 pages
Tugas1 - 4 Analisis Data Talitha Syahda Aguslin (20037061)
No ratings yet
Tugas1 - 4 Analisis Data Talitha Syahda Aguslin (20037061)
27 pages
Numpy Data Analytics
No ratings yet
Numpy Data Analytics
13 pages
Bachelor of Technology: Diabetes Disease Prediction Using Machine Learning
No ratings yet
Bachelor of Technology: Diabetes Disease Prediction Using Machine Learning
58 pages
Numpy and Matplotlib: Purushothaman.V.N March 10, 2011
No ratings yet
Numpy and Matplotlib: Purushothaman.V.N March 10, 2011
27 pages
Python Pandas Interview Questions
100% (1)
Python Pandas Interview Questions
17 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
72 pages
ballisticCalcExplain
No ratings yet
ballisticCalcExplain
31 pages
Week 2 Exercise 03 - NumPy Operations
No ratings yet
Week 2 Exercise 03 - NumPy Operations
4 pages
Python_frontmatter
No ratings yet
Python_frontmatter
12 pages
Pandas Notes
No ratings yet
Pandas Notes
9 pages
Python Business Intelligence Cookbook - Sample Chapter
No ratings yet
Python Business Intelligence Cookbook - Sample Chapter
22 pages
Teslaaa 1
No ratings yet
Teslaaa 1
20 pages
Cameramodule
No ratings yet
Cameramodule
12 pages
Python For Control Purposes
No ratings yet
Python For Control Purposes
103 pages
8322346-Practical File Artificial Intelligence Class 10 for 2023-24 -Final
No ratings yet
8322346-Practical File Artificial Intelligence Class 10 for 2023-24 -Final
16 pages
Final Project Report
No ratings yet
Final Project Report
76 pages
1. Installing GSAS-II_ Overview — GSAS-II web documentation 1.0 documentation
No ratings yet
1. Installing GSAS-II_ Overview — GSAS-II web documentation 1.0 documentation
5 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
Data Analytics With Python-1
No ratings yet
Data Analytics With Python-1
12 pages
(R18A0513) Python Programming
No ratings yet
(R18A0513) Python Programming
183 pages
Sandeep S Ghugre Web Based Workshop On Teaching Physics at The UG & PG Level Using Python
No ratings yet
Sandeep S Ghugre Web Based Workshop On Teaching Physics at The UG & PG Level Using Python
21 pages
AdvancedBooks - Python Wiki
0% (1)
AdvancedBooks - Python Wiki
104 pages
Deep Learning Project for Computer Vision with Python 2022
No ratings yet
Deep Learning Project for Computer Vision with Python 2022
297 pages
CODE: PCC-IT-601 Subject Name: Data Analysis Using Python No of Credits: 3
100% (1)
CODE: PCC-IT-601 Subject Name: Data Analysis Using Python No of Credits: 3
2 pages

HKU - 7001 - 3.2 Managing Data II

Uploaded by

HKU - 7001 - 3.2 Managing Data II

Uploaded by

Managing Data II

MSBA7001 Business Intelligence and Analytics

Instructor: Dr. DING Chao

It makes possible It provides high-

[7, 2, 9, 10] An array of rank 1, i.e., it has 1 axis of length 4

data1 = [6, 7.5, 8, 0, 1]

Structure of arr2.shape (2, 4)

arr3 = np.array([0, 1, 2, 3, 4, 5, 6, 7])

np.zeros((3,6)) array([[0., 0., 0., 0., 0., 0.],

np.empty((2,3,2)) array([[[1.05442863e-311, 2.86558075e-322],

• arange has three arguments

• arange(8) is equivalent to arange(0,8,1)

array([ 0. , 2.5, 5. , 7.5, 10. ])

np.linspace(0, 10, num = 5, endpoint = True, dtype = float)

Starting end Number If True, end is The type of

np.linspace(0, 10, num = 5, endpoint = False, dtype = int)

array([0.47487993, 0.55756924, 0.3188104 , 0.27839417, 0.11052682])

array([[-1.08369819, -0.12103409, -0.98555855],

np.random.normal(mu, sigma, 10)

mean Standard size

• For a full list of available distributions, see

• np.nan returns Not a Number (NaN). This is used when

arr * arr array([[ 1, 4, 9],

1 / arr array([[1. , 0.5 , 0.33333333],

zero_arr = np.zeros(5, int) array([0, 0, 0, 0, 0])

zero_arr.fill(4) array([4, 4, 4, 4, 4])

zero_arr[0] = 5.8 array([5, 4, 4, 4, 4])

5.8 is truncated since

np.sqrt(arr) array([0. , 1. , 1.41421356, 1.73205081,

np.exp(arr) array([1.00000000e+00, 2.71828183e+00,

x = np.random.randn(8) [0.59054432 0.79793215 -0.44441787

y = np.random.randn(8) [0.55356053 -1.54007186 -0.40315681

np.maximum(x, y) array([0.59054432, 0.79793215, -0.40315681,

• For a full list of ufuncs, go to

values2 = np.loadtxt(r'../data/nparray.txt’, dtype = float)

Output: [6.3, 6. , 5.7, 4.7, 3.7, 2. , 3.7, 3. , 3. , 2. ]

• Hint: numpy.cumsum() returns an array of cumulative sum

arr = np.array([2, 3, 1, -1, 3, 5])

obj1 = pd.Series([4, 7, -5, 3])

obj2 = pd.Series([4, 7, -5, 3], index = ['a', 'b', 'c', 'd'])

sdata = {'Ohio': 35000, 'Texas': 71000,

obj2.index Index(['a', 'b', 'c', 'd'], dtype='object')

obj2.index = ['Bob', 'Steve', 'Jeff', 'Ryan']

obj2[obj2 > 0] obj2 * 2 np.exp(obj2)

state year pop

year state pop

year state pop debt

frame2.columns Index(['year', 'state', 'pop', 'debt'], dtype='object')

• To retrieve values from a row:

year state pop debt

Our data already has index

data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],

Web Beautiful Soup

You might also like