Numpy Array
Numpy Array
4. NUMPY ARRAYS
NumPy is a powerful library for numerical computing in Python. It provides support for arrays, which
are more efficient than Python lists for numerical operations. Here are some basic and advanced
operations you can perform with NumPy arrays.
Creating NumPy Arrays
numpy.array(): Create an array from a Python list or tuple
numpy.zeros(): Create an array filled with zeros
numpy.ones(): Create an array filled with ones
numpy.random.rand(): Create an array with random values
NumPy Array Properties
shape: The number of dimensions and size of each dimension
dtype: The data type of the array elements
size: The total number of elements in the array
Indexing and Slicing
arr[index]: Access a single element
arr[start:stop:step]: Access a slice of elements
arr[start:stop]: Access a slice of elements with default step size 1
Basic Operations
arr + arr: Element-wise addition
arr - arr: Element-wise subtraction
arr * arr: Element-wise multiplication
arr / arr: Element-wise division
Advanced Operations
numpy.dot(): Compute the dot product of two arrays
numpy.cross(): Compute the cross product of two arrays
23 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242
Output: Output:
4 15
[3 4 5] 3.0
Reshaping an array Array Comparison
import numpy as np import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6]) arr1 = np.array([1, 2, 3])
arr = arr.reshape(2, 3) arr2 = np.array([1, 2, 4])
print(arr) print(np.equal(arr1, arr2))
Output: Output:
[[1 2 3] [ True True False]
[4 5 6]]
Concatenating arrays Splitting an array
import numpy as np import numpy as np
arr1 = np.array([1, 2, 3]) arr = np.array([1, 2, 3, 4, 5, 6])
arr2 = np.array([4, 5, 6]) arr1, arr2 = np.split(arr, 2)
arr = np.concatenate((arr1, arr2)) print(arr1)
print(arr) print(arr2)
Output: Output:
[1 2 3 4 5 6] [1 2 3]
[4 5 6]
24 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242
import numpy as np
# Creating arrays
a = np.array([1, 2, 3])
b = np.array([(1, 2, 3), (4, 5, 6)])
c = np.arange(0, 10, 2)
d = np.linspace(0, 1, 5)
e = np.zeros((2, 3))
f = np.ones((2, 3))
g = np.eye(3)
h = np.random.random((2, 3))
# Displaying arrays
print("Array a:\n", a)
print("Array b:\n", b)
print("Array c (arange):\n", c)
print("Array d (linspace):\n", d)
print("Array e (zeros):\n", e)
print("Array f (ones):\n", f)
print("Array g (identity matrix):\n", g)
print("Array h (random values):\n", h)
# Array properties
print("Shape of array b:", b.shape)
print("Size of array b:", b.size)
print("Data type of array a:", a.dtype)
# Array operations
i = np.array([1, 2, 3])
j = np.array([4, 5, 6])
print("i + j:\n", i + j)
print("i * j:\n", i * j)
# Matrix operations
k = np.array([[1, 2], [3, 4]])
l = np.array([[5, 6], [7, 8]])
print("Matrix product of k and l:\n", np.dot(k, l))
# Aggregate functions
m = np.array([1, 2, 3, 4, 5])
print("Sum of array m:", np.sum(m))
print("Mean of array m:", np.mean(m))
print("Standard deviation of array m:", np.std(m))
25 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242
5. UNIVERSAL FUNCTIONS
Universal functions (ufuncs) in NumPy are functions that operate element-wise on arrays, supporting
broadcasting, type casting, and other standard features. They are essential for performing vectorized
operations, which are both more concise and more efficient than using Python loops.
Key Characteristics of Ufuncs
1. Element-wise Operations: Ufuncs apply operations element-wise, which means they operate
on each element of the input arrays independently.
2. Broadcasting: Ufuncs support broadcasting, which allows them to work with arrays of different
shapes in a flexible manner.
3. Performance: Ufuncs are implemented in C and are optimized for performance, making them
much faster than equivalent Python loops.
Common Ufuncs;-
import numpy as np
a = np.array([1, 2, 3])
np.add
b = np.array([4, 5, 6])
result = np.add(a, b)
print("Addition:", result)
result = np.subtract(a, b)
Arithmetic Operations np.subtract
print("Subtraction:", result)
result = np.multiply(a, b)
np.multiply
print("Multiplication:", result)
result = np.divide(a, b)
np.divide
print("Division:", result)
result = np.power(a, 2)
np.power
print("Power:", result)
result = np.sqrt(a)
Square Root: np.sqrt
print("Square Root:", result)
result = np.exp(a)
Exponential: np.exp
print("Exponential:", result)
Mathematical Functions result = np.log(a)
Logarithm: np.log
print("Logarithm:", result)
angle = np.array([0, np.pi/2, np.pi])
Trigonometric Functions: print("Sine:", np.sin(angle))
np.sin, np.cos, np.tan print("Cosine:", np.cos(angle))
print("Tangent:", np.tan(angle))
result = np.mean(a)
Mean: np.mean
print("Mean:", result)
Standard Deviation: result = np.std(a)
Statistical Functions
np.std print("Standard Deviation:", result)
result = np.sum(a)
Sum: np.sum
print("Sum:", result)
result = np.greater(a, b)
Greater Than: np.greater
print("Greater Than:", result)
result = np.less(a, b)
Less Than: np.less
Comparison Operators print("Less Than:", result)
result = np.equal(a, b)
Equal: np.equal print("Equal:", result)
26 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242
Example Code:
import numpy as np
# Arrays
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
# Arithmetic Operations
print("Addition:", np.add(a, b))
print("Subtraction:", np.subtract(a, b))
print("Multiplication:", np.multiply(a, b))
print("Division:", np.divide(a, b))
# Mathematical Functions
print("Square Root:", np.sqrt(a))
print("Exponential:", np.exp(a))
print("Logarithm:", np.log(a))
# Trigonometric Functions
angle = np.array([0, np.pi/2, np.pi])
print("Sine:", np.sin(angle))
print("Cosine:", np.cos(angle))
print("Tangent:", np.tan(angle))
# Statistical Functions
print("Mean:", np.mean(a))
print("Standard Deviation:", np.std(a))
print("Sum:", np.sum(a))
# Comparison Operators
print("Greater Than:", np.greater(a, b))
print("Less Than:", np.less(a, b))
print("Equal:", np.equal(a, b))
6. AGGREGATIONS
Aggregation in data science refers to the process of summarizing or combining multiple data points to produce a
single result or a smaller set of results. This is a fundamental concept used to simplify and analyze large datasets,
making it easier to draw insights and make decisions. Aggregation can be performed in various ways, depending
on the type of data and the analysis being conducted.
Definition: Aggregation is the process of combining multiple pieces of data to produce a summary
result.
27 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242
Purpose: The primary purpose of aggregation is to simplify and summarize data, making it easier to
analyze and interpret. This helps in identifying trends, patterns, and anomalies.
6. 1 AGGREGATION TECHNIQUES
Group By: Grouping data based on one or more columns and then applying an aggregation function. For
example, grouping sales data by region and then calculating the total sales per region.
Pivot Tables: Reshaping data by turning unique values from one column into multiple columns,
providing a summarized dataset.
Rolling Aggregation: Calculating aggregates over a rolling window, such as a moving average.
Function: SUM()
total_sales = df['sales'].sum()
2. Mean (Average):
Calculates the average value of a dataset.
Calculate the mean value of a column or group of data points.
Function: AVG()
average_age = df['age'].mean()
3. Median:
Finds the middle value in a dataset, which is less affected by outliers than the mean.
median_income = df['income'].median()
4. Mode:
Identifies the most frequently occurring value in a dataset.
most_common_category = df['category'].mode()
most_common_category = df['category'].mode()
5. Count:
Counts the number of entries in a dataset, often used to determine the number of occurrences of a
specific value.
count_of_sales = df['sales'].count()
6. Min and Max:
Finds the minimum and maximum values in a dataset.
Find the maximum or minimum value in a column or group of data points.
Function: MAX()
MIN():
28 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242
min_salary = df['salary'].min()
max_salary = df['salary'].max()
7. Standard Deviation and Variance:
Measures the spread or dispersion of the data around the mean.
Calculate the spread or dispersion of a column or group of data points.
Function: STDDEV()
VAR()
std_dev = df['scores'].std()
variance = df['scores'].var()
8. Group By:
Aggregates data based on one or more categories. This is often used in conjunction with other
aggregation functions.
Group data by one or more columns and apply aggregations.
sales_by_region = df.groupby('region')['sales'].sum()
6.2 APPLICATIONS OF AGGREGATION
Descriptive Statistics:
Aggregation is used to describe the main features of a dataset quantitatively. For example, summarizing
the central tendency and dispersion of data.
Data Cleaning:
Aggregation can help in identifying and handling missing values, outliers, and inconsistencies in the
data.
Data Visualization:
Aggregated data is often used to create plots and charts, making it easier to visualize trends and
patterns.
Feature Engineering:
Aggregation can be used to create new features from existing data, improving the performance of
machine learning models.
Reporting:
Aggregated data is commonly used in business reports and dashboards to provide a high-level overview
of key metrics.
Example Code: Using Pandas
import pandas as pd
# Sample data
data = {
'region': ['North', 'South', 'East', 'West', 'North', 'South'],
'sales': [250, 150, 200, 300, 400, 100],
'expenses': [100, 50, 80, 120, 150, 60]
29 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242
}
df = pd.DataFrame(data)
# Sum of sales
total_sales = df['sales'].sum()
# Average expenses
average_expenses = df['expenses'].mean()
# Sales by region
sales_by_region = df.groupby('region')['sales'].sum()
print(f"Total Sales: {total_sales}")
print(f"Average Expenses: {average_expenses}")
print("Sales by Region:")
print(sales_by_region)
7. COMPUTATION ON ARRAYS
Computation on arrays is a fundamental aspect of data science, enabling efficient data manipulation,
analysis, and machine learning. Arrays, especially as implemented in libraries like NumPy, provide a
powerful way to handle large datasets and perform a wide range of mathematical operations. Here,
we'll explore the essential aspects of array computations in data science.
Key Concepts
1. Array Creation and Initialization
Creating and initializing arrays is the first step in performing any computation. Arrays can be created
from lists, using functions like np.array, or from scratch using functions like np.zeros, np.ones, and
np.full.
import numpy as np
# From a list
arr = np.array([1, 2, 3, 4])
# From scratch
zeros = np.zeros((3, 3))
ones = np.ones((2, 2))
full = np.full((2, 3), 7)
2. Array Operations
NumPy supports a variety of element-wise operations, such as addition, subtraction, multiplication, and
division, as well as more complex mathematical functions like exponentiation, logarithms, and
trigonometric functions.
# Element-wise operations
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
30 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242
4. Aggregation
Aggregation functions like sum, mean, median, min, and max help summarize data.
arr = np.array([[1, 2, 3], [4, 5, 6]])
total_sum = np.sum(arr)
column_mean = np.mean(arr, axis=0)
row_max = np.max(arr, axis=1)
5. Linear Algebra
NumPy provides support for linear algebra operations, including dot products, matrix multiplication,
determinants, and inverses.
# Dot product
vec1 = np.array([1, 2])
vec2 = np.array([3, 4])
dot_product = np.dot(vec1, vec2)
# Matrix multiplication
mat1 = np.array([[1, 2], [3, 4]])
mat2 = np.array([[5, 6], [7, 8]])
mat_mult = np.matmul(mat1, mat2)
7.1 BROADCASTING
Broadcasting is a powerful mechanism in NumPy (a popular library for numerical computations in
Python) that allows for element-wise operations on arrays of different shapes. When performing
arithmetic operations, NumPy automatically stretches the smaller array along the dimension with size
1 to match the shape of the larger array. This allows for efficient computation without the need for
explicitly replicating the data.
Broadcasting Rules:
To understand how broadcasting works, it's important to know the rules that govern it:
31 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242
If the arrays differ in their number of dimensions, the shape of the smaller array is padded with
ones on its left side.
If the shape of the two arrays does not match in any dimension, the array with shape equal to 1
in that dimension is stretched to match the other shape.
If in any dimension the sizes are different and neither is equal to 1, an error is raised
Broadcasting follows a set of rules to make arrays compatible for element-wise operations:
Align Shapes: If the arrays have different numbers of dimensions, the shape of the smaller array
is padded with ones on its left side.
Shape Compatibility: Arrays are compatible for broadcasting if, in all dimensions, the following
is true:The dimension sizes are equal, orOne of the dimensions is 1.
Result Shape: The resulting shape is the maximum size along each dimension from the input
arrays.
Examples of Broadcasting
Example 1: Adding a Scalar to an Array
import numpy as np
arr = np.array([1, 2, 3])
scalar = 5
result = arr + scalar
print(result)
Output: [6 7 8]
Output: [[2 4 6]
[5 7 9]]
Example 3: More Complex Broadcasting
arr1 = np.array([[1, 2, 3], [4, 5, 6]])
arr2 = np.array([[1], [2]])
result = arr1 + arr2
print(result)
Output: [[2 3 4]
[6 7 8]]
Practical Applications
Normalizing Data
Broadcasting is useful for normalizing data, subtracting the mean, and dividing by the standard
deviation for each feature.
32 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242
Output:
[[-1.22474487 -1.22474487 -1.22474487]
[0. 0. 0. ]
[ 1.22474487 1.22474487 1.22474487]]
Element-wise Operations
Broadcasting simplifies scaling each column of a matrix by a different factor.
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
scaling_factors = np.array([0.1, 0.2, 0.3])
scaled_matrix = matrix * scaling_factors
print(scaled_matrix)
Output:
[[0.1 0.4 0.9]
[0.4 1. 1.8]
[0.7 1.6 2.7]]
8. FANCY INDEXING
Fancy indexing, also known as advanced indexing, is a technique in data science and programming
(particularly in Python with NumPy and pandas) that allows for more flexible and powerful ways to
access and manipulate data arrays or dataframes. It involves using arrays or sequences of indices to
select specific elements or slices from an array or dataframe.
Fancy indexing refers to using arrays of indices to access multiple elements of an array simultaneously.
Instead of accessing elements one by one, you can pass a list or array of indices to obtain a subset of
elements. This technique can be used for both reading from and writing to arrays.
NumPy Fancy Indexing
NumPy is a fundamental package for scientific computing with Python, providing support for
arrays and matrices.
In NumPy, fancy indexing is done by passing arrays of indices inside square brackets. Here’s an
example:
import numpy as np
# Create a NumPy array
arr = np.array([10, 20, 30, 40, 50])
# Fancy indexing with a list of indices
indices = [0, 2, 4]
subset = arr[indices]
print(subset)
33 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242
Boolean Indexing
Another form of fancy indexing is boolean indexing, where you use boolean arrays to select elements:
mask = arr > 30
subset = arr[mask]
print(subset)
9. SORTING ARRAYS
Sorting means putting elements in an ordered sequence.
Ordered sequence is any sequence that has an order corresponding to elements, like numeric or
alphabetical, ascending or decending.
The NumPy ndarray object has a function called sort(). That will sort a specified array.
Sorting in NumPy
1. Simple Sorting
numpy.sort() returns a sorted copy of the array.
numpy.ndarray.sort() sorts the array in-place.
import numpy as np
arr = np.array([3, 1, 2, 5, 4])
sorted_arr = np.sort(arr)
print(sorted_arr) # Output: [1 2 3 4 5]
arr.sort()
print(arr)
Output: [1 2 3 4 5]
2. Sorting Multi-dimensional Arrays
our can sort along a specified axis using the axis parameter.
arr_2d = np.array([[3, 1, 2], [5, 4, 6]])
sorted_arr_2d = np.sort(arr_2d, axis=0) # Sort along the rows
print(sorted_arr_2d)
Output: [[3 1 2]
[5 4 6]]
sorted_arr_2d = np.sort(arr_2d, axis=1) # Sort along the columns
print(sorted_arr_2d)
Output: [[1 2 3]
[4 5 6]]
3. Argsort for Indirect Sorting
numpy.argsort() returns the indices that would sort an array.
arr = np.array([3, 1, 2, 5, 4])
indices = np.argsort(arr)
print(indices)
Output: [1 2 0 4 3]
sorted_arr = arr[indices]
print(sorted_arr)
Output: [1 2 3 4 5]
35 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242
Output: [(2, 'second', 100) (3, 'third', 150) (1, 'first', 200)]
Custom Sorting
You can use numpy.lexsort() for custom sorting.
names = np.array(['Betty', 'John', 'Alice', 'Alice'])
ages = np.array([25, 34, 30, 22])
indices = np.lexsort((ages, names))
sorted_data = list(zip(names[indices], ages[indices]))
print(sorted_data)
36 | P a g e
Downloaded by Kalai ilaiya ([email protected])