0% found this document useful (0 votes)
6 views14 pages

Numpy Array

The document provides an overview of data science fundamentals, focusing on the use of NumPy for numerical computing in Python. It covers key concepts such as creating and manipulating NumPy arrays, performing basic and advanced operations, and utilizing universal functions for efficient computations. Additionally, it discusses aggregation techniques for data analysis, highlighting their purpose and applications in summarizing datasets.

Uploaded by

nl3454343
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views14 pages

Numpy Array

The document provides an overview of data science fundamentals, focusing on the use of NumPy for numerical computing in Python. It covers key concepts such as creating and manipulating NumPy arrays, performing basic and advanced operations, and utilizing universal functions for efficient computations. Additionally, it discusses aggregation techniques for data analysis, highlighting their purpose and applications in summarizing datasets.

Uploaded by

nl3454343
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

lOMoARcPSD|28284242

OCS353 - DATA SCIENCE FUNDAMENTALS

Environment • %env: List environment variables


8.
Management • %store: Store a variable in the IPython database

Extension • %load_ext: Load an IPython extension


9.
Management • %unload_ext: Unload an IPython extension
Help and
• %help: Show help for a magic command or function
10. Documentation
• %quickref: Show a quick reference guide for IPython

Memory Usage • %memit: Measure memory usage of a statement or expression


11.
• %mprun: Profile memory usage of a statement or expression

• %parallel: Run a command in parallel


Parallel Computing
12. • %px: Run a command in parallel with Xtrae
• %pxresult: Get the result of a parallel computation
• %install_ext: Install an IPython extension
Other • %install_nbext: Install a Jupyter notebook extension
13.
• %uninstall_ext: Uninstall an IPython extension
• %uninstall_nbext: Uninstall a Jupyter notebook extension

4. NUMPY ARRAYS

NumPy is a powerful library for numerical computing in Python. It provides support for arrays, which
are more efficient than Python lists for numerical operations. Here are some basic and advanced
operations you can perform with NumPy arrays.
Creating NumPy Arrays
 numpy.array(): Create an array from a Python list or tuple
 numpy.zeros(): Create an array filled with zeros
 numpy.ones(): Create an array filled with ones
 numpy.random.rand(): Create an array with random values
NumPy Array Properties
 shape: The number of dimensions and size of each dimension
 dtype: The data type of the array elements
 size: The total number of elements in the array
Indexing and Slicing
 arr[index]: Access a single element
 arr[start:stop:step]: Access a slice of elements
 arr[start:stop]: Access a slice of elements with default step size 1
Basic Operations
 arr + arr: Element-wise addition
 arr - arr: Element-wise subtraction
 arr * arr: Element-wise multiplication
 arr / arr: Element-wise division
Advanced Operations
 numpy.dot(): Compute the dot product of two arrays
 numpy.cross(): Compute the cross product of two arrays

23 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242

OCS353 - DATA SCIENCE FUNDAMENTALS

 numpy.inner(): Compute the inner product of two arrays


 numpy.outer(): Compute the outer product of two arrays
Array Functions
 numpy.sum(): Compute the sum of all elements in an array
 numpy.mean(): Compute the mean of all elements in an array
 numpy.median(): Compute the median of all elements in an array
 numpy.std(): Compute the standard deviation of all elements in an array
 numpy.var(): Compute the variance of all elements in an array
Array Comparison
 numpy.equal(): Compare two arrays element-wise
 numpy.not_equal(): Compare two arrays element-wise
 numpy.greater(): Compare two arrays element-wise
 numpy.less(): Compare two arrays element-wise

Creating a NumPy array Basic operations


import numpy as np import numpy as np
arr = np.array([1, 2, 3, 4, 5]) arr1 = np.array([1, 2, 3])
print(arr) arr2 = np.array([4, 5, 6])
print(arr1 + arr2)
Output: print(arr1 * arr2)
[1 2 3 4 5]
Output:
[5 7 9]
[ 4 10 18]
Indexing and slicing Array functions
import numpy as np import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]) arr = np.array([1, 2, 3, 4, 5])
print(arr[3]) print(np.sum(arr))
print(arr[2:5]) print(np.mean(arr))

Output: Output:
4 15
[3 4 5] 3.0
Reshaping an array Array Comparison
import numpy as np import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6]) arr1 = np.array([1, 2, 3])
arr = arr.reshape(2, 3) arr2 = np.array([1, 2, 4])
print(arr) print(np.equal(arr1, arr2))

Output: Output:
[[1 2 3] [ True True False]
[4 5 6]]
Concatenating arrays Splitting an array
import numpy as np import numpy as np
arr1 = np.array([1, 2, 3]) arr = np.array([1, 2, 3, 4, 5, 6])
arr2 = np.array([4, 5, 6]) arr1, arr2 = np.split(arr, 2)
arr = np.concatenate((arr1, arr2)) print(arr1)
print(arr) print(arr2)

Output: Output:
[1 2 3 4 5 6] [1 2 3]
[4 5 6]

24 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242

OCS353 - DATA SCIENCE FUNDAMENTALS

Example Program: Overall Operations using Numpy Array

import numpy as np

# Creating arrays
a = np.array([1, 2, 3])
b = np.array([(1, 2, 3), (4, 5, 6)])
c = np.arange(0, 10, 2)
d = np.linspace(0, 1, 5)
e = np.zeros((2, 3))
f = np.ones((2, 3))
g = np.eye(3)
h = np.random.random((2, 3))

# Displaying arrays
print("Array a:\n", a)
print("Array b:\n", b)
print("Array c (arange):\n", c)
print("Array d (linspace):\n", d)
print("Array e (zeros):\n", e)
print("Array f (ones):\n", f)
print("Array g (identity matrix):\n", g)
print("Array h (random values):\n", h)

# Array properties
print("Shape of array b:", b.shape)
print("Size of array b:", b.size)
print("Data type of array a:", a.dtype)

# Array operations
i = np.array([1, 2, 3])
j = np.array([4, 5, 6])
print("i + j:\n", i + j)
print("i * j:\n", i * j)

# Matrix operations
k = np.array([[1, 2], [3, 4]])
l = np.array([[5, 6], [7, 8]])
print("Matrix product of k and l:\n", np.dot(k, l))

# Aggregate functions
m = np.array([1, 2, 3, 4, 5])
print("Sum of array m:", np.sum(m))
print("Mean of array m:", np.mean(m))
print("Standard deviation of array m:", np.std(m))

# Indexing and slicing


print("First element of array a:", a[0])
n = np.array([1, 2, 3, 4, 5])
print("Elements from index 1 to 3 of array n:", n[1:4])

25 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242

OCS353 - DATA SCIENCE FUNDAMENTALS

5. UNIVERSAL FUNCTIONS

Universal functions (ufuncs) in NumPy are functions that operate element-wise on arrays, supporting
broadcasting, type casting, and other standard features. They are essential for performing vectorized
operations, which are both more concise and more efficient than using Python loops.
Key Characteristics of Ufuncs
1. Element-wise Operations: Ufuncs apply operations element-wise, which means they operate
on each element of the input arrays independently.
2. Broadcasting: Ufuncs support broadcasting, which allows them to work with arrays of different
shapes in a flexible manner.
3. Performance: Ufuncs are implemented in C and are optimized for performance, making them
much faster than equivalent Python loops.
Common Ufuncs;-

UNIVERSAL FUNCTIONS FUNCTION EXAMPLE

import numpy as np
a = np.array([1, 2, 3])
np.add
b = np.array([4, 5, 6])
result = np.add(a, b)
print("Addition:", result)
result = np.subtract(a, b)
Arithmetic Operations np.subtract
print("Subtraction:", result)
result = np.multiply(a, b)
np.multiply
print("Multiplication:", result)
result = np.divide(a, b)
np.divide
print("Division:", result)
result = np.power(a, 2)
np.power
print("Power:", result)
result = np.sqrt(a)
Square Root: np.sqrt
print("Square Root:", result)
result = np.exp(a)
Exponential: np.exp
print("Exponential:", result)
Mathematical Functions result = np.log(a)
Logarithm: np.log
print("Logarithm:", result)
angle = np.array([0, np.pi/2, np.pi])
Trigonometric Functions: print("Sine:", np.sin(angle))
np.sin, np.cos, np.tan print("Cosine:", np.cos(angle))
print("Tangent:", np.tan(angle))
result = np.mean(a)
Mean: np.mean
print("Mean:", result)
Standard Deviation: result = np.std(a)
Statistical Functions
np.std print("Standard Deviation:", result)
result = np.sum(a)
Sum: np.sum
print("Sum:", result)
result = np.greater(a, b)
Greater Than: np.greater
print("Greater Than:", result)
result = np.less(a, b)
Less Than: np.less
Comparison Operators print("Less Than:", result)
result = np.equal(a, b)
Equal: np.equal print("Equal:", result)

26 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242

OCS353 - DATA SCIENCE FUNDAMENTALS

Example Code:
import numpy as np
# Arrays
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
# Arithmetic Operations
print("Addition:", np.add(a, b))
print("Subtraction:", np.subtract(a, b))
print("Multiplication:", np.multiply(a, b))
print("Division:", np.divide(a, b))
# Mathematical Functions
print("Square Root:", np.sqrt(a))
print("Exponential:", np.exp(a))
print("Logarithm:", np.log(a))
# Trigonometric Functions
angle = np.array([0, np.pi/2, np.pi])
print("Sine:", np.sin(angle))
print("Cosine:", np.cos(angle))
print("Tangent:", np.tan(angle))
# Statistical Functions
print("Mean:", np.mean(a))
print("Standard Deviation:", np.std(a))
print("Sum:", np.sum(a))
# Comparison Operators
print("Greater Than:", np.greater(a, b))
print("Less Than:", np.less(a, b))
print("Equal:", np.equal(a, b))
6. AGGREGATIONS

Aggregation in data science refers to the process of summarizing or combining multiple data points to produce a
single result or a smaller set of results. This is a fundamental concept used to simplify and analyze large datasets,
making it easier to draw insights and make decisions. Aggregation can be performed in various ways, depending
on the type of data and the analysis being conducted.

Definition: Aggregation is the process of combining multiple pieces of data to produce a summary
result.

27 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242

OCS353 - DATA SCIENCE FUNDAMENTALS

Purpose: The primary purpose of aggregation is to simplify and summarize data, making it easier to
analyze and interpret. This helps in identifying trends, patterns, and anomalies.

6. 1 AGGREGATION TECHNIQUES

Group By: Grouping data based on one or more columns and then applying an aggregation function. For
example, grouping sales data by region and then calculating the total sales per region.

Pivot Tables: Reshaping data by turning unique values from one column into multiple columns,
providing a summarized dataset.
Rolling Aggregation: Calculating aggregates over a rolling window, such as a moving average.

Common aggregation Techniques:


1. Sum:
 Adds up all the values in a dataset. Commonly used to calculate total sales, total expenses, etc.
 Calculate the total value of a column or group of data points.

Function: SUM()

total_sales = df['sales'].sum()

2. Mean (Average):
 Calculates the average value of a dataset.
 Calculate the mean value of a column or group of data points.

Function: AVG()

average_age = df['age'].mean()
3. Median:
Finds the middle value in a dataset, which is less affected by outliers than the mean.
median_income = df['income'].median()
4. Mode:
Identifies the most frequently occurring value in a dataset.
most_common_category = df['category'].mode()
most_common_category = df['category'].mode()
5. Count:
Counts the number of entries in a dataset, often used to determine the number of occurrences of a
specific value.
count_of_sales = df['sales'].count()
6. Min and Max:
 Finds the minimum and maximum values in a dataset.
 Find the maximum or minimum value in a column or group of data points.
Function: MAX()
MIN():

28 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242

OCS353 - DATA SCIENCE FUNDAMENTALS

min_salary = df['salary'].min()
max_salary = df['salary'].max()
7. Standard Deviation and Variance:
 Measures the spread or dispersion of the data around the mean.
 Calculate the spread or dispersion of a column or group of data points.
Function: STDDEV()
VAR()

std_dev = df['scores'].std()
variance = df['scores'].var()
8. Group By:
 Aggregates data based on one or more categories. This is often used in conjunction with other
aggregation functions.
 Group data by one or more columns and apply aggregations.
sales_by_region = df.groupby('region')['sales'].sum()
6.2 APPLICATIONS OF AGGREGATION
Descriptive Statistics:
Aggregation is used to describe the main features of a dataset quantitatively. For example, summarizing
the central tendency and dispersion of data.
Data Cleaning:
Aggregation can help in identifying and handling missing values, outliers, and inconsistencies in the
data.
Data Visualization:
Aggregated data is often used to create plots and charts, making it easier to visualize trends and
patterns.
Feature Engineering:
Aggregation can be used to create new features from existing data, improving the performance of
machine learning models.
Reporting:
Aggregated data is commonly used in business reports and dashboards to provide a high-level overview
of key metrics.
Example Code: Using Pandas
import pandas as pd
# Sample data
data = {
'region': ['North', 'South', 'East', 'West', 'North', 'South'],
'sales': [250, 150, 200, 300, 400, 100],
'expenses': [100, 50, 80, 120, 150, 60]

29 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242

OCS353 - DATA SCIENCE FUNDAMENTALS

}
df = pd.DataFrame(data)
# Sum of sales
total_sales = df['sales'].sum()
# Average expenses
average_expenses = df['expenses'].mean()
# Sales by region
sales_by_region = df.groupby('region')['sales'].sum()
print(f"Total Sales: {total_sales}")
print(f"Average Expenses: {average_expenses}")
print("Sales by Region:")
print(sales_by_region)
7. COMPUTATION ON ARRAYS
Computation on arrays is a fundamental aspect of data science, enabling efficient data manipulation,
analysis, and machine learning. Arrays, especially as implemented in libraries like NumPy, provide a
powerful way to handle large datasets and perform a wide range of mathematical operations. Here,
we'll explore the essential aspects of array computations in data science.
Key Concepts
1. Array Creation and Initialization
Creating and initializing arrays is the first step in performing any computation. Arrays can be created
from lists, using functions like np.array, or from scratch using functions like np.zeros, np.ones, and
np.full.
import numpy as np
# From a list
arr = np.array([1, 2, 3, 4])
# From scratch
zeros = np.zeros((3, 3))
ones = np.ones((2, 2))
full = np.full((2, 3), 7)

2. Array Operations
NumPy supports a variety of element-wise operations, such as addition, subtraction, multiplication, and
division, as well as more complex mathematical functions like exponentiation, logarithms, and
trigonometric functions.
# Element-wise operations
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

sum_arr = arr1 + arr2


diff_arr = arr1 - arr2

30 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242

OCS353 - DATA SCIENCE FUNDAMENTALS

prod_arr = arr1 * arr2


quot_arr = arr1 / arr2
3. Broadcasting
As previously discussed, broadcasting allows operations on arrays of different shapes, making it easier
to perform operations without explicitly reshaping arrays.
arr = np.array([[1, 2, 3], [4, 5, 6]])
scalar = 3
result = arr + scalar # Broadcasting scalar to the shape of arr

4. Indexing and Slicing


Efficiently accessing and manipulating array elements is crucial. NumPy provides powerful indexing
and slicing capabilities.
arr = np.array([[1, 2, 3], [4, 5, 6]])
# Indexing
element = arr[1, 2] # Accessing the element at row 1, column 2
# Slicing
slice_arr = arr[:, 1:3] # Slicing columns 1 to 2 for all rows

4. Aggregation
Aggregation functions like sum, mean, median, min, and max help summarize data.
arr = np.array([[1, 2, 3], [4, 5, 6]])
total_sum = np.sum(arr)
column_mean = np.mean(arr, axis=0)
row_max = np.max(arr, axis=1)

5. Linear Algebra
NumPy provides support for linear algebra operations, including dot products, matrix multiplication,
determinants, and inverses.
# Dot product
vec1 = np.array([1, 2])
vec2 = np.array([3, 4])
dot_product = np.dot(vec1, vec2)
# Matrix multiplication
mat1 = np.array([[1, 2], [3, 4]])
mat2 = np.array([[5, 6], [7, 8]])
mat_mult = np.matmul(mat1, mat2)
7.1 BROADCASTING
Broadcasting is a powerful mechanism in NumPy (a popular library for numerical computations in
Python) that allows for element-wise operations on arrays of different shapes. When performing
arithmetic operations, NumPy automatically stretches the smaller array along the dimension with size
1 to match the shape of the larger array. This allows for efficient computation without the need for
explicitly replicating the data.
Broadcasting Rules:
To understand how broadcasting works, it's important to know the rules that govern it:

31 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242

OCS353 - DATA SCIENCE FUNDAMENTALS

 If the arrays differ in their number of dimensions, the shape of the smaller array is padded with
ones on its left side.
 If the shape of the two arrays does not match in any dimension, the array with shape equal to 1
in that dimension is stretched to match the other shape.
 If in any dimension the sizes are different and neither is equal to 1, an error is raised
Broadcasting follows a set of rules to make arrays compatible for element-wise operations:
 Align Shapes: If the arrays have different numbers of dimensions, the shape of the smaller array
is padded with ones on its left side.
 Shape Compatibility: Arrays are compatible for broadcasting if, in all dimensions, the following
is true:The dimension sizes are equal, orOne of the dimensions is 1.
 Result Shape: The resulting shape is the maximum size along each dimension from the input
arrays.
Examples of Broadcasting
Example 1: Adding a Scalar to an Array
import numpy as np
arr = np.array([1, 2, 3])
scalar = 5
result = arr + scalar
print(result)

Output: [6 7 8]

Example 2: Adding Arrays of Different Shapes


arr1 = np.array([[1, 2, 3], [4, 5, 6]])
arr2 = np.array([1, 2, 3])
result = arr1 + arr2
print(result)

Output: [[2 4 6]
[5 7 9]]
Example 3: More Complex Broadcasting
arr1 = np.array([[1, 2, 3], [4, 5, 6]])
arr2 = np.array([[1], [2]])
result = arr1 + arr2
print(result)

Output: [[2 3 4]
[6 7 8]]
Practical Applications
Normalizing Data
Broadcasting is useful for normalizing data, subtracting the mean, and dividing by the standard
deviation for each feature.

32 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242

OCS353 - DATA SCIENCE FUNDAMENTALS

data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])


mean = np.mean(data, axis=0)
std = np.std(data, axis=0)
normalized_data = (data - mean) / std
print(normalized_data)

Output:
[[-1.22474487 -1.22474487 -1.22474487]
[0. 0. 0. ]
[ 1.22474487 1.22474487 1.22474487]]

Element-wise Operations
Broadcasting simplifies scaling each column of a matrix by a different factor.
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
scaling_factors = np.array([0.1, 0.2, 0.3])
scaled_matrix = matrix * scaling_factors
print(scaled_matrix)

Output:
[[0.1 0.4 0.9]
[0.4 1. 1.8]
[0.7 1.6 2.7]]
8. FANCY INDEXING
Fancy indexing, also known as advanced indexing, is a technique in data science and programming
(particularly in Python with NumPy and pandas) that allows for more flexible and powerful ways to
access and manipulate data arrays or dataframes. It involves using arrays or sequences of indices to
select specific elements or slices from an array or dataframe.
Fancy indexing refers to using arrays of indices to access multiple elements of an array simultaneously.
Instead of accessing elements one by one, you can pass a list or array of indices to obtain a subset of
elements. This technique can be used for both reading from and writing to arrays.
NumPy Fancy Indexing
 NumPy is a fundamental package for scientific computing with Python, providing support for
arrays and matrices.
In NumPy, fancy indexing is done by passing arrays of indices inside square brackets. Here’s an
example:
import numpy as np
# Create a NumPy array
arr = np.array([10, 20, 30, 40, 50])
# Fancy indexing with a list of indices
indices = [0, 2, 4]
subset = arr[indices]
print(subset)

Output: [10 30 50]

33 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242

OCS353 - DATA SCIENCE FUNDAMENTALS

Boolean Indexing
Another form of fancy indexing is boolean indexing, where you use boolean arrays to select elements:
mask = arr > 30
subset = arr[mask]
print(subset)

Output: [40 50]


Fancy Indexing in 2D Arrays
Fancy indexing can also be applied to multi-dimensional arrays
# Create a 2D array
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Fancy indexing with row and column indices
row_indices = [0, 1, 2]
col_indices = [2, 1, 0]
subset = arr2d[row_indices, col_indices]
print(subset)
Output: [3 5 7]
Fancy Indexing in pandas
Using loc and iloc:
loc is used for label-based indexing, while iloc is used for integer-based indexing.
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [10, 20, 30, 40, 50],
'B': [5, 10, 15, 20, 25]
})
# Fancy indexing with .iloc (integer-location based indexing)
subset = df.iloc[[0, 2, 4]]
print(subset)
Combined Indexing Techniques
Fancy indexing can be combined with other indexing techniques to achieve complex selections
# Combined indexing
subset = df.iloc[[0, 2, 4], [0, 1]]
print(subset)

Applications in Data Science


Fancy indexing is particularly useful in various data science tasks, including:
 Data Cleaning: Selecting and modifying subsets of data based on certain conditions.
 Data Analysis: Efficiently extracting and analyzing specific parts of a dataset.
 Machine Learning: Preprocessing data by selecting specific features or samples.
 Visualization: Selecting specific data points to visualize.
 Data Selection: Extract specific elements, rows, or columns from large datasets.
 Data Filtering: Filter data based on conditions or criteria.
 Data Transformation: Apply operations to specific subsets of data.
 Efficient Computations: Perform efficient computations on selected data without looping.
34 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242

OCS353 - DATA SCIENCE FUNDAMENTALS

9. SORTING ARRAYS
Sorting means putting elements in an ordered sequence.
Ordered sequence is any sequence that has an order corresponding to elements, like numeric or
alphabetical, ascending or decending.
The NumPy ndarray object has a function called sort(). That will sort a specified array.
Sorting in NumPy
1. Simple Sorting
 numpy.sort() returns a sorted copy of the array.
 numpy.ndarray.sort() sorts the array in-place.
import numpy as np
arr = np.array([3, 1, 2, 5, 4])
sorted_arr = np.sort(arr)
print(sorted_arr) # Output: [1 2 3 4 5]
arr.sort()
print(arr)
Output: [1 2 3 4 5]
2. Sorting Multi-dimensional Arrays
 our can sort along a specified axis using the axis parameter.
arr_2d = np.array([[3, 1, 2], [5, 4, 6]])
sorted_arr_2d = np.sort(arr_2d, axis=0) # Sort along the rows
print(sorted_arr_2d)

Output: [[3 1 2]
[5 4 6]]
sorted_arr_2d = np.sort(arr_2d, axis=1) # Sort along the columns
print(sorted_arr_2d)

Output: [[1 2 3]
[4 5 6]]
3. Argsort for Indirect Sorting
 numpy.argsort() returns the indices that would sort an array.
arr = np.array([3, 1, 2, 5, 4])
indices = np.argsort(arr)
print(indices)

Output: [1 2 0 4 3]
sorted_arr = arr[indices]
print(sorted_arr)

Output: [1 2 3 4 5]

35 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242

OCS353 - DATA SCIENCE FUNDAMENTALS

Sorting by Multiple Keys


 You can sort a structured array by multiple fields.
data = np.array([(1, 'first', 200),
(2, 'second', 100),
(3, 'third', 150)],
dtype=[('id', 'i4'), ('name', 'U10'), ('score', 'i4')])
sorted_data = np.sort(data, order=['score', 'id'])
print(sorted_data)

Output: [(2, 'second', 100) (3, 'third', 150) (1, 'first', 200)]
Custom Sorting
 You can use numpy.lexsort() for custom sorting.
names = np.array(['Betty', 'John', 'Alice', 'Alice'])
ages = np.array([25, 34, 30, 22])
indices = np.lexsort((ages, names))
sorted_data = list(zip(names[indices], ages[indices]))
print(sorted_data)

Output: [('Alice', 22), ('Alice', 30), ('Betty', 25), ('John', 34)]

10. STRUCTURED DATA


NumPy’s Stuctured Arrays:
NumPy's structured arrays (also known as record arrays) are a powerful feature for handling
heterogeneous data, where each element can have multiple fields of different data types. Structured
arrays allow you to define complex data structures and perform efficient operations on them.
Creating Structured Arrays
1. Defining Data Types
 You can define a structured array by specifying a list of tuples, where each tuple represents a
field's name and its data type.
import numpy as np
dtype = [('name', 'U10'), ('age', 'i4'), ('weight', 'f4')]
data = np.array([('Alice', 25, 55.5), ('Bob', 30, 75.2)], dtype=dtype)
print(data)

Output: [('Alice', 25, 55.5) ('Bob', 30, 75.2)]


2. Accessing Fields
 You can access individual fields of the structured array using the field names.
names = data['name']
ages = data['age']
weights = data['weight']
print(names)

Output: ['Alice' 'Bob']


print(ages)

36 | P a g e
Downloaded by Kalai ilaiya ([email protected])

You might also like