0% found this document useful (0 votes)
12 views63 pages

HCA2 (1)

Uploaded by

953622243023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views63 pages

HCA2 (1)

Uploaded by

953622243023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 63

Unit-2

ANALYTICS ON MACHINE
LEARNING
SYNOPSIS
• Machine Learning Pipeline – Pre-processing –Visualization –
Feature Selection – Training model parameter – Evaluation model
: Sensitivity , Specificity , PPV ,NPV, FPR ,Accuracy , ROC ,
Precision Recall Curves , Valued target variables –Python:
Variables and types, Data Structures and containers , Pandas Data
Frame :Operations – Scikit –Learn : Pre-processing , Feature
Selection
Machine Learning Pipeline
• machine learning pipeline is a sequence of data processing steps,
where each step in the pipeline feeds into the next one, ultimately
leading to the creation of a machine learning model.
• A well organized pipeline helps automate and streamline the end-to-
end process of building, training, and deploying machine learning
models.
• the key components of a typical machine learning pipeline:
key components
1.Data Collection: Gather relevant data from various sources. This may
involve accessing databases, APIs, or other data repositories.
2. Data Cleaning and Preprocessing: Handle missing data, outliers, and
inconsistencies.
• Transform and normalize data to make it suitable for machine learning
algorithms.
• Perform feature engineering to create new features or modify existing
ones.
3. Exploratory Data Analysis (EDA): Understand the characteristics of
the data through statistical analysis and visualization.
• Identify patterns, trends, and relationships that may inform model
selection and feature engineering.
CONT..
4.Feature Selection: Choose the most relevant features to be used in
the model.
• This step is essential for improving model efficiency and reducing
overfitting.
5. Model Selection: Choose the appropriate machine learning
algorithm based on the nature of the problem (e.g., classification,
regression, clustering).
• Split the data into training and testing sets for model evaluation.
6. Model Training: Train the selected model on the training data.
Adjust hyper parameters to optimize model performance.
CONT…
7. Model Evaluation: Assess the model's performance on the testing data
using relevant evaluation metrics.
• Use techniques like cross-validation to ensure robustness.
8. Model Tuning: Fine-tune the model based on performance metrics. Adjust
hyper parameters or try different algorithms to improve results.
9. Model Deployment: Integrate the trained model into the production
environment where it can make predictions on new, unseen data.
• Set up monitoring for the deployed model's performance.
10. Feedback Loop: Collect feedback on model performance from real-
world usage.
• Iterate on the model or pipeline based on user feedback or changes in the
data distribution
PREPROCESSING

• Preprocessing is a crucial step in the machine learning pipeline where


raw data is transformed, cleaned, and organized to make it suitable
for training machine learning models.
• The goal of preprocessing is to enhance the quality of the data and
improve the performance and interpretability of the models. These
are common preprocessing steps
1.Data Cleaning:

• Handling Missing Data: Healthcare datasets may have missing


values due to incomplete records or data entry errors. Techniques
include imputation using mean, median, or predictive models, or
excluding incomplete records if they are minimal.
• Removing Duplicates: Ensuring that patient records or other data
entries are unique to avoid redundancy.
• Correcting Errors: Identifying and rectifying inaccuracies in
medical data, such as incorrect diagnosis codes or mismatched patient
identifiers.
2.Data Transformation:

• Normalization/Standardization: Scaling continuous variables like


lab results or vital signs to ensure consistency and comparability.
• This can be important for algorithms sensitive to feature scales.
• Encoding Categorical Variables: Converting categorical data such as
diagnosis codes or treatment types into numerical formats.
• Methods include one-hot encoding for categorical variables like
medication types or label encoding for hierarchical categories.
3.Feature Engineering:

Creating New Features: Generate additional features that may


capture more information about the problem.
• This can include interaction terms, polynomial features, or domain
specific features.
Dimensionality Reduction: Reduce the number of features while
preserving important information.
• Techniques include Principal Component Analysis (PCA) or feature
selection methods.
4.Data Integration:

• Combining Data Sources: Merging data from various sources such


as electronic health records (EHRs), lab results, imaging systems, and
patient surveys to create a comprehensive dataset.
• Harmonizing Data Formats: Standardizing data formats and
terminologies across different sources to ensure consistency.
• For instance, converting different coding systems like ICD-9 to ICD-
10.
Cont.…
5.Handling Imbalanced Data:
• Resampling Techniques(observed data): Addressing class
imbalances in medical conditions, such as rare diseases, by
oversampling underrepresented classes or under sampling
overrepresented classes.
6. Data Reduction:
• Dimensionality Reduction: Applying techniques such as Principal
Component Analysis (PCA) to reduce the number of features while
retaining important information, which can be useful in handling
high-dimensional data.
Cont.…

7.Outlier Detection and Handling:


• Identifying Outliers: Detecting anomalies in patient data, such as
unusually high or low lab results, which may indicate errors or rare
conditions.
• Handling Outliers: Deciding whether to adjust, exclude, or retain
outliers based on their impact on the analysis.
8. Data Augmentation:
• Enhancing Data Quality: For certain types of data, such as medical
images, augmentation techniques like rotation or flipping can help
improve the model's robustness and generalization.
Cont.…

9.Data Privacy and Security:


• Ensuring Compliance: Implementing measures to anonymize or de-
identify sensitive patient information to comply with regulations like
HIPAA (Health Insurance Portability and Accountability Act) and
GDPR (General Data Protection Regulation).
10. Data Validation:
• Ensuring Data Accuracy: Performing checks to validate data
accuracy and consistency before analysis. This can involve cross-
referencing data with clinical guidelines or external sources.
Challenges in Healthcare Data
Preprocessing:
• Data Quality Issues: Healthcare data can be noisy, incomplete, or
inaccurate, which requires careful cleaning and validation.
• Complex Data Types: Handling diverse data types such as structured
EHRs, unstructured clinical notes, and images requires different
preprocessing techniques.
• Regulatory Compliance: Ensuring that preprocessing activities
adhere to data protection laws and ethical guidelines.
VISUALIZATION
• Visualization is a powerful tool in the field of machine learning, aiding
in the exploration, analysis, and communication of patterns and
insights within data.
• some key aspects of visualization in the context of machine learning:
Cont.…

1.Exploratory Data Analysis (EDA): Understand the structure,


distribution, and relationships within the data before applying machine
learning algorithms.
Helps identify data patterns, anomalies, and feature relationships that
inform feature engineering and model selection
• Univariate Plots: Histograms, box plots, and kernel density plots help
understand the distribution of individual features.
• Histograms: Display the distribution of a single feature.
• Box Plots: Show the distribution, median, and outliers of a feature.
• Pair Plots: Visualize pairwise relationships between features.
• Correlation Matrices: Show the correlation coefficients between pairs
of features.
Cont..
• Bivariate Plots: Scatter plots, pair plots, and heatmaps reveal
relationships between pairs of features.
1. Scatter Plots: Scatter plots are used to visualize the relationship
between two numerical variables.
They help identify correlations, trends, and patterns in data.
2.Heatmaps
• Purpose: Heatmaps display data in a matrix format, where
individual values are represented by colors.
• They are particularly useful for visualizing correlations or other
metrics between pairs of variables.
Cont.…
2. Feature Distribution: Visualize the distribution of individual
features to understand their characteristics and identify outliers or
anomalies.
3. Correlation Analysis: Heat maps or correlation matrices help
visualize the correlation between different features, assisting in
feature selection and understanding relationships.
4. Data Summary: Use summary statistics and visualizations to
provide an overall picture of the dataset, including mean, median,
and standard deviation.
Cont.…
5.Model Performance: Visualize model performance metrics such as
accuracy, precision, recall, and F1 score using bar charts, line graphs,
or confusion matrices.
6. Learning Curves: Plot learning curves to visualize how the
performance of a machine learning model changes over time as it is
trained on more data.
7. ROC [Receiver Operating Characteristic] Curves and Precision-
Recall Curves: These curves visualize the trade-off between true
positive rate and false positive rate, or precision and recall, providing
insights into the model's performance across different thresholds.
Cont.…
8.Feature Importance: Bar charts or horizontal bar plots can
be used to display the importance of different features in a
model, helping with feature selection.
9. Decision Boundaries: Visualize decision boundaries for
classification models to understand how the model separates
different classes in the feature space.
10. Error Analysis: Visualize misclassified instances or
prediction errors to understand where the model is struggling
and identify potential areas for improvement.
FEATURE SELECTION
• Feature selection is the process of choosing a subset of relevant
features from a larger set of features to build more efficient and
accurate machine learning models.
• By selecting the most informative features, you can improve model
performance, reduce overfitting, and enhance interpretability. These
are some common techniques for feature selection:
1.Filter Methods:

• Correlation-based Methods: Remove features that are highly


correlated with each other since they may provide redundant
information.
• Pearson correlation coefficient or other correlation measures can be
used.
• Variance Thresholding: Eliminate features with low variance.
Features with little variation are less informative.
• Statistical Tests: Use statistical tests (e.g., t-tests, chi-square tests) to
assess the relevance of each feature to the target variable
2.Wrapper Methods:
• Recursive Feature Elimination (RFE): Iteratively remove
the least important features and train the model until the
desired number of features is reached.
• Forward Selection: Start with an empty set of features and
add the most relevant feature in each iteration until a
stopping criterion is met.
• Backward Elimination: Start with all features and eliminate
the least important feature in each iteration until a stopping
criterion is met.
3.Embedded Methods:
• LASSO (Least Absolute Shrinkage and Selection Operator):
Introduce a penalty term based on the absolute magnitude of
coefficients during model training. This encourages sparsity in the
feature space, effectively performing feature selection.
• Tree-based Methods: Decision trees and ensemble methods like
Random Forests can provide feature importance. Features with
higher importance are more relevant.
• Regularization Techniques: Include regularization terms (e.g., L1
regularization) in the model training process to penalize the
magnitude of coefficients, leading to feature selection.
Cont..
4.Dimensionality Reduction:
• Principal Component Analysis (PCA): Transform the original
features into a new set of uncorrelated features (principal
components) that retain most of the variance in the data.
• Linear Discriminant Analysis (LDA): Similar to PCA, but LDA also
considers class labels and aims to maximize the separability between
classes.
5. Information Gain/Mutual Information: Calculate the information
gain or mutual information between each feature and the target variable.
Features with higher information gain are considered more
informative.
Cont..

6.Recursive Feature Addition (RFA): Similar to RFE but in


the opposite direction.
• Start with an empty set and add features in each iteration
based on their relevance.
7. SelectKBest and SelectPercentile:
From scikit-learn library, these functions allow you to select
the top k features or the top percentage of features based on
statistical tests.
TRAINING MODEL PARAMETER
• Training a machine learning model involves setting its parameters to
specific values so that it can learn patterns from the training data.
• Parameters are the internal variables that the model adjusts during
the training process.
• The values of these parameters determine the performance and
behavior of the model.
• Here are some key concepts related to training model parameters:
1. Hyper parameters:
• Hyperparameters are external configuration settings for the model.
They are set before the training process begins and are not learned
from the data.
• Examples of hyperparameters include the learning rate,
regularization strength, the number of hidden layers in a neural
network, and the number of trees in a random forest.
• Tuning hyperparameters is a critical step in optimizing the
performance of a machine learning model
Cont.…
2.Learning Rate: The learning rate is a hyperparameter that controls the
step size during the optimization process.
It determines how much the model's parameters are updated in each iteration.
• Too high of a learning rate can cause the model to overshoot the optimal
values, while too low of a learning rate can lead to slow convergence.
3. Regularization:
• Regularization is a technique used to prevent overfitting by adding a
penalty term to the loss function based on the complexity of the model.
• Common regularization methods include L1 regularization (Lasso) and L2
regularization (Ridge).
• The strength of regularization is controlled by a hyperparameter.
Cont.…
4. Number of Hidden Layers and Neurons (for Neural Networks):
In neural networks, the architecture is defined by the number of hidden
layers and the number of neurons (nodes) in each layer.
The choice of architecture depends on the complexity of the problem
and the amount of available data.
These are hyperparameters that need to be tuned.
5. Batch Size: Batch size is the number of training examples used in
one iteration of gradient descent.
It is a hyperparameter that affects the convergence and computational
efficiency of the training process.
Cont.…
6.Number of Trees (for Tree-based Models): In ensemble models like
random forests or gradient boosting, the number of trees is a
hyperparameter.
• Increasing the number of trees can improve model performance, but it
also increases computational complexity.
7. Activation Functions (for Neural Networks): Activation functions
control the output of each neuron in a neural network.
• Common activation functions include ReLU, Sigmoid, and Tanh.
• Choosing the appropriate activation function is a hyperparameter
decision.
Cont.…
8.Loss Function: The loss function measures the difference between the
model's predictions and the actual target values.
Different models and tasks may require different loss functions (e.g., mean
squared error for regression, cross-entropy for classification).
9. Optimizer: The optimizer is the algorithm used to update the model's
parameters during training.
Examples include Stochastic Gradient Descent (SGD), Adam, and RMSprop.
The choice of optimizer is a hyperparameter.
10. Epochs: An epoch is one complete pass through the entire training
dataset.
The number of epochs is a hyperparameter that determines how many times the
model will see the entire dataset during training.
EVALUATION

• When evaluating the performance of a classification model, several metrics


are used to assess its effectiveness in predicting the correct class labels.
Here are some commonly used metrics
1.Sensitivity (True Positive Rate or Recall): Sensitivity measures the
proportion of actual positive instances that are correctly identified by the
model. Sensitivity = True Positives / (True Positives + False Negatives)
2. Specificity (True Negative Rate): Specificity measures the proportion of
actual negative instances that are correctly identified by the model.
Specificity = True Negatives / (True Negatives + False Positives)
Cont.…
3.Precision (Positive Predictive Value): Precision measures the
proportion of predicted positive instances that are actually positive.
Precision = True Positives / (True Positives + False Positives)
4. Negative Predictive Value (NPV): NPV measures the proportion of
predicted negative instances that are actually negative.
NPV = True Negatives / (True Negatives + False Negatives)
5. False Positive Rate (FPR): FPR measures the proportion of actual
negative instances that are incorrectly classified as positive by the
model. FPR = False Positives / (False Positives + True Negatives)
Cont.…

6.Accuracy: Accuracy measures the overall correctness of the model,


considering both true positive and true negative instances.
Accuracy = (True Positives + True Negatives) / Total Instances
7. Receiver Operating Characteristic (ROC) Curve: The ROC curve
is a graphical representation of the trade-off between sensitivity and
specificity at various thresholds.
It is created by plotting the true positive rate against the false positive
rate at different threshold values.
Cont.…
8.Area Under the ROC Curve (AUC-ROC): AUC-ROC quantifies
the overall performance of a classification model.
A higher AUC indicates better discrimination between positive and
negative instances.
9. Precision-Recall Curve: Similar to the ROC curve, the precision-
recall curve is a graphical representation of the trade-off between
precision and recall at different thresholds
Cont.…
10. F1 Score: The F1 score is the harmonic mean of precision and
recall, providing a balance between the two metrics.
F1 Score = 2 * (Precision * Recall) / (Precision + Recall)
11. Matthews Correlation Coefficient (MCC): MCC takes into
account true positives, true negatives, false positives, and false
negatives to provide a balanced measure of classification
performance.
Cont.…
12. Balanced Accuracy:Balanced accuracy considers the imbalance in
the distribution of classes and calculates an accuracy score that
accounts for this imbalance.
13. Cohen's Kappa: Cohen's Kappa measures the agreement
between the predicted and actual labels, adjusted for the possibility of
random agreement.
14. Confusion Matrix: A confusion matrix provides a tabular
summary of the number of true positives, true negatives, false
positives, and false negatives.
• Variables are named storage locations for data.
• Data Types include integers, floats, strings, and booleans, each
representing different kinds of data.
• Lists are mutable, ordered collections.
• Tuples are immutable, ordered collections.
• Dictionaries store key-value pairs and are mutable.
• Sets are unordered collections of unique items.
• Strings are immutable sequences of characters used for text.
VARIABLE AND TYPES
1.Variables: In Python, variables are used to store data. They are
created by assigning a value to a name using the ‘= ‘operator.
Python is dynamically typed, meaning you don't need to declare the
type of a variable explicitly.
The type is inferred from the value assigned to the variable.
x=5 'x' is a variable storing an integer
name = "Alice" 'name' is a variable storing a string
is_active = True
2. Data Types

• Integers (int):Whole numbers without a fractional part.


• Explanation: Integers can be positive, negative, or zero.
• Example: age = 25 # An integer value
• Floating-Point Numbers (float):Numbers that contain a decimal
point.
• Explanation: Floats represent real numbers and can be positive or
negative.
• Example: height = 5.9 # A floating-point value
Cont..
• Strings (str):Sequences of characters enclosed in quotes.
• Explanation: Strings are used to represent text and are immutable
(cannot be changed after creation).
• Example: greeting = "Hello, world!" # A string value
• Booleans (bool):A type with two possible values: True or False.
• Explanation: Booleans are often used in conditional statements and
loops.
• Example: python Copy code is_active = True # A Boolean value
DATA STRUCTURES AND CONTAINERS
1. Lists: Lists are ordered collections of items that are mutable (can be
changed). They can contain items of different types, including other
lists.
• Characteristics:
• Ordered: Items maintain the order in which they were added.
• Indexed: Items can be accessed using their index (position).
• Mutable: You can change, add, or remove items after creation.
2. Tuples
Definition: Tuples are ordered collections of items that are immutable
(cannot be changed once created). They can contain items of different
types.
Characteristics:
• Ordered: Items maintain the order in which they were added.
• Indexed: Items can be accessed using their index.
• Immutable: Once created, the items in a tuple cannot be modified
3. Dictionaries
• Definition: Dictionaries are unordered collections of key-value pairs.
Each key must be unique, and values are accessed via their keys.
• Characteristics:
• Unordered: The order of items is not guaranteed (Python 3.7+
maintains insertion order).
• Key-Value Pairs: Data is stored in pairs, where each key maps to a
specific value.
• Mutable: You can add, change, or remove key-value pairs.
4. Sets
Definition: Sets are unordered collections of unique items. They are
useful for membership tests, removing duplicates, and set operations
(like union, intersection).
• Characteristics:
• Unordered: The order of items is not guaranteed.
• Unique Elements: Sets do not allow duplicate values.
• Mutable: You can add or remove items, but the items themselves must
be immutable.
5. Strings : Strings are immutable sequences of characters. They are
used for storing and manipulating text data.
Characteristics:
• Immutable: Once created, the contents of a string cannot be changed.
• Indexed: Characters in a string can be accessed using their index.
• Support Various Methods: Strings have a variety of methods for
manipulation (e.g., .upper(), .replace()).
PANDAS DATA FRAME: OPERATIONS
• Creating a DataFrame: From various sources (e.g., dictionary, CSV).
• Viewing Data: Using .head(), .tail(), .info(), and .describe().
• Selecting Data: Accessing rows and columns with .loc[], .iloc[], and
conditions.
• Modifying Data: Adding, updating, or deleting columns and values.
• Aggregating Data: Calculating statistics, grouping, and pivot tables.
• Sorting and Ordering: Sorting by values or index.
• Handling Missing Data: Detecting, filling, or dropping missing values.
• Merging and Joining: Combining DataFrames based on common
columns or indices.
Cont…

Pandas is a powerful data manipulation library in Python. DataFrames are


two-dimensional labeled data structures.
import pandas as pd
# Creating a DataFrame
data = {'Name': ['John', 'Alice', 'Bob'],
'Age': [28, 25, 32],
'City': ['New York', 'San Francisco', 'Los Angeles']}
df = pd.DataFrame(data)
# Displaying DataFrame
• print(df)
Operations on Pandas DataFrames:
# Accessing columns
print(df['Name'])
print(df.Age)
# Descriptive statistics
print(df.describe())
# Filtering data
filtered_df = df[df['Age'] > 25]
# Adding a new column
df['Salary'] = [50000, 60000, 75000]
# Grouping data
grouped_df = df.groupby('City').mean()
# Merging Data Frames
other_data = {'City': ['New York', 'San Francisco', 'Los Angeles'],
'Population': [8500000, 870887, 3980400]}
other_df = pd.DataFrame(other_data)
merged_df = pd.merge(df, other_df, on='City')
SCIKIT-LEARN:PREPROCESSING,FEATURE
SELECTION
• Scikit-learn is a popular machine learning library in Python that
provides tools for data preprocessing, feature selection, and various
machine learning algorithms.
Preprocessing:
• Preprocessing involves preparing your data to improve the
performance of your machine learning model.
• Common preprocessing tasks include scaling features, encoding
categorical variables, and handling missing values.
1.Scaling Features:

• Standardization: This process involves scaling features so they have


a mean of 0 and a standard deviation of 1.
• It’s useful when features have different units or scales.

from sklearn.preprocessing import StandardScaler

• scaler = StandardScaler()
• X_scaled = scaler.fit_transform(X)
• Normalization: This scales features to lie between a specific range,
usually [0, 1]
• from sklearn.preprocessing import MinMaxScaler

• scaler = MinMaxScaler()
• X_normalized = scaler.fit_transform(X)
2.Encoding Categorical Variables:
• One-Hot Encoding: Converts • Label Encoding: Converts
categorical variables into a format categorical labels into numeric
that can be provided to ML
values.
algorithms to do a better job in
prediction. • from sklearn.preprocessing
• from sklearn.preprocessing import import LabelEncoder
OneHotEncoder • encoder = LabelEncoder()
• encoder = • y_encoded =
OneHotEncoder(sparse=False) encoder.fit_transform(y_categori
• X_encoded = cal)
encoder.fit_transform(X_categorical)
3.Handling Missing Values:
• Imputation: Filling missing values using the mean, median, or mode
• from sklearn.impute import SimpleImputer
• imputer = SimpleImputer(strategy='mean')
• X_imputed = imputer.fit_transform(X)
Cont.…
4.Feature Engineering:sklearn.preprocessing.PolynomialFeatures: Generates
polynomial and interaction features from the original features, which can be
useful for models that benefit from non-linear relationships.
5.Dimensionality Reduction:sklearn.decomposition.PCA: Reduces the
dimensionality of data by projecting it onto a lower-dimensional space while
preserving as much variance as possible.sklearn.decomposition.NMF: Factorizes
the data into non-negative matrices, useful for dimensionality reduction and
feature extraction.
6.Pipeline Integration:sklearn.pipeline.Pipeline: Combines multiple
preprocessing steps and modeling into a single object, allowing for streamlined
and reproducible workflows. For example, you can chain feature selection,
preprocessing, and model training in one pipeline.
Feature Selection

• Feature selection involves choosing the most relevant features for your
model.
• It can help in reducing overfitting, improving model performance, and
speeding up the training process.
1.Filter Methods:
• Univariate Selection: Selects features based on univariate statistical tests
from sklearn.feature_selection import SelectKBest, chi2

selector = SelectKBest(score_func=chi2, k=10)


X_new = selector.fit_transform(X, y)
• Variance Threshold: Removes features with low variance
• from sklearn.feature_selection import VarianceThreshold

• selector = VarianceThreshold(threshold=0.01)
• X_reduced = selector.fit_transform(X)
Wrapper Methods:

• Recursive Feature Elimination (RFE): Recursively removes the


least important features
• from sklearn.feature_selection import RFE
• from sklearn.linear_model import LogisticRegression

• model = LogisticRegression()
• rfe = RFE(model, n_features_to_select=5)
• X_rfe = rfe.fit_transform(X, y)
Embedded Methods:

• Feature Importance from Trees: Tree-based methods like Random


Forest can be used to determine feature importance.
• from sklearn.ensemble import RandomForestClassifier

• model = RandomForestClassifier()
• model.fit(X, y)
• importances = model.feature_importances_
Cont…
• L1 Regularization (Lasso): Can be used for feature selection by
penalizing the absolute size of the coefficients.
• from sklearn.linear_model import Lasso

• model = Lasso(alpha=0.1)
• model.fit(X, y)
• selected_features = model.coef_ != 0
Nptel links
• https://www.google.com/search?q=sklearn++nptel+%5Cvideos&sca_esv=d508bae1ee411921&sca_upv=1&rlz=1C1CHBD_en-
GBIN1125IN1125&ei=ZW3RZrLbJPG7seMP9PSZyAc&ved=0ahUKEwjyoY7ckZyIAxXxXWwGHXR6BnkQ4dUDCBA&uact=5&oq=sklearn++nptel+
%5Cvideos&gs_lp=Egxnd3Mtd2l6LXNlcnAiFnNrbGVhcm4gIG5wdGVsIFx2aWRlb3MyCBAhGKABGMMESNUwUNkSWOQucAJ4AZABAJgBnwGgAcIHqgEDMS43uAEDyAEA-
AEBmAIKoALXB8ICChAAGLADGNYEGEfCAggQABiABBiiBMICChAhGKABGMMEGArCAgQQIRgKmAMAiAYBkAYIkgcDMy43oAfMGw&sclient=gws-wiz-
serp#fpstate=ive&vld=cid:359e7415,vid:4Lo10fugSOE,st:0 =================sklearn

• https://www.google.com/search?q=PANDAS+DATA+FRAME%3A+OPERATIONS+nptel+%5Cvideos&sca_esv=d508bae1ee411921&sca_upv=1&rlz=1C1CHBD_en-
GBIN1125IN1125&ei=iG3RZqTQEYGRseMPmOyBuQk&ved=0ahUKEwiktNPskZyIAxWBSGwGHRh2IJcQ4dUDCBA&uact=5&oq=PANDAS+DATA+FRAME%3A+OPERATIONS+nptel+
%5Cvideos&gs_lp=Egxnd3Mtd2l6LXNlcnAiK1BBTkRBUyBEQVRBIEZSQU1FOiBPUEVSQVRJT05TIG5wdGVsIFx2aWRlb3MyChAhGKABGMMEGApInSBQsAxYsAxwAXgBkAEAmAG
cAaABnAGqAQMwLjG4AQPIAQD4AQL4AQGYAgKgAqIBwgIKEAAYsAMY1gQYR5gDAIgGAZAGCJIHAzEuMaAHpAM&sclient=gws-wiz-
serp#fpstate=ive&vld=cid:68cf4756,vid:6DTFIKF8QIg,st:0==============pandas

You might also like