0% found this document useful (0 votes)
19 views11 pages

Skit Learn Cheatsheet

Scikit-Learn is a Python library designed for data mining, data analysis, and machine learning, offering tools for data preprocessing, model training, evaluation, cross-validation, and hyperparameter tuning. Key functions include StandardScaler for feature standardization, LinearRegression for fitting linear models, and GridSearchCV for hyperparameter optimization. The library also supports creating pipelines to streamline multiple processing steps in machine learning workflows.

Uploaded by

saurabh kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views11 pages

Skit Learn Cheatsheet

Scikit-Learn is a Python library designed for data mining, data analysis, and machine learning, offering tools for data preprocessing, model training, evaluation, cross-validation, and hyperparameter tuning. Key functions include StandardScaler for feature standardization, LinearRegression for fitting linear models, and GridSearchCV for hyperparameter optimization. The library also supports creating pipelines to streamline multiple processing steps in machine learning workflows.

Uploaded by

saurabh kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Scikit-Learn

Cheat Sheet

Simple tools for data mining, data


analysis, and machine learning

by
Numan
Scikit-Learn is a Python
library that provides
simple and efficient
tools for data mining,
data analysis, and
machine learning.
Data Preprocessing
sklearn.preprocessing.StandardScaler()
Standardizes features by removing the mean and
scaling to unit variance.

sklearn.preprocessing.MinMaxScaler()
Scales features to a given range, typically [0, 1].

sklearn.preprocessing.OneHotEncoder()
Converts categorical values into one-hot encoded
binary vectors.

sklearn.preprocessing.LabelEncoder()
Encodes labels with values between zero and the
number of classes minus one.

sklearn.impute.SimpleImputer()
Handles missing values by replacing them with specified
values (e.g., mean, median).
Test-Train Split

Splits arrays or matrices into random train and


test subsets.
sklearn.model_selection.train_test_split(
data,
test_size=0.2,
shuffle=True,
random_state=42,
)

Don’t forget to specify the random state, so that


the results are reproducible!
Model Training
sklearn.linear_model.LinearRegression()
Fits a linear model with coefficients to minimize the
residual sum of squares.

sklearn.linear_model.LogisticRegression()
Applies logistic regression for binary or multiclass
classification tasks.

sklearn.tree.DecisionTreeClassifier()
A decision tree classifier that uses a tree structure to
make predictions.

sklearn.ensemble.RandomForestClassifier()
A meta-estimator that fits a number of decision trees
on various sub-samples of the dataset.
Model Evaluation
sklearn.metrics.accuracy_score()
Calculates the accuracy classification score
(proportion of correct predictions).

sklearn.metrics.precision_score()
Measures precision; useful for binary classification to
assess the positive class.

sklearn.metrics.recall_score()
Measures recall, which is the ability of the classifier to
find all positive samples.

sklearn.metrics.f1_score()
Computes the F1 score, which balances precision and
recall.
Model Evaluation
sklearn.metrics.mean_absolute_error()
Computes the mean absolute error for regression
tasks.

sklearn.metrics.mean_squared_error()
Calculates the MSE regression loss, measuring how
close a regression line is to a set of data point.

sklearn.metrics.r2_score()
Calculates R squared - a regression performance
measure based on variance explained.
Cross-Validation

Evaluates a score by cross-validation on different


subsets of the data.

sklearn.model_selection.cross_val_score(
estimator=model,
X=X_train,
y=y_train,
cv=5, # splitting strategy
scoring='accuracy',
)

Learning the parameters of a prediction function


and testing it on the same data will lead to
overfitting.
Hyperparameter Tuning

Hyperparameter tuning is the process of


selecting the optimal values for a machine learning
model’s hyperparameters.

sklearn.model_selection.GridSearchCV()
Performs exhaustive search over specified
parameter values for an estimator.
Basically, brute-force search.

sklearn.model_selection.RandomizedSearchCV()
Randomly samples parameter settings. Uses
fewer resources than GridSearchCV.
Pipeline Creation

Use Pipeline to group multiple processing steps


together:

pipeline = sklearn.pipeline.Pipeline([
('scaler', StandardScaler()),
('classifier', RandomForestClassifier())
])

You can still use hyperparameter tunning on your


pipelines! As if they are models.
don’t forget to

Subscribe
Kostya Numan

You might also like