Skit Learn Cheatsheet
Skit Learn Cheatsheet
Cheat Sheet
by
Numan
Scikit-Learn is a Python
library that provides
simple and efficient
tools for data mining,
data analysis, and
machine learning.
Data Preprocessing
sklearn.preprocessing.StandardScaler()
Standardizes features by removing the mean and
scaling to unit variance.
sklearn.preprocessing.MinMaxScaler()
Scales features to a given range, typically [0, 1].
sklearn.preprocessing.OneHotEncoder()
Converts categorical values into one-hot encoded
binary vectors.
sklearn.preprocessing.LabelEncoder()
Encodes labels with values between zero and the
number of classes minus one.
sklearn.impute.SimpleImputer()
Handles missing values by replacing them with specified
values (e.g., mean, median).
Test-Train Split
sklearn.linear_model.LogisticRegression()
Applies logistic regression for binary or multiclass
classification tasks.
sklearn.tree.DecisionTreeClassifier()
A decision tree classifier that uses a tree structure to
make predictions.
sklearn.ensemble.RandomForestClassifier()
A meta-estimator that fits a number of decision trees
on various sub-samples of the dataset.
Model Evaluation
sklearn.metrics.accuracy_score()
Calculates the accuracy classification score
(proportion of correct predictions).
sklearn.metrics.precision_score()
Measures precision; useful for binary classification to
assess the positive class.
sklearn.metrics.recall_score()
Measures recall, which is the ability of the classifier to
find all positive samples.
sklearn.metrics.f1_score()
Computes the F1 score, which balances precision and
recall.
Model Evaluation
sklearn.metrics.mean_absolute_error()
Computes the mean absolute error for regression
tasks.
sklearn.metrics.mean_squared_error()
Calculates the MSE regression loss, measuring how
close a regression line is to a set of data point.
sklearn.metrics.r2_score()
Calculates R squared - a regression performance
measure based on variance explained.
Cross-Validation
sklearn.model_selection.cross_val_score(
estimator=model,
X=X_train,
y=y_train,
cv=5, # splitting strategy
scoring='accuracy',
)
sklearn.model_selection.GridSearchCV()
Performs exhaustive search over specified
parameter values for an estimator.
Basically, brute-force search.
sklearn.model_selection.RandomizedSearchCV()
Randomly samples parameter settings. Uses
fewer resources than GridSearchCV.
Pipeline Creation
pipeline = sklearn.pipeline.Pipeline([
('scaler', StandardScaler()),
('classifier', RandomForestClassifier())
])
Subscribe
Kostya Numan