Alfeo, Introduction To PyCharm, PyLint, PyTest, and CVS
Alfeo, Introduction To PyCharm, PyLint, PyTest, and CVS
OVERVIEW
1. Installation 8. Python basics to design a structured project pipeline
I. Tabular data with pandas.dataframes
2. Work with Pycharm projects
II. Dataframe manipulation
I. Configure the interpreter
II. Requirements management III. Data segregation with Sklearn
IV. Model hyperarametrization
III. Project navigation and run
V. From JSON to Object
VI. Performance evaluation
3. Code quality checking VII. Model deployment
I. Pylint VIII. Project profiling
II. Plugin installation and usage
INSTALLATION
4
INSTALLATION 1/3
1. Install Git using only the default settings in the installation process
• https://git-scm.com/downloads
2. Install PyCharm
• Linux -> https://download.jetbrains.com/python/pycharm-community-
anaconda-2020.2.3.tar.gz
• Windows -> https://download.jetbrains.com/python/pycharm-community-
anaconda-2020.2.3.exe
• MacOs -> https://download.jetbrains.com/python/pycharm-community-
anaconda-2020.2.3.dmg
5
INSTALLATION 2/3
During PyCharm installation enable “open folder as project”
6
INSTALLATION 3/3
• Accept the JetBrains
Privacy Policy
• Choose the UI theme you
prefer
• Do not install any
featured plugin
• Install Miniconda: includes
the conda environment
manager, Python, the
packages they depend
on, and a small number
of other useful packages
(e.g. pip).
• Start using PyCharm
7
OPEN A PROJECT
1. Unzip “simpleClassifier.rar”
GUI
Menu Toolbar Run/Debug buttons
INTERPRETER CONFIGURATION
• File > Settings > Project > Python interpreter > Show all
1
4
2
• > Conda Environment > New environment
• Location and conda executable may have slightly different paths
(the first part) according to your miniconda3 location.
6
5
• Apply > Ok 7
8
11
WARNING
• Each package can also be installed via GUI or with the Terminal by
using
• “pip install <name_lib>” to install a single package
• “pip install –r requirements.txt” to install the packages on the
requirements.txt
14
*
15
2. Select “Python”
4. Click OK
16
2. Search for
“pylint”
3. Install Pylint
plugin 4
4. Restart
Pycharm
20
5 4
21
1
1. Double click on a Python
file in the Project View
2
22
• Code navigation.
ENABLE PYTEST
1. File > Settings > Python Integrated Tools and
2. Set “pytest” as the “Default test runner”
1
26
1
2
3
27
“ -q --disable-warnings --tb=no “
-q, quiet output
--disable-warnings, don't show most of the warnings
3
--tb=no, don't show any traceback
4
29
TEST PARAMETRIZATION
3
32
VERSION CONTROL ON
PYCHARM
34
VERSION CONTROL
In Pycharm the Version Control
can be accessed locally:
1. via VCS > Local History
2. Stores only local and recent
change in the project
highlighting them 1
2
35
1 2 3
37
3
1
38
2. Click Add
2
39
… ON GITHUB
Let’s say you have a central repository for your project, the
contribution of each coder will be highlighted by the commits she/he
made. In the next slide you will see how to “sign” your commit º
40
Color code:
- Blue: modified
- Green: commit done
- Red: not in the VCS
- White: push done
Explain clearly the
change provided
After the first commit here
there will be the “before
and after” comparison
41
2
42
1
2
3
43
44
RECAP
PyCharm is an integrated development environment (IDE) widely
used in the Python community. PyCharm provides:
• Project and code navigation: create a Python project by opening a
folder as a project or even from scratch, by creating or importing
the python files. Navigate the project with specialized project views,
jump between files, classes, methods and usages.
• Python refactoring: rename, extract methods, introduce variables
and constants
• Configurable python interpreter and virtual environment support.
• Coding assistance and analysis: code completion, syntax and error
highlighting, quick fixes, guided import, integrated Python
debugger, linting and unit testing with code coverage.
• PyCharm provides version control integration, a unified user
interface for many VCS such as GithHub, supporting the
management of commit, push, pull, change lists, cloning… and
even branch, merge, and conflict via Git.
45
MORE RESOURCES…
46
ON LINTING
• https://www.python.org/dev/peps/pep-0008/
• https://plugins.jetbrains.com/plugin/11084-pylint
ON TESTING
• https://www.jetbrains.com/help/pycharm/pytest.html
• https://www.jetbrains.com/pycharm/guide/technologies/pyt
est/
• pytest.mark allows you setting the metadata on your test
functions. Here you have some builtin markers different from
pytest.mark.parametrize:
• @pytest.mark.skip: always skip a test function
• @pytest.mark.skipif: skip a test function if a certain condition is
met
• @pytest.mark.xfail: produce an expected failure outcome if a
certain condition is met
49
ON VERSION CONTROL
PYCHARM DOCS
• https://www.jetbrains.com/help/pycharm/enabling-version-control.html
• https://www.jetbrains.com/help/pycharm/manage-projects-hosted-on-
github.html
• https://www.jetbrains.com/help/pycharm/contribute-to-projects.html
VIDEO TUTORIAL
• https://www.youtube.com/watch?v=jFnYQbUZQlA
• https://www.youtube.com/watch?v=_w9XWHDSSa4
• https://www.youtube.com/watch?v=AHkiCKG-JhM
50
51
EXERCISE
1. Use the IrisClassifier.py by changing the number of training epochs,
i.e. "max_iter" in the MLPClassifier() documentation page. How
does the performance change by using 50 epochs?
A POSSIBLE SOLUTION
@pytest.mark.parametrize('epoch', epochs)
def test_evaluations(epoch):
i = IrisClassifier(epoch)
i.ingestion()
i.segregation()
i.train()
res = i.evaluation()
assert res > 0.75
53
54
PROJECT ORGANIZATION
• Consider that each class in your project should provide
one or more functionalities that work with inputs and
configurations to generate outputs. Inputs,
configurations and outputs have to be stored in
dedicated folders. Each class should be able to
save/load data and configurations from those folders.
• Let’s start with a very simple example. We aim at
classify a yearly set of time series. We need at least 3
folders:
• data -> tabular data, from raw to segregated form
• configs -> configurations and models
• results -> outcomes, figures and scores obtained
57
2. DATAFRAME MANIPULATION
WITH PANDAS.DATAFRAME
from pandas import read_csv
# Read csv
data = read_csv('data/completeDataset.csv') Provides statistics for
# Check the content of the dataframe each column in the
dataframe
print(data.describe())
# Remove column with constant values Drop the column (axis=1)
data = data.drop('h24', axis=1) whose label is ‘h24’
# Remove anomalous instances
data = data.loc[data['Anomalous'] < 1] Loc select instances by
print(data.describe()) using a boolean array.
# Save as csv
data.to_csv('data/preprocessedDataset.csv', index=False)
59
4. ML MODEL PARAMETRIZATION
VIA GRID SEARCH
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from pandas import read_csv
from numpy import ravel
import json
The ML model
# Read the data
training_data = read_csv('data/trainingData.csv')
(random_state is used for
training_labels = read_csv('data/trainingLabels.csv') reproducibility)
# setup Grid Search for a randomForestClassifier
rf = RandomForestClassifier(random_state=0) N_estimator is the name
# values to test of the hyperparameter of
parameters = {'n_estimators': (1, 2, 3)}
RandomForestClassifier
# apply grid search
gs = GridSearchCV(rf, parameters)
gs.fit(training_data, ravel(training_labels)) Use ravel to transform
# save best configuration dataframe to narray
config_path = 'config/modelConfiguration.json'
with open(config_path, 'w') as f: Save the best
json.dump(gs.best_params_, f) hyperparameter as JSON
61
import json
Use a simple class to get
the model’s parameters
class Params():
def __init__(self, n_estimators): Load the JSON .
self.n_estimators = n_estimators Remember, it should be
compliant to the
# load best configuration
expected json schema
config_path = 'config/modelConfiguration.json' i.e. pass the validation
with open(config_path, 'r') as f:
params = json.load(f) Create a new Object,
and pass the JSON
params_object = Params(**params) dictionary as a map to
convert JSON data into a
custom Python Object
62
6. ML MODEL PERFORMANCE
from numpy import ravel
ASSESSMENT
from pandas import read_csv
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import json
DATA PRE-PROCESSING
WITH SCIKIT-LEARN
• Data discretization
• Data normalization with minmax or standard scaler
• Features computation and extraction with NumPy and Scipy
• Split data in train and testing set
ML MODEL IMPLEMENTATION
WITH SCIKIT-LEARN
• MLPclassifier (classification)
• RandomForestClassifier (classification)
• DecisionTreeRegressor (regression)
• MLPregressor (regression)
• k-means (partition based clustering)
• DBSCAN (density based clustering)
• IsolationForest (anomaly detection)
• OneClassSVM (anomaly detection)
ARCHITECTURE EVALUATION
• Choose the right performance metric
• Performance visualization
• K-Folds and Stratified-K-Fold cross-validation
• Hyperparameters search via grid search
• Import/export objects parametrization with json
• Validate the json schema
• Dump/load fitted model with joblib
• How much time does an instruction take to execute?
• Official Pycharm profiling tool (just for Pro version)
EXERCIZE
• Given the csvs used in this lab, build a few classes that:
• read the csvs “1” and “2” and append them
• remove the columns with constant values
• select only data from Cluster 0 (‘Cluster’ = 0)
• generate training and testing sets (respectively 75% and 25% of the
whole dataset) using only columns ‘h1’-‘h23’ as data and the column
‘Anomalous’ as target
• build a ML model using isolationForest
• train the model with the training data
• predict the anomalous label with the testing data
• compute the accuracy of your model. Consider that with your labels
you have 1 for anomalous instances whereas isolationForest return -1 for
anomalous instances.
• save the ML trained model
• load the trained model and use it with the “New” data
• print the resulting accuracy on screen.