100% found this document useful (1 vote)

117 views11 pages

Random Forest: Implementaciones de Scikit-Learn Sobre QSAR

The document discusses implementations of random forest and ID3 decision tree algorithms from Scikit-Learn on a QSAR oral toxicity dataset. It loads and prepares the data, trains random forest and ID3 models on train data and tests them on test data. It reports the accuracy, classification report and ROC AUC scores. It also performs cross validation on the two models and hyperparameter tuning of random forest using randomized search.

Uploaded by

Richard Jimenez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

117 views11 pages

Random Forest: Implementaciones de Scikit-Learn Sobre QSAR

Uploaded by

Richard Jimenez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

13/4/2020 Scikit-Learn

Implementaciones de Scikit-learn sobre QSAR

Random Forest
In [2]:

import pandas as pd
import numpy as np
dataset= pd.read_csv("qsar_oral_toxicity.csv", sep=';', prefix='x', header=None)
dataset.head()

Out[2]:

x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 ... x1015 x1016 x1017 x1018 x1019 x1020

0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0

1 0 0 1 0 0 0 0 0 0 0 ... 0 0 0 0 0 0

2 0 0 0 0 0 0 0 0 0 0 ... 0 0 1 0 0 0

3 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 0 0

4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0

5 rows × 1025 columns

In [3]:

from sklearn import preprocessing, model_selection

enc = preprocessing.OrdinalEncoder()
enc.fit(dataset[['x1024']])
for i, cat in enumerate(enc.categories_[0]): print("{} -> {}".format(cat, i))
dataset['output'] = enc.transform(dataset[['x1024']])
dataset.head()

negative -> 0
positive -> 1

Out[3]:

x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 ... x1016 x1017 x1018 x1019 x1020 x1021

0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0

1 0 0 1 0 0 0 0 0 0 0 ... 0 0 0 0 0 0

2 0 0 0 0 0 0 0 0 0 0 ... 0 1 0 0 0 0

3 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 0 0

4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0

5 rows × 1026 columns

localhost:8888/nbconvert/html/Entrega2/ Scikit-Learn.ipynb?download=false 1/11

13/4/2020 Scikit-Learn

In [4]:

train, test = model_selection.train_test_split(dataset, test_size=0.2, random_state=42)

train.x1024.value_counts()

Out[4]:

negative 6609
positive 584
Name: x1024, dtype: int64

In [5]:

from sklearn.ensemble import RandomForestClassifier

X_train = train.iloc[:, 0:1024].values
Y_train = train.output
clf = RandomForestClassifier(n_estimators=100, max_features="sqrt", max_depth=None, min
_samples_split=2)
clf = clf.fit(X_train, Y_train)

In [6]:

X_test = test.iloc[:, 0:1024].values

Y_test = test.output
test_pred=clf.predict(X_test)

In [7]:

from sklearn import metrics

print("\nAcierto:", metrics.accuracy_score(test.output, test_pred))
print(metrics.classification_report(test.output, test_pred))

Acierto: 0.9394107837687604
precision recall f1-score support

0.0 0.95 0.99 0.97 1642

1.0 0.79 0.42 0.55 157

accuracy 0.94 1799

macro avg 0.87 0.70 0.76 1799
weighted avg 0.93 0.94 0.93 1799

In [8]:

from sklearn.metrics import roc_auc_score

roc_value = roc_auc_score(test.output, test_pred)
print(roc_value)

0.7047099622178948

ID3

localhost:8888/nbconvert/html/Entrega2/ Scikit-Learn.ipynb?download=false 2/11

13/4/2020 Scikit-Learn

In [9]:

from sklearn import tree

clf1 = tree.DecisionTreeClassifier()
clf1 = clf1.fit(X_train, Y_train)
test_pred1=clf1.predict(X_test)

In [10]:

print("\nAcierto:", metrics.accuracy_score(test.output, test_pred1))

print(metrics.classification_report(test.output, test_pred1))

Acierto: 0.9049471928849361
precision recall f1-score support

0.0 0.95 0.94 0.95 1642

1.0 0.46 0.52 0.49 157

accuracy 0.90 1799

macro avg 0.71 0.73 0.72 1799
weighted avg 0.91 0.90 0.91 1799

In [11]:

roc_value1 = roc_auc_score(test.output, test_pred1)

print(roc_value1)

0.731913853697138

Cross Validation
In [12]:

seed = 1
scoring = 'accuracy'

In [13]:

models = []
models.append(('CART', tree.DecisionTreeClassifier()))
models.append(('RF', RandomForestClassifier()))
results = []
names = []
for name, model in models:
kfold = model_selection.KFold(n_splits=10, random_state=None)
cv_results = model_selection.cross_val_score(model, X_train, Y_train, cv=kfold,
scoring=scoring)
results.append(cv_results)
names.append(name)
msg = "%s: %f (%f)" % (name, cv_results.mean(), cv_results.std())
print(msg)

CART: 0.910466 (0.010126)

RF: 0.939664 (0.010440)

localhost:8888/nbconvert/html/Entrega2/ Scikit-Learn.ipynb?download=false 3/11

13/4/2020 Scikit-Learn

In [18]:

from sklearn.model_selection import RandomizedSearchCV

n_estimators = [int(x) for x in np.linspace(start = 100, stop = 1000, num = 10)]
max_features = ['auto', 'sqrt']
max_depth = [int(x) for x in np.linspace(10, 110, num = 11)]
max_depth.append(None)
min_samples_split = [2, 5, 10]
min_samples_leaf = [1, 2, 4]
bootstrap = [True, False]

random_grid = {'n_estimators': n_estimators,

'max_features': max_features,
'max_depth': max_depth,
'min_samples_split': min_samples_split,
'min_samples_leaf': min_samples_leaf,
'bootstrap': bootstrap}
print (random_grid)

{'n_estimators': [100, 200, 300, 400, 500, 600, 700, 800, 900, 1000], 'max
_features': ['auto', 'sqrt'], 'max_depth': [10, 20, 30, 40, 50, 60, 70, 8
0, 90, 100, 110, None], 'min_samples_split': [2, 5, 10], 'min_samples_lea
f': [1, 2, 4], 'bootstrap': [True, False]}

localhost:8888/nbconvert/html/Entrega2/ Scikit-Learn.ipynb?download=false 4/11

13/4/2020 Scikit-Learn

In [15]:

from sklearn.ensemble import RandomForestRegressor

rf = RandomForestRegressor()
rf_random = RandomizedSearchCV(estimator = rf, param_distributions = random_grid, n_ite
r = 10, cv = 5, verbose=2, random_state=42)
rf_random.fit(X_train, Y_train)
rf_random.best_params_

localhost:8888/nbconvert/html/Entrega2/ Scikit-Learn.ipynb?download=false 5/11

13/4/2020 Scikit-Learn

Fitting 5 folds for each of 10 candidates, totalling 50 fits

[CV] n_estimators=100, min_samples_split=10, min_samples_leaf=2, max_featu
res=sqrt, max_depth=50, bootstrap=True

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent wo

rkers.

[CV] n_estimators=100, min_samples_split=10, min_samples_leaf=2, max_feat

ures=sqrt, max_depth=50, bootstrap=True, total= 12.9s
[CV] n_estimators=100, min_samples_split=10, min_samples_leaf=2, max_featu
res=sqrt, max_depth=50, bootstrap=True

[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 12.8s remaining:

0.0s

localhost:8888/nbconvert/html/Entrega2/ Scikit-Learn.ipynb?download=false 6/11

13/4/2020 Scikit-Learn

[CV] n_estimators=100, min_samples_split=10, min_samples_leaf=2, max_feat

ures=sqrt, max_depth=50, bootstrap=True, total= 13.1s
[CV] n_estimators=100, min_samples_split=10, min_samples_leaf=2, max_featu
res=sqrt, max_depth=50, bootstrap=True
[CV] n_estimators=100, min_samples_split=10, min_samples_leaf=2, max_feat
ures=sqrt, max_depth=50, bootstrap=True, total= 13.6s
[CV] n_estimators=100, min_samples_split=10, min_samples_leaf=2, max_featu
res=sqrt, max_depth=50, bootstrap=True
[CV] n_estimators=100, min_samples_split=10, min_samples_leaf=2, max_feat
ures=sqrt, max_depth=50, bootstrap=True, total= 13.8s
[CV] n_estimators=100, min_samples_split=10, min_samples_leaf=2, max_featu
res=sqrt, max_depth=50, bootstrap=True
[CV] n_estimators=100, min_samples_split=10, min_samples_leaf=2, max_feat
ures=sqrt, max_depth=50, bootstrap=True, total= 13.9s
[CV] n_estimators=300, min_samples_split=10, min_samples_leaf=4, max_featu
res=sqrt, max_depth=90, bootstrap=False
[CV] n_estimators=300, min_samples_split=10, min_samples_leaf=4, max_feat
ures=sqrt, max_depth=90, bootstrap=False, total= 1.0min
[CV] n_estimators=300, min_samples_split=10, min_samples_leaf=4, max_featu
res=sqrt, max_depth=90, bootstrap=False
[CV] n_estimators=300, min_samples_split=10, min_samples_leaf=4, max_feat
ures=sqrt, max_depth=90, bootstrap=False, total= 1.1min
[CV] n_estimators=300, min_samples_split=10, min_samples_leaf=4, max_featu
res=sqrt, max_depth=90, bootstrap=False
[CV] n_estimators=300, min_samples_split=10, min_samples_leaf=4, max_feat
ures=sqrt, max_depth=90, bootstrap=False, total= 1.1min
[CV] n_estimators=300, min_samples_split=10, min_samples_leaf=4, max_featu
res=sqrt, max_depth=90, bootstrap=False
[CV] n_estimators=300, min_samples_split=10, min_samples_leaf=4, max_feat
ures=sqrt, max_depth=90, bootstrap=False, total= 1.1min
[CV] n_estimators=300, min_samples_split=10, min_samples_leaf=4, max_featu
res=sqrt, max_depth=90, bootstrap=False
[CV] n_estimators=300, min_samples_split=10, min_samples_leaf=4, max_feat
ures=sqrt, max_depth=90, bootstrap=False, total= 1.1min
[CV] n_estimators=300, min_samples_split=2, min_samples_leaf=2, max_featur
es=auto, max_depth=60, bootstrap=False
[CV] n_estimators=300, min_samples_split=2, min_samples_leaf=2, max_featu
res=auto, max_depth=60, bootstrap=False, total=49.4min
[CV] n_estimators=300, min_samples_split=2, min_samples_leaf=2, max_featur
es=auto, max_depth=60, bootstrap=False
[CV] n_estimators=300, min_samples_split=2, min_samples_leaf=2, max_featu
res=auto, max_depth=60, bootstrap=False, total=30.2min
[CV] n_estimators=300, min_samples_split=2, min_samples_leaf=2, max_featur
es=auto, max_depth=60, bootstrap=False
[CV] n_estimators=300, min_samples_split=2, min_samples_leaf=2, max_featu
res=auto, max_depth=60, bootstrap=False, total=18.2min
[CV] n_estimators=300, min_samples_split=2, min_samples_leaf=2, max_featur
es=auto, max_depth=60, bootstrap=False
[CV] n_estimators=300, min_samples_split=2, min_samples_leaf=2, max_featu
res=auto, max_depth=60, bootstrap=False, total= 8.3min
[CV] n_estimators=300, min_samples_split=2, min_samples_leaf=2, max_featur
es=auto, max_depth=60, bootstrap=False
[CV] n_estimators=300, min_samples_split=2, min_samples_leaf=2, max_featu
res=auto, max_depth=60, bootstrap=False, total= 9.5min
[CV] n_estimators=700, min_samples_split=5, min_samples_leaf=1, max_featur
es=sqrt, max_depth=30, bootstrap=True
[CV] n_estimators=700, min_samples_split=5, min_samples_leaf=1, max_featu
res=sqrt, max_depth=30, bootstrap=True, total= 27.8s
[CV] n_estimators=700, min_samples_split=5, min_samples_leaf=1, max_featur
es=sqrt, max_depth=30, bootstrap=True
[CV] n_estimators=700, min_samples_split=5, min_samples_leaf=1, max_featu
localhost:8888/nbconvert/html/Entrega2/ Scikit-Learn.ipynb?download=false 7/11
13/4/2020 Scikit-Learn

res=sqrt, max_depth=30, bootstrap=True, total= 28.6s

[CV] n_estimators=700, min_samples_split=5, min_samples_leaf=1, max_featur
es=sqrt, max_depth=30, bootstrap=True
[CV] n_estimators=700, min_samples_split=5, min_samples_leaf=1, max_featu
res=sqrt, max_depth=30, bootstrap=True, total= 21.6s
[CV] n_estimators=700, min_samples_split=5, min_samples_leaf=1, max_featur
es=sqrt, max_depth=30, bootstrap=True
[CV] n_estimators=700, min_samples_split=5, min_samples_leaf=1, max_featu
res=sqrt, max_depth=30, bootstrap=True, total= 21.4s
[CV] n_estimators=700, min_samples_split=5, min_samples_leaf=1, max_featur
es=sqrt, max_depth=30, bootstrap=True
[CV] n_estimators=700, min_samples_split=5, min_samples_leaf=1, max_featu
res=sqrt, max_depth=30, bootstrap=True, total= 21.0s
[CV] n_estimators=500, min_samples_split=10, min_samples_leaf=1, max_featu
res=auto, max_depth=80, bootstrap=False
[CV] n_estimators=500, min_samples_split=10, min_samples_leaf=1, max_feat
ures=auto, max_depth=80, bootstrap=False, total=12.0min
[CV] n_estimators=500, min_samples_split=10, min_samples_leaf=1, max_featu
res=auto, max_depth=80, bootstrap=False
[CV] n_estimators=500, min_samples_split=10, min_samples_leaf=1, max_feat
ures=auto, max_depth=80, bootstrap=False, total=15.1min
[CV] n_estimators=500, min_samples_split=10, min_samples_leaf=1, max_featu
res=auto, max_depth=80, bootstrap=False
[CV] n_estimators=500, min_samples_split=10, min_samples_leaf=1, max_feat
ures=auto, max_depth=80, bootstrap=False, total=15.4min
[CV] n_estimators=500, min_samples_split=10, min_samples_leaf=1, max_featu
res=auto, max_depth=80, bootstrap=False
[CV] n_estimators=500, min_samples_split=10, min_samples_leaf=1, max_feat
ures=auto, max_depth=80, bootstrap=False, total=13.1min
[CV] n_estimators=500, min_samples_split=10, min_samples_leaf=1, max_featu
res=auto, max_depth=80, bootstrap=False
[CV] n_estimators=500, min_samples_split=10, min_samples_leaf=1, max_feat
ures=auto, max_depth=80, bootstrap=False, total=16.0min
[CV] n_estimators=200, min_samples_split=10, min_samples_leaf=1, max_featu
res=sqrt, max_depth=60, bootstrap=False
[CV] n_estimators=200, min_samples_split=10, min_samples_leaf=1, max_feat
ures=sqrt, max_depth=60, bootstrap=False, total= 9.8s
[CV] n_estimators=200, min_samples_split=10, min_samples_leaf=1, max_featu
res=sqrt, max_depth=60, bootstrap=False
[CV] n_estimators=200, min_samples_split=10, min_samples_leaf=1, max_feat
ures=sqrt, max_depth=60, bootstrap=False, total= 10.1s
[CV] n_estimators=200, min_samples_split=10, min_samples_leaf=1, max_featu
res=sqrt, max_depth=60, bootstrap=False
[CV] n_estimators=200, min_samples_split=10, min_samples_leaf=1, max_feat
ures=sqrt, max_depth=60, bootstrap=False, total= 9.8s
[CV] n_estimators=200, min_samples_split=10, min_samples_leaf=1, max_featu
res=sqrt, max_depth=60, bootstrap=False
[CV] n_estimators=200, min_samples_split=10, min_samples_leaf=1, max_feat
ures=sqrt, max_depth=60, bootstrap=False, total= 10.2s
[CV] n_estimators=200, min_samples_split=10, min_samples_leaf=1, max_featu
res=sqrt, max_depth=60, bootstrap=False
[CV] n_estimators=200, min_samples_split=10, min_samples_leaf=1, max_feat
ures=sqrt, max_depth=60, bootstrap=False, total= 10.1s
[CV] n_estimators=1000, min_samples_split=2, min_samples_leaf=2, max_featu
res=auto, max_depth=50, bootstrap=False
[CV] n_estimators=1000, min_samples_split=2, min_samples_leaf=2, max_feat
ures=auto, max_depth=50, bootstrap=False, total=25.0min
[CV] n_estimators=1000, min_samples_split=2, min_samples_leaf=2, max_featu
res=auto, max_depth=50, bootstrap=False
[CV] n_estimators=1000, min_samples_split=2, min_samples_leaf=2, max_feat
ures=auto, max_depth=50, bootstrap=False, total=28.5min
localhost:8888/nbconvert/html/Entrega2/ Scikit-Learn.ipynb?download=false 8/11
13/4/2020 Scikit-Learn

[CV] n_estimators=1000, min_samples_split=2, min_samples_leaf=2, max_featu

res=auto, max_depth=50, bootstrap=False
[CV] n_estimators=1000, min_samples_split=2, min_samples_leaf=2, max_feat
ures=auto, max_depth=50, bootstrap=False, total=34.1min
[CV] n_estimators=1000, min_samples_split=2, min_samples_leaf=2, max_featu
res=auto, max_depth=50, bootstrap=False
[CV] n_estimators=1000, min_samples_split=2, min_samples_leaf=2, max_feat
ures=auto, max_depth=50, bootstrap=False, total=30.8min
[CV] n_estimators=1000, min_samples_split=2, min_samples_leaf=2, max_featu
res=auto, max_depth=50, bootstrap=False
[CV] n_estimators=1000, min_samples_split=2, min_samples_leaf=2, max_feat
ures=auto, max_depth=50, bootstrap=False, total=31.1min
[CV] n_estimators=100, min_samples_split=5, min_samples_leaf=2, max_featur
es=sqrt, max_depth=10, bootstrap=True
[CV] n_estimators=100, min_samples_split=5, min_samples_leaf=2, max_featu
res=sqrt, max_depth=10, bootstrap=True, total= 2.2s
[CV] n_estimators=100, min_samples_split=5, min_samples_leaf=2, max_featur
es=sqrt, max_depth=10, bootstrap=True
[CV] n_estimators=100, min_samples_split=5, min_samples_leaf=2, max_featu
res=sqrt, max_depth=10, bootstrap=True, total= 2.2s
[CV] n_estimators=100, min_samples_split=5, min_samples_leaf=2, max_featur
es=sqrt, max_depth=10, bootstrap=True
[CV] n_estimators=100, min_samples_split=5, min_samples_leaf=2, max_featu
res=sqrt, max_depth=10, bootstrap=True, total= 2.1s
[CV] n_estimators=100, min_samples_split=5, min_samples_leaf=2, max_featur
es=sqrt, max_depth=10, bootstrap=True
[CV] n_estimators=100, min_samples_split=5, min_samples_leaf=2, max_featu
res=sqrt, max_depth=10, bootstrap=True, total= 2.1s
[CV] n_estimators=100, min_samples_split=5, min_samples_leaf=2, max_featur
es=sqrt, max_depth=10, bootstrap=True
[CV] n_estimators=100, min_samples_split=5, min_samples_leaf=2, max_featu
res=sqrt, max_depth=10, bootstrap=True, total= 2.1s
[CV] n_estimators=600, min_samples_split=2, min_samples_leaf=4, max_featur
es=auto, max_depth=100, bootstrap=True
[CV] n_estimators=600, min_samples_split=2, min_samples_leaf=4, max_featu
res=auto, max_depth=100, bootstrap=True, total=11.2min
[CV] n_estimators=600, min_samples_split=2, min_samples_leaf=4, max_featur
es=auto, max_depth=100, bootstrap=True
[CV] n_estimators=600, min_samples_split=2, min_samples_leaf=4, max_featu
res=auto, max_depth=100, bootstrap=True, total= 9.5min
[CV] n_estimators=600, min_samples_split=2, min_samples_leaf=4, max_featur
es=auto, max_depth=100, bootstrap=True
[CV] n_estimators=600, min_samples_split=2, min_samples_leaf=4, max_featu
res=auto, max_depth=100, bootstrap=True, total= 9.6min
[CV] n_estimators=600, min_samples_split=2, min_samples_leaf=4, max_featur
es=auto, max_depth=100, bootstrap=True
[CV] n_estimators=600, min_samples_split=2, min_samples_leaf=4, max_featu
res=auto, max_depth=100, bootstrap=True, total= 9.8min
[CV] n_estimators=600, min_samples_split=2, min_samples_leaf=4, max_featur
es=auto, max_depth=100, bootstrap=True
[CV] n_estimators=600, min_samples_split=2, min_samples_leaf=4, max_featu
res=auto, max_depth=100, bootstrap=True, total= 9.0min
[CV] n_estimators=1000, min_samples_split=5, min_samples_leaf=2, max_featu
res=auto, max_depth=50, bootstrap=True
[CV] n_estimators=1000, min_samples_split=5, min_samples_leaf=2, max_feat
ures=auto, max_depth=50, bootstrap=True, total=15.8min
[CV] n_estimators=1000, min_samples_split=5, min_samples_leaf=2, max_featu
res=auto, max_depth=50, bootstrap=True
[CV] n_estimators=1000, min_samples_split=5, min_samples_leaf=2, max_feat
ures=auto, max_depth=50, bootstrap=True, total=16.6min
[CV] n_estimators=1000, min_samples_split=5, min_samples_leaf=2, max_featu
localhost:8888/nbconvert/html/Entrega2/ Scikit-Learn.ipynb?download=false 9/11
13/4/2020 Scikit-Learn

res=auto, max_depth=50, bootstrap=True

[CV] n_estimators=1000, min_samples_split=5, min_samples_leaf=2, max_feat
ures=auto, max_depth=50, bootstrap=True, total=16.4min
[CV] n_estimators=1000, min_samples_split=5, min_samples_leaf=2, max_featu
res=auto, max_depth=50, bootstrap=True
[CV] n_estimators=1000, min_samples_split=5, min_samples_leaf=2, max_feat
ures=auto, max_depth=50, bootstrap=True, total=16.7min
[CV] n_estimators=1000, min_samples_split=5, min_samples_leaf=2, max_featu
res=auto, max_depth=50, bootstrap=True
[CV] n_estimators=1000, min_samples_split=5, min_samples_leaf=2, max_feat
ures=auto, max_depth=50, bootstrap=True, total=15.5min

[Parallel(n_jobs=1)]: Done 50 out of 50 | elapsed: 476.0min finished

Out[15]:

{'n_estimators': 200,
'min_samples_split': 10,
'min_samples_leaf': 1,
'max_features': 'sqrt',
'max_depth': 60,
'bootstrap': False}

In [16]:

rf_random.best_params_

Out[16]:

{'n_estimators': 200,
'min_samples_split': 10,
'min_samples_leaf': 1,
'max_features': 'sqrt',
'max_depth': 60,
'bootstrap': False}

comparamos
In [22]:

clf2 = RandomForestClassifier(n_estimators=200, max_features="sqrt", max_depth=60, min_

samples_split=10, min_samples_leaf=1, bootstrap= False)
clf2 = clf.fit(X_train, Y_train)

localhost:8888/nbconvert/html/Entrega2/ Scikit-Learn.ipynb?download=false 10/11

13/4/2020 Scikit-Learn

In [23]:

test_pred2=clf2.predict(X_test)
print("\nAcierto:", metrics.accuracy_score(test.output, test_pred2))
print(metrics.classification_report(test.output, test_pred2))

Acierto: 0.9382990550305725
precision recall f1-score support

0.0 0.95 0.99 0.97 1642

1.0 0.78 0.41 0.54 157

accuracy 0.94 1799

macro avg 0.86 0.70 0.75 1799
weighted avg 0.93 0.94 0.93 1799

In [ ]:

localhost:8888/nbconvert/html/Entrega2/ Scikit-Learn.ipynb?download=false 11/11

ZKBio Security How To Find Password 2023
50% (2)
ZKBio Security How To Find Password 2023
3 pages
Lecture 3 - Single Channel Queue (Updated)
No ratings yet
Lecture 3 - Single Channel Queue (Updated)
15 pages
Explore: Lampiran 5 Hasil Pengolahan SPSS
No ratings yet
Explore: Lampiran 5 Hasil Pengolahan SPSS
4 pages
ANOVA Step by Step
No ratings yet
ANOVA Step by Step
4 pages
Introduction To Machine Learning: Jaime S. Cardoso
100% (1)
Introduction To Machine Learning: Jaime S. Cardoso
52 pages
Merging - Scaled - 1D - & - Trying - Different - CLassification - ML - Models - .Ipynb - Colaboratory
100% (1)
Merging - Scaled - 1D - & - Trying - Different - CLassification - ML - Models - .Ipynb - Colaboratory
16 pages
FIN435 Individual Assignment
No ratings yet
FIN435 Individual Assignment
13 pages
Madras University Business Statistics Syllabus
No ratings yet
Madras University Business Statistics Syllabus
1 page
Xgboost in Online Transaction Fraud Detection
100% (1)
Xgboost in Online Transaction Fraud Detection
8 pages
Notite Seminar 5
No ratings yet
Notite Seminar 5
7 pages
ML Lab6.Ipynb - Colaboratory
100% (1)
ML Lab6.Ipynb - Colaboratory
5 pages
TR 10002802v010401p
No ratings yet
TR 10002802v010401p
285 pages
Outliers, Hypothesis and Natural Language Processing
100% (1)
Outliers, Hypothesis and Natural Language Processing
7 pages
Mean Deviation and Standard Deviation
No ratings yet
Mean Deviation and Standard Deviation
1 page
PublishedPaperNo.8 2022
100% (1)
PublishedPaperNo.8 2022
14 pages
Statistics - Mindmap
No ratings yet
Statistics - Mindmap
1 page
Etzc434 Mar23 FN
No ratings yet
Etzc434 Mar23 FN
2 pages
Regression Anallysis Hands0n 1
100% (1)
Regression Anallysis Hands0n 1
3 pages
Weather Forecasting Basepaper
100% (1)
Weather Forecasting Basepaper
14 pages
Class_SCA_03
No ratings yet
Class_SCA_03
25 pages
Chap08 PPT
No ratings yet
Chap08 PPT
77 pages
STAT422 - Slides
No ratings yet
STAT422 - Slides
52 pages
Charmi Shah 20bcp299 Lab2
100% (1)
Charmi Shah 20bcp299 Lab2
7 pages
Re P Ub L I C o F I R e L And: EDI CT OF Government
No ratings yet
Re P Ub L I C o F I R e L And: EDI CT OF Government
402 pages
Beta Survival Models: David Hubbard, Benoît Rostykus, Yves Raimond, Tony Jebara
No ratings yet
Beta Survival Models: David Hubbard, Benoît Rostykus, Yves Raimond, Tony Jebara
11 pages
Variosalgoritmos - Jupyter Notebook
100% (1)
Variosalgoritmos - Jupyter Notebook
9 pages
Assignment10 4
100% (1)
Assignment10 4
3 pages
Assignment No - 6-1
100% (1)
Assignment No - 6-1
3 pages
Cramer-Rao Lower Bound Computation Via The Characteristic Function
No ratings yet
Cramer-Rao Lower Bound Computation Via The Characteristic Function
15 pages
EMF CheatSheet V4
100% (1)
EMF CheatSheet V4
2 pages
Glass Classification
100% (2)
Glass Classification
3 pages
K Means Clustering
100% (1)
K Means Clustering
10 pages
VARMA For Battery Voltage Forecasting 3
No ratings yet
VARMA For Battery Voltage Forecasting 3
50 pages
RFID Prox Credentials
No ratings yet
RFID Prox Credentials
1 page
Lab 3. Linear Regression 230223
100% (1)
Lab 3. Linear Regression 230223
7 pages
Heart Disease Prediction - Jupyter Notebook
100% (1)
Heart Disease Prediction - Jupyter Notebook
9 pages
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
100% (1)
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
6 pages
Importing Libraries: Import As Import As Import As From Import As From Import From Import Import
100% (1)
Importing Libraries: Import As Import As Import As From Import As From Import From Import Import
11 pages
Uji Normalitas: One-Sample Kolmogorov-Smirnov Test
No ratings yet
Uji Normalitas: One-Sample Kolmogorov-Smirnov Test
2 pages
K-NN (Nearest Neighbor)
100% (1)
K-NN (Nearest Neighbor)
17 pages
Very High-Density 802.11ac NETWORKS: Engineering and Configuration Guide
No ratings yet
Very High-Density 802.11ac NETWORKS: Engineering and Configuration Guide
110 pages
Lecture 4 Data Analysis Summary Measures 1
No ratings yet
Lecture 4 Data Analysis Summary Measures 1
24 pages
Principles Design Experiments: Statistical
No ratings yet
Principles Design Experiments: Statistical
7 pages
To Pattern Recognition: CSE555, Fall 2021 Chapter 1, DHS
100% (1)
To Pattern Recognition: CSE555, Fall 2021 Chapter 1, DHS
39 pages
Salary Prediction LinearRegression
100% (1)
Salary Prediction LinearRegression
7 pages
Practice Problems3.1 - 4 - Solution
No ratings yet
Practice Problems3.1 - 4 - Solution
9 pages
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
100% (1)
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
41 pages
Wiley - Applied Econometric Time Series, 4th Edition - 978!1!118-80856-6
No ratings yet
Wiley - Applied Econometric Time Series, 4th Edition - 978!1!118-80856-6
2 pages
Cardio Screen RF
100% (1)
Cardio Screen RF
27 pages
Unit-5 Decision Trees and Ensemble Learning
100% (1)
Unit-5 Decision Trees and Ensemble Learning
162 pages
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
100% (1)
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
73 pages
Statistics MCQs - Hypothesis Testing For One Population Part 1 - Examrace
No ratings yet
Statistics MCQs - Hypothesis Testing For One Population Part 1 - Examrace
7 pages
Csi 5155 ML Project Report
100% (1)
Csi 5155 ML Project Report
24 pages
CS550 Regression Aug12
100% (1)
CS550 Regression Aug12
63 pages
Stats Lecture 06. Normal Distribution Data
No ratings yet
Stats Lecture 06. Normal Distribution Data
46 pages
Measures of Dispersion Kurtosis and Skewness
No ratings yet
Measures of Dispersion Kurtosis and Skewness
19 pages
Data Science With Python Relationship
No ratings yet
Data Science With Python Relationship
30 pages
Classification Problems
100% (1)
Classification Problems
25 pages
Bagging and Boosting
100% (1)
Bagging and Boosting
19 pages
Tcp/Ip Application Note: LPWA Module Series
No ratings yet
Tcp/Ip Application Note: LPWA Module Series
53 pages
Linear Regression: What Is Regression Analysis?
100% (1)
Linear Regression: What Is Regression Analysis?
21 pages
Designing A Wi-Fi Deployment Using Ekahau Site Survey Pro PDF
100% (1)
Designing A Wi-Fi Deployment Using Ekahau Site Survey Pro PDF
10 pages
Python Numpy (1) : Intro To Multi-Dimensional Array & Numerical Linear Algebra
100% (1)
Python Numpy (1) : Intro To Multi-Dimensional Array & Numerical Linear Algebra
27 pages
Sales Forecasting
100% (1)
Sales Forecasting
10 pages
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
100% (1)
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
38 pages
Class - Xii Mathematics Ncert Solutions: S A PA
No ratings yet
Class - Xii Mathematics Ncert Solutions: S A PA
10 pages
Research Methods Unit 4
No ratings yet
Research Methods Unit 4
6 pages
ML Project Shivani Pandey
100% (2)
ML Project Shivani Pandey
49 pages
Bagging and Boosting Regression Algorithms
100% (1)
Bagging and Boosting Regression Algorithms
84 pages
HW1
100% (1)
HW1
8 pages
Uncertainty Assessment For DC Power Supply: SKC Compliance Lab
No ratings yet
Uncertainty Assessment For DC Power Supply: SKC Compliance Lab
18 pages
Odds Ratio
No ratings yet
Odds Ratio
16 pages
ML Lect1
100% (1)
ML Lect1
51 pages
Lecture 4 Linear Regression
100% (1)
Lecture 4 Linear Regression
44 pages
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
100% (1)
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
19 pages
0.1 Stock Data
100% (1)
0.1 Stock Data
4 pages
Visualizing Network Data
No ratings yet
Visualizing Network Data
13 pages
SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science
100% (1)
SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science
28 pages
PR01
100% (1)
PR01
41 pages
Econometrics I: Chapter 3: Two Variable Regression Model: The Problem of Estimation
No ratings yet
Econometrics I: Chapter 3: Two Variable Regression Model: The Problem of Estimation
35 pages
Book
100% (1)
Book
480 pages
Loading The Dataset: 'Churn - Modelling - CSV'
No ratings yet
Loading The Dataset: 'Churn - Modelling - CSV'
6 pages
Lack of Fit Test
No ratings yet
Lack of Fit Test
5 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
IRIS BPNN - Ipynb - Colaboratory
100% (1)
IRIS BPNN - Ipynb - Colaboratory
4 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Lecture 03 Gradient Descent
No ratings yet
Lecture 03 Gradient Descent
26 pages
Logistic Regression
100% (1)
Logistic Regression
29 pages
Classification and Regression Trees
100% (1)
Classification and Regression Trees
60 pages
Vinee
100% (1)
Vinee
28 pages
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
100% (1)
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
72 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
Columbia Seaborn Tutorial
No ratings yet
Columbia Seaborn Tutorial
12 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Figure Style and Scale: Darkgrid Whitegrid Dark White Ticks Darkgrid
No ratings yet
Figure Style and Scale: Darkgrid Whitegrid Dark White Ticks Darkgrid
15 pages
Weka Tutorial
No ratings yet
Weka Tutorial
2 pages
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
From Everand
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
Fouad Sabry
No ratings yet

Random Forest: Implementaciones de Scikit-Learn Sobre QSAR

Uploaded by

Random Forest: Implementaciones de Scikit-Learn Sobre QSAR

Uploaded by

13/4/2020 Scikit-Learn

Implementaciones de Scikit-learn sobre QSAR

x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 ... x1015 x1016 x1017 x1018 x1019 x1020

5 rows × 1025 columns

from sklearn import preprocessing, model_selection

x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 ... x1016 x1017 x1018 x1019 x1020 x1021

5 rows × 1026 columns

localhost:8888/nbconvert/html/Entrega2/ Scikit-Learn.ipynb?download=false 1/11

train, test = model_selection.train_test_split(dataset, test_size=0.2, random_state=42)

from sklearn.ensemble import RandomForestClassifier

X_test = test.iloc[:, 0:1024].values

from sklearn import metrics

0.0 0.95 0.99 0.97 1642

accuracy 0.94 1799

from sklearn.metrics import roc_auc_score

localhost:8888/nbconvert/html/Entrega2/ Scikit-Learn.ipynb?download=false 2/11

from sklearn import tree

print("\nAcierto:", metrics.accuracy_score(test.output, test_pred1))

0.0 0.95 0.94 0.95 1642

accuracy 0.90 1799

roc_value1 = roc_auc_score(test.output, test_pred1)

CART: 0.910466 (0.010126)

localhost:8888/nbconvert/html/Entrega2/ Scikit-Learn.ipynb?download=false 3/11

from sklearn.model_selection import RandomizedSearchCV

random_grid = {'n_estimators': n_estimators,

localhost:8888/nbconvert/html/Entrega2/ Scikit-Learn.ipynb?download=false 4/11

from sklearn.ensemble import RandomForestRegressor

localhost:8888/nbconvert/html/Entrega2/ Scikit-Learn.ipynb?download=false 5/11

Fitting 5 folds for each of 10 candidates, totalling 50 fits

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent wo

[CV] n_estimators=100, min_samples_split=10, min_samples_leaf=2, max_feat

[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 12.8s remaining:

localhost:8888/nbconvert/html/Entrega2/ Scikit-Learn.ipynb?download=false 6/11

[CV] n_estimators=100, min_samples_split=10, min_samples_leaf=2, max_feat

res=sqrt, max_depth=30, bootstrap=True, total= 28.6s

[CV] n_estimators=1000, min_samples_split=2, min_samples_leaf=2, max_featu

res=auto, max_depth=50, bootstrap=True

[Parallel(n_jobs=1)]: Done 50 out of 50 | elapsed: 476.0min finished

clf2 = RandomForestClassifier(n_estimators=200, max_features="sqrt", max_depth=60, min_

localhost:8888/nbconvert/html/Entrega2/ Scikit-Learn.ipynb?download=false 10/11

0.0 0.95 0.99 0.97 1642

accuracy 0.94 1799

localhost:8888/nbconvert/html/Entrega2/ Scikit-Learn.ipynb?download=false 11/11

You might also like