Course DataCamp Classification With XGBoost
Course DataCamp Classification With XGBoost
course!
EXTREME GRADIENT BOOSTING WITH XGBOOST
Sergey Fogelson
VP of Analytics, Viacom
Before we get to XGBoost...
Need to understand the basics of
Supervised classi cation
Decision trees
Boosting
Labels: 1 or 0
Accuracy
tp + tn
tp + tn + f p + f n
Example: Net ix
Sergey Fogelson
VP of Analytics, Viacom
What is XGBoost?
Optimized gradient-boosting machine learning library
Scala
Julia
Java
X, y = class_data.iloc[:,:-1], class_data.iloc[:,-1]
X_train, X_test, y_train, y_test= train_test_split(X, y,
test_size=0.2, random_state=123)
xg_cl = xgb.XGBClassifier(objective='binary:logistic',
n_estimators=10, seed=123)
xg_cl.fit(X_train, y_train)
preds = xg_cl.predict(X_test)
accuracy = float(np.sum(preds==y_test))/y_test.shape[0]
accuracy: 0.78333
Sergey Fogelson
VP of Analytics, Viacom
Visualizing a decision tree
1h ps://www.ibm.com/support/knowledgecenter/en/SS3RA7_15.0.0/
com.ibm.spss.modeler.help/nodes_treebuilding.htm
1 h p://sco .fortmann-roe.com/docs/BiasVariance.html
1 h p://sco .fortmann-roe.com/docs/BiasVariance.html
Sergey Fogelson
VP of Analytics, Viacom
Boosting overview
Not a speci c machine learning algorithm
1 h ps://xgboost.readthedocs.io/en/latest/model.html
Accuracy: 0.88315
Sergey Fogelson
VP of Analytics, Viacom
When to use XGBoost
You have a large number of training samples
Greater than 1000 training samples and less 100 features
Computer vision