Assignment II
Assignment II
Then we vectorize all the required features so as it can be fed as input to the
logistic regression model.
We define a logistic regression model and train the model on the train dataset.
We look at the area under ROC, accuracy and F1-score of our model
Use Sparkweb UI to determine which task take the most of your
execution time.
The fit command took the most time. It spawned 106 jobs with 126 stages. The maximum time in a
stage was 7 seconds as shown above.