Human Activity Recognition Using Smartphone Data
Human Activity Recognition Using Smartphone Data
import warnings
warnings.filterwarnings('ignore')
# Read the test data from a CSV file into a pandas DataFrame
test = pd.read_csv('dataset/test.csv')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7352 entries, 0 to 7351
Columns: 563 entries, tBodyAcc-mean()-X to Activity
dtypes: float64(561), int64(1), object(1)
memory usage: 31.6+ MB
In [7]: # Count the occurrences of each unique value in the 'subject' column of t
train['subject'].value_counts()
Out[7]: subject
25 409
21 408
26 392
30 383
28 382
27 376
23 372
17 368
16 366
19 360
1 347
29 344
3 341
15 328
6 325
14 323
22 321
11 316
7 308
5 302
8 281
Name: count, dtype: int64
Data Preprocessing
Checking for Duplicate values
In [9]: # Print the number of duplicate rows in the training dataset
print('Number of duplicates in train :', train.duplicated().sum())
There is almost same number of observtions across all the six activities, so we
can say that there is no imbalance.
Out[23]: count
fBodyAcc 79
fBodyGyro 79
fBodyAccJerk 79
tGravityAcc 40
tBodyAcc 40
tBodyGyroJerk 40
tBodyGyro 40
tBodyAccJerk 40
tBodyAccMag 13
tGravityAccMag 13
tBodyAccJerkMag 13
tBodyGyroMag 13
tBodyGyroJerkMag 13
fBodyAccMag 13
fBodyBodyAccJerkMag 13
fBodyBodyGyroMag 13
fBodyBodyGyroJerkMag 13
angle 7
subject 1
Activity 1
# Map a kernel density estimate (KDE) plot onto the FacetGrid for the col
# setting hist=False to plot a smooth curve, and add legend
facetgrid.map(sns.distplot, 'tBodyAccMag-mean()', hist=False).add_legend(
# Add a horizontal dashed line at y=0.08, spanning from 10% to 90% of the
plt.axhline(y=0.08, xmin=0.1, xmax=0.9, dashes=(3, 3))
# Create a scatter plot of the PCA-transformed data, with the first compo
# Color the points based on the 'Activity' column from the training datas
sns.scatterplot(x=pca[:, 0], y=pca[:, 1], hue=train['Activity'])
# Display the plot
plt.show()
# Create a scatter plot of the t-SNE-transformed data, with the first com
# Color the points based on the 'Activity' column from the training datas
sns.scatterplot(x=tsne[:,0], y=tsne[:,1], hue=train['Activity'])
# Separate the features from the target labels in the testing dataset
X_test = test.drop(['subject', 'Activity'], axis=1) # Features (independ
y_test = test['Activity'] # Target labels (dependent variable) for testi
Logistic Regression
cross validation model with hyperparameter tuning and
In [51]: # Define the parameters grid for tuning the logistic regression model
parameters = {'max_iter': [100, 200, 500]}
# Make predictions on the testing data using the tuned logistic regressio
y_pred_lr = lr_classifier_rs.predict(X_test)
In [53]: # Compute the accuracy score by comparing the predicted labels (y_pred_lr
lr_accuracy = accuracy_score(y_test, y_pred_lr)
Kernel SVM
Validation with Hyperparameter Tuning and Cross
In [58]: # Define the set of hyperparameters for tuning the SVM classifier
parameters = {
'kernel': ['linear', 'rbf', 'poly', 'sigmoid'],
'C': [100, 50]
}
# Print out the best estimator and the best set of parameters found by Ra
get_best_randomsearch_results(svm_rs)
Out[58]: ▸ RandomizedSearchCV
▸ estimator: SVC
▸ SVC
In [59]: # Make predictions on the testing data using the tuned SVM model
y_pred_svm = svm_rs.predict(X_test)
In [60]: # Compute the accuracy score by comparing the predicted labels (y_pred_sv
svm_accuracy = accuracy_score(y_test, y_pred_svm)
# Print out the best estimator and the best set of parameters found by Ra
get_best_randomsearch_results(dt_classifier_rs)
Out[78]: ▸ RandomizedSearchCV
▸ estimator: DecisionTreeClassifier
▸ DecisionTreeClassifier
In [79]: # Make predictions on the testing data using the tuned Decision Tree mode
y_pred_dt = dt_classifier_rs.predict(X_test)
In [80]: # Compute the accuracy score by comparing the predicted labels (y_pred_dt
dt_accuracy = accuracy_score(y_test, y_pred_dt)
# Print the accuracy score obtained using the Decision Tree model
print('Accuracy Score Using Decision Tree :', round(dt_accuracy, 2))
# Print out the best estimator and the best set of parameters found by Ra
get_best_randomsearch_results(rf_classifier_rs)
Out[83]: ▸ RandomizedSearchCV
▸ estimator: RandomForestClassifier
▸ RandomForestClassifier
In [84]: # Make predictions on the testing data using the tuned Random Forest mode
y_pred_rf = rf_classifier_rs.predict(X_test)
In [85]: # Compute the accuracy score by comparing the predicted labels (y_pred_rf
rf_accuracy = accuracy_score(y_test, y_pred_rf)
# Print the accuracy score obtained using the Random Forest model
print('Accuracy Score Using Random Forest :', round(rf_accuracy, 2))
--------------------------------------
Accuracy Scores for all the models :
--------------------------------------
Logistic Regression : 0.96
Support Vector Classifier : 0.97
Decision Tree Classifier : 0.85
Random Forest Classifier : 0.92
--------------------------------------
Conclusion:
Inbest-performing
summary, the Support
model VectoronClassifier
based the appears
provided to be the
accuracy
scores. However, further evaluation metrics such
precision, recall, and F1-score could provide additional as
insights
scenarios into the models'
with and
imbalanced performance, especially
classes orAdditionally, in
different priorities for
false positives
toof consider the false negatives.
computational complexity and it's essential
interpretability
each model
deployment. when selecting the final model for
In [ ]: