Home Assignment 1 - Cognitive Analytics CSBA 3009
Home Assignment 1 - Cognitive Analytics CSBA 3009
Data Visualizations:
Basics of Statistics:
1. How can you use statistical methods to identify and analyze patterns in large-scale cognitive
datasets?
2. Participants in a cognitive experiment are asked to choose between two options in a
decision-making task. How can you use a binomial distribution to model their choices and
analyze the factors influencing their decisions?
3. EEG data often follows a normal distribution. How can this be used to identify potential
outliers or anomalies in the data that might indicate cognitive dysfunction?
4. Explain the concept of Shannon entropy in the context of cognitive data. How can it be used
to measure the uncertainty or complexity of cognitive states, like attention or emotional
arousal?
5. How can the Central Limit Theorem be used to justify the use of parametric statistical tests
on cognitive data, even if the individual data points may not be normally distributed?
6. Researchers analyze reaction times from hundreds of participants in a visual attention task.
How can they leverage the Central Limit Theorem to estimate the population mean and
standard deviation of reaction times, even if the individual data follows a non-normal
distribution?
7. A study investigates the number of times participants press a button during a cognitive task.
It follows a Poisson distribution with a mean of 5 presses. Calculate the probability of a
participant pressing the button exactly 3 times.
Data Mining
1. Differentiate between data mining and traditional data analysis. How does data mining
specifically benefit cognitive computing applications?
2. Explain the concept of feature selection in data mining and its importance in building
effective cognitive computing models.
3. Describe two different data mining algorithms (e.g., decision trees, clustering) and explain
their suitability for specific cognitive computing tasks.
4. A study measures EEG data of 50 participants in three different cognitive states (relaxed,
focused, distracted). You apply K-means clustering with K=3. Calculate the within-cluster and
between-cluster sum of squares to evaluate the clustering quality.
Regression Analysis
1. Differentiate between linear and non-linear regression in the context of cognitive computing.
When would you choose each type of model?
2. Imagine a study measuring reaction times in a visual search task. You use linear regression to
predict reaction time based on the number of distractor items present. Interpret the
coefficients of your model and explain their meaning.
3. Explain how polynomial regression can be used to model the relationship between cognitive
performance and age in a large-scale study.
4. Explain the concept of residual analysis and its importance in evaluating the fit and
assumptions of a regression model.
Imagine a dataset with 1000 images labelled with six different emotions (happy, sad, angry, etc.) and
corresponding facial feature measurements (e.g., eyebrow position, mouth curvature). You want to
compare Random Forest and Logistic Regression for predicting emotion based on the features.
i. Split the data into training and testing sets. Train a Random Forest and a Logistic Regression
model on the training set to predict emotion based on facial features.
ii. Calculate accuracy, precision, recall, and F1-score for both models on the testing set.
Compare their performance and discuss which model is better suited for this task.
iii. Analyze the feature importance scores in the Random Forest model. Which features are most
important for predicting emotion?
An online learning platform collects data on student performance, learning styles, and engagement.
You want to use Random Forest and Logistic Regression to predict which learning materials are most
effective for each student.
i. Train both models to predict student performance on specific learning materials based on
their data. Evaluate their performance metrics like Root Mean Squared Error (RMSE).
ii. Compare the interpretability of both models. How easy is it to understand why a student is
predicted to perform well or poorly on specific material?
iii. Discuss the advantages and limitations of using Random Forest and Logistic Regression for
personalized learning applications.