ai
ai
1. Stock Prices Predictor: Build a model that predicts stock prices based on historical
data.
Key Concepts: Time series analysis, regression models.
2. Sentiment Analyzer: Create a tool to analyze customer sentiment from social media
posts or product reviews.
Key Concepts: Natural Language Processing (NLP), text classification.
3. Movie Ticket Price Predictor: Develop a system to predict movie ticket prices based
on factors such as demand, release date, and competing movies.
Key Concepts: Regression analysis, feature engineering.
4. Student Results Predictor: Use previous student performance data to predict exam
results.
Key Concepts: Classification, regression.
5. Human Activity Recognition: Classify human activities (like walking or running)
based on smartphone sensor data.
Key Concepts: Time series data, classification models.
1. Classification: The goal is to categorize data into predefined classes (e.g., spam or not
spam).
2. Regression: Here, you predict continuous values (e.g., house prices, stock prices).
3. Clustering: Grouping data points into clusters based on similarity (e.g., customer
segmentation).
4. Anomaly Detection: Detect outliers or unusual patterns in data (e.g., fraud detection in
banking transactions).
5. Recommendation: Suggest items or actions based on patterns (e.g., recommending
products on an e-commerce site).
1. Business Understanding: Define the business objectives and identify how AI can
address them.
2. Data Understanding: Gather relevant data and explore it to gain insights.
3. Data Preparation: Clean and preprocess the data to make it ready for modeling.
4. Modeling: Select and apply the right algorithms to build your model.
5. Evaluation: Measure the model’s performance using relevant metrics.
6. Deployment: Implement the model in a production environment to start using it for
predictions.
7. Monitoring: Continuously monitor the model to ensure it remains accurate over time.
Interpretation:
A lower MSE indicates better model performance; it means the predictions are closer to the
actual values.
Since errors are squared, larger errors have a disproportionately large effect on MSE,
making it sensitive to outliers.
Example:
Suppose we’re predicting the test scores of students based on the number of hours they studied.
Interpretation:
The Mean Squared Error is 9.2, indicating the average squared difference between the
predicted and actual test scores.
2. Root Mean Squared Error (RMSE)
Definition:
RMSE is the square root of the MSE. It provides the error metric in the same units as the
target variable, making it more interpretable.
Formula:
Example:
Using the MSE calculated above:
1. Calculate RMSE:
RMSE = (9.2)1/2 ≈3.033
Interpretation:
The RMSE of approximately 3.033 means that, on average, the model’s predictions are
about 3 points off from the actual test scores.
Practical Example: House Price Prediction
Let’s consider a real-world scenario where we’re predicting house prices based on features like
size, location, and number of bedrooms.
A 250
B 300
C 150
D 400
E 350
Calculating MSE:
1. Calculate the squared errors:
House A: (240 – 250)² = (-10)² = 100
House B: (310 – 300)² = (10)² = 100
House C: (145 – 150)² = (-5)² = 25
House D: (390 – 400)² = (-10)² = 100
House E: (360 – 350)² = (10)² = 100
2. Sum the squared errors:
Total = 100 + 100 + 25 + 100 + 100 = 425
3. Calculate MSE:
MSE = 425/5 = 85
Calculating RMSE:
1. Calculate RMSE:
RMSE= (85)1/2 ≈ 9.22
Interpretation:
MSE: The average squared error is $85,000,000 (since prices are in thousands, and errors
are squared).
RMSE: On average, the model’s predictions are off by about $9,220. This is more
interpretable and indicates the typical prediction error in the same units as house prices.
Key Points:
Use RMSE for Interpretability: Since RMSE is in the same units as the target
variable, it is often more intuitive when communicating model performance.
Sensitivity to Outliers: Both MSE and RMSE are sensitive to large errors due to the
squaring of differences. This means that a few large errors can significantly increase the
MSE and RMSE.
Comparing Models: These metrics are useful for comparing different models. A model
with a lower RMSE is generally preferred over one with a higher RMSE on the same
dataset.
Additional Example: Predicting Student Grades
Imagine you are developing a model to predict students’ final exam scores based on their
midterm scores.
1 80 85
2 70 78
3 90 92
4 60 65
5 75 80
Calculating MSE:
1. Calculate the squared errors:
Student 1: (83 – 85)² = 4
Student 2: (76 – 78)² = 4
Student 3: (88 – 92)² = 16
Student 4: (68 – 65)² = 9
Student 5: (79 – 80)² = 1
2. Sum the squared errors:
Total = 4 + 4 + 16 + 9 + 1 = 34
3. Calculate MSE:
MSE = 34/5 = 6.8
Calculating RMSE:
1. Calculate RMSE:
RMSE = (6.8)1/2 ≈ 2.607
Interpretation:
MSE: The average squared error is 6.8.
RMSE: The model’s predictions are, on average, about 2.6 points away from the actual
final exam scores.
This level of error might be acceptable or not, depending on the context (e.g., grading scale,
importance of precise predictions).
Understanding the Impact of Outliers
Suppose in the previous example, one student’s predicted final score was significantly off:
Updated RMSE:
RMSE = (100.4)1/2 ≈ 10.02
Interpretation:
The RMSE increased from approximately 2.6 to 10 due to one large error.
This demonstrates how sensitive MSE and RMSE are to outliers.
Practical Considerations:
Data Cleaning: Always check for outliers or errors in your data that might
disproportionately affect your error metrics.
Model Improvement: If the RMSE is high, consider improving your model by:
Adding more relevant features.
Using a different modeling technique.
Performing feature engineering.
Comparing Models: Use RMSE and MSE to compare different models or versions of a
model to choose the one with the best predictive performance.
For students, this life cycle forms the foundation of practical AI projects. Understanding each
phase in detail and practicing hands-on implementation through examples will prepare them for
building AI systems.
AI Development Platforms
Students are encouraged to explore a range of AI development platforms, each offering specific
features for different stages of the AI model life cycle:
Engages audiences: Stories transport listeners to different places and times, creating a
shared experience.
Facilitates cross-cultural understanding: Storytelling helps to establish a sense
of belonging and identity, making it particularly valuable in diverse cultural contexts.
Makes information relatable: When data is presented as a story, it becomes more
relatable and easier to grasp, reducing the ambiguity that often accompanies raw data.
Example:
Consider indigenous cultures where stories are used to pass down history and moral lessons.
Similarly, in data science, storytelling helps make data more understandable by linking it to real-
world scenarios.
Step 3: Examine data relationships: Find connections between data points. For
example, a dataset showing an increase in student grades might relate to a new learning
tool introduced in the class.
Step 4: Create a simple narrative: Develop a storyline that includes conflict or a
challenge that needs resolution, such as students’ declining interest in a subject and how a
new teaching method solved the problem.
Example:
The spike in “excited” students from 19% to 38% post-intervention represents a positive
outcome, which could be attributed to interactive lessons or more hands-on learning
experiences.
Memorability: People are more likely to retain information presented as a narrative than
a set of facts or numbers.
Example:
Instead of saying “80% of students passed the exam,” framing it as “8 out of every 10 students
reached their academic goal” makes the information more relatable and memorable.
Example:
Consider a dataset showing student performance over the course of a year. If the data shows
that 70% of students improved their grades after implementing an AI-based learning tool, this
serves as the key fact for the story. However, without context, it’s just a number.
2. Visuals
Visuals help to make data more understandable and engaging. They can highlight trends,
outliers, and patterns that may not be obvious when looking at raw data. Common visuals include
charts, graphs, infographics, or maps.
Example:
In the example of student performance, instead of just stating that 70% of students improved,
a bar chart could be used to visually show the percentage of students who improved,
remained the same, or declined. A comparison of grades before and after the implementation of
the AI tool could be presented using side-by-side bar graphs.
Before Visualization:
50% scored between 50-70%.
20% scored between 70-90%.
10% scored above 90%.
After Visualization:
30% scored between 50-70%.
40% scored between 70-90%.
30% scored above 90%.
This visual clearly shows the improvement and makes the impact of the AI tool easy to grasp.
3. Narrative
The narrative ties the data and visuals together, providing context and meaning. It
explains why the data is important, what it means, and how it relates to the audience. The
narrative should guide the audience through the data in a logical and compelling way,
highlighting the key insights.
Example:
For the student performance data, the narrative could be: