0% found this document useful (0 votes)
2 views

Presentation

Big Data encompasses large and complex datasets that require advanced tools for processing and analysis, characterized by the 5 Vs: Volume, Velocity, Variety, Veracity, and Value. The document outlines various data collection methods, storage solutions, processing techniques, and visualization tools, alongside an introduction to Machine Learning, including its types and algorithms. It emphasizes the importance of data preparation and model performance measurement in both supervised and unsupervised learning contexts.

Uploaded by

chedli chaaben
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Presentation

Big Data encompasses large and complex datasets that require advanced tools for processing and analysis, characterized by the 5 Vs: Volume, Velocity, Variety, Veracity, and Value. The document outlines various data collection methods, storage solutions, processing techniques, and visualization tools, alongside an introduction to Machine Learning, including its types and algorithms. It emphasizes the importance of data preparation and model performance measurement in both supervised and unsupervised learning contexts.

Uploaded by

chedli chaaben
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

AI , Machine

Learning and Big


Data
Big Data :
Definition :

Big Data refers to extremely large and complex datasets that cannot
be effectively processed using traditional data management tools. It
involves collecting, storing, processing, and analyzing vast amounts of
structured, semi-structured, and unstructured data to extract
meaningful insights.

Key Characteristics of Big Data (The 5 Vs) :


1️⃣Volume – Massive amounts of data generated every second.
2️⃣ Velocity – Data is produced and processed at high speed (real-time
processing).
3️⃣Variety – Data comes in multiple formats (text, images, videos...).
4️⃣Veracity – Ensuring data accuracy and reliability.
5️⃣Value – Extracting meaningful insights for decision-making.
Collection process :
#2 SOCIAL #3 SURVEYS AND
#1 DATA MEDIA
MINING FEEDBACK
ANALYTICS FORMS
Involves extracting valuable Involves analyzing data from social Used to collect direct input from
information from large datasets to media platforms to understand user individuals on various topics .
identify patterns, correlations, and behavior, sentiments, and trends. It They provide structured data that
trends for decision-making. provides insights into audience can be analyzed to understand
preferences and content preferences, opinions, and
engagement. experiences.
Data types that can be obtained:
• Structured data: Spreadsheets, Data types that can be obtained:
databases Data types that can be obtained:
• Text: Posts, comments, tweets • Quantitative data: Ratings,
• Unstructured data: Text, • Unstructured data: Images and
images, audio scores, numerical responses
videos • Qualitative data: Open-ended
• Semi-structured data: XML, • User interactions: Likes, shares,
JSON text responses
follows • Demographic data: Age,
• Time-series data: Stock prices, • Demographic data: Age,
sensor readings gender, occupation
gender, location • Psychographic data: Interests,
• Spatial data: Geographic • Sentiment data: Positive,
information systems data values, lifestyle
negative, neutral sentiments
#5 IOT, SENSORS, AND #6 USAGE OF PUBLIC
TELEMETRY DEVICES RECORDS AND
DATABASES
Used to collect real-time data from Public records and databases are
connected devices, such as sources of structured data maintained
smartwatches, industrial sensors, or by government agencies, public
GPS trackers. This data is valuable for institutions, and private organizations.
predictive maintenance, energy These records include demographic
management, environmental data, economic indicators, health
monitoring, and improving operational statistics, legal documents, and
efficiency. geographic information.

Data types that can be obtained: Data types that can be obtained:
• Sensor data: Temperature, • Demographic data: population
humidity, pressure statistics
• Telemetry data: Location, • Economic data: Employment
speed, acceleration rates, GDP figures
• Operational data: Machine • Health data: Disease prevalence,
status, energy consumption healthcare outcomes
• Environmental data: Air quality, • Geographic data: Maps, spatial
soil moisture datasets
Open source Tools

web scraping tools:


IoT & Sensors Social Media
• Beautiful Soup • Node-RED Analytics
• Scrapy • Tweepy
• Eclipse Kura
• SocialFeedManager
• Puppeteer • ThingsBoard
• OSINT Framework
• Selenium • Apache NiFi
• Logstash
• Apache Flume • OpenHAB
• Apache Kafka
• Apache NiFi • Prometheus
• Mediacloud
• Logstash • Apache Kafka

Surveys & Data Mining Databases & Public


Feedback
• LimeSurvey Records
• Scrapy
• Google Forms
• Apache Nutch • CKAN
• OpenDataKit (ODK)
• Weka • OpenRefine
• SurveyJS
• Orange • Datasette
• Apache NiFi
• RapidMiner • Apache NiFi
• KoBoToolbox
• Metabase
• Apache Flume
Storage process :
Storing data in distributed file systems or databases optimized for big data

Types of Storage :

• Relational Databases (RDBMS) : Data is stored in tables with a fixed schema.


• NoSQL Databases : Flexible storage for semi-structured or unstructured data.
• Data Lakes : Storing raw data in its native format for future processing.
• Distributed File Systems : Splitting data across multiple storage nodes .

Open-source tools :

POSTGRESQL MYSQL MONGODB CASSANDRA HDFS


Distributed NoSQL Part of Hadoop
Powerful, open- Popular open-source NoSQL database for
database designed for ecosystem, used for
source relational RDBMS for flexible, document-
handling large distributed storage of
database system. transactional data. based storage.
amounts of data large files.
Processing process :
Before analyzing data, it must be processed and cleaned to ensure accuracy and usability .

Key Steps in Processing :


• Data Cleaning → Handle missing values, remove duplicates, and correct inconsistencies.
• Data Transformation → Normalize, standardize, or encode categorical variables.
• Feature Engineering → Create new meaningful features to improve model performance.
• Data Reduction → Apply dimensionality reduction techniques like PCA or feature selection.
• Exploratory Data Analysis (EDA) → Use statistics and visualizations to identify patterns and relationships.

Open-source tools :

APACHE SPARK APACHE HADOOP APACHE FLINK AIRFLOW DASK


Fast, in-memory data Framework for Stream processing Workflow automation Parallel computing
distributed storage for scheduling and for big data
processing engine. framework.
and processing monitoring data processing in
(MapReduce). pipelines. Python.
Visualisatio 1
n GRAFANA
Tools : used for monitoring
and visualizing time-
series data. It's
widely used with
time-series
databases

2
TABLEAU
PUBLIC
a free version that
allows you to
create and share
visualizations
online
1
POWER BI
allows users to
create reports and
visualizations on
their local machines.

3
LOOKER
used for exploring,
analyzing, and
sharing real-time
business analytics.
5
KIBANA
Allows users to visualize
and explore data stored
in Elasticsearch using
interactive dashboards,
graphs, charts, maps,
and tables
Which visualization tool should I
choose for my needs?
Machine
Learning
Definition:
Machine Learning (ML) is a subset of Artificial
Intelligence that focuses on developing algorithms that
enable computers to learn and make decisions from
data without being explicitly programmed.

Types of ML:

01 Supervised Learning

02 Unsupervised Learning

03 Reinforcement Learning
1- Supervised
Learning
Supervised learning is when a model is trained on
labeled data, meaning it knows the correct answer
during training .

-> Labeled data is data that has been tagged with a


correct answer or classification.

Supervised learning, has the presence of a supervisor


as a teacher.
After that, the machine is provided with a new set of
examples so that the supervised learning algorithm
analyses the training data and produces a correct
outcome from labeled data.

Types of supervised learning:


• Classification
• Regression
Classification :
Classification teaches a machine to sort things into categories.
The model learns from labeled examples and predicts the class
of new data (e.g., email spam vs. non-spam).

Classification
Algorithms:
There are two types of learners in machine learning classification:

Eager learners :
Eager learners are ML algorithms that build a model during
the training phase and are ready to make predictions
immediately when the training is complete. These
algorithms "eagerly" build the entire model before being
presented with any new data.

• Logistic Regression.
• Support Vector Machine.
• Decision Trees.
Logistic Regression.
Logistic Regression is a statistical method used for binary
classification problems. It predicts the probability that a given
input belongs to a specific class (usually labeled as 0 or 1) based
on one or more input features.

Types of Logistic Regression :

1.Binomial : It can be only two possible types of the dependent


variables, such as 0 or 1, Pass or Fail, etc.
2.Multinomial : It can be 3 or more possible unordered types of
the dependent variable, such as “cat”, “dogs”, or “sheep”
3.Ordinal : It can be 3 or more possible ordered types of
dependent variables, such as “low”, “Medium”, or “High”.
Support Vector Machine
(SVM)
Support Vector Machine (SVM) is a ML method used
to classify data into categories. It finds the best line
or boundary (hyperplane) that divides the data into
two groups.

Decision Tree
A decision tree is a ML method used for classification
or regression. It works like a flowchart, where each
step (called a node) asks a question or makes a
decision to split the data into groups until it reaches a
final decision.
Lazy learners :

Lazy learners or instance-based learners, don’t create any model immediately from the training data, and this is where
the lazy aspect comes from. They just memorize the training data, and each time there is a need to make a prediction,
they search for the nearest neighbor from the whole training data, which makes them very slow during prediction.

• K-Nearest Neighbor(KNN).
• Case-based reasoning.

K-Nearest Neighbor
(KNN)
The KNN algorithm predicts the label or value for a new data
point by looking at the closest K data points (neighbors) in the
training dataset.

Case-based reasoning (CBR)

CBR is an AI technique where solutions to new problems are


derived by finding and adapting solutions from similar past
problems. It mimics human decision-making by recalling past
experiences and applying them to current situations.
Regression :
Regression is used to predict a continuous numerical value based on one or more input features
(independent variables). It helps establish a relationship between input and output variables.
Unlike classification, which assigns categories, regression estimates a real number.

Types of
Regression
1. Simple Linear Regression
This assumes that there is a linear relationship between the independent and dependent variables. This means that the change in

based on its size. Formula: 𝑌 = a𝑋 + 𝑏


the dependent variable is proportional to the change in the independent variables. For example predicting the price of a house

2. Multiple Linear Regression


Multiple linear regression extends simple linear regression by using multiple independent variables to predict target variable.

Formula: 𝑌 = 𝑏₀ + 𝑏₁𝑋₁ + 𝑏₂𝑋₂ + ... + 𝑏𝑛𝑋𝑛


For example predicting the price of a house based on multiple features such as size, number of rooms, etc.

3. Polynomial Regression
It’s used to model with non-linear relationships between the dependent variable
and the independent variables. It adds polynomial terms to the linear regression
model to capture more complex relationships.

over time . Formula: 𝑌 = 𝑏₀ + 𝑏₁𝑋 + 𝑏₂𝑋² + 𝑏₃𝑋³ + ...


For example when we want to predict a non-linear trend like population growth
4. Support Vector Regression (SVR)
SVR is based on the principles of Support Vector Machines (SVM).
It is used for predicting continuous values (regression tasks) while
maintaining the basic concept of maximizing the margin between
the data points and the decision boundary.

5. Ridge and Lasso Regression


Ridge & lasso regression are regularized versions of linear
regression that help avoid overfitting by penalizing large
coefficients. When there’s a risk of overfitting due to too many
features we use these type of regression algorithms.

6. Decision tree and Random forest


Uses tree-based models to predict continuous values.
🔹 Decision Tree Regression → Splits data into smaller parts recursively.
🔹 Random Forest Regression → Uses multiple decision trees for better
accuracy.
Before you use supervised learning

Requirements before performing supervised learning :

• No missing values
• Data in numeric format
• Data stored as pandas DataFrames or Series, or NumPy arrays.

→ Prevents biased or incorrect model predictions.


→ Ensures compatibility with machine learning algorithms
→ Optimizes performance and enables efficient computations.
Measuring model performance :

In classification , accuracy is a commonly used metric :


2- Unsupervised Learning
Unsupervised learning is a type of ML where the model is trained on unlabeled data. This means that the
input data does not come with predefined labels or target outputs. The model's objective is to find hidden
patterns, structures, or relationships in the data by itself.
Unlike supervised learning, where the goal is to predict an output based on input-output pairs, unsupervised
learning aims to identify the underlying structure of the data.

Types of unsupervised learning:


• Clustering
• Dimensionality Reduction
• Anomaly Detection
Clustering
Clustering is a technique that involves grouping similar data points together based on their features, without any
labeled data. The main goal of clustering is to partition the data into distinct groups (called clusters), where the
data points within each cluster are more similar to each other than to those in other clusters.

Types of
Clustering :
1) K-MEANS :
It divides the data into a fixed number (K) of clusters based on the
similarity of the data points.

1.Choose the number of clusters (K)


2.Initialize K centroids randomly : The algorithm picks K points randomly from the dataset as initial cluster centers.
3.Assign each data point to the nearest centroid.
4.Update centroids: The centroids are recalculated by taking the mean of all the points assigned to that cluster.
5.Repeat steps 3 and 4: The algorithm continues iterating until:
⚬ The centroids do not change significantly.
⚬ A predefined number of iterations is reached.
2) Hierarchical Clustering
It creates a hierarchy of clusters by either merging smaller clusters (agglomerative) or dividing a large cluster
into smaller clusters (divisive).

There are two main types of hierarchical clustering:


1.Agglomerative Clustering (Bottom-Up Approach):
⚬ Starts by treating each data point as a single cluster.
⚬ Iteratively merges the closest pairs of clusters until all points are in one cluster.
2.Divisive Clustering (Top-Down Approach):
⚬ Starts with all data points in one cluster.
⚬ Iteratively splits the cluster into smaller clusters until each point is in its own cluster.

The algorithm works as follows:


3.Initialize : Treat each data point as a single cluster.
4.Compute Distance Matrix : Calculate the distance between all pairs of clusters .
5.Merge Closest Clusters : Merge the two closest clusters into a single cluster.
6.Update Distance Matrix : Recalculate the distances between the new cluster and the remaining clusters.
7.Repeat: Repeat steps 3 and 4 until all data points are in one cluster.
3) DBSCAN (Density-based Spatial Clustering of Applications with Noise):
Unlike algorithms like K-means, which force data to group around fixed centroids, DBSCAN groups points based on their
density and can also detect noise (points that don't belong to any cluster).

It classifies points into three categories:


1.Core Points: Points with at least MinPts (minimum points) within a distance ε (epsilon).
2.Border Points: Points that have fewer than MinPts neighbors but are within ε of a core point.
3.Noise Points (Outliers): Points that do not belong to any cluster.

Steps in DBSCAN Algorithm:


4.Select a random point P.
5.Find all points within distance ε from P.
⚬ If P has at least MinPts neighbors, it becomes a core point and forms a new cluster.
⚬ If P has fewer than MinPts neighbors, it is marked as noise (may later become part of a cluster).
6.Expand the cluster by adding all density-connected points.
7.Repeat the process until all points are either clustered or marked as noise.

DBSCAN Parameters
🔹 ε (epsilon): Maximum distance between two points to be considered neighbors.
🔹 MinPts (minimum points): Minimum number of points required to form a dense cluster.
Dimensionality
Reduction
It’s the process of reducing the number of features (variables) in a dataset while preserving as much important information as possible.
It helps to :
• Improve computational efficiency.
• Reduce overfitting by eliminating redundant features.
• Enhance visualization (especially for high-dimensional data).

Types of Dimensionality Reduction


1) Feature Selection : Keeps only the most important features and removes the rest.
• Methods :
⚬ Filter Methods : choose the most important features from the data based on simple statistical measures (like how much they
vary or their correlation with the target). This is done before training the model.
⚬ Wrapper Methods: evaluate feature subsets by training and testing a model to find the best combination for accurate
predictions.
⚬ Embedded Methods automatically selects important features during the training process.
2) Feature Extraction : Transforms existing features into a smaller set of new features that still contain useful information.
• Methods:
⚬ Principal Component Analysis (PCA) : Reduces data while keeping the most important features. Useful when features are highly
correlated
⚬ Linear Discriminant Analysis (LDA) : Similar to PCA but focuses on class separation.
⚬ t-SNE : Useful for visualizing high-dimensional data in 2D or 3D., but not for making predictions. Used in facial recognition to
group similar-looking faces together.
⚬ Autoencoders (deep learning approach) Neural networks that learn a compressed version of the data
Anomaly detection
Anomaly detection is used to identify unusual patterns or outliers in data that deviate significantly from the majority of the
dataset. These techniques are commonly applied in fraud detection, network security, and other areas where unusual
behaviors or events need to be flagged

Types of Outliers:
• Univariate outliers exist in a single variable in isolation. They are extreme or abnormal values that deviate from the
typical range of values for that specific feature.
• Multivariate outliers are found by combining the values of multiple variables at the same time.

For univariate outlier detection, the most popular methods are:


1.Z-score (standard score): Measures how many standard deviations a data point is away from the mean. Generally,
instances with a z-score over 3 are chosen as outliers.
2.Interquartile range (IQR): The IQR is the difference between the third quartile (Q3) and the first quartile (Q1) of the
data.. The outlier detection rule is: Lower bound: Q1−1.5×IQR / Upper bound: Q3+1.5×IQR / Any value outside of this
range is considered an outlier.
3.Modified z-scores: They are similar to regular Z-scores but are more robust to outliers. They use the median instead
of the mean and the Median Absolute Deviation (MAD) instead of the standard deviation to calculate how far a data
point is from the center of the distribution.
4.ARIMA: Used for forecasting time-series data. If the actual value is very different from the predicted one, it's an
anomaly.
For multivariate outliers :
1.Isolation Forest: uses a collection of isolation trees (similar to decision trees) that recursively divide complex datasets
until each instance is isolated. The instances that get isolated the quickest are considered outliers.
2.Local Outlier Factor (LOF): LOF measures the local density deviation of a sample compared to its neighbors. Points
with significantly lower density are chosen as outliers.
3.Clustering techniques: techniques such as k-means or hierarchical clustering divide the dataset into groups. Points
that don’t belong to any group or are in their own little clusters are considered outliers.
4.Angle-based Outlier Detection (ABOD): ABOD measures the angles between individual points. Instances with odd
angles can be considered outliers.
Reinforcement
Learning
RL is a type of ML where an agent learns to make decisions by interacting with an environment to maximize a reward. Instead of
learning from labeled data, RL learns through trial and error, receiving feedback in the form of rewards or penalties.

Key Components of RL:


1.Agent → The learner or decision-maker (e.g., a robot, AI player in a game).
2.Environment → Everything the agent interacts with (e.g., a game, real world).
3.State (S) → The current situation of the agent in the environment.
4.Action (A) → The possible moves the agent can take.
5.Reward (R) → Feedback signal for each action (positive for good, negative for bad).
6.Policy (π) → The strategy the agent follows to decide actions.
7.Value Function (V) → Measures how good a state is for future rewards.
8.Q-Value (Q) → Measures the expected reward for taking an action in a given state.

How RL Works:
9.The agent observes the state of the environment.
10.It chooses an action based on its current policy.
11.The environment responds by changing the state and giving a reward.
12.The agent updates its knowledge (policy) to maximize future rewards.
13.The process repeats until the agent learns the best strategy
Types of Reinforcement Learning :
1.Model-Free RL : The agent learns through trial and error without knowing the environment's exact rules.
⚬ Examples: Q-Learning, Deep Q-Networks (DQN).
2.Model-Based RL : The agent builds a model of the environment and uses it to make decisions.
⚬ Examples: Monte Carlo Tree Search (MCTS).
3.On-Policy vs. Off-Policy Learning :
⚬ On-Policy: Learns from the actions it takes (e.g., SARSA).
⚬ Off-Policy: Learns from different policies (e.g., Q-Learning).

Popular Algorithms in RL :
4.Q-Learning : A value-based method that learns the optimal action-value function Q(s,a).
5.Deep Q-Networks (DQN) : An extension of Q-learning that uses neural networks to approximate the Q-
function, enabling it to handle high-dimensional inputs like images.
6.Policy Gradient Methods : Algorithms like REINFORCE and Actor-Critic learn policies directly by optimizing
the expected return.
7.Proximal Policy Optimization (PPO) : A popular policy gradient algorithm known for its stability and
efficiency.
8.Monte Carlo Tree Search (MCTS) : Often used in combination with RL for games like Go and Chess.
Deep Learning
Definition:
DL is a subset of ML, which itself is a subfield of AI. It involves the
use of artificial neural networks to model and solve complex
problems. These neural networks are inspired by the structure and
function of the human brain, consisting of layers of interconnected
nodes (neurons) that process data.

Types of Neural Networks


• Feedforward Neural Networks (FNNs): The simplest type, where data
flows in one direction from input to output.
• Convolutional Neural Networks (CNNs): Specialized for image and
video processing. They use convolutional layers to automatically and
adaptively learn spatial hierarchies of features.
• Recurrent Neural Networks (RNNs): Used for sequential data like
time series or text. They have loops that allow information to persist
across time steps.
• Long Short-Term Memory Networks (LSTMs): A type of RNN designed
to handle long-term dependencies in sequences.
• Transformers: Introduced for natural language processing (NLP), they
use self-attention mechanisms to weigh the importance of different
parts of the input sequence.
Training Process

• Forward Propagation : Input data is passed through the


network layer by layer to produce an output.

• Loss Function : Measures the difference between the


predicted output and the true output. The goal of training is to
minimize this loss, so the predictions get closer to the actual
values. Common loss functions include mean squared error
(MSE) for regression tasks and cross-entropy for classification
tasks.

• Backpropagation : The error is propagated backward


through the network to adjust the weights of the neurons. This
is done using gradient descent or its variants (e.g., stochastic
gradient descent, Adam).

• Optimization Algorithms : Techniques like gradient


descent, momentum, and adaptive learning rate methods
(e.g., RMSProp, Adam) are used to minimize the loss function.
Recommendatio
n systems
Definition:
Recommendation systems are algorithms designed to suggest
relevant items to users, such as products, movies, music, or content,
based on their preferences, behavior, or similar users' data.

Types of Recommendation Systems


1)Collaborative Filtering :
It recommends items to a user based on what other similar users have
liked or interacted with. The core idea is that people who agreed in the
past are likely to agree again in the future.
• User-based Collaborative Filtering recommends items based
on the behavior of similar users.
• Item-based Collaborative Filtering recommends items that are
similar to items the user has liked.
There are 2 main kinds of collaborative filtering systems: memory-based and
model-based.

Memory-based :
Memory-based systems represent users and items as a matrix. They are an
extension of the k-nearest neighbors (KNN) algorithm because they aim to find
their “nearest neighbors,” which can be similar users or similar items.

Model-based :
One of the most commonly used model-based collaborative filtering algorithms
is matrix factorization. This dimensionality reduction method decomposes the
user-item matrix into two smaller matrices—one for users and another for
items. The 2 matrices are then multiplied together to predict the missing values
(or the recommendations) in the larger matrix.

2) Content-based :
Content-based filtering recommends items by comparing the features of items
with a user's profile or preferences. It focuses on the attributes of the items
themselves.

• Example: If a user frequently watches action movies, the system


recommends other action movies with similar characteristics (e.g., director,
actors, genre).
3) Hybrid :
Combines both collaborative filtering and content-based filtering methods to overcome the limitations of each by
combining the strengths of both approaches. Provides more tailored recommendations by considering both user
behavior and item features.

4)Context-Aware Filtering :
It’s a recommendation system approach that takes into account the context in which a
user is making a decision or interacting with a system. The context could refer to various
factors such as time, location, device, or mood.
While traditional recommendation systems focus mainly on user preferences and item
attributes, context-aware systems consider the surrounding conditions to refine the
recommendations further.

5) Deep Neural Network Models for Recommendation :


1. Deep Neural Networks (DNN) : Learns patterns in user-item interactions using a
multi-layered neural network.
• How it works:
⚬ Converts users and items into numerical vectors (embeddings).
⚬ Passes them through multiple layers of a neural network.
⚬ Predicts how much a user will like an item.
• Used for: General recommendation tasks.
2. Neural Collaborative Filtering (NCF) : Replaces traditional collaborative filtering with a deep learning approach.
• How it works:
⚬ Users and items are mapped to embeddings.
⚬ A neural network learns complex interactions between them.
⚬ Outputs a recommendation score.
• Used for: Personalized recommendations based on user-item interactions.

3. Autoencoders (Variational Autoencoders - VAE) : Learns hidden patterns in user preferences and reconstructs missing
data.
• How it works:
⚬ Compresses (encodes) user preferences into a smaller representation.
⚬ Reconstructs (decodes) preferences to fill in missing interactions.
• Used for: Handling missing data (e.g., cold start problems).

4. Recurrent Neural Networks (RNN - LSTM, GRU) : Models user behavior over time.
• How it works:
⚬ Takes a sequence of user interactions (e.g., past clicks).
⚬ Uses LSTM (Long Short-Term Memory) or GRU (Gated Recurrent Units) to remember past preferences.
⚬ Predicts the next item the user might like.
• Used for: Sequential recommendations (e.g., music, news).
5. Convolutional Neural Networks (CNN) : Extracts important features from content
(images, text, etc.).
• How it works:
⚬ Uses convolutional layers to analyze item characteristics.
⚬ Matches similar items based on extracted features.
• Used for: Content-based recommendations.

6. DLRM (Deep Learning Recommendation Model) :


DLRM is developed by Facebook (Meta). It is designed to efficiently handle large-
scale personalized recommendations by combining dense (numerical) and sparse
(categorical) features.

7.Recommendations based on popularity


Recommendations based on popularity refers to systems or methods that suggest items, content, or services to users based on
their popularity among a larger group. The idea is that items that are widely liked or used by many people are more likely to be
relevant or interesting to an individual.
These systems don't rely on the user's own preferences or past behavior, but rather on the collective behavior of others.

8.Recommendation by Clustering
Clustering-based recommendation is a technique in recommendation systems that groups users or items into clusters based on
their similarities. Once the clusters are formed, recommendations are made based on the preferences of other users or items in
the same cluster.
• Build vector representations of items Use techniques like TF-IDF
• Apply clustering algorithms to group similar users or items.
• Associate a user with a cluster of similar items
• Recommend items from the same cluster
THANK YOU
FOR YOUR INTEREST

You might also like