0% found this document useful (0 votes)

2 views

Presentation

Big Data encompasses large and complex datasets that require advanced tools for processing and analysis, characterized by the 5 Vs: Volume, Velocity, Variety, Veracity, and Value. The document outlines various data collection methods, storage solutions, processing techniques, and visualization tools, alongside an introduction to Machine Learning, including its types and algorithms. It emphasizes the importance of data preparation and model performance measurement in both supervised and unsupervised learning contexts.

Uploaded by

chedli chaaben

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Presentation

Uploaded by

chedli chaaben

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 42

AI , Machine

Learning and Big

Data
Big Data :
Definition :

Big Data refers to extremely large and complex datasets that cannot
be effectively processed using traditional data management tools. It
involves collecting, storing, processing, and analyzing vast amounts of
structured, semi-structured, and unstructured data to extract
meaningful insights.

Key Characteristics of Big Data (The 5 Vs) :

1️⃣Volume – Massive amounts of data generated every second.
2️⃣ Velocity – Data is produced and processed at high speed (real-time
processing).
3️⃣Variety – Data comes in multiple formats (text, images, videos...).
4️⃣Veracity – Ensuring data accuracy and reliability.
5️⃣Value – Extracting meaningful insights for decision-making.
Collection process :
#2 SOCIAL #3 SURVEYS AND
#1 DATA MEDIA
MINING FEEDBACK
ANALYTICS FORMS
Involves extracting valuable Involves analyzing data from social Used to collect direct input from
information from large datasets to media platforms to understand user individuals on various topics .
identify patterns, correlations, and behavior, sentiments, and trends. It They provide structured data that
trends for decision-making. provides insights into audience can be analyzed to understand
preferences and content preferences, opinions, and
engagement. experiences.
Data types that can be obtained:
• Structured data: Spreadsheets, Data types that can be obtained:
databases Data types that can be obtained:
• Text: Posts, comments, tweets • Quantitative data: Ratings,
• Unstructured data: Text, • Unstructured data: Images and
images, audio scores, numerical responses
videos • Qualitative data: Open-ended
• Semi-structured data: XML, • User interactions: Likes, shares,
JSON text responses
follows • Demographic data: Age,
• Time-series data: Stock prices, • Demographic data: Age,
sensor readings gender, occupation
gender, location • Psychographic data: Interests,
• Spatial data: Geographic • Sentiment data: Positive,
information systems data values, lifestyle
negative, neutral sentiments
#5 IOT, SENSORS, AND #6 USAGE OF PUBLIC
TELEMETRY DEVICES RECORDS AND
DATABASES
Used to collect real-time data from Public records and databases are
connected devices, such as sources of structured data maintained
smartwatches, industrial sensors, or by government agencies, public
GPS trackers. This data is valuable for institutions, and private organizations.
predictive maintenance, energy These records include demographic
management, environmental data, economic indicators, health
monitoring, and improving operational statistics, legal documents, and
efficiency. geographic information.

Data types that can be obtained: Data types that can be obtained:
• Sensor data: Temperature, • Demographic data: population
humidity, pressure statistics
• Telemetry data: Location, • Economic data: Employment
speed, acceleration rates, GDP figures
• Operational data: Machine • Health data: Disease prevalence,
status, energy consumption healthcare outcomes
• Environmental data: Air quality, • Geographic data: Maps, spatial
soil moisture datasets
Open source Tools

web scraping tools:

IoT & Sensors Social Media
• Beautiful Soup • Node-RED Analytics
• Scrapy • Tweepy
• Eclipse Kura
• SocialFeedManager
• Puppeteer • ThingsBoard
• OSINT Framework
• Selenium • Apache NiFi
• Logstash
• Apache Flume • OpenHAB
• Apache Kafka
• Apache NiFi • Prometheus
• Mediacloud
• Logstash • Apache Kafka

Surveys & Data Mining Databases & Public

Feedback
• LimeSurvey Records
• Scrapy
• Google Forms
• Apache Nutch • CKAN
• OpenDataKit (ODK)
• Weka • OpenRefine
• SurveyJS
• Orange • Datasette
• Apache NiFi
• RapidMiner • Apache NiFi
• KoBoToolbox
• Metabase
• Apache Flume
Storage process :
Storing data in distributed file systems or databases optimized for big data

Types of Storage :

• Relational Databases (RDBMS) : Data is stored in tables with a fixed schema.

• NoSQL Databases : Flexible storage for semi-structured or unstructured data.
• Data Lakes : Storing raw data in its native format for future processing.
• Distributed File Systems : Splitting data across multiple storage nodes .

Open-source tools :

POSTGRESQL MYSQL MONGODB CASSANDRA HDFS

Distributed NoSQL Part of Hadoop
Powerful, open- Popular open-source NoSQL database for
database designed for ecosystem, used for
source relational RDBMS for flexible, document-
handling large distributed storage of
database system. transactional data. based storage.
amounts of data large files.
Processing process :
Before analyzing data, it must be processed and cleaned to ensure accuracy and usability .

Key Steps in Processing :

• Data Cleaning → Handle missing values, remove duplicates, and correct inconsistencies.
• Data Transformation → Normalize, standardize, or encode categorical variables.
• Feature Engineering → Create new meaningful features to improve model performance.
• Data Reduction → Apply dimensionality reduction techniques like PCA or feature selection.
• Exploratory Data Analysis (EDA) → Use statistics and visualizations to identify patterns and relationships.

Open-source tools :

APACHE SPARK APACHE HADOOP APACHE FLINK AIRFLOW DASK

Fast, in-memory data Framework for Stream processing Workflow automation Parallel computing
distributed storage for scheduling and for big data
processing engine. framework.
and processing monitoring data processing in
(MapReduce). pipelines. Python.
Visualisatio 1
n GRAFANA
Tools : used for monitoring
and visualizing time-
series data. It's
widely used with
time-series
databases

2
TABLEAU
PUBLIC
a free version that
allows you to
create and share
visualizations
online
1
POWER BI
allows users to
create reports and
visualizations on
their local machines.

3
LOOKER
used for exploring,
analyzing, and
sharing real-time
business analytics.
5
KIBANA
Allows users to visualize
and explore data stored
in Elasticsearch using
interactive dashboards,
graphs, charts, maps,
and tables
Which visualization tool should I
choose for my needs?
Machine
Learning
Definition:
Machine Learning (ML) is a subset of Artificial
Intelligence that focuses on developing algorithms that
enable computers to learn and make decisions from
data without being explicitly programmed.

Types of ML:

01 Supervised Learning

02 Unsupervised Learning

03 Reinforcement Learning
1- Supervised
Learning
Supervised learning is when a model is trained on
labeled data, meaning it knows the correct answer
during training .

-> Labeled data is data that has been tagged with a

correct answer or classification.

Supervised learning, has the presence of a supervisor

as a teacher.
After that, the machine is provided with a new set of
examples so that the supervised learning algorithm
analyses the training data and produces a correct
outcome from labeled data.

Types of supervised learning:

• Classification
• Regression
Classification :
Classification teaches a machine to sort things into categories.
The model learns from labeled examples and predicts the class
of new data (e.g., email spam vs. non-spam).

Classification
Algorithms:
There are two types of learners in machine learning classification:

Eager learners :
Eager learners are ML algorithms that build a model during
the training phase and are ready to make predictions
immediately when the training is complete. These
algorithms "eagerly" build the entire model before being
presented with any new data.

• Logistic Regression.
• Support Vector Machine.
• Decision Trees.
Logistic Regression.
Logistic Regression is a statistical method used for binary
classification problems. It predicts the probability that a given
input belongs to a specific class (usually labeled as 0 or 1) based
on one or more input features.

Types of Logistic Regression :

1.Binomial : It can be only two possible types of the dependent

variables, such as 0 or 1, Pass or Fail, etc.
2.Multinomial : It can be 3 or more possible unordered types of
the dependent variable, such as “cat”, “dogs”, or “sheep”
3.Ordinal : It can be 3 or more possible ordered types of
dependent variables, such as “low”, “Medium”, or “High”.
Support Vector Machine
(SVM)
Support Vector Machine (SVM) is a ML method used
to classify data into categories. It finds the best line
or boundary (hyperplane) that divides the data into
two groups.

Decision Tree
A decision tree is a ML method used for classification
or regression. It works like a flowchart, where each
step (called a node) asks a question or makes a
decision to split the data into groups until it reaches a
final decision.
Lazy learners :

Lazy learners or instance-based learners, don’t create any model immediately from the training data, and this is where
the lazy aspect comes from. They just memorize the training data, and each time there is a need to make a prediction,
they search for the nearest neighbor from the whole training data, which makes them very slow during prediction.

• K-Nearest Neighbor(KNN).
• Case-based reasoning.

K-Nearest Neighbor
(KNN)
The KNN algorithm predicts the label or value for a new data
point by looking at the closest K data points (neighbors) in the
training dataset.

Case-based reasoning (CBR)

CBR is an AI technique where solutions to new problems are

derived by finding and adapting solutions from similar past
problems. It mimics human decision-making by recalling past
experiences and applying them to current situations.
Regression :
Regression is used to predict a continuous numerical value based on one or more input features
(independent variables). It helps establish a relationship between input and output variables.
Unlike classification, which assigns categories, regression estimates a real number.

Types of
Regression
1. Simple Linear Regression
This assumes that there is a linear relationship between the independent and dependent variables. This means that the change in

based on its size. Formula: 𝑌 = a𝑋 + 𝑏

the dependent variable is proportional to the change in the independent variables. For example predicting the price of a house

2. Multiple Linear Regression

Multiple linear regression extends simple linear regression by using multiple independent variables to predict target variable.

Formula: 𝑌 = 𝑏₀ + 𝑏₁𝑋₁ + 𝑏₂𝑋₂ + ... + 𝑏𝑛𝑋𝑛

For example predicting the price of a house based on multiple features such as size, number of rooms, etc.

3. Polynomial Regression
It’s used to model with non-linear relationships between the dependent variable
and the independent variables. It adds polynomial terms to the linear regression
model to capture more complex relationships.

over time . Formula: 𝑌 = 𝑏₀ + 𝑏₁𝑋 + 𝑏₂𝑋² + 𝑏₃𝑋³ + ...

For example when we want to predict a non-linear trend like population growth
4. Support Vector Regression (SVR)
SVR is based on the principles of Support Vector Machines (SVM).
It is used for predicting continuous values (regression tasks) while
maintaining the basic concept of maximizing the margin between
the data points and the decision boundary.

5. Ridge and Lasso Regression

Ridge & lasso regression are regularized versions of linear
regression that help avoid overfitting by penalizing large
coefficients. When there’s a risk of overfitting due to too many
features we use these type of regression algorithms.

6. Decision tree and Random forest

Uses tree-based models to predict continuous values.
🔹 Decision Tree Regression → Splits data into smaller parts recursively.
🔹 Random Forest Regression → Uses multiple decision trees for better
accuracy.
Before you use supervised learning

Requirements before performing supervised learning :

• No missing values
• Data in numeric format
• Data stored as pandas DataFrames or Series, or NumPy arrays.

→ Prevents biased or incorrect model predictions.

→ Ensures compatibility with machine learning algorithms
→ Optimizes performance and enables efficient computations.
Measuring model performance :

In classification , accuracy is a commonly used metric :

2- Unsupervised Learning
Unsupervised learning is a type of ML where the model is trained on unlabeled data. This means that the
input data does not come with predefined labels or target outputs. The model's objective is to find hidden
patterns, structures, or relationships in the data by itself.
Unlike supervised learning, where the goal is to predict an output based on input-output pairs, unsupervised
learning aims to identify the underlying structure of the data.

Types of unsupervised learning:

• Clustering
• Dimensionality Reduction
• Anomaly Detection
Clustering
Clustering is a technique that involves grouping similar data points together based on their features, without any
labeled data. The main goal of clustering is to partition the data into distinct groups (called clusters), where the
data points within each cluster are more similar to each other than to those in other clusters.

Types of
Clustering :
1) K-MEANS :
It divides the data into a fixed number (K) of clusters based on the
similarity of the data points.

1.Choose the number of clusters (K)

2.Initialize K centroids randomly : The algorithm picks K points randomly from the dataset as initial cluster centers.
3.Assign each data point to the nearest centroid.
4.Update centroids: The centroids are recalculated by taking the mean of all the points assigned to that cluster.
5.Repeat steps 3 and 4: The algorithm continues iterating until:
⚬ The centroids do not change significantly.
⚬ A predefined number of iterations is reached.
2) Hierarchical Clustering
It creates a hierarchy of clusters by either merging smaller clusters (agglomerative) or dividing a large cluster
into smaller clusters (divisive).

There are two main types of hierarchical clustering:

1.Agglomerative Clustering (Bottom-Up Approach):
⚬ Starts by treating each data point as a single cluster.
⚬ Iteratively merges the closest pairs of clusters until all points are in one cluster.
2.Divisive Clustering (Top-Down Approach):
⚬ Starts with all data points in one cluster.
⚬ Iteratively splits the cluster into smaller clusters until each point is in its own cluster.

The algorithm works as follows:

3.Initialize : Treat each data point as a single cluster.
4.Compute Distance Matrix : Calculate the distance between all pairs of clusters .
5.Merge Closest Clusters : Merge the two closest clusters into a single cluster.
6.Update Distance Matrix : Recalculate the distances between the new cluster and the remaining clusters.
7.Repeat: Repeat steps 3 and 4 until all data points are in one cluster.
3) DBSCAN (Density-based Spatial Clustering of Applications with Noise):
Unlike algorithms like K-means, which force data to group around fixed centroids, DBSCAN groups points based on their
density and can also detect noise (points that don't belong to any cluster).

It classifies points into three categories:

1.Core Points: Points with at least MinPts (minimum points) within a distance ε (epsilon).
2.Border Points: Points that have fewer than MinPts neighbors but are within ε of a core point.
3.Noise Points (Outliers): Points that do not belong to any cluster.

Steps in DBSCAN Algorithm:

4.Select a random point P.
5.Find all points within distance ε from P.
⚬ If P has at least MinPts neighbors, it becomes a core point and forms a new cluster.
⚬ If P has fewer than MinPts neighbors, it is marked as noise (may later become part of a cluster).
6.Expand the cluster by adding all density-connected points.
7.Repeat the process until all points are either clustered or marked as noise.

DBSCAN Parameters
🔹 ε (epsilon): Maximum distance between two points to be considered neighbors.
🔹 MinPts (minimum points): Minimum number of points required to form a dense cluster.
Dimensionality
Reduction
It’s the process of reducing the number of features (variables) in a dataset while preserving as much important information as possible.
It helps to :
• Improve computational efficiency.
• Reduce overfitting by eliminating redundant features.
• Enhance visualization (especially for high-dimensional data).

Types of Dimensionality Reduction

1) Feature Selection : Keeps only the most important features and removes the rest.
• Methods :
⚬ Filter Methods : choose the most important features from the data based on simple statistical measures (like how much they
vary or their correlation with the target). This is done before training the model.
⚬ Wrapper Methods: evaluate feature subsets by training and testing a model to find the best combination for accurate
predictions.
⚬ Embedded Methods automatically selects important features during the training process.
2) Feature Extraction : Transforms existing features into a smaller set of new features that still contain useful information.
• Methods:
⚬ Principal Component Analysis (PCA) : Reduces data while keeping the most important features. Useful when features are highly
correlated
⚬ Linear Discriminant Analysis (LDA) : Similar to PCA but focuses on class separation.
⚬ t-SNE : Useful for visualizing high-dimensional data in 2D or 3D., but not for making predictions. Used in facial recognition to
group similar-looking faces together.
⚬ Autoencoders (deep learning approach) Neural networks that learn a compressed version of the data
Anomaly detection
Anomaly detection is used to identify unusual patterns or outliers in data that deviate significantly from the majority of the
dataset. These techniques are commonly applied in fraud detection, network security, and other areas where unusual
behaviors or events need to be flagged

Types of Outliers:
• Univariate outliers exist in a single variable in isolation. They are extreme or abnormal values that deviate from the
typical range of values for that specific feature.
• Multivariate outliers are found by combining the values of multiple variables at the same time.

For univariate outlier detection, the most popular methods are:

1.Z-score (standard score): Measures how many standard deviations a data point is away from the mean. Generally,
instances with a z-score over 3 are chosen as outliers.
2.Interquartile range (IQR): The IQR is the difference between the third quartile (Q3) and the first quartile (Q1) of the
data.. The outlier detection rule is: Lower bound: Q1−1.5×IQR / Upper bound: Q3+1.5×IQR / Any value outside of this
range is considered an outlier.
3.Modified z-scores: They are similar to regular Z-scores but are more robust to outliers. They use the median instead
of the mean and the Median Absolute Deviation (MAD) instead of the standard deviation to calculate how far a data
point is from the center of the distribution.
4.ARIMA: Used for forecasting time-series data. If the actual value is very different from the predicted one, it's an
anomaly.
For multivariate outliers :
1.Isolation Forest: uses a collection of isolation trees (similar to decision trees) that recursively divide complex datasets
until each instance is isolated. The instances that get isolated the quickest are considered outliers.
2.Local Outlier Factor (LOF): LOF measures the local density deviation of a sample compared to its neighbors. Points
with significantly lower density are chosen as outliers.
3.Clustering techniques: techniques such as k-means or hierarchical clustering divide the dataset into groups. Points
that don’t belong to any group or are in their own little clusters are considered outliers.
4.Angle-based Outlier Detection (ABOD): ABOD measures the angles between individual points. Instances with odd
angles can be considered outliers.
Reinforcement
Learning
RL is a type of ML where an agent learns to make decisions by interacting with an environment to maximize a reward. Instead of
learning from labeled data, RL learns through trial and error, receiving feedback in the form of rewards or penalties.

Key Components of RL:

1.Agent → The learner or decision-maker (e.g., a robot, AI player in a game).
2.Environment → Everything the agent interacts with (e.g., a game, real world).
3.State (S) → The current situation of the agent in the environment.
4.Action (A) → The possible moves the agent can take.
5.Reward (R) → Feedback signal for each action (positive for good, negative for bad).
6.Policy (π) → The strategy the agent follows to decide actions.
7.Value Function (V) → Measures how good a state is for future rewards.
8.Q-Value (Q) → Measures the expected reward for taking an action in a given state.

How RL Works:
9.The agent observes the state of the environment.
10.It chooses an action based on its current policy.
11.The environment responds by changing the state and giving a reward.
12.The agent updates its knowledge (policy) to maximize future rewards.
13.The process repeats until the agent learns the best strategy
Types of Reinforcement Learning :
1.Model-Free RL : The agent learns through trial and error without knowing the environment's exact rules.
⚬ Examples: Q-Learning, Deep Q-Networks (DQN).
2.Model-Based RL : The agent builds a model of the environment and uses it to make decisions.
⚬ Examples: Monte Carlo Tree Search (MCTS).
3.On-Policy vs. Off-Policy Learning :
⚬ On-Policy: Learns from the actions it takes (e.g., SARSA).
⚬ Off-Policy: Learns from different policies (e.g., Q-Learning).

Popular Algorithms in RL :
4.Q-Learning : A value-based method that learns the optimal action-value function Q(s,a).
5.Deep Q-Networks (DQN) : An extension of Q-learning that uses neural networks to approximate the Q-
function, enabling it to handle high-dimensional inputs like images.
6.Policy Gradient Methods : Algorithms like REINFORCE and Actor-Critic learn policies directly by optimizing
the expected return.
7.Proximal Policy Optimization (PPO) : A popular policy gradient algorithm known for its stability and
efficiency.
8.Monte Carlo Tree Search (MCTS) : Often used in combination with RL for games like Go and Chess.
Deep Learning
Definition:
DL is a subset of ML, which itself is a subfield of AI. It involves the
use of artificial neural networks to model and solve complex
problems. These neural networks are inspired by the structure and
function of the human brain, consisting of layers of interconnected
nodes (neurons) that process data.

Types of Neural Networks

• Feedforward Neural Networks (FNNs): The simplest type, where data
flows in one direction from input to output.
• Convolutional Neural Networks (CNNs): Specialized for image and
video processing. They use convolutional layers to automatically and
adaptively learn spatial hierarchies of features.
• Recurrent Neural Networks (RNNs): Used for sequential data like
time series or text. They have loops that allow information to persist
across time steps.
• Long Short-Term Memory Networks (LSTMs): A type of RNN designed
to handle long-term dependencies in sequences.
• Transformers: Introduced for natural language processing (NLP), they
use self-attention mechanisms to weigh the importance of different
parts of the input sequence.
Training Process

• Forward Propagation : Input data is passed through the

network layer by layer to produce an output.

• Loss Function : Measures the difference between the

predicted output and the true output. The goal of training is to
minimize this loss, so the predictions get closer to the actual
values. Common loss functions include mean squared error
(MSE) for regression tasks and cross-entropy for classification
tasks.

• Backpropagation : The error is propagated backward

through the network to adjust the weights of the neurons. This
is done using gradient descent or its variants (e.g., stochastic
gradient descent, Adam).

• Optimization Algorithms : Techniques like gradient

descent, momentum, and adaptive learning rate methods
(e.g., RMSProp, Adam) are used to minimize the loss function.
Recommendatio
n systems
Definition:
Recommendation systems are algorithms designed to suggest
relevant items to users, such as products, movies, music, or content,
based on their preferences, behavior, or similar users' data.

Types of Recommendation Systems

1)Collaborative Filtering :
It recommends items to a user based on what other similar users have
liked or interacted with. The core idea is that people who agreed in the
past are likely to agree again in the future.
• User-based Collaborative Filtering recommends items based
on the behavior of similar users.
• Item-based Collaborative Filtering recommends items that are
similar to items the user has liked.
There are 2 main kinds of collaborative filtering systems: memory-based and
model-based.

Memory-based :
Memory-based systems represent users and items as a matrix. They are an
extension of the k-nearest neighbors (KNN) algorithm because they aim to find
their “nearest neighbors,” which can be similar users or similar items.

Model-based :
One of the most commonly used model-based collaborative filtering algorithms
is matrix factorization. This dimensionality reduction method decomposes the
user-item matrix into two smaller matrices—one for users and another for
items. The 2 matrices are then multiplied together to predict the missing values
(or the recommendations) in the larger matrix.

2) Content-based :
Content-based filtering recommends items by comparing the features of items
with a user's profile or preferences. It focuses on the attributes of the items
themselves.

• Example: If a user frequently watches action movies, the system

recommends other action movies with similar characteristics (e.g., director,
actors, genre).
3) Hybrid :
Combines both collaborative filtering and content-based filtering methods to overcome the limitations of each by
combining the strengths of both approaches. Provides more tailored recommendations by considering both user
behavior and item features.

4)Context-Aware Filtering :
It’s a recommendation system approach that takes into account the context in which a
user is making a decision or interacting with a system. The context could refer to various
factors such as time, location, device, or mood.
While traditional recommendation systems focus mainly on user preferences and item
attributes, context-aware systems consider the surrounding conditions to refine the
recommendations further.

5) Deep Neural Network Models for Recommendation :

1. Deep Neural Networks (DNN) : Learns patterns in user-item interactions using a
multi-layered neural network.
• How it works:
⚬ Converts users and items into numerical vectors (embeddings).
⚬ Passes them through multiple layers of a neural network.
⚬ Predicts how much a user will like an item.
• Used for: General recommendation tasks.
2. Neural Collaborative Filtering (NCF) : Replaces traditional collaborative filtering with a deep learning approach.
• How it works:
⚬ Users and items are mapped to embeddings.
⚬ A neural network learns complex interactions between them.
⚬ Outputs a recommendation score.
• Used for: Personalized recommendations based on user-item interactions.

3. Autoencoders (Variational Autoencoders - VAE) : Learns hidden patterns in user preferences and reconstructs missing
data.
• How it works:
⚬ Compresses (encodes) user preferences into a smaller representation.
⚬ Reconstructs (decodes) preferences to fill in missing interactions.
• Used for: Handling missing data (e.g., cold start problems).

4. Recurrent Neural Networks (RNN - LSTM, GRU) : Models user behavior over time.
• How it works:
⚬ Takes a sequence of user interactions (e.g., past clicks).
⚬ Uses LSTM (Long Short-Term Memory) or GRU (Gated Recurrent Units) to remember past preferences.
⚬ Predicts the next item the user might like.
• Used for: Sequential recommendations (e.g., music, news).
5. Convolutional Neural Networks (CNN) : Extracts important features from content
(images, text, etc.).
• How it works:
⚬ Uses convolutional layers to analyze item characteristics.
⚬ Matches similar items based on extracted features.
• Used for: Content-based recommendations.

6. DLRM (Deep Learning Recommendation Model) :

DLRM is developed by Facebook (Meta). It is designed to efficiently handle large-
scale personalized recommendations by combining dense (numerical) and sparse
(categorical) features.

7.Recommendations based on popularity

Recommendations based on popularity refers to systems or methods that suggest items, content, or services to users based on
their popularity among a larger group. The idea is that items that are widely liked or used by many people are more likely to be
relevant or interesting to an individual.
These systems don't rely on the user's own preferences or past behavior, but rather on the collective behavior of others.

8.Recommendation by Clustering
Clustering-based recommendation is a technique in recommendation systems that groups users or items into clusters based on
their similarities. Once the clusters are formed, recommendations are made based on the preferences of other users or items in
the same cluster.
• Build vector representations of items Use techniques like TF-IDF
• Apply clustering algorithms to group similar users or items.
• Associate a user with a cluster of similar items
• Recommend items from the same cluster
THANK YOU
FOR YOUR INTEREST

Download ebooks file Recommender Systems Handbook 3rd Edition Francesco Ricci all chapters
No ratings yet
Download ebooks file Recommender Systems Handbook 3rd Edition Francesco Ricci all chapters
40 pages
360DigiTMG Practical Data Science New
100% (1)
360DigiTMG Practical Data Science New
168 pages
Report Print
No ratings yet
Report Print
22 pages
9. Introduction to Emerging Technologies
No ratings yet
9. Introduction to Emerging Technologies
43 pages
Chapter 2 Preparing To Model
No ratings yet
Chapter 2 Preparing To Model
49 pages
Unit II Big Data Learning
No ratings yet
Unit II Big Data Learning
6 pages
Big Data Assignment
No ratings yet
Big Data Assignment
14 pages
Ass 2
No ratings yet
Ass 2
6 pages
Supervised Learning Final With Diagrams Cleaned
No ratings yet
Supervised Learning Final With Diagrams Cleaned
7 pages
Unit 3
No ratings yet
Unit 3
33 pages
Library
No ratings yet
Library
23 pages
(IJCST-V9I4P18) :yew Kee Wong
No ratings yet
(IJCST-V9I4P18) :yew Kee Wong
5 pages
AI_X_MIDTERM
No ratings yet
AI_X_MIDTERM
21 pages
AIML MODEL
No ratings yet
AIML MODEL
13 pages
AI PROJECT CYCLE
No ratings yet
AI PROJECT CYCLE
30 pages
Unit_IIAIProjectCycle
No ratings yet
Unit_IIAIProjectCycle
9 pages
SIM - Chapters - DA T1
No ratings yet
SIM - Chapters - DA T1
4 pages
360DigiTmg E Book Data Science
100% (1)
360DigiTmg E Book Data Science
168 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
ML NOTES
No ratings yet
ML NOTES
101 pages
Week 12 Intro to DS and ML
No ratings yet
Week 12 Intro to DS and ML
67 pages
Unit 3
No ratings yet
Unit 3
97 pages
Module1 Introduction
No ratings yet
Module1 Introduction
35 pages
Data Science
No ratings yet
Data Science
132 pages
(IJIT-V7I5P2) :yew Kee Wong
No ratings yet
(IJIT-V7I5P2) :yew Kee Wong
6 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
21 pages
Machine Learning and Deep Learning a Comprehensive Overview.pptx
No ratings yet
Machine Learning and Deep Learning a Comprehensive Overview.pptx
15 pages
Ai Cheat Sheet Machine Learning With Python Cheat Sheet
100% (3)
Ai Cheat Sheet Machine Learning With Python Cheat Sheet
2 pages
ML Concepts
No ratings yet
ML Concepts
15 pages
Module -1 Lecture-1
No ratings yet
Module -1 Lecture-1
40 pages
AI unit 1
No ratings yet
AI unit 1
36 pages
MLDM Lect1 Introduction
No ratings yet
MLDM Lect1 Introduction
40 pages
AI PROJECT CYCLE EASY NOTES
No ratings yet
AI PROJECT CYCLE EASY NOTES
7 pages
AI and ML For Business Antim Prahar WITH ANSWERS
No ratings yet
AI and ML For Business Antim Prahar WITH ANSWERS
26 pages
Lecture Notes 1 2 Intro Python
No ratings yet
Lecture Notes 1 2 Intro Python
13 pages
Machine Learning for Data Science Unit-4
No ratings yet
Machine Learning for Data Science Unit-4
16 pages
And Where The Machine Learning Models Are Being Used?
100% (1)
And Where The Machine Learning Models Are Being Used?
4 pages
Module_-1
No ratings yet
Module_-1
9 pages
LECTURE-2
No ratings yet
LECTURE-2
36 pages
A Review on Big Data
No ratings yet
A Review on Big Data
6 pages
Final PPT
No ratings yet
Final PPT
24 pages
Module 1 ML
No ratings yet
Module 1 ML
8 pages
Summary DS231
No ratings yet
Summary DS231
11 pages
RESEARCH PAPER ON MACHINE LEARNING .pdf (1)
No ratings yet
RESEARCH PAPER ON MACHINE LEARNING .pdf (1)
15 pages
Basics of Data Science
No ratings yet
Basics of Data Science
46 pages
Aiml
No ratings yet
Aiml
11 pages
SocrAI Day 1
No ratings yet
SocrAI Day 1
104 pages
Big Data
No ratings yet
Big Data
5 pages
21cs743 Solutions
No ratings yet
21cs743 Solutions
19 pages
1.2. Preparing Machine Learning Environment: Installation of Python (In Windows OS)
No ratings yet
1.2. Preparing Machine Learning Environment: Installation of Python (In Windows OS)
8 pages
ML Algorithms
No ratings yet
ML Algorithms
4 pages
Ai Notes
No ratings yet
Ai Notes
7 pages
Notes Unit 1
No ratings yet
Notes Unit 1
13 pages
AIYA SESSION 4
No ratings yet
AIYA SESSION 4
42 pages
DSBDA_UNIT1
No ratings yet
DSBDA_UNIT1
232 pages
Industrial Training Report (Sahil)
No ratings yet
Industrial Training Report (Sahil)
33 pages
big data with machine learning and fuzzy logic
No ratings yet
big data with machine learning and fuzzy logic
5 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Hadoop Ecosystem for Big Data
From Everand
Hadoop Ecosystem for Big Data
Dr. Zemelak Goraga
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
From Everand
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
Robert Johnson
No ratings yet
Deep Learning Applications, Volume 2 M. Arif Wani all chapter instant download
100% (2)
Deep Learning Applications, Volume 2 M. Arif Wani all chapter instant download
65 pages
Book Recommendation Systems a Deep Dive
No ratings yet
Book Recommendation Systems a Deep Dive
8 pages
Content Based Movie Recommendation System An Enhanced Approach To Personalized Movie Recommendations - 12
No ratings yet
Content Based Movie Recommendation System An Enhanced Approach To Personalized Movie Recommendations - 12
5 pages
Resume Amazon
No ratings yet
Resume Amazon
3 pages
Computer Science Project Titles 2023 24 Takeoff Edu Group
No ratings yet
Computer Science Project Titles 2023 24 Takeoff Edu Group
66 pages
MCA IV Semester Project 1 Review Presentation: Movie Recommendation System Using Machine Learning
No ratings yet
MCA IV Semester Project 1 Review Presentation: Movie Recommendation System Using Machine Learning
12 pages
Resume
No ratings yet
Resume
2 pages
5330-Article Text-8555-1-10-20200508
No ratings yet
5330-Article Text-8555-1-10-20200508
8 pages
Caso 2 - YOB Bank
No ratings yet
Caso 2 - YOB Bank
10 pages
Case Study on Netflix
No ratings yet
Case Study on Netflix
20 pages
Ijsret v9 Issue6 435
No ratings yet
Ijsret v9 Issue6 435
6 pages
Recommendation System
No ratings yet
Recommendation System
14 pages
AWS Machine Learning Specialty
100% (1)
AWS Machine Learning Specialty
67 pages
Cutting-Edge_Travel_Planner_Intelligent_Route_Recommendation_System_using_Enhanced_Learning_Scheme_with_AI_Principles
No ratings yet
Cutting-Edge_Travel_Planner_Intelligent_Route_Recommendation_System_using_Enhanced_Learning_Scheme_with_AI_Principles
8 pages
An Neural Collaborative Filtering NCF Based Recommender System For Personalized Rehabilitation Exercises
No ratings yet
An Neural Collaborative Filtering NCF Based Recommender System For Personalized Rehabilitation Exercises
6 pages
Aindumps 2023-Aug-06 by Edison 117q Vce
No ratings yet
Aindumps 2023-Aug-06 by Edison 117q Vce
9 pages
A Technical Analysis of Recommender Systems For Web Personalization Based On Data Mining Methods
No ratings yet
A Technical Analysis of Recommender Systems For Web Personalization Based On Data Mining Methods
5 pages
Recommendation System
No ratings yet
Recommendation System
21 pages
Electronics 11 00141 v2
No ratings yet
Electronics 11 00141 v2
48 pages
Content Filtering of Social Media Sites Using Mach
No ratings yet
Content Filtering of Social Media Sites Using Mach
8 pages
BDA Final Notes
No ratings yet
BDA Final Notes
53 pages
Digital Age in Semiotics and Communication
No ratings yet
Digital Age in Semiotics and Communication
208 pages
Project Report on Movie Recommendation System
No ratings yet
Project Report on Movie Recommendation System
10 pages
Fundamentals of Machine Learning
No ratings yet
Fundamentals of Machine Learning
24 pages
Music Recommendation System
No ratings yet
Music Recommendation System
24 pages
Implementation of Event Management System Using Machine Learning and Cryptography (Advanced Event Management System)
No ratings yet
Implementation of Event Management System Using Machine Learning and Cryptography (Advanced Event Management System)
7 pages
House File
No ratings yet
House File
30 pages
Format Internship Report 2022
No ratings yet
Format Internship Report 2022
52 pages
Mining The Network Value of Customers - Domingos and Richardson
No ratings yet
Mining The Network Value of Customers - Domingos and Richardson
10 pages

Presentation

Uploaded by

Presentation

Uploaded by

AI , Machine

Learning and Big

Key Characteristics of Big Data (The 5 Vs) :

web scraping tools:

Surveys & Data Mining Databases & Public

• Relational Databases (RDBMS) : Data is stored in tables with a fixed schema.

POSTGRESQL MYSQL MONGODB CASSANDRA HDFS

Key Steps in Processing :

APACHE SPARK APACHE HADOOP APACHE FLINK AIRFLOW DASK

-> Labeled data is data that has been tagged with a

Supervised learning, has the presence of a supervisor

Types of supervised learning:

Types of Logistic Regression :

1.Binomial : It can be only two possible types of the dependent

Case-based reasoning (CBR)

CBR is an AI technique where solutions to new problems are

based on its size. Formula: 𝑌 = a𝑋 + 𝑏

2. Multiple Linear Regression

Formula: 𝑌 = 𝑏₀ + 𝑏₁𝑋₁ + 𝑏₂𝑋₂ + ... + 𝑏𝑛𝑋𝑛

over time . Formula: 𝑌 = 𝑏₀ + 𝑏₁𝑋 + 𝑏₂𝑋² + 𝑏₃𝑋³ + ...

5. Ridge and Lasso Regression

6. Decision tree and Random forest

Requirements before performing supervised learning :

→ Prevents biased or incorrect model predictions.

In classification , accuracy is a commonly used metric :

Types of unsupervised learning:

1.Choose the number of clusters (K)

There are two main types of hierarchical clustering:

The algorithm works as follows:

It classifies points into three categories:

Steps in DBSCAN Algorithm:

Types of Dimensionality Reduction

For univariate outlier detection, the most popular methods are:

Key Components of RL:

Types of Neural Networks

• Forward Propagation : Input data is passed through the

• Loss Function : Measures the difference between the

• Backpropagation : The error is propagated backward

• Optimization Algorithms : Techniques like gradient

Types of Recommendation Systems

• Example: If a user frequently watches action movies, the system

5) Deep Neural Network Models for Recommendation :

6. DLRM (Deep Learning Recommendation Model) :

7.Recommendations based on popularity

You might also like