0% found this document useful (0 votes)

3 views

Data Processing in AI

TensorFlow Transform (tf.Transform) is a library for preprocessing data in TensorFlow, allowing for consistent transformations during both training and serving. Effective data processing is crucial in AI and ML, enhancing model performance, ensuring data quality, and enabling feature engineering. Key steps in data processing include data collection, cleaning, integration, transformation, and reduction, while challenges involve handling large volumes of data and ensuring compliance with ethical standards.

Uploaded by

ragipatiyeliya40

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Data Processing in AI

Uploaded by

ragipatiyeliya40

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Unit 5

TensorFlow Transform (tf.Transform) can be used to preprocess data using exactly the same code for both training a
model and serving inferences in productionTensorFlow Transform is a library for preprocessing input data for TensorFlow, including
creating features that require a full pass over the training dataset. For example, using TensorFlow Transform you could:

 Normalize an input value by using the mean and standard deviation

 Convert strings to integers by generating a vocabulary over all of the input values

 Convert floats to integers by assigning them to buckets, based on the observed data distribution

TensorFlow has built-in support for manipulations on a single example or a batch of examples. tf.Transform extends these
capabilities to support full passes over the entire training dataset.

The output of tf.Transform is exported as a TensorFlow graph which you can use for both training and serving. Using the same
graph for both training and serving can prevent skew, since the same transformations are applied in both stages.

Data Processing in AI & Machine Learning

Machine Learning (ML) and Artificial Intelligence (AI) are leading the way in innovation, powering things like predictive
analytics in today’s rapid technology world. At the core of these powerful technologies is data—raw, unprocessed, and
often disorganized. Converting raw data into helpful insights involves a complex process known as data processing. This
detailed guide explores the Importance of Data Processing in Machine Learning & AI, highlighting its crucial role in
creating models that are accurate, efficient, and strong.

Understanding Data Processing

Data processing is a series of steps that change raw data into something usable. This includes collecting, cleaning,
combining, changing, and simplifying the data. Each step is important to make sure the data going into ML and AI models is
accurate, steady, and useful. Doing this well improves the data’s quality, which means the models work better and faster.

Why Data Processing is Crucial

Quality Assurance: High-quality data is crucial for trustworthy AI models. Poor-quality data can cause wrong predictions
and insights, making the AI system less dependable Data processing gets rid of mistakes, fixes errors, and fills in missing
information, making sure the data is trustworthy.

Improved Model Performance: Well-handled data improves how ML algorithms work. When data is clean and organized,
algorithms can Identify patterns better, making models more accurate and adaptable.

Reduced Computational Costs: Effective data processing can significantle reduce computer expenses. By getting rid of
unimportant or duplicated information, the dataset becomes smaller, making it easier and cheaper to handle.

Enabling Feature Engineering: Feature engineering, which involves using knowledge in a specific field to create
features that improve ML algorithms, depends a lot on properly handled data. When data processing is done well, it helps
pull out useful features, making models perform better.

Compliance and Security: Data processing makes sure that data follows important laws like GDPR. It also involves
making sensitive information anonymous, which boosts data security and privacy.

Key Steps in Data Processing

Data Collection: Collecting data can happen in different ways, like typing it in directly, using sensors, scraping the web, or
accessing databases. How good and related the collected data is sets the groundwork for the next steps in processing it.

Data Cleaning: This step involves removing or correcting inaccuracies, handling missing values, and eliminating
duplicates. Techniques used include filling in missing values, spotting outliers, and making sure data is on the same scale.

Data Integration: Bringing together data from various sources to create a single view. This often involves sorting out
differences in data formats and getting rid of duplicates.
Data Transformation: Converting data into a format that’s good for analyzing. This could mean making sure it’s all on the
same scale, putting it in a standard format, or changing categories into numbers.

Data Reduction: Making the data simpler without losing the important information. This could involve techniques like
dimensionality reduction, using things like Principal Component Analysis (PCA), or choosing only the most important
features.

Practical Examples of Data Processing in ML and AI

Natural Language Processing (NLP): In Natural Language Processing (NLP), data processing includes breaking text into
pieces (tokenization), reducing words to their root form (stemming), ensuring words are in their dictionary form
(lemmatization), and getting rid of common words that don’t carry much meaning (stop words). These steps are important
for changing raw text into a format that ML algorithms can learn to do tasks such as understanding emotions in text or
translating languages.

Image Processing: In computer vision tasks, processing data involves resizing images, making sure they’re all on the
same scale, and using techniques like rotating, flipping, and scaling to add variety. These steps make the model stronger
and work better.

Time Series Analysis: Data processing for time series includes dealing with missing time points, making the data
smoother, and pulling out features like average trends over time. Making sure time-series data is handled well is crucial for
forecasting models to guess things like stock prices, weather, or sales trends.

Challenges in Data Processing

Volume and Variety: The huge amount and different kinds of data out there today can feel like a lot. Handling big sets of
data that have structured parts, semi-structured parts, and bits that aren’t organized takes smart methods and strong
computers.

Data Quality Issues: Inconsistent, incomplete, or noisy data can pose significant challenges. Developing robust methods
to clean and preprocess such data is crucial for effective ML and AI applications.

Real-time Processing: Lots of applications need to process data as it comes in, which can be hard because it has to be
done fast and well. Doing this in real-time is critical for applications like fraud detection, autonomous driving, and real-time
analytics.

Ethical and Legal Considerations: Making sure data processing follows the rules and is ethical is really important. This
means keeping data private, getting permission to use it, and being clear about how it’s used.

Tools and Techniques for Data Processing

Many tools and techniques have been created to help process data for ML and AI. Some of the popular ones include:

Pandas: A powerful Python library used for handling and analyzing data, offering the necessary tools and functions to tidy
up, convert, and examine data effectively.

Apache Spark: Spark is a tool for processing large amounts of data all in one place. It’s good for handling big data
because it can work with data quickly without needing to store it all first.

TensorFlow and PyTorch: Although mainly used to create ML models, these frameworks also come with tools for
preparing data, such as libraries for processing images and text.

SQL and NoSQL Databases: Databases such as MySQL, PostgreSQL, MongoDB, and Cassandra offer strong features for
storing and finding data, helping with different data processing jobs.

Activation functions
An active function Function decides whether a nureon should be active or not.
The primery role of the Activation Function is to transform the summed weighted input from node to
an output value to be fed to next hidden layer or as output

Diagram1

This neral nerwork is made of interconnected neurons.Each of them is characterized by its weights,bais
and activation function

Input Layer The input layer takes raw input from the domain. No computation is performed at this
layer. Nodes here just pass on the information (features) to the hidden layer.

Hidden Layer As the name suggests, the nodes of this layer are not exposed. They provide an
abstraction to the neural network.
The hidden layer performs all kinds of computation on the features entered through the input layer
and transfers the result to the output layer.
Output Layer It's the final layer of the network that brings the information learned through the
hidden layer and delivers the final value as a result.

• All hidden layers usually use the same activation function.

• However, the output layer will typically use a different activation function from the hidden layers.
• The choice depends on the goal or type of prediction made by the model.
In the feedforward propagation, the Activation Function is a mathematical "gate" in between the input
feeding the current neuron and its output going to the next layer.

Why do Neural Networks need activation function?

The purpose of an activation function is to add non- linearity to the neural network.

Activation functions introduce an additional step at each layer during the forward propagation Let's
suppose we have a neural network working with out the activation functions. In that case, every
neuron will only be performing a linear transformation on the inputs using the weights and biases. It's
because it doesn't matter how many hidden layers we attach in the neural network; all layers will
behave in the same way because the composition of two linear functions is a linear function itself.
Although the neural network becomes simpler, learning any complex task is impossible, and our model
would be just a linear regression model.

Binary Step Function:

Binary step function depends on a threshold value that decides whether a neuron should be activated
or not.
The input fed to the activation function is compared to a certain threshold; if the input is greater than
it, then the neuron is activated, else it is deactivated, meaning that its output is not passed on to the
next hidden layer.

Linear Activation Function:

The linear activation function, also known as "no activation," or "identity function" (multiplied x1.0), is
where the activation is proportional to the input.
The function doesn't do anything to the weighted sum of the input, it simply spits out the value it was
given.

Diagram 1

Non-Linear Activation Functions: The linear activation function is simply a linear regression model.
Because of its limited power, this does not allow the model to create complex mappings between the
network's inputs and outputs.
Sigmoid / Logistic Activation Function:
This function takes any real value as input and outputs values in the range of 0 to 1.
The larger the input (more positive), the closer the output value will be to 1.0, whereas the smaller the
input (more negative), the closer the output will be to 0.0, as shown below.

Dagram 2

Sigmoid/logistic activation function is one of the most widely used functions

Tanh Function (Hyperbolic Tangent):

Tanh function is very similar to the sigmoid/logistic activation function, and even has the same S-
shape with the difference in output range of -1 to 1. In Tanh, the larger the input (more positive), the
closer the output value will be to 1.0, whereas the smaller the input (more negative), the closer the
output will be to -1.0.

Diagram 3

Advantages of using this activation function are:

• The output of the tanh activation function is Zero centered; hence we can easily map the output
values as strongly negative, neutral, or strongly positive.
•Usually used in hidden layers of a neural network as its values lie between -1 to; therefore, the mean
for the hidden layer comes out to be 0 or very close to it. It helps in centering the data and makes
learning for the next layer much easier.

ReLU Function:
ReLU stands for Rectified Linear Unit.

Although it gives an impression of a linear function, ReLU has a derivative function and allows for
backpropagation while simultaneously making it computationally efficient.
The main catch here is that the ReLU function does not activate all the neurons at the same time.
The neurons will only be deactivated if the output of the linear transformation is less than 0.
Diagram 4

ACTIVATION FUNCTION
ACTIVATION LEVEL - DISCRETE OR CONTINUOUS
HARD LIMIT FUCNTION (DISCRETE)
• Binary Activation function
• Bipolar activation function
• Identity function
SIGMOIDAL ACTIVATION FUNCTION (CONTINUOUS)
• Binary Sigmoidal activation function
• Bipolar Sigmoidal activation function

Unit -1

In machine learning, it's essential to grasp the difference between training error and test error to create models
that generalize effectively to new, unseen data. In this discussion, we'll delve into these concepts, illustrate how
they behave, and offer strategies to manage and interpret these errors effectively.

What is Training Error?

Training Error refers to the model's error rate on the dataset it was trained on. It shows how well the model has
learned the training data:

- Low Training Error: Suggests that the model fits the training data well.

- High Training Error: Indicates that the model is too simple and fails to capture the underlying patterns in the
data (underfitting).
As the model complexity increases, the training error tends to decrease because the model can fit more details of
the training data. For example, a very deep decision tree can perfectly classify the training data, resulting in a
training error near zero.

What is Test Error?

Test Error measures the model's error rate on a separate, unseen dataset (the test set). It assesses how well the
model generalizes to new data:

- Low Test Error: Shows good generalization, meaning the model performs well on new, unseen data.

- High Test Error: Suggests poor generalization, indicating the model has overfitted the training data and is
capturing noise instead of true patterns.

Initially, as the model complexity increases, the test error decreases because the model captures more relevant
patterns. However, beyond a certain point, the test error starts to rise, indicating overfitting.

Diagram 1

Key Points in the Curve:

1. Underfitting Region: Both training and test errors are high because the model is too simplistic.

2. Optimal Fit Region: Training error is low, and test error is also low, showing good generalization.

3. Overfitting Region: Training error continues to decrease, but test error begins to increase as the model
becomes too complex and starts to fit noise in the training data.

Causes and Implications of Overfitting

Overfitting happens when a model learns the noise and random fluctuations in the training data, not just the
underlying patterns. This makes the model very sensitive to the specific instances in the training set, leading to
high variance and poor performance on new data.

Implications:

- The model performs well on training data but poorly on test data.

- It fails to generalize, which is a significant issue in real-world applications where the aim is to predict or classify
new, unseen instances.

Strategies to Mitigate Overfitting

1. Pruning: In decision trees, pruning removes parts of the tree that provide little predictive power, simplifying the
model.

2. Cross-Validation: Techniques like k-fold cross-validation provide a more accurate estimate of test error, aiding
in model selection.

3. Regularization: Methods such as L1 and L2 regularization add a penalty for complexity to the loss function,
discouraging overfitting.

4. Ensemble Methods: Combining the predictions of multiple models (e.g., Random Forests, Gradient Boosting)
can improve generalization and reduce overfitting.

5. Early Stopping: In iterative training processes, stopping training when performance on a validation set begins
to degrade can prevent overfitting.
Example:

Suppose you are building a decision tree to predict house prices based on features like size, location, and age.

- Initial Model: A shallow tree might have a high training error (e.g., 15%) and a high test error (e.g., 20%),
indicating underfitting.

- Optimal Model: A moderately deep tree might reduce the training error to 5% and the test error to 10%,
indicating a good fit.

- Overfitted Model: An extremely deep tree might further reduce the training error to 1% but increase the test
error to 15%, showing overfitting.

By applying techniques like pruning or cross-validation, you can aim to find the optimal balance where the model
generalizes well without fitting the noise.

Balancing training and test errors is key to building robust machine learning models. By understanding these errors
and applying strategies to mitigate overfitting, you ensure that models perform well on training data and generalize
effectively to new data. This balance is crucial for the successful deployment of machine learning models in real-
world applications.

To measure the training error of your decision tree model using the accuracy_score function from Scikit-learn, you
need to follow a series of steps to set up your model, make predictions on the training data, and then evaluate
these predictions. Here's a detailed guide on how to do it:

Steps to Measure Training Error with accuracy_score

1. Import Necessary Libraries: You need to import the necessary components from Scikit-learn, including the
model you are using (such as a decision tree) and the accuracy_score function.

2. Prepare Your Data: Ensure your dataset is divided into features (`X`) and the target variable (`y`). If you have
a separate training and testing dataset, make sure you are using the training dataset (`X_train` and y_train).

3. Train Your Model: Fit the decision tree classifier to your training data.

4. Make Predictions: Use the trained model to predict the labels of the training data.

5. Calculate Training Error: Compare the predicted labels with the actual labels of the training data using the
accuracy_score function, then compute the training error as 1 - accuracy.

Here is the complete Python code that demonstrates this process:

from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import accuracy_score

# Assuming you have your training data ready

# X_train: features of training data
# y_train: actual labels of training data

# Step 1: Train the decision tree classifier

clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)

# Step 2: Predict the labels on the training data

y_train_pred = clf.predict(X_train)

# Step 3: Calculate the accuracy on the training data

training_accuracy = accuracy_score(y_train, y_train_pred)

# Step 4: Calculate the training error

training_error = 1 - training_accuracy
# Print the training accuracy and training error
print(f"Training Accuracy: {training_accuracy}")
print(f"Training Error: {training_error}")

- DecisionTreeClassifier: This is the decision tree model from Scikit-learn. You can adjust its parameters to
change the complexity of the model.

- accuracy_score: This function computes the accuracy, the fraction of correctly predicted samples to the total
samples.

- Training Error: It represents the proportion of training samples that were incorrectly classified by the model.

ML Interactively
No ratings yet
ML Interactively
273 pages
Computer Data Processing
No ratings yet
Computer Data Processing
3 pages
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
Rishabhbuccha
No ratings yet
Rishabhbuccha
20 pages
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
L2 - SLM Notes (Pre-Processing)
No ratings yet
L2 - SLM Notes (Pre-Processing)
37 pages
Machine Learning
No ratings yet
Machine Learning
7 pages
Data Science Course Syllabus 01
100% (1)
Data Science Course Syllabus 01
20 pages
Artificial Intelligence 2024 Book 2 of 2: AI, #2
From Everand
Artificial Intelligence 2024 Book 2 of 2: AI, #2
Yang Yen Thaw
No ratings yet
Unit 1
No ratings yet
Unit 1
21 pages
Unit .1
No ratings yet
Unit .1
7 pages
Unit I
No ratings yet
Unit I
41 pages
Data Science ML Full Stack 2022 GitHub
No ratings yet
Data Science ML Full Stack 2022 GitHub
9 pages
Computer Vision-Based Early Fire Detection Using Open CV and Machine Learning
No ratings yet
Computer Vision-Based Early Fire Detection Using Open CV and Machine Learning
11 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Machine Learning With Python
No ratings yet
Machine Learning With Python
6 pages
Best Practices For Data Management in Artificial Intelligence Applications en
No ratings yet
Best Practices For Data Management in Artificial Intelligence Applications en
104 pages
Machine Learning with Python
No ratings yet
Machine Learning with Python
10 pages
AI Project Cycle-Notes
No ratings yet
AI Project Cycle-Notes
14 pages
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Module 2
No ratings yet
Module 2
8 pages
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
Roadmap
No ratings yet
Roadmap
7 pages
AIML
No ratings yet
AIML
5 pages
Unit 1
No ratings yet
Unit 1
14 pages
machine_learning_roadmap.pdf
No ratings yet
machine_learning_roadmap.pdf
4 pages
Data Mining - Unit - 3
No ratings yet
Data Mining - Unit - 3
62 pages
9. Introduction to Emerging Technologies
No ratings yet
9. Introduction to Emerging Technologies
43 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
11 pages
Unit 2
No ratings yet
Unit 2
27 pages
Data Task Breakdown
No ratings yet
Data Task Breakdown
12 pages
Workflow of A Machine Learning Project
No ratings yet
Workflow of A Machine Learning Project
12 pages
Data Science Syllabus From Beginner to Advanced
No ratings yet
Data Science Syllabus From Beginner to Advanced
7 pages
Chapter 2 Preparing To Model
No ratings yet
Chapter 2 Preparing To Model
49 pages
General Material
No ratings yet
General Material
16 pages
DA Unit 2
No ratings yet
DA Unit 2
13 pages
hammad raza.
No ratings yet
hammad raza.
28 pages
Data Science I: Lesson #01 - Outline Presentation
No ratings yet
Data Science I: Lesson #01 - Outline Presentation
20 pages
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
From Everand
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
Nietsnie Trebla
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Computer Assignment
No ratings yet
Computer Assignment
7 pages
Activity Log
No ratings yet
Activity Log
23 pages
Master+Data+Science,+Data+Analytics+and+Machine+Learning+Using+Python (1)
No ratings yet
Master+Data+Science,+Data+Analytics+and+Machine+Learning+Using+Python (1)
16 pages
DSF - UNIT III Notes
No ratings yet
DSF - UNIT III Notes
17 pages
Datascience Notes
No ratings yet
Datascience Notes
2 pages
Data Preprocessing in Machine Learning
No ratings yet
Data Preprocessing in Machine Learning
5 pages
Python
No ratings yet
Python
9 pages
Data Processing and Its Types
No ratings yet
Data Processing and Its Types
11 pages
Self-Learning Data Science
No ratings yet
Self-Learning Data Science
16 pages
Smarter Data Science: Succeeding with Enterprise-Grade Data and AI Projects
From Everand
Smarter Data Science: Succeeding with Enterprise-Grade Data and AI Projects
Neal Fishman
No ratings yet
Al Project Cycle[1]
No ratings yet
Al Project Cycle[1]
10 pages
chp4 (10) fam
No ratings yet
chp4 (10) fam
16 pages
Unit 5
No ratings yet
Unit 5
14 pages
State of The Art Research Methodology For Machine
No ratings yet
State of The Art Research Methodology For Machine
58 pages
Da - 100 - 2MT01 - Act#1 - Bernas, Marc Carlson F.
No ratings yet
Da - 100 - 2MT01 - Act#1 - Bernas, Marc Carlson F.
3 pages
Assignment Title: Trends and Issues in Sciences
No ratings yet
Assignment Title: Trends and Issues in Sciences
4 pages
Doubt Clearance Session(AI) on 29.12.2024
No ratings yet
Doubt Clearance Session(AI) on 29.12.2024
41 pages
1732869290817
No ratings yet
1732869290817
25 pages
Unit 4 - DS - 1st year
No ratings yet
Unit 4 - DS - 1st year
6 pages
TERM 2 Notes
No ratings yet
TERM 2 Notes
8 pages
Ds Genai Partb
No ratings yet
Ds Genai Partb
4 pages
Mtech Ai ML
No ratings yet
Mtech Ai ML
32 pages
generative AI Unit 3 notes
No ratings yet
generative AI Unit 3 notes
8 pages
Principles of Artificial Intelligence
No ratings yet
Principles of Artificial Intelligence
2 pages
Well Posed Learning Problem
No ratings yet
Well Posed Learning Problem
5 pages
Scoa-Question Bank PDF
No ratings yet
Scoa-Question Bank PDF
8 pages
ML Labs
No ratings yet
ML Labs
46 pages
Machine Learning Q and AI: 30 Essential Questions and Answers On Machine Learning and AI 1 / Converted Edition Sebastian Raschka
100% (6)
Machine Learning Q and AI: 30 Essential Questions and Answers On Machine Learning and AI 1 / Converted Edition Sebastian Raschka
52 pages
ESE_577_syllabus_Fall2024
No ratings yet
ESE_577_syllabus_Fall2024
4 pages
CCS 3350 Artificial Intelligence 2
No ratings yet
CCS 3350 Artificial Intelligence 2
3 pages
SpeechToSpeech 1
No ratings yet
SpeechToSpeech 1
30 pages
Neuro-Symbolic Artificial Intelligence: A Survey: Review
No ratings yet
Neuro-Symbolic Artificial Intelligence: A Survey: Review
36 pages
Cybernetical Intelligence: Engineering Cybernetics with Machine Intelligence Kelvin K. L. Wong 2024 Scribd Download
100% (1)
Cybernetical Intelligence: Engineering Cybernetics with Machine Intelligence Kelvin K. L. Wong 2024 Scribd Download
47 pages
Object Detection With DL
No ratings yet
Object Detection With DL
17 pages
Graph Neural Networks
100% (1)
Graph Neural Networks
27 pages
Two Day Workshop On Introduction To Neural Network Toolbox & MATLAB-17
No ratings yet
Two Day Workshop On Introduction To Neural Network Toolbox & MATLAB-17
5 pages
ANN_Unit
No ratings yet
ANN_Unit
40 pages
BrainChip Tech Brief What Is Akida v3 1
No ratings yet
BrainChip Tech Brief What Is Akida v3 1
6 pages
Minor PPT Yolo
No ratings yet
Minor PPT Yolo
19 pages
Machine Learning Q and AI: 30 Essential Questions and Answers on Machine Learning and AI 1 / converted Edition Sebastian Raschka download pdf
75% (4)
Machine Learning Q and AI: 30 Essential Questions and Answers on Machine Learning and AI 1 / converted Edition Sebastian Raschka download pdf
65 pages
Convolutional Neural Networks: Shusen Wang
No ratings yet
Convolutional Neural Networks: Shusen Wang
75 pages
Chapter 4: Machine Learning
No ratings yet
Chapter 4: Machine Learning
30 pages
Appedix D: Artificial Neural Network: D.1 Classical Hebb's Rule
No ratings yet
Appedix D: Artificial Neural Network: D.1 Classical Hebb's Rule
4 pages
Question Bank For Insem AIML
No ratings yet
Question Bank For Insem AIML
1 page
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
1 page
Ai Project Cycle
No ratings yet
Ai Project Cycle
9 pages
Call For Papers: ISSN: 2319 - 1015 (Online) 2319 - 4081 (Print)
No ratings yet
Call For Papers: ISSN: 2319 - 1015 (Online) 2319 - 4081 (Print)
1 page
2nd Review PPT Template
No ratings yet
2nd Review PPT Template
13 pages
TRW Assignment 1
No ratings yet
TRW Assignment 1
10 pages
A Novel Brain Tumor Classification Model
No ratings yet
A Novel Brain Tumor Classification Model
12 pages

Data Processing in AI

Uploaded by

Data Processing in AI

Uploaded by

Unit 5

 Normalize an input value by using the mean and standard deviation

Data Processing in AI & Machine Learning

Understanding Data Processing

Why Data Processing is Crucial

Key Steps in Data Processing

Practical Examples of Data Processing in ML and AI

Challenges in Data Processing

Tools and Techniques for Data Processing

• All hidden layers usually use the same activation function.

Why do Neural Networks need activation function?

Binary Step Function:

Linear Activation Function:

Sigmoid/logistic activation function is one of the most widely used functions

Tanh Function (Hyperbolic Tangent):

Advantages of using this activation function are:

What is Training Error?

What is Test Error?

Key Points in the Curve:

Causes and Implications of Overfitting

Strategies to Mitigate Overfitting

Steps to Measure Training Error with accuracy_score

Here is the complete Python code that demonstrates this process:

from sklearn.tree import DecisionTreeClassifier

# Assuming you have your training data ready

# Step 1: Train the decision tree classifier

# Step 2: Predict the labels on the training data

# Step 3: Calculate the accuracy on the training data

# Step 4: Calculate the training error

You might also like