Open In App

What is Perceptron | The Simplest Artificial neural network

Last Updated : 21 Oct, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

The Perceptron is one of the simplest artificial neural network architectures, introduced by Frank Rosenblatt in 1957. It is primarily used for binary classification.

At that time, traditional methods like Statistical Machine Learning and Conventional Programming were commonly used for predictions. Despite being one of the simplest forms of artificial neural networks, the Perceptron model proved to be highly effective in solving specific classification problems, laying the groundwork for advancements in AI and machine learning.

The article aims to provide fundamentals of the perceptron model, its architecture, working principles, and application, covering both theory and practical implementation using PyTorch.

What is Perceptron?

Perceptron is a type of neural network that performs binary classification that maps input features to an output decision, usually classifying data into one of two categories, such as 0 or 1.

Perceptron consists of a single layer of input nodes that are fully connected to a layer of output nodes. It is particularly good at learning linearly separable patterns. It utilizes a variation of artificial neurons called Threshold Logic Units (TLU), which were first introduced by McCulloch and Walter Pitts in the 1940s. This foundational model has played a crucial role in the development of more advanced neural networks and machine learning algorithms.

Types of Perceptron

  1. Single-Layer Perceptron is a type of perceptron is limited to learning linearly separable patterns. It is effective for tasks where the data can be divided into distinct categories through a straight line. While powerful in its simplicity, it struggles with more complex problems where the relationship between inputs and outputs is non-linear.
  2. Multi-Layer Perceptron possess enhanced processing capabilities as they consist of two or more layers, adept at handling more complex patterns and relationships within the data.

Basic Components of Perceptron

A Perceptron is composed of key components that work together to process information and make predictions.

  • Input Features: The perceptron takes multiple input features, each representing a characteristic of the input data.
  • Weights: Each input feature is assigned a weight that determines its influence on the output. These weights are adjusted during training to find the optimal values.
  • Summation Function: The perceptron calculates the weighted sum of its inputs, combining them with their respective weights.
  • Activation Function: The weighted sum is passed through the Heaviside step function, comparing it to a threshold to produce a binary output (0 or 1).
  • Output: The final output is determined by the activation function, often used for binary classification tasks.
  • Bias: The bias term helps the perceptron make adjustments independent of the input, improving its flexibility in learning.
  • Learning Algorithm: The perceptron adjusts its weights and bias using a learning algorithm, such as the Perceptron Learning Rule, to minimize prediction errors.

These components enable the perceptron to learn from data and make predictions. While a single perceptron can handle simple binary classification, complex tasks require multiple perceptrons organized into layers, forming a neural network.

How does Perceptron work?

A weight is assigned to each input node of a perceptron, indicating the importance of that input in determining the output. The Perceptron’s output is calculated as a weighted sum of the inputs, which is then passed through an activation function to decide whether the Perceptron will fire.

The weighted sum is computed as:

z = w_1x_1 + w_2x_2 + \ldots + w_nx_n = X^TW

The step function compares this weighted sum to a threshold. If the input is larger than the threshold value, the output is 1; otherwise, it's 0. This is the most common activation function used in Perceptrons are represented by the Heaviside step function:

h(z) = \begin{cases} 0 & \text{if } z < \text{Threshold} \\ 1 & \text{if } z \geq \text{Threshold} \end{cases}

A perceptron consists of a single layer of Threshold Logic Units (TLU), with each TLU fully connected to all input nodes.

Threshold Logic units - Geeksforgeeks
Threshold Logic units

In a fully connected layer, also known as a dense layer, all neurons in one layer are connected to every neuron in the previous layer.

The output of the fully connected layer is computed as:

f_{W,b}(X)=h(XW+b)

where X is the input W is the weight for each inputs neurons and b is the bias and h is the step function.

During training, the Perceptron's weights are adjusted to minimize the difference between the predicted output and the actual output. This is achieved using supervised learning algorithms like the delta rule or the Perceptron learning rule.

The weight update formula is:

w_{i,j} = w_{i,j} +\eta (y_j -\hat y_j)x_i

Where:

  • w_{i,j} is the weight between the i^{th} input and j^{th} output neuron,
  • x_i is the i^{th} input value,
  • y_j​ is the actual value, and \hat{y}_j​ is the predicted value,
  • \eta is the learning rate, controlling how much the weights are adjusted.

This process enables the perceptron to learn from data and improve its prediction accuracy over time.

Example: Perceptron in Action

Let’s take a simple example of classifying whether a given fruit is an apple or not based on two inputs: its weight (in grams) and its color (on a scale of 0 to 1, where 1 means red). The perceptron receives these inputs, multiplies them by their weights, adds a bias, and applies the activation function to decide whether the fruit is an apple or not.

  • Input 1 (Weight): 150 grams
  • Input 2 (Color): 0.9 (since the fruit is mostly red)
  • Weights: [0.5, 1.0]
  • Bias: 1.5

The perceptron’s weighted sum would be:

(150 * 0.5) + (0.9 * 1.0) + 1.5 = 76.4

Let’s assume the activation function uses a threshold of 75. Since 76.4 > 75, the perceptron classifies the fruit as an apple (output = 1).

Building and Training Single Layer Perceptron Model

For building the perceptron model we are going to implement following steps

Step 1: Initialize the weight and learning rate

We consider the weight values for the number of inputs + 1 (with the additional +1 accounting for the bias term). This ensures that both the inputs and bias are included during training.

class Perceptron:
def __init__(self, num_inputs, learning_rate=0.01):
# Initialize the weights (num_inputs + 1 for bias)
self.weights = np.random.rand(num_inputs + 1) # Random initialization
self.learning_rate = learning_rate # Learning rate

Step 2: Define the Linear Layer

The first step is to calculate the weighted sum of the inputs. This is done using the formula: Z = XW + b, where X represents the inputs, W the weights, and b the bias.

    def linear(self, inputs):
Z = inputs @ self.weights[1:].T + self.weights[0] # Weighted sum: XW + b
return Z

Step 3: Define the Activation Function

The Heaviside Step function is used as the activation function, which compares the weighted sum to a threshold. If the sum is greater than or equal to 0, it outputs 1; otherwise, it outputs 0.

    def Heaviside_step_fn(self, z):
if z >= 0:
return 1 # Output 1 if the input is >= 0
else:
return 0 # Output 0 otherwise

Step 4: Define the Prediction

Use the linear function followed by the activation function to generate predictions based on the input features.

    def predict(self, inputs):
Z = self.linear(inputs) # Pass inputs through the linear layer
try:
pred = []
for z in Z: # For batch inputs
pred.append(self.Heaviside_step_fn(z))
except:
return self.Heaviside_step_fn(Z) # For single input
return pred # Return prediction

Step 5: Define the Loss Function

The loss function calculates the error between the predicted output and the actual output. In the Perceptron, the loss is the difference between the target value and the predicted value.

    def loss(self, prediction, target):
loss = (prediction - target) # Error or loss calculation
return loss

Step 6: Define Training

In this step, weights and bias are updated according to the error calculated from the loss function. The Perceptron learning rule is applied to adjust the weights to minimize the error.

    def train(self, inputs, target):
prediction = self.predict(inputs) # Get prediction
error = self.loss(prediction, target) # Calculate error (loss)
self.weights[1:] += self.learning_rate * error * inputs # Update weights
self.weights[0] += self.learning_rate * error # Update bias

Step 7: Fit the Model

The fitting process involves training the model over multiple iterations (epochs) to adjust the weights and bias. This allows the Perceptron to learn from the data and improve its prediction accuracy over time.

    def fit(self, X, y, num_epochs):
for epoch in range(num_epochs):
for inputs, target in zip(X, y): # Loop through dataset
self.train(inputs, target) # Train on each input-target pair

Complete Code:

Python
# Import the necessary library
import numpy as np

# Build the Perceptron Model
class Perceptron:
    
    def __init__(self, num_inputs, learning_rate=0.01):
        # Initialize the weight and learning rate
        self.weights = np.random.rand(num_inputs + 1)
        self.learning_rate = learning_rate
    
    # Define the first linear layer 
    def linear(self, inputs):
        Z = inputs @ self.weights[1:].T + + self.weights[0]
        return Z
    
    # Define the Heaviside Step function.
    def Heaviside_step_fn(self, z):
        if z>=0:
            return 1
        else:
            return 0
        
    # Define the Prediction
    def predict(self, inputs):
        Z = self.linear(inputs)
        try:
            pred = []
            for z in Z:
                pred.append(self.Heaviside_step_fn(z))
        except:
            return self.Heaviside_step_fn(Z)
        return pred
    
    # Define the Loss function
    def loss(self, prediction, target):
        loss = (prediction-target)
        return loss
    
    #Define training
    def train(self, inputs, target):
        prediction = self.predict(inputs)
        error = self.loss(prediction, target)
        self.weights[1:] += self.learning_rate * error * inputs
        self.weights[0]  += self.learning_rate * error
        
    # Fit the model
    def fit(self, X, y, num_epochs):
        for epoch in range(num_epochs):
            for inputs, target in zip(X, y):
                self.train(inputs, target)

Binary Classification on a Linearly Separable Dataset

Here, we are going to process of building, training, and evaluating a Perceptron model for binary classification using a synthetic, linearly separable dataset. It covers data preprocessing, model training, and performance evaluation. We will follow these steps:

  1. Import Libraries
  2. Generate Dataset using make_blobs()
  3. Train-Test Split with train_test_split()
  4. Scale Features using StandardScaler()
  5. Initialize Perceptron with appropriate input size
  6. Train the Model with fit() over 100 epochs
  7. Predict on test data and evaluate accuracy by comparing predictions with actual labels
  8. Visualize Results using a scatter plot
Python
# Import the necessary library
import numpy as np
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Generate a linearly separable dataset with two classes
X, y = make_blobs(n_samples=1000,
                  n_features=2, 
                  centers=2, 
                  cluster_std=3,
                  random_state=23)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y,
                                                    test_size=0.2,
                                                    random_state=23,
                                                    shuffle=True
                                                   )

# Scale the input features to have zero mean and unit variance
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Set the random seed legacy
np.random.seed(23)

# Initialize the Perceptron with the appropriate number of inputs
perceptron = Perceptron(num_inputs=X_train.shape[1])

# Train the Perceptron on the training data
perceptron.fit(X_train, y_train, num_epochs=100)

# Prediction
pred = perceptron.predict(X_test)

# Test the accuracy of the trained Perceptron on the testing data
accuracy = np.mean(pred != y_test)
print("Accuracy:", accuracy)

# Plot the dataset
plt.scatter(X_test[:, 0], X_test[:, 1], c=pred)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()

Output:

Accuracy: 0.975
Classification Result - Geeksforgeeks
Scatter Plot of the Classified Data Points

Binary Classification using Perceptron with PyTorch

In this section, we are going to implement perceptron model using PyTorch to perform binary classification on linearly separable data that is generated using make_blobs().

The steps include:

  1. Data Preparation: A synthetic dataset with two features is created, scaled, and split into training and test sets.
  2. Perceptron Model: A single-layer perceptron is implemented using PyTorch's nn.Module.
  3. Training: The Perceptron is trained using a simple learning rate and weight update rule for 10 epochs.
  4. Evaluation: The model's performance is evaluated by calculating the accuracy on the test set.
  5. Visualization: The test dataset is visualized, with the predictions color-coded for easy interpretation.
Python
# Import the necessary libraries
import torch
import torch.nn as nn
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Generate a linearly separable dataset with two classes
X, y = make_blobs(n_samples=1000,
                  n_features=2, 
                  centers=2, 
                  cluster_std=3,
                  random_state=23)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y,
                                                    test_size=0.2,
                                                    random_state=23,
                                                    shuffle=True
                                                   )

# Scale the input features to have zero mean and unit variance
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Convert the data to PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.float32, requires_grad=False)
X_test = torch.tensor(X_test, dtype=torch.float32, requires_grad=False)
y_train = torch.tensor(y_train, dtype=torch.float32, requires_grad=False)
y_test = torch.tensor(y_test, dtype=torch.float32, requires_grad=False)

# reshape the target tensor to match the predicted output tensor
y_train = y_train.reshape(-1, 1)
y_test = y_test.reshape(-1, 1)

torch.random.seed()

# Define the Perceptron model
class Perceptron(nn.Module):
    def __init__(self, num_inputs):
        super(Perceptron, self).__init__()
        self.linear = nn.Linear(num_inputs, 1)
        
    # Heaviside Step function
    def heaviside_step_fn(self,Z):
        Class = []
        for z in Z:
            if z >=0:
                Class.append(1)
            else:
                Class.append(0)
        return torch.tensor(Class)
    
    def forward(self, x):
        Z = self.linear(x)
        return self.heaviside_step_fn(Z)
        

# Initialize the Perceptron with the appropriate number of inputs
perceptron = Perceptron(num_inputs=X_train.shape[1])

# loss function
def loss(y_pred,Y):
    cost = y_pred-Y
    return cost


# Learning Rate
learning_rate = 0.001

# Train the Perceptron on the training data
num_epochs = 10
for epoch in range(num_epochs):
    Losses = 0
    for Input, Class in zip(X_train, y_train):
        # Forward pass
        predicted_class = perceptron(Input)
        error = loss(predicted_class, Class)
        Losses += error
        # Perceptron Learning Rule

        # Model Parameter
        w = perceptron.linear.weight
        b = perceptron.linear.bias

        # Matually Update the model parameter
        w = w - learning_rate * error * Input
        b = b - learning_rate * error

        # assign the weight &amp; bias parameter to the linear layer
        perceptron.linear.weight = nn.Parameter(w)
        perceptron.linear.bias   = nn.Parameter(b)
    print('Epoch [{}/{}], weight:{}, bias:{} Loss: {:.4f}'.format(
        epoch+1,num_epochs,
        w.detach().numpy(),
        b.detach().numpy(),
        Losses.item()))

# Test the accuracy of the trained Perceptron on the testing data
pred = perceptron(X_test)

accuracy = (pred==y_test[:,0]).float().mean()
print("Accuracy on Test Dataset:", accuracy.item())
    
# Plot the dataset
plt.scatter(X_test[:, 0], X_test[:, 1], c=pred)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()

Output:

Epoch [1/10], weight:[[ 0.01072957 -0.7055903 ]], bias:[0.07482227] Loss: 4.0000
Epoch [2/10], weight:[[ 0.0140219 -0.70487624]], bias:[0.07082226] Loss: 4.0000
Epoch [3/10], weight:[[ 0.0175706 -0.70405596]], bias:[0.06782226] Loss: 3.0000
. . .
Epoch [9/10], weight:[[ 0.03782528 -0.69902927]], bias:[0.05182226] Loss: 2.0000
Epoch [10/10], weight:[[ 0.04085522 -0.6981565 ]], bias:[0.04982227] Loss: 2.0000
Accuracy on Test Dataset: 0.9750000238418579
Classification Result - Geeksforgeeks
Scatter Plot of the Classified Data Points

Limitations of Perceptron

The Perceptron was a significant breakthrough in the development of neural networks, proving that simple networks could learn to classify patterns. However, the Perceptron model has certain limitations that can make it unsuitable for some tasks:

  • Limited to linearly separable problems
  • Struggles with convergence when handling non-separable data
  • Requires labeled data for training
  • Sensitive to input scaling
  • Lacks hidden layers for complex decision-making

To overcome these limitations, more advanced neural network architectures, such as Multilayer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs), have been developed. These models can learn more complex patterns and are widely used in modern machine learning and deep learning applications.


Next Article

Similar Reads