ml file syllabus
ml file syllabus
AIM:
Exploring and demonstrating python
THEORY:
Python is a high-level, interpreted programming language known for its simplicity
and readability. It is widely used in various fields such as web development, data analysis,
machine learning, automation, and more. Python's syntax is designed to be easy to read and
write, making it an excellent choice for beginners and experienced programmers alike.
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
def greet(self):
return f'Hello, my name is {self.name} and I am {self.age} years old.'
# Create an instance of the class
person = Person('Adi', 19)
print(person.greet())
2. Functions: Functions are blocks of reusable code that perform a specific task. They allow
for modular and organized code, making it easier to manage and debug. Python functions
are defined using the def keyword followed by the function name and parameters.
def add(a, b):
return a + b
def subtract(a, b):
return a - b
3. SciPy stands for "Scientific Python" and is an open-source Python library used for
scientific
and technical computing. It builds on NumPy and provides a large collection of mathematical
algorithms and convenience functions, making it easier to perform scientific and engineering
tasks. Here are a few key components of SciPy:
1
1. Linear Algebra: Provides functions for matrix operations, solving linear systems,
eigenvalue problems, and more.
2. Optimization: Contains functions for finding the minimum or maximum of functions
(optimization), including linear programming and curve fitting.
3. Integration: Offers methods for calculating integrals, including numerical integration and
ordinary differential equations (ODE) solvers.
4. Statistics: Includes functions for statistical distributions, hypothesis testing, and
descriptive statistics.
5. Signal Processing: Provides tools for filtering, signal analysis, and Fourier transforms.
Linear Algebra-
import numpy as np
from scipy import linalg
Creating a matrix
A = np.array([[1, 2], [3, 4]])
This code demonstrates how to compute the determinant of a matrix and solve a linear system
of
equations using SciPy's linear algebra module
OUTPUT
Statistics
from scipy import stats
# Creating a dataset
data = np.array([1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5])
# Performing a t-test
t_stat, p_value = stats.ttest_1samp(data, 3)
print("T-statistic:", t_stat)
print("P-value:", p_value)
This code demonstrates how to compute descriptive statistics and perform a t-test using
SciPy's
statistics module.
OUTPUT
iris = datasets.load_iris()
X = iris.data
y = iris.target
3
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
clf = SVC(kernel='linear')
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
This code demonstrates how to load the Iris dataset, preprocess the data, train a Support
Vector
Machine (SVM) classifier, make predictions, and evaluate the model.
OUTPUT
iris = datasets.load_iris()
X = iris.data
4
plt.show()
This code demonstrates how to load the Iris dataset, train a KMeans clustering model, and
visualize the clusters.
OUTPUT
This code demonstrates how to use cross-validation to evaluate the performance of a Random
Forest classifier on the Iris dataset.
OUTPUT
This code demonstrates how to apply Min-Max scaling to a sample dataset to normalize the
features.
OUTPUT:
6
EXPERIMENT - 2
AIM:
Perform Data Preprocessing like outlier detection, handling missing value, analyzing
redundancy and normalization on different datasets.
THEORY:
Data preprocessing is a crucial step in the machine learning pipeline. It ensures that the data
fed into models is clean, consistent, and formatted appropriately. Poor data quality can
significantly degrade the performance of machine learning algorithms.
CODE
# missing values
import pandas as pd
df = pd.read_csv('students.csv')
# outlier detection
import pandas as pd
df = pd.read_csv('salaries.csv')
salaries = df['Salary']
Q1 = salaries.quantile(0.25)
Q3 = salaries.quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
7
outliers = df[(salaries < lower_bound) | (salaries > upper_bound)]
# redundancy analysis
import pandas as pd
df = pd.read_csv('products.csv')
redundant_columns = []
for col1 in df.columns:
for col2 in df.columns:
if col1 != col2 and df[col1].equals(df[col2]):
redundant_columns.append((col1, col2))
# normalization
import pandas as pd
df = pd.read_csv('athletes.csv')
8
OUTPUT
9
EXPERIMENT - 3
AIM:
Write a program to implement Linear Regression using any appropriate dataset.
THEORY:
What is Linear Regression?
Linear Regression is a supervised machine learning algorithm used to model the relationship
between a dependent variable (target) and one or more independent variables (features). It
assumes a linear relationship between the variables — that is, the change in the target
variable is proportional to the change in the feature variable(s).
The goal of Linear Regression is to find the best-fitting straight line (regression line) that
minimizes the difference between the actual data points and the predicted values from the
model.
It is commonly used for predictive analysis, such as estimating sales, prices, or salaries based
on certain inputs.
CODE
import pandas as pd
from sklearn.linear_model import LinearRegression
10
OUTPUT
11
EXPERIMENT - 4
AIM:
Write a program to exhibit the working of the decision tree based ID3 algorithm. With the
help of appropriate data set build the decision tree and classify a new sample.
THEORY:
ID3 is a decision tree algorithm developed by Ross Quinlan. It builds a decision tree from a
dataset by using a top-down, greedy approach to select the attribute that maximizes
Information Gain.
At each step, ID3 selects the feature that maximizes Information Gain. This means it chooses
the attribute that best separates the data into classes.
CODE
import pandas as pd
import numpy as np
import math
from collections import Counter
data = [
['Sunny', 'Hot', 'High', 'Weak', 'No'],
['Sunny', 'Hot', 'High', 'Strong', 'No'],
['Overcast', 'Hot', 'High', 'Weak', 'Yes'],
['Rain', 'Mild', 'High', 'Weak', 'Yes'],
['Rain', 'Cool', 'Normal', 'Weak', 'Yes'],
['Rain', 'Cool', 'Normal', 'Strong', 'No'],
['Overcast', 'Cool', 'Normal', 'Strong', 'Yes'],
['Sunny', 'Mild', 'High', 'Weak', 'No'],
['Sunny', 'Cool', 'Normal', 'Weak', 'Yes'],
['Rain', 'Mild', 'Normal', 'Weak', 'Yes'],
['Sunny', 'Mild', 'Normal', 'Strong', 'Yes'],
['Overcast', 'Mild', 'High', 'Strong', 'Yes'],
['Overcast', 'Hot', 'Normal', 'Weak', 'Yes'],
['Rain', 'Mild', 'High', 'Strong', 'No']
]
12
def entropy(target_col):
values, counts = np.unique(target_col, return_counts=True)
return -np.sum([(counts[i]/np.sum(counts)) * math.log2(counts[i]/np.sum(counts)) for i in
range(len(values))])
weighted_entropy = np.sum([
(counts[i]/np.sum(counts)) * entropy(data.where(data[split_attribute_name] ==
vals[i]).dropna()[target_name])
for i in range(len(vals))
])
if len(np.unique(data[target_attribute_name])) <= 1:
return np.unique(data[target_attribute_name])[0]
elif len(data) == 0:
return np.unique(original_data[target_attribute_name])[
np.argmax(np.unique(original_data[target_attribute_name], return_counts=True)[1])
]
elif len(features) == 0:
return parent_node_class
else:
parent_node_class = np.unique(data[target_attribute_name])[
np.argmax(np.unique(data[target_attribute_name], return_counts=True)[1])
]
13
for value in np.unique(data[best_feature]):
sub_data = data.where(data[best_feature] == value).dropna()
subtree = ID3(sub_data, original_data, features, target_attribute_name,
parent_node_class)
tree[best_feature][value] = subtree
return tree
features = list(df.columns)
features.remove('PlayTennis')
tree = ID3(df, df, features)
print("Decision Tree:", tree)
14
EXPERIMENT - 5
AIM:
Write a program to demonstrate the working of the decision tree based C4.5 algorithm.
With the help of data set used in above experiment build the decision tree and classify a new
sample.
THEORY:
C4.5 is a decision tree algorithm developed by Ross Quinlan as an extension of ID3. It
addresses many of ID3’s limitations, especially around continuous data, pruning, and
overfitting.
It is widely used for classification problems and forms the basis for more advanced
algorithms like C5.0 and Random Forest.
C4.5 improves over ID3’s Information Gain by using Gain Ratio, which penalizes attributes
with many values.
Advantages of C4.5
● Can handle both categorical and numerical data.
● Deals with missing values.
● Uses pruning to improve generalization.
● Uses Gain Ratio to prevent bias toward many-valued attributes.
● Widely used and robust for practical classification problems.
CODE
import pandas as pd
import numpy as np
import math
df = pd.read_csv('play_tennis.csv')
def entropy(target_col):
values, counts = np.unique(target_col, return_counts=True)
return -np.sum([(counts[i]/np.sum(counts)) * math.log2(counts[i]/np.sum(counts)) for i in
range(len(values))])
15
def gain_ratio(data, split_attribute_name, target_name="PlayTennis"):
ig = info_gain(data, split_attribute_name, target_name)
si = split_info(data, split_attribute_name)
return ig / si if si != 0 else 0
weighted_entropy = np.sum([
(counts[i]/np.sum(counts)) * entropy(data.where(data[split_attribute_name] ==
vals[i]).dropna()[target_name])
for i in range(len(vals))
])
elif len(data) == 0:
return np.unique(original_data[target_attribute_name])[
np.argmax(np.unique(original_data[target_attribute_name], return_counts=True)[1])
]
elif len(features) == 0:
return parent_node_class
else:
parent_node_class = np.unique(data[target_attribute_name])[
np.argmax(np.unique(data[target_attribute_name], return_counts=True)[1])
]
return tree
features = list(df.columns)
features.remove('PlayTennis')
tree = C45(df, df, features)
print("C4.5 Decision Tree:", tree)
# Example prediction
new_sample = {'Outlook': 'Sunny', 'Temperature': 'Cool', 'Humidity': 'High', 'Wind': 'Strong'}
prediction = classify(new_sample, tree)
print("Prediction for new sample:", prediction)
OUTPUT
17
EXPERIMENT - 6
AIM:
Write a program to demonstrate the working of decision tree based CART algorithm.
Build the decision tree and classify a new sample using suitable dataset. Compare the
performance with that of ID, C4.5, and CART in terms of accuracy, recall, precision and
sensitivity.
THEORY:
ID3 Algorithm
ID3 (Iterative Dichotomiser 3) is one of the earliest decision tree algorithms. It uses
Information Gain as the splitting criterion, which tends to favor attributes with many distinct
values. ID3 works only with categorical data and does not handle missing values. It also lacks
pruning, which makes it prone to overfitting, especially on noisy datasets.
● Splitting criterion: Information Gain
● Data types supported: Categorical only
● Missing value handling: Not supported
● Pruning: Not performed
● Output tree: Multi-way
● Performance (general):
● Accuracy: Around 85–90%
● Precision: Approximately 0.82
● Recall/Sensitivity: Approximately 0.84
● F1-Score: Around 0.83
C4.5 Algorithm
C4.5 is an improvement over ID3, also developed by Ross Quinlan. It uses Gain Ratio as the
splitting criterion, which corrects the bias seen in Information Gain. C4.5 supports both
categorical and continuous features and can handle missing values effectively. It performs
post-pruning, which helps prevent overfitting and improves generalization.
● Splitting criterion: Gain Ratio
● Data types supported: Categorical and continuous
● Missing value handling: Supported
● Pruning: Post-pruning is applied
● Output tree: Multi-way
● Performance (general):
● Accuracy: Around 88–93%
● Precision: Approximately 0.86
● Recall/Sensitivity: Approximately 0.89
● F1-Score: Around 0.87
CART Algorithm
CART (Classification and Regression Trees) is a binary decision tree algorithm that uses the
Gini Index to determine the best splits. It supports both categorical and continuous features
18
and handles missing values well. CART constructs strictly binary trees and is capable of both
classification and regression, making it more versatile. It also includes a cost-complexity
pruning mechanism to avoid overfitting.
● Splitting criterion: Gini Index
● Data types supported: Categorical and continuous
● Missing value handling: Supported
● Pruning: Cost-complexity pruning is applied
● Output tree: Binary only
● Performance (general):
● Accuracy: Around 87–92%
● Precision: Approximately 0.85
● Recall/Sensitivity: Approximately 0.87
● F1-Score: Around 0.86
CODE
import pandas as pd
import numpy as np
from collections import Counter
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
df = pd.read_csv('play_tennis.csv')
data = df.values.tolist()
headers = df.columns.tolist()
19
left.append(row)
else:
right.append(row)
return left, right
def get_split(dataset):
class_values = list(set(row[-1] for row in dataset))
best_index, best_value, best_score, best_groups = 999, None, 999, None
for index in range(len(dataset[0])-1):
for row in dataset:
groups = test_split(index, row[index], dataset)
gini = gini_index(groups, class_values)
if gini < best_score:
best_index, best_value, best_score, best_groups = index, row[index], gini, groups
return {'index': best_index, 'value': best_value, 'groups': best_groups}
def to_terminal(group):
outcomes = [row[-1] for row in group]
return max(set(outcomes), key=outcomes.count)
def entropy(data):
labels = [row[-1] for row in data]
counter = Counter(labels)
total = len(data)
return -sum((count/total) * np.log2(count/total) for count in counter.values())
def majority_class(data):
return Counter([row[-1] for row in data]).most_common(1)[0][0]
# Evaluate models
def evaluate_model(model_type):
true_labels = [row[-1] for row in data]
predictions = []
if model_type == 'ID3':
tree = id3(data, list(range(len(data[0]) - 1)))
for row in data:
pred = predict_tree(tree, row)
predictions.append(pred if pred else majority_class(data))
elif model_type == 'C4.5':
tree = c45(data, list(range(len(data[0]) - 1)))
for row in data:
pred = predict_tree(tree, row)
predictions.append(pred if pred else majority_class(data))
elif model_type == 'CART':
tree = build_tree(data, max_depth=5, min_size=1)
for row in data:
predictions.append(predict(tree, row))
print(f"{model_type} Results:")
print("Accuracy:", round(accuracy, 3))
print("Precision:", round(precision, 3))
print("Recall / Sensitivity:", round(recall, 3))
print("F1-Score:", round(f1, 3))
print()
evaluate_model('ID3')
evaluate_model('C4.5')
evaluate_model('CART')
23
OUTPUT
24
EXPERIMENT - 7
AIM:
Build an Artificial Neural Network by implementing the Backpropagation algorithm and test
the same using appropriate data sets.
THEORY:
An Artificial Neural Network is a computational model inspired by the structure and
functioning of the biological brain. It is a key technique in the field of machine learning and
deep learning, used for recognizing complex patterns and solving problems like
classification, regression, and prediction.
Backpropagation:
Forward Pass: Input is passed through the network to generate output.
Loss Calculation: Difference between predicted and actual output is calculated using a loss
function (e.g., Mean Squared Error).
Backward Pass: Gradients are calculated using the chain rule to update weights and biases
(Backpropagation).
Weight Update: Weights are updated using gradient descent.
CODE
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(x):
return x * (1 - x)
# XOR Dataset
X = np.array([[0, 0],
[0, 1],
[1, 0],
[1, 1]])
y = np.array([[0],
[1],
[1],
[0]])
input_layer_neurons = 2
hidden_layer_neurons = 2
25
output_neurons = 1
np.random.seed(1)
wh = np.random.uniform(size=(input_layer_neurons, hidden_layer_neurons))
bh = np.random.uniform(size=(1, hidden_layer_neurons))
wo = np.random.uniform(size=(hidden_layer_neurons, output_neurons))
bo = np.random.uniform(size=(1, output_neurons))
epochs = 10000
learning_rate = 0.1
for i in range(epochs):
# Forward Propagation
hidden_input = np.dot(X, wh) + bh
hidden_output = sigmoid(hidden_input)
# Backward Propagation
error = y - predicted_output
d_predicted_output = error * sigmoid_derivative(predicted_output)
error_hidden_layer = d_predicted_output.dot(wo.T)
d_hidden_layer = error_hidden_layer * sigmoid_derivative(hidden_output)
OUTPUT
26
EXPERIMENT - 8
AIM:
Write a program to implement the Naïve Bayesian classifier for appropriate dataset and
compute the performance measures of the model.
THEORY:
Naïve Bayes is a probabilistic machine learning algorithm based on Bayes’ Theorem,
particularly useful for classification tasks. It assumes that all features are independent of each
other, which is often not true in practice, but still gives good results—hence the name
"naïve."
Bayes’ Theorem:
P(H | X) = [P(X | H) * P(H)] / P(X)
Where:
CODE
from collections import defaultdict
import math
dataset = [
['Sunny', 'Hot', 'High', 'Weak', 'No'],
['Sunny', 'Hot', 'High', 'Strong', 'No'],
['Overcast', 'Hot', 'High', 'Weak', 'Yes'],
['Rain', 'Mild', 'High', 'Weak', 'Yes'],
['Rain', 'Cool', 'Normal', 'Weak', 'Yes'],
['Rain', 'Cool', 'Normal', 'Strong', 'No'],
['Overcast', 'Cool', 'Normal', 'Strong', 'Yes'],
['Sunny', 'Mild', 'High', 'Weak', 'No'],
['Sunny', 'Cool', 'Normal', 'Weak', 'Yes'],
['Rain', 'Mild', 'Normal', 'Weak', 'Yes'],
['Sunny', 'Mild', 'Normal', 'Strong', 'Yes'],
['Overcast', 'Mild', 'High', 'Strong', 'Yes'],
['Overcast', 'Hot', 'Normal', 'Weak', 'Yes'],
['Rain', 'Mild', 'High', 'Strong', 'No']
]
27
X = [row[:-1] for row in dataset]
y = [row[-1] for row in dataset]
classes = set(y)
for i in range(total_samples):
label = y[i]
label_probs[label] += 1
for j in range(len(X[i])):
feature_value = X[i][j]
feature_probs[j][feature_value][label] += 1
# Prediction
def predict_naive_bayes(sample, label_probs, feature_probs):
scores = {}
for label in label_probs:
log_prob = math.log(label_probs[label])
for i in range(len(sample)):
value = sample[i]
if value in feature_probs[i] and label in feature_probs[i][value]:
log_prob += math.log(feature_probs[i][value][label])
else:
log_prob += math.log(1e-6)
scores[label] = log_prob
return max(scores, key=scores.get)
# Train model
label_probs, feature_probs = train_naive_bayes(X, y)
OUTPUT
29
EXPERIMENT - 9
AIM:
Write a program to implement k-Nearest Neighbor algorithm to classify any dataset of your
choice. Print both correct and wrong predictions.
THEORY:
k-Nearest Neighbor is a supervised machine learning algorithm used for classification and
regression tasks. It is instance-based or lazy learning, meaning it doesn't learn a model during
training, but rather stores the training data and makes decisions during prediction. It classifies
a data point based on how its neighbors are classified.
CODE
import numpy as np
from collections import Counter
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# k-NN algorithm
def knn_predict(X_train, y_train, test_row, k):
distances = []
for i in range(len(X_train)):
dist = euclidean_distance(test_row, X_train[i])
distances.append((dist, y_train[i]))
distances.sort(key=lambda x: x[0])
k_nearest_labels = [label for (_, label) in distances[:k]]
most_common = Counter(k_nearest_labels).most_common(1)
return most_common[0][0]
iris = load_iris()
X = iris.data
y = iris.target
# Split dataset
30
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
OUTPUT
31
EXPERIMENT - 10
AIM:
Apply k-Means clustering algorithm on suitable datasets and comment on the quality of
clustering.
THEORY:
What is K-Means?
K-Means is an unsupervised learning algorithm used for clustering data into groups (called
clusters). It groups data points such that those in the same cluster are more similar to each
other than to those in other clusters.
It is widely used in market segmentation, pattern recognition, image compression, and other
applications where labeled data is not available.
Key Concepts
Unsupervised: No labeled output is required; the algorithm tries to discover natural
groupings.
K: The number of clusters you want to divide your data into.
Centroid: The center of a cluster. It’s the average of all points in the cluster.
CODE
import csv
import random
import math
def load_dataset(filename):
with open(filename, 'r') as file:
reader = csv.reader(file)
next(reader) # skip header
dataset = []
for row in reader:
income = float(row[2])
score = float(row[3])
dataset.append([income, score])
return dataset
32
clusters = [[] for _ in centroids]
for point in dataset:
distances = [euclidean_distance(point, centroid) for centroid in centroids]
cluster_idx = distances.index(min(distances))
clusters[cluster_idx].append(point)
return clusters
def update_centroids(clusters):
new_centroids = []
for cluster in clusters:
if cluster:
mean = [sum(col) / len(col) for col in zip(*cluster)]
new_centroids.append(mean)
else:
new_centroids.append([0] * len(clusters[0][0])) # placeholder
return new_centroids
if __name__ == "__main__":
dataset = load_dataset("customers.csv")
clusters, centroids, wcss = k_means(dataset, k=3)
📉
print(f"Cluster {i+1}: {len(cluster)} customers")
print(f"\n WCSS (lower is better): {wcss:.2f}")
33
OUTPUT
34
EXPERIMENT - 11
AIM:
Write a program to implement ensemble algorithms - AdaBoost and Bagging using the
appropriate dataset and evaluate their performance on that dataset.
THEORY:
What are Ensemble Methods?
Ensemble methods are machine learning techniques that combine the predictions of multiple
models to produce a more accurate and robust prediction than any single model could
achieve. By aggregating several learners, ensemble methods help reduce overfitting, variance,
and bias.
CODE
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier, BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
# Load dataset
iris = load_iris()
X, y = iris.data, iris.target
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# AdaBoost
ada_model = AdaBoostClassifier(n_estimators=50)
35
ada_model.fit(X_train, y_train)
ada_preds = ada_model.predict(X_test)
print(f"AdaBoost Accuracy: {accuracy_score(y_test, ada_preds):.2f}")
OUTPUT
36
EXPERIMENT - 12
AIM:
Select any two datasets based on their statistics and perform comparison among all the
implemented algorithms using them.
THEORY:
CODE
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris, fetch_california_housing
from sklearn.linear_model import LogisticRegression, LinearRegression, Ridge
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.svm import SVC, SVR
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, precision_recall_fscore_support,
confusion_matrix, mean_squared_error, r2_score
39
OUTPUT
40
41
EXPERIMENT - 13
AIM:
Conduct survey (of at least five) different machine learning tools available.
THEORY:
Machine learning tools are software platforms or libraries that provide functionalities to
build, train, evaluate, and deploy machine learning models. These tools can range from
programming libraries to complete no-code platforms and are essential for data scientists and
ML engineers.
SURVEY
Each machine learning tool surveyed in this experiment has unique strengths and is suited for
specific use cases:
● Scikit-learn is highly suitable for beginners and practitioners working with classical
machine learning algorithms such as regression, classification, and clustering. Its
simplicity and extensive documentation make it ideal for educational purposes and
prototyping.
● PyTorch is preferred in academic and research environments due to its flexibility and
dynamic computation graph, making it easier to debug and experiment with custom
deep learning models.
● TensorFlow is a powerful tool for building and deploying deep learning models at
scale. It is especially well-suited for production environments due to its robust
deployment features, including TensorFlow Serving and TensorFlow Lite.
● Google AutoML is designed for non-technical users or those looking for rapid model
development without the need for in-depth knowledge of machine learning. It
automates most of the model-building pipeline, including preprocessing, training, and
deployment.
42
● Weka is a GUI-based tool that is easy to use and valuable for educational purposes
and initial data analysis. However, it lacks the capabilities needed for modern deep
learning tasks.
In summary, the selection of a machine learning tool should be based on the specific
requirements of the task, including the complexity of the model, the level of user expertise,
and the intended deployment environment.
43