0% found this document useful (0 votes)

12 views87 pages

report

Uploaded by

srivastavaamritansh02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views87 pages

report

Uploaded by

srivastavaamritansh02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 87

INDUSTRIAL TRAINING REPORT

“Data Science with Machine Learning”

“Sofcon Scortek Private Limited”

Submitted in partial fulfillment of the requirements

for the award of degree of

Bachelor of Technology
In
“Computer Science & Engineering”

Submitted By: -
Amritansh Srivastava
2105250100007

BUDDHA INSTITUTE OF TECHNOLOGY

(Affiliated to Dr. A.P.J. Abdul Kalam Technical University, Lucknow)
CL-1, Sector-7, GIDA, GORAKHPUR
INDUSTRIAL TRAINING REPORT

“Data Science with Machine Learning”

“Sofcon Scortek Private Limited”

Submitted in partial fulfillment of the requirements

for the award of degree of

Bachelor of Technology
In
“Computer Science & Engineering”

Submitted By: -
Nitesh Pandey
2105250100038

BUDDHA INSTITUTE OF TECHNOLOGY

(Affiliated to Dr. A.P.J. Abdul Kalam Technical University, Lucknow)
CL-1, Sector-7, GIDA, GORAKHPUR

2
INDUSTRIAL TRAINING REPORT

“Data Science with Machine Learning”

“Sofcon Scortek Private Limited”

Submitted in partial fulfillment of the requirements

for the award of degree of

Bachelor of Technology
In
“Computer Science & Engineering”

Submitted By: -
MD Arman
2105250100031

BUDDHA INSTITUTE OF TECHNOLOGY

(Affiliated to Dr. A.P.J. Abdul Kalam Technical University, Lucknow)
CL-1, Sector-7, GIDA, GORAKHPUR

3
INDUSTRIAL TRAINING REPORT

“Data Science with Machine Learning”

“Sofcon Scortek Private Limited”

Submitted in partial fulfillment of the requirements

for the award of degree of

Bachelor of Technology
In
“Computer Science & Engineering”

Submitted By: -
MD Salman
2105250100032

BUDDHA INSTITUTE OF TECHNOLOGY

(Affiliated to Dr. A.P.J. Abdul Kalam Technical University, Lucknow)
CL-1, Sector-7, GIDA, GORAKHPUR

4
DECLARATION

I, Amritansh Srivastava , Student of B. Tech final year CSE branch, declare that I
have completed my Industrial Training from 8th July to 8th August which is a part of
the curriculum at “Sofcon Scortek Private Limited ” on “ Data Science with
Machine Learning” which is submitted by me to Department of Computer Science
and Engineering, BUDDHA INSTITUTE OF TECHNOLOGY, Gorakhpur
affiliated to Dr. A.P.J.Abdul Kalam Technical University, Lucknow.

Date: 20-11-2024

Name: Amritansh Srivastava

5
ACKNOWLEDGMENT

It gives me a great sense of pleasure to present the report of Industrial Training

undertaken during B. Tech Third Year. I owe special debt of gratitude of Mr. Aman
Gupta (Trainer) at Sofcon Scortek Private Limited for his constant support/
guidance throughout the course of our work. His sincerity, thoroughness and
perseverance have been a constant source of inspiration for us. It is only his
cognizant efforts that our endeavors have seen light of the day.

I also take the opportunity to acknowledge the contribution of Mr. Ranjeet Singh
Assistant Professor at Buddha Institute of Technology, GIDA, Gorakhpur
(U.P) for his full support and assistance during the Training.

Signature Name: Amritansh Srivastava

Date : 20-11-2024 Roll No. : 2105250100007

6
CERTIFICATE

This is to certify that the report of my vocational training on “Data

Science with Machine Learning” is the work carried out by
Amritansh Srivastava studying in 7th semester in Computer Science &
Engineering branch in Buddha Institute of Technology, GIDA,
Gorakhpur affiliated to Dr. A.P.J Abdul Kalam Technical University
(U.P) India under the guidance and supervision of Aman Gupta.

To the best of my knowledge and belief the report

 Embodies the work of the candidate himself/herself.
 Has duly been completed.
 Fulfills the requirement of the ordinance relating to vocational
training/internship w.r.t. the university curriculum.

For being referred to the examiners.

Signature Signature
HOD (CSE) T&P In-Charge (CSE)

7
CONTENTS

Page No.

1. INTRODUCTION
a. About organization (one or half page) 7
b. Introduction of Training (one or half page) 8

2. DETAILS OF TRAINING 9
 Chapter 1- Introduction to Data Science and Machine Learning 10
 Chapter 2- Python for Data Science 11
 Chapter 3- Data Wrangling and Preprocessing 12
 Chapter 4- Exploratory Data Analysis (EDA) 13
 Chapter 5- Introduction to Machine Learning Algorithms 14
 Chapter 6- Model Building and Evaluation 15
 Chapter 7- Project – Heart Disease Prediction using EDA 16

3. CONCLUSION 17-19

4. REFERENCES (all) 20

8
ABSTRACT

About Sofcon Scortek Private Limited

Sofcon Scortek Private Limited is a leading training and consultancy

organization specializing in industrial automation, data science, machine
learning, and advanced technical skill development. Established with the
vision to bridge the gap between academic learning and industry
requirements, the organization provides cutting-edge training programs
designed to equip students and professionals with practical, job-oriented
skills.

The company focuses on innovative learning methodologies, offering

courses in domains like Data Science, Artificial Intelligence, Internet
of Things (IoT), and Robotics. It emphasizes hands-on experience
through project-based learning, guided by industry experts with extensive
practical exposure.

Sofcon Scortek has a strong reputation for its industry-aligned curriculum,

modern infrastructure, and experienced trainers like Mr. Aman Gupta,
ensuring trainees gain insights into real-world challenges and solutions.
The organization has established partnerships with leading companies to
provide placement assistance, making it a preferred choice for career-
oriented individuals.

Sofcon Scortek continues to empower aspiring professionals with the

skills and knowledge necessary to excel in rapidly evolving technological
landscapes.

9
Introduction to Training

The industrial training in Data Science with Machine Learning was

conducted by Sofcon Scortek Private Limited, a reputed organization
known for its practical and industry-oriented approach to skill
development. The training aimed to equip participants with
comprehensive knowledge and hands-on experience in key areas of data
analysis, machine learning, and predictive modeling.

Under the mentorship of Mr. Aman Gupta, the program focused on

mastering tools and techniques such as Python programming, NumPy,
Pandas, Matplotlib, Seaborn, and advanced concepts like data
wrangling, exploratory data analysis (EDA), feature engineering, and
model evaluation.

The training also included a capstone project, "Heart Disease Prediction,"

which allowed participants to apply their learning to real-world data,
analyze patterns, and extract actionable insights. This project not only
strengthened technical skills but also provided practical exposure to
addressing real-world challenges in the healthcare domain.

This industrial training served as a crucial stepping stone for gaining

expertise in data science and machine learning, bridging academic
knowledge with professional application.

10
Details of Training

The industrial training on Data Science with Machine Learning was

conducted at Sofcon Scortek Private Limited under the guidance of Mr.
Aman Gupta. The training program was designed to provide an in-depth
understanding of the data science and machine learning lifecycle, from
data preprocessing and exploratory data analysis (EDA) to model building
and evaluation. The training not only focused on theoretical concepts but
also provided hands-on experience through practical projects.

Below is a detailed breakdown of the training, including all the chapters

and the project undertaken during the training.

11
Chapter 1: Introduction to Data Science and Machine Learning

Objective:

 To introduce the fundamental concepts and applications of data

science and machine learning in modern industries.

Key Concepts Covered:

 Data Science Overview:

o Definition of data science and its role in decision-making.
o The interdisciplinary nature of data science, combining
statistics, computer science, and domain knowledge.
o Importance of data-driven decision-making in business,
healthcare, and other sectors.
 Machine Learning Overview:
o What is machine learning and how it fits within data science.
o Types of machine learning:
 Supervised Learning: Learning from labeled data to
make predictions or classifications (e.g., linear
regression, decision trees).
 Unsupervised Learning: Learning from data without
labels to identify patterns (e.g., clustering, PCA).
 Reinforcement Learning: Learning through rewards
and punishments (e.g., game-playing AI).
 Applications of Machine Learning:
o Healthcare (e.g., heart disease prediction, medical image
analysis).
o Finance (e.g., fraud detection, stock market predictions).
o E-commerce (e.g., recommendation systems).
o Manufacturing (e.g., predictive maintenance).

Tools and Technologies:

 Python as the main programming language for data science.

12
 Jupyter Notebooks for interactive data analysis.

Chapter 2: Python for Data Science

Objective:

 To build proficiency in Python, focusing on libraries and tools

essential for data science.

Key Concepts Covered:

 Python Basics:
o Introduction to Python syntax, data types (lists, tuples,
dictionaries, sets).
o Functions, loops, and conditionals for handling data
operations.
 Python Libraries for Data Science:
o Pandas:
 DataFrames, Series, handling CSV and Excel files, data
filtering and manipulation.
 Operations like merging, grouping, and pivoting data.
o NumPy:
 Arrays, matrix operations, and basic statistical
functions for numerical data.
o Matplotlib and Seaborn:
 Data visualization techniques, creating plots (scatter,
line, bar, histograms).
 Customizing plots and visualizing complex data
relationships.
 Data Preprocessing:
o Importing and cleaning data using Pandas.
o Handling missing data, duplicates, and irrelevant
information.
13
o Transforming data for analysis, such as converting text to
numbers.

Chapter 3: Data Wrangling and Preprocessing

Objective:

 To learn techniques for cleaning and preparing raw data for

analysis and modeling.

Key Concepts Covered:

 Handling Missing Data:

o Imputation: Replacing missing values with mean, median, or
mode.
o Dropping missing values based on specific conditions.
 Outliers and Noise Removal:
o Identifying outliers using statistical techniques (e.g., Z-
scores, IQR).
o Handling noisy data using smoothing techniques (e.g.,
moving averages).
 Data Transformation:
o Feature scaling (normalization, standardization) for
numerical data.
o Encoding categorical variables using one-hot encoding and
label encoding.
 Feature Engineering:
o Creating new features from existing ones (e.g., extracting
year from a date column).
o Combining features to form interaction terms.
 Data Splitting:
o Splitting data into training, testing, and validation sets using

train_test_split from scikit-learn.

14
Chapter 4: Exploratory Data Analysis (EDA)

Objective:

 To explore and analyze data to uncover patterns, relationships,

and insights.

Key Concepts Covered:

 Visualizing Data:
o Using Matplotlib and Seaborn to create meaningful
visualizations.
o Univariate analysis: Histograms, box plots to understand
individual feature distributions.
o Bivariate analysis: Scatter plots, heatmaps, and pair plots to
examine relationships between features.
 Statistical Analysis:
o Calculating descriptive statistics (mean, median, mode,
variance, etc.).
o Understanding the distribution of data using measures of
central tendency and spread.
 Correlation Analysis:
o Computing correlation coefficients (e.g., Pearson, Spearman)
to identify relationships between numerical features.
o Visualizing correlation matrices with heatmaps.
 Dimensionality Reduction:
o Introduction to techniques like PCA (Principal Component
Analysis) to reduce the number of features while retaining
the variance.
 Identifying Trends and Patterns:
o Exploring data for trends (e.g., seasonal patterns,
correlations between features).

15
o Identifying potential biases or imbalances in the dataset.

Chapter 5: Introduction to Machine Learning Algorithms

Objective:

 To introduce various machine learning algorithms used for

building predictive models.

Key Concepts Covered:

 Supervised Learning Algorithms:

o Linear Regression: A simple algorithm for predicting
continuous values.
o Logistic Regression: Used for binary classification tasks.
o Decision Trees: Building decision rules based on feature
values.
o Random Forest: An ensemble method that combines
multiple decision trees.
o Support Vector Machines (SVM): Finding the optimal
boundary between classes in classification problems.
 Unsupervised Learning Algorithms:
o K-means Clustering: Grouping data points into clusters
based on similarities.
o Hierarchical Clustering: Building a tree-like structure of
clusters.
o PCA (Principal Component Analysis): A method to reduce
the dimensionality of data while preserving the variance.
 Model Evaluation Techniques:
o Cross-validation: Splitting data into subsets to train and
validate models on different sets.
o Confusion Matrix: Evaluating classification models.

16
o Accuracy, Precision, Recall, F1-Score: Common performance
metrics.

Chapter 6: Model Building and Evaluation

Objective:

 To apply machine learning algorithms on datasets, evaluate their

performance, and optimize models.

Key Concepts Covered:

 Model Training and Testing:

o Training models on training data and evaluating
performance on test data.
o Split data into training and testing sets using

train_test_split.
 Hyperparameter Tuning:
o Tuning model parameters (e.g., depth of trees in decision
trees, number of clusters in K-means) for optimal
performance.
o Grid Search and Random Search techniques for finding the
best parameters.
 Model Evaluation:
o Using evaluation metrics such as accuracy, precision, recall,
F1-score, and AUC-ROC curve to compare models.
o Choosing the best model based on performance metrics.
 Model Deployment:
o Brief introduction to deploying machine learning models for
real-time predictions.

17
Chapter 7: Project – Heart Disease Prediction using EDA

Objective:

 To apply the concepts learned during the training on a real-world

dataset and build a machine learning model for predicting heart
disease.

Project Breakdown:

 Data Collection:
o The dataset, Heart Disease UCI, was imported from a CSV
file into a Pandas DataFrame.
o Features included age, gender, blood pressure, cholesterol
levels, exercise-induced angina, etc.
 Data Preprocessing:
o Handling missing values and encoding categorical variables.
o Normalizing numerical data to bring all features to a similar
scale.
 Exploratory Data Analysis (EDA):
o Analyzed the dataset using summary statistics and
visualizations to understand the relationships between
features.
o Identified key features like cholesterol levels and age that
correlated with heart disease.
 Model Building:
o Applied Logistic Regression, Random Forest, and Decision
Trees to predict heart disease.
o Tuned hyperparameters for each model to improve
performance.
 Model Evaluation:
o Evaluated models using confusion matrices, precision, recall,
F1-score, and the ROC curve.

18
o Compared the models and selected the best-performing
one.
 Final Model:
o Built and trained the best-performing model (e.g., Random
Forest) and predicted the likelihood of heart disease based
on input features.

19
Conclusion of the Training on Data Science with Machine Learning

The Data Science with Machine Learning training, conducted at

Sofcon Scortek Private Limited, provided a comprehensive overview
of essential concepts and tools needed to excel in the rapidly growing
field of data science and machine learning. The training successfully
bridged the gap between theory and practical application, equipping
participants with the skills to analyze, preprocess, and model real-world
data effectively.

Key Learning Outcomes

1. Mastery of Python for Data Science:

20
o In-depth coverage was given to different machine learning
algorithms, including Supervised Learning (e.g., Linear
Regression, Decision Trees, Random Forests) and
Unsupervised Learning (e.g., K-Means Clustering, PCA).
o Participants worked with real-world datasets to build
predictive models, optimizing them using techniques like
cross-validation, hyperparameter tuning, and performance
evaluation metrics such as accuracy, precision, recall, and
F1-score.
4. Real-World Project – Heart Disease Prediction:
o The heart disease prediction project was a key component of
the training. Participants applied their skills to a healthcare
dataset to predict whether a person is at risk of heart disease
based on various health indicators.
o The project involved several stages: data cleaning, EDA,
feature engineering, model building, and model
evaluation. They explored relationships in the data and built
machine learning models such as Logistic Regression and
Random Forest to predict the outcome.
o This hands-on project helped participants gain practical
experience and reinforced the importance of each step in the
data science workflow.

Skills Gained

 Data Manipulation: Participants can now efficiently manipulate

large datasets using Pandas and NumPy, enabling them to prepare
data for analysis and machine learning.
 Visualization: Using Matplotlib and Seaborn, they can create
compelling visualizations to interpret data and present insights
clearly.
 Modeling: With knowledge of multiple machine learning
algorithms, participants are now equipped to build, evaluate, and
optimize predictive models.

21
 Problem-Solving: The training improved their ability to tackle
real-world problems by translating business or research questions
into machine learning tasks.

Project Outcomes

The heart disease prediction project allowed participants to:

 Gain exposure to the entire data science lifecycle, from data

The project's results showcased the effectiveness of machine learning in

healthcare, highlighting its potential in predicting serious conditions like
heart disease and improving decision-making processes in healthcare
management.

Overall Impact of the Training

 Skill Development: The training played a significant role in

enhancing technical skills in Python programming, machine
learning, and data science techniques.
 Practical Knowledge: Participants gained practical, hands-on
experience by working on real-world datasets, ensuring they can
apply theoretical knowledge to solve practical problems.
 Industry Readiness: With the growing importance of data science
and machine learning in various industries, participants are now
better equipped to pursue careers or further education in data
science, AI, and machine learning.
 Confidence in Implementing Machine Learning: The training
has given participants the confidence to implement machine
learning models, conduct data analysis, and contribute to data-
driven decision-making in any field.
22
Conclusion

The training at Sofcon Scortek Private Limited has been an invaluable

learning experience. It not only enhanced technical capabilities but also
fostered a deeper understanding of data science and machine learning
concepts. The hands-on approach, combined with real-world projects,
has prepared participants to take on complex data challenges and apply
machine learning techniques to solve real-world problems effectively.
This training serves as a stepping stone toward becoming proficient in
the field of data science and machine learning.

23
References

1. W3C. (n.d.). Introduction to Data Science. W3C. Retrieved from

https://www.w3.org
2. GeeksforGeeks. (2023, October 25). Python Programming
Language for Data Science. GeeksforGeeks. Retrieved from
https://www.geeksforgeeks.org/python-programming-language-
for-data-science/
3. Python.org. (n.d.). Pandas Documentation. Python.org. Retrieved
from https://pandas.pydata.org/pandas-docs/stable/
4. Scikit-learn. (n.d.). Supervised Learning. Scikit-learn. Retrieved
from https://scikit-learn.org/stable/supervised_learning.html
5. Kaggle. (n.d.). Heart Disease UCI Dataset. Kaggle. Retrieved
from https://www.kaggle.com/ronitf/heart-disease-uci
6. Towards Data Science. (2022, April 30). Exploratory Data
Analysis (EDA) for Beginners. Towards Data Science. Retrieved
from https://towardsdatascience.com/eda-for-beginners
7. Analytics Vidhya. (2021, August 14). A Complete Guide to Data
Preprocessing. Analytics Vidhya. Retrieved from
https://www.analyticsvidhya.com

24
25
I, Amritansh Srivastava , Student of B. Tech final year CSE branch, declare that I
have completed my Industrial Training from 8th July to 8th August which is a part of
the curriculum at “Sofcon Scortek Private Limited ” on “ Data Science with
Machine Learning” which is submitted by me to Department of Computer Science
and Engineering, BUDDHA INSTITUTE OF TECHNOLOGY, Gorakhpur
affiliated to Dr. A.P.J.Abdul Kalam Technical University, Lucknow.

Date: 20-11-2024

Name: Amritansh Srivastava

26
ACKNOWLEDGMENT

It gives me a great sense of pleasure to present the report of Industrial Training

Signature Name: Amritansh Srivastava

Date : 20-11-2024 Roll No. : 2105250100007

27
CERTIFICATE

This is to certify that the report of my vocational training on “Data

To the best of my knowledge and belief the report

For being referred to the examiners.

28
Signature Signature
HOD (CSE) T&P In-Charge (CSE)

CONTENTS

Page No.

1. INTRODUCTION
a. About organization (one or half page) 7
b. Introduction of Training (one or half page) 8

3. CONCLUSION 17-19

4. REFERENCES (all) 20

29
ABSTRACT

About Sofcon Scortek Private Limited

Sofcon Scortek Private Limited is a leading training and consultancy

The company focuses on innovative learning methodologies, offering

Sofcon Scortek has a strong reputation for its industry-aligned curriculum,

Sofcon Scortek continues to empower aspiring professionals with the

skills and knowledge necessary to excel in rapidly evolving technological
landscapes.

30
Introduction to Training

The industrial training in Data Science with Machine Learning was

Under the mentorship of Mr. Aman Gupta, the program focused on

The training also included a capstone project, "Heart Disease Prediction,"

This industrial training served as a crucial stepping stone for gaining

expertise in data science and machine learning, bridging academic
knowledge with professional application.

31
Details of Training

The industrial training on Data Science with Machine Learning was

Below is a detailed breakdown of the training, including all the chapters

and the project undertaken during the training.

32
Chapter 1: Introduction to Data Science and Machine Learning

Objective:

 To introduce the fundamental concepts and applications of data

science and machine learning in modern industries.

Key Concepts Covered:

 Data Science Overview:

33
Tools and Technologies:

 Python as the main programming language for data science.

 Jupyter Notebooks for interactive data analysis.

Chapter 2: Python for Data Science

Objective:

 To build proficiency in Python, focusing on libraries and tools

essential for data science.

Key Concepts Covered:

 Python Basics:
o Introduction to Python syntax, data types (lists, tuples,
dictionaries, sets).
o Functions, loops, and conditionals for handling data
operations.
 Python Libraries for Data Science:
o Pandas:
 DataFrames, Series, handling CSV and Excel files, data
filtering and manipulation.
 Operations like merging, grouping, and pivoting data.
o NumPy:
 Arrays, matrix operations, and basic statistical
functions for numerical data.
o Matplotlib and Seaborn:
 Data visualization techniques, creating plots (scatter,
line, bar, histograms).
 Customizing plots and visualizing complex data
relationships.
 Data Preprocessing:
34
o Importing and cleaning data using Pandas.
o Handling missing data, duplicates, and irrelevant
information.
o Transforming data for analysis, such as converting text to
numbers.

Chapter 3: Data Wrangling and Preprocessing

Objective:

 To learn techniques for cleaning and preparing raw data for

analysis and modeling.

Key Concepts Covered:

 Handling Missing Data:

35
 Data Splitting:
o Splitting data into training, testing, and validation sets using

train_test_split from scikit-learn.

Chapter 4: Exploratory Data Analysis (EDA)

Objective:

 To explore and analyze data to uncover patterns, relationships,

and insights.

Key Concepts Covered:

 Visualizing Data:
o Using Matplotlib and Seaborn to create meaningful
visualizations.
o Univariate analysis: Histograms, box plots to understand
individual feature distributions.
o Bivariate analysis: Scatter plots, heatmaps, and pair plots to
examine relationships between features.
 Statistical Analysis:
o Calculating descriptive statistics (mean, median, mode,
variance, etc.).
o Understanding the distribution of data using measures of
central tendency and spread.
 Correlation Analysis:
o Computing correlation coefficients (e.g., Pearson, Spearman)
to identify relationships between numerical features.
o Visualizing correlation matrices with heatmaps.
 Dimensionality Reduction:
o Introduction to techniques like PCA (Principal Component
Analysis) to reduce the number of features while retaining
the variance.
36
 Identifying Trends and Patterns:
o Exploring data for trends (e.g., seasonal patterns,
correlations between features).
o Identifying potential biases or imbalances in the dataset.

Chapter 5: Introduction to Machine Learning Algorithms

Objective:

 To introduce various machine learning algorithms used for

building predictive models.

Key Concepts Covered:

 Supervised Learning Algorithms:

37
o Cross-validation: Splitting data into subsets to train and
validate models on different sets.
o Confusion Matrix: Evaluating classification models.
o Accuracy, Precision, Recall, F1-Score: Common performance
metrics.

Chapter 6: Model Building and Evaluation

Objective:

 To apply machine learning algorithms on datasets, evaluate their

performance, and optimize models.

Key Concepts Covered:

 Model Training and Testing:

o Training models on training data and evaluating
performance on test data.
o Split data into training and testing sets using

Objective:

 To apply the concepts learned during the training on a real-world

dataset and build a machine learning model for predicting heart
disease.

Project Breakdown:

39
o Evaluated models using confusion matrices, precision, recall,
F1-score, and the ROC curve.
o Compared the models and selected the best-performing
one.
 Final Model:
o Built and trained the best-performing model (e.g., Random
Forest) and predicted the likelihood of heart disease based
on input features.

40
Conclusion of the Training on Data Science with Machine Learning

The Data Science with Machine Learning training, conducted at

Key Learning Outcomes

5. Mastery of Python for Data Science:

41
7. Hands-On Machine Learning Techniques:
o In-depth coverage was given to different machine learning
algorithms, including Supervised Learning (e.g., Linear
Regression, Decision Trees, Random Forests) and
Unsupervised Learning (e.g., K-Means Clustering, PCA).
o Participants worked with real-world datasets to build
predictive models, optimizing them using techniques like
cross-validation, hyperparameter tuning, and performance
evaluation metrics such as accuracy, precision, recall, and
F1-score.
8. Real-World Project – Heart Disease Prediction:
o The heart disease prediction project was a key component of
the training. Participants applied their skills to a healthcare
dataset to predict whether a person is at risk of heart disease
based on various health indicators.
o The project involved several stages: data cleaning, EDA,
feature engineering, model building, and model
evaluation. They explored relationships in the data and built
machine learning models such as Logistic Regression and
Random Forest to predict the outcome.
o This hands-on project helped participants gain practical
experience and reinforced the importance of each step in the
data science workflow.

Skills Gained

 Data Manipulation: Participants can now efficiently manipulate

42
 Problem-Solving: The training improved their ability to tackle
real-world problems by translating business or research questions
into machine learning tasks.

Project Outcomes

The heart disease prediction project allowed participants to:

 Gain exposure to the entire data science lifecycle, from data

The project's results showcased the effectiveness of machine learning in

healthcare, highlighting its potential in predicting serious conditions like
heart disease and improving decision-making processes in healthcare
management.

Overall Impact of the Training

 Skill Development: The training played a significant role in

The training at Sofcon Scortek Private Limited has been an invaluable

44
References

8. W3C. (n.d.). Introduction to Data Science. W3C. Retrieved from

https://www.w3.org
9. GeeksforGeeks. (2023, October 25). Python Programming
Language for Data Science. GeeksforGeeks. Retrieved from
https://www.geeksforgeeks.org/python-programming-language-
for-data-science/
10. Python.org. (n.d.). Pandas Documentation. Python.org.
Retrieved from https://pandas.pydata.org/pandas-docs/stable/
11. Scikit-learn. (n.d.). Supervised Learning. Scikit-learn.
Retrieved from https://scikit-
learn.org/stable/supervised_learning.html
12. Kaggle. (n.d.). Heart Disease UCI Dataset. Kaggle.
Retrieved from https://www.kaggle.com/ronitf/heart-disease-uci
13. Towards Data Science. (2022, April 30). Exploratory Data
Analysis (EDA) for Beginners. Towards Data Science. Retrieved
from https://towardsdatascience.com/eda-for-beginners
14. Analytics Vidhya. (2021, August 14). A Complete Guide to
Data Preprocessing. Analytics Vidhya. Retrieved from
https://www.analyticsvidhya.com

45
I, Amritansh Srivastava , Student of B. Tech final year CSE branch, declare that I
have completed my Industrial Training from 8th July to 8th August which is a part of
the curriculum at “Sofcon Scortek Private Limited ” on “ Data Science with
Machine Learning” which is submitted by me to Department of Computer Science
and Engineering, BUDDHA INSTITUTE OF TECHNOLOGY, Gorakhpur
affiliated to Dr. A.P.J.Abdul Kalam Technical University, Lucknow.

46
Date: 20-11-2024

Name: Amritansh Srivastava

ACKNOWLEDGMENT

It gives me a great sense of pleasure to present the report of Industrial Training

47
I also take the opportunity to acknowledge the contribution of Mr. Ranjeet Singh
Assistant Professor at Buddha Institute of Technology, GIDA, Gorakhpur
(U.P) for his full support and assistance during the Training.

Signature Name: Amritansh Srivastava

Date : 20-11-2024 Roll No. : 2105250100007

CERTIFICATE

This is to certify that the report of my vocational training on “Data

48
To the best of my knowledge and belief the report
 Embodies the work of the candidate himself/herself.
 Has duly been completed.
 Fulfills the requirement of the ordinance relating to vocational
training/internship w.r.t. the university curriculum.

For being referred to the examiners.

Signature Signature
HOD (CSE) T&P In-Charge (CSE)

CONTENTS

Page No.

1. INTRODUCTION
a. About organization (one or half page) 7
b. Introduction of Training (one or half page) 8

49
 Chapter 5- Introduction to Machine Learning Algorithms 14
 Chapter 6- Model Building and Evaluation 15
 Chapter 7- Project – Heart Disease Prediction using EDA 16

3. CONCLUSION 17-19

4. REFERENCES (all) 20

ABSTRACT

About Sofcon Scortek Private Limited

Sofcon Scortek Private Limited is a leading training and consultancy

50
The company focuses on innovative learning methodologies, offering
courses in domains like Data Science, Artificial Intelligence, Internet
of Things (IoT), and Robotics. It emphasizes hands-on experience
through project-based learning, guided by industry experts with extensive
practical exposure.

Sofcon Scortek has a strong reputation for its industry-aligned curriculum,

Sofcon Scortek continues to empower aspiring professionals with the

skills and knowledge necessary to excel in rapidly evolving technological
landscapes.

Introduction to Training

The industrial training in Data Science with Machine Learning was

Under the mentorship of Mr. Aman Gupta, the program focused on

mastering tools and techniques such as Python programming, NumPy,

51
Pandas, Matplotlib, Seaborn, and advanced concepts like data
wrangling, exploratory data analysis (EDA), feature engineering, and
model evaluation.

The training also included a capstone project, "Heart Disease Prediction,"

This industrial training served as a crucial stepping stone for gaining

expertise in data science and machine learning, bridging academic
knowledge with professional application.

Details of Training

The industrial training on Data Science with Machine Learning was

52
Below is a detailed breakdown of the training, including all the chapters
and the project undertaken during the training.

Chapter 1: Introduction to Data Science and Machine Learning

Objective:

 To introduce the fundamental concepts and applications of data

science and machine learning in modern industries.

Key Concepts Covered:

 Data Science Overview:

o Definition of data science and its role in decision-making.
o The interdisciplinary nature of data science, combining
statistics, computer science, and domain knowledge.

53
o Importance of data-driven decision-making in business,
healthcare, and other sectors.
 Machine Learning Overview:
o What is machine learning and how it fits within data science.
o Types of machine learning:
 Supervised Learning: Learning from labeled data to
make predictions or classifications (e.g., linear
regression, decision trees).
 Unsupervised Learning: Learning from data without
labels to identify patterns (e.g., clustering, PCA).
 Reinforcement Learning: Learning through rewards
and punishments (e.g., game-playing AI).
 Applications of Machine Learning:
o Healthcare (e.g., heart disease prediction, medical image
analysis).
o Finance (e.g., fraud detection, stock market predictions).
o E-commerce (e.g., recommendation systems).
o Manufacturing (e.g., predictive maintenance).

Tools and Technologies:

 Python as the main programming language for data science.

 Jupyter Notebooks for interactive data analysis.

Chapter 2: Python for Data Science

Objective:

 To build proficiency in Python, focusing on libraries and tools

essential for data science.

Key Concepts Covered:

54
 Python Basics:
o Introduction to Python syntax, data types (lists, tuples,
dictionaries, sets).
o Functions, loops, and conditionals for handling data
operations.
 Python Libraries for Data Science:
o Pandas:
 DataFrames, Series, handling CSV and Excel files, data
filtering and manipulation.
 Operations like merging, grouping, and pivoting data.
o NumPy:
 Arrays, matrix operations, and basic statistical
functions for numerical data.
o Matplotlib and Seaborn:
 Data visualization techniques, creating plots (scatter,
line, bar, histograms).
 Customizing plots and visualizing complex data
relationships.
 Data Preprocessing:
o Importing and cleaning data using Pandas.
o Handling missing data, duplicates, and irrelevant
information.
o Transforming data for analysis, such as converting text to
numbers.

Chapter 3: Data Wrangling and Preprocessing

Objective:

 To learn techniques for cleaning and preparing raw data for

analysis and modeling.

55
Key Concepts Covered:

 Handling Missing Data:

train_test_split from scikit-learn.

Chapter 4: Exploratory Data Analysis (EDA)

Objective:

 To explore and analyze data to uncover patterns, relationships,

and insights.

Key Concepts Covered:

56
 Visualizing Data:
o Using Matplotlib and Seaborn to create meaningful
visualizations.
o Univariate analysis: Histograms, box plots to understand
individual feature distributions.
o Bivariate analysis: Scatter plots, heatmaps, and pair plots to
examine relationships between features.
 Statistical Analysis:
o Calculating descriptive statistics (mean, median, mode,
variance, etc.).
o Understanding the distribution of data using measures of
central tendency and spread.
 Correlation Analysis:
o Computing correlation coefficients (e.g., Pearson, Spearman)
to identify relationships between numerical features.
o Visualizing correlation matrices with heatmaps.
 Dimensionality Reduction:
o Introduction to techniques like PCA (Principal Component
Analysis) to reduce the number of features while retaining
the variance.
 Identifying Trends and Patterns:
o Exploring data for trends (e.g., seasonal patterns,
correlations between features).
o Identifying potential biases or imbalances in the dataset.

Chapter 5: Introduction to Machine Learning Algorithms

Objective:

 To introduce various machine learning algorithms used for

building predictive models.

57
Key Concepts Covered:

 Supervised Learning Algorithms:

Chapter 6: Model Building and Evaluation

Objective:

 To apply machine learning algorithms on datasets, evaluate their

performance, and optimize models.

58
Key Concepts Covered:

 Model Training and Testing:

o Training models on training data and evaluating
performance on test data.
o Split data into training and testing sets using

Chapter 7: Project – Heart Disease Prediction using EDA

Objective:

 To apply the concepts learned during the training on a real-world

dataset and build a machine learning model for predicting heart
disease.

Project Breakdown:

 Data Collection:

59
o The dataset, Heart Disease UCI, was imported from a CSV
file into a Pandas DataFrame.
o Features included age, gender, blood pressure, cholesterol
levels, exercise-induced angina, etc.
 Data Preprocessing:
o Handling missing values and encoding categorical variables.
o Normalizing numerical data to bring all features to a similar
scale.
 Exploratory Data Analysis (EDA):
o Analyzed the dataset using summary statistics and
visualizations to understand the relationships between
features.
o Identified key features like cholesterol levels and age that
correlated with heart disease.
 Model Building:
o Applied Logistic Regression, Random Forest, and Decision
Trees to predict heart disease.
o Tuned hyperparameters for each model to improve
performance.
 Model Evaluation:
o Evaluated models using confusion matrices, precision, recall,
F1-score, and the ROC curve.
o Compared the models and selected the best-performing
one.
 Final Model:
o Built and trained the best-performing model (e.g., Random
Forest) and predicted the likelihood of heart disease based
on input features.

60
Conclusion of the Training on Data Science with Machine Learning

The Data Science with Machine Learning training, conducted at

Key Learning Outcomes

9. Mastery of Python for Data Science:

61
oThe training started with a detailed introduction to Python,
the primary language for data science. Participants became
proficient in using libraries like Pandas, NumPy,
Matplotlib, and Seaborn, which are essential for data
manipulation, analysis, and visualization.
o By understanding how to manipulate data structures and
visualize relationships within the data, participants gained the
ability to clean, preprocess, and transform raw data into
insightful visual representations.
10. Data Preprocessing and Exploratory Data Analysis
(EDA):
o Emphasis was placed on data preprocessing techniques such
as handling missing data, encoding categorical variables, and
scaling numerical features.
o Participants learned how to perform Exploratory Data
Analysis (EDA), including how to use statistical methods
and visualizations to uncover patterns, trends, and anomalies
in the data.
o This skill is crucial in building a strong foundation for
machine learning models, as it helps identify the most
important features for model development.
11. Hands-On Machine Learning Techniques:
o In-depth coverage was given to different machine learning
algorithms, including Supervised Learning (e.g., Linear
Regression, Decision Trees, Random Forests) and
Unsupervised Learning (e.g., K-Means Clustering, PCA).
o Participants worked with real-world datasets to build
predictive models, optimizing them using techniques like
cross-validation, hyperparameter tuning, and performance
evaluation metrics such as accuracy, precision, recall, and
F1-score.
12. Real-World Project – Heart Disease Prediction:
o The heart disease prediction project was a key component of
the training. Participants applied their skills to a healthcare

62
dataset to predict whether a person is at risk of heart disease
based on various health indicators.
o The project involved several stages: data cleaning, EDA,
feature engineering, model building, and model
evaluation. They explored relationships in the data and built
machine learning models such as Logistic Regression and
Random Forest to predict the outcome.
o This hands-on project helped participants gain practical
experience and reinforced the importance of each step in the
data science workflow.

Skills Gained

 Data Manipulation: Participants can now efficiently manipulate

Project Outcomes

The heart disease prediction project allowed participants to:

 Gain exposure to the entire data science lifecycle, from data

preprocessing to model deployment.
 Develop a solid understanding of how to analyze and interpret
healthcare data to make informed predictions.
 Learn how to work with various machine learning models and
assess their effectiveness using performance metrics.
63
The project's results showcased the effectiveness of machine learning in
healthcare, highlighting its potential in predicting serious conditions like
heart disease and improving decision-making processes in healthcare
management.

Overall Impact of the Training

 Skill Development: The training played a significant role in

Conclusion

The training at Sofcon Scortek Private Limited has been an invaluable

64
References

15. W3C. (n.d.). Introduction to Data Science. W3C. Retrieved

from https://www.w3.org
16. GeeksforGeeks. (2023, October 25). Python Programming
Language for Data Science. GeeksforGeeks. Retrieved from
https://www.geeksforgeeks.org/python-programming-language-
for-data-science/
17. Python.org. (n.d.). Pandas Documentation. Python.org.
Retrieved from https://pandas.pydata.org/pandas-docs/stable/

65
18. Scikit-learn. (n.d.). Supervised Learning. Scikit-learn.
Retrieved from https://scikit-
learn.org/stable/supervised_learning.html
19. Kaggle. (n.d.). Heart Disease UCI Dataset. Kaggle.
Retrieved from https://www.kaggle.com/ronitf/heart-disease-uci
20. Towards Data Science. (2022, April 30). Exploratory Data
Analysis (EDA) for Beginners. Towards Data Science. Retrieved
from https://towardsdatascience.com/eda-for-beginners
21. Analytics Vidhya. (2021, August 14). A Complete Guide to
Data Preprocessing. Analytics Vidhya. Retrieved from
https://www.analyticsvidhya.com

66
I, Amritansh Srivastava , Student of B. Tech final year CSE branch, declare that I
have completed my Industrial Training from 8th July to 8th August which is a part of
the curriculum at “Sofcon Scortek Private Limited ” on “ Data Science with
Machine Learning” which is submitted by me to Department of Computer Science

67
and Engineering, BUDDHA INSTITUTE OF TECHNOLOGY, Gorakhpur
affiliated to Dr. A.P.J.Abdul Kalam Technical University, Lucknow.

Date: 20-11-2024

Name: Amritansh Srivastava

ACKNOWLEDGMENT

It gives me a great sense of pleasure to present the report of Industrial Training

68
perseverance have been a constant source of inspiration for us. It is only his
cognizant efforts that our endeavors have seen light of the day.

Signature Name: Amritansh Srivastava

Date : 20-11-2024 Roll No. : 2105250100007

CERTIFICATE

This is to certify that the report of my vocational training on “Data

Science with Machine Learning” is the work carried out by
Amritansh Srivastava studying in 7th semester in Computer Science &
69
Engineering branch in Buddha Institute of Technology, GIDA,
Gorakhpur affiliated to Dr. A.P.J Abdul Kalam Technical University
(U.P) India under the guidance and supervision of Aman Gupta.

To the best of my knowledge and belief the report

For being referred to the examiners.

Signature Signature
HOD (CSE) T&P In-Charge (CSE)

CONTENTS

Page No.

1. INTRODUCTION
a. About organization (one or half page) 7
b. Introduction of Training (one or half page) 8

2. DETAILS OF TRAINING 9
70
 Chapter 1- Introduction to Data Science and Machine Learning 10
 Chapter 2- Python for Data Science 11
 Chapter 3- Data Wrangling and Preprocessing 12
 Chapter 4- Exploratory Data Analysis (EDA) 13
 Chapter 5- Introduction to Machine Learning Algorithms 14
 Chapter 6- Model Building and Evaluation 15
 Chapter 7- Project – Heart Disease Prediction using EDA 16

3. CONCLUSION 17-19

4. REFERENCES (all) 20

ABSTRACT

About Sofcon Scortek Private Limited

Sofcon Scortek Private Limited is a leading training and consultancy

71
designed to equip students and professionals with practical, job-oriented
skills.

The company focuses on innovative learning methodologies, offering

Sofcon Scortek has a strong reputation for its industry-aligned curriculum,

Sofcon Scortek continues to empower aspiring professionals with the

skills and knowledge necessary to excel in rapidly evolving technological
landscapes.

Introduction to Training

The industrial training in Data Science with Machine Learning was

72
Under the mentorship of Mr. Aman Gupta, the program focused on
mastering tools and techniques such as Python programming, NumPy,
Pandas, Matplotlib, Seaborn, and advanced concepts like data
wrangling, exploratory data analysis (EDA), feature engineering, and
model evaluation.

The training also included a capstone project, "Heart Disease Prediction,"

This industrial training served as a crucial stepping stone for gaining

expertise in data science and machine learning, bridging academic
knowledge with professional application.

Details of Training

The industrial training on Data Science with Machine Learning was

73
and evaluation. The training not only focused on theoretical concepts but
also provided hands-on experience through practical projects.

Below is a detailed breakdown of the training, including all the chapters

and the project undertaken during the training.

Chapter 1: Introduction to Data Science and Machine Learning

Objective:

 To introduce the fundamental concepts and applications of data

science and machine learning in modern industries.

Key Concepts Covered:

74
 Data Science Overview:
o Definition of data science and its role in decision-making.
o The interdisciplinary nature of data science, combining
statistics, computer science, and domain knowledge.
o Importance of data-driven decision-making in business,
healthcare, and other sectors.
 Machine Learning Overview:
o What is machine learning and how it fits within data science.
o Types of machine learning:
 Supervised Learning: Learning from labeled data to
make predictions or classifications (e.g., linear
regression, decision trees).
 Unsupervised Learning: Learning from data without
labels to identify patterns (e.g., clustering, PCA).
 Reinforcement Learning: Learning through rewards
and punishments (e.g., game-playing AI).
 Applications of Machine Learning:
o Healthcare (e.g., heart disease prediction, medical image
analysis).
o Finance (e.g., fraud detection, stock market predictions).
o E-commerce (e.g., recommendation systems).
o Manufacturing (e.g., predictive maintenance).

Tools and Technologies:

 Python as the main programming language for data science.

 Jupyter Notebooks for interactive data analysis.

Chapter 2: Python for Data Science

Objective:

75
 To build proficiency in Python, focusing on libraries and tools
essential for data science.

Key Concepts Covered:

 Python Basics:
o Introduction to Python syntax, data types (lists, tuples,
dictionaries, sets).
o Functions, loops, and conditionals for handling data
operations.
 Python Libraries for Data Science:
o Pandas:
 DataFrames, Series, handling CSV and Excel files, data
filtering and manipulation.
 Operations like merging, grouping, and pivoting data.
o NumPy:
 Arrays, matrix operations, and basic statistical
functions for numerical data.
o Matplotlib and Seaborn:
 Data visualization techniques, creating plots (scatter,
line, bar, histograms).
 Customizing plots and visualizing complex data
relationships.
 Data Preprocessing:
o Importing and cleaning data using Pandas.
o Handling missing data, duplicates, and irrelevant
information.
o Transforming data for analysis, such as converting text to
numbers.

Chapter 3: Data Wrangling and Preprocessing

76
Objective:

 To learn techniques for cleaning and preparing raw data for

analysis and modeling.

Key Concepts Covered:

 Handling Missing Data:

train_test_split from scikit-learn.

Chapter 4: Exploratory Data Analysis (EDA)

Objective:

77
 To explore and analyze data to uncover patterns, relationships,
and insights.

Key Concepts Covered:

Chapter 5: Introduction to Machine Learning Algorithms

78
Objective:

 To introduce various machine learning algorithms used for

building predictive models.

Key Concepts Covered:

 Supervised Learning Algorithms:

Chapter 6: Model Building and Evaluation

79
Objective:

 To apply machine learning algorithms on datasets, evaluate their

performance, and optimize models.

Key Concepts Covered:

 Model Training and Testing:

o Training models on training data and evaluating
performance on test data.
o Split data into training and testing sets using

Chapter 7: Project – Heart Disease Prediction using EDA

Objective:

 To apply the concepts learned during the training on a real-world

dataset and build a machine learning model for predicting heart
disease.

80
Project Breakdown:

81
Conclusion of the Training on Data Science with Machine Learning

The Data Science with Machine Learning training, conducted at

82
Key Learning Outcomes

13. Mastery of Python for Data Science:

o The training started with a detailed introduction to Python,
the primary language for data science. Participants became
proficient in using libraries like Pandas, NumPy,
Matplotlib, and Seaborn, which are essential for data
manipulation, analysis, and visualization.
o By understanding how to manipulate data structures and
visualize relationships within the data, participants gained the
ability to clean, preprocess, and transform raw data into
insightful visual representations.
14. Data Preprocessing and Exploratory Data Analysis
(EDA):
o Emphasis was placed on data preprocessing techniques such
as handling missing data, encoding categorical variables, and
scaling numerical features.
o Participants learned how to perform Exploratory Data
Analysis (EDA), including how to use statistical methods
and visualizations to uncover patterns, trends, and anomalies
in the data.
o This skill is crucial in building a strong foundation for
machine learning models, as it helps identify the most
important features for model development.
15. Hands-On Machine Learning Techniques:
o In-depth coverage was given to different machine learning
algorithms, including Supervised Learning (e.g., Linear
Regression, Decision Trees, Random Forests) and
Unsupervised Learning (e.g., K-Means Clustering, PCA).
o Participants worked with real-world datasets to build
predictive models, optimizing them using techniques like
cross-validation, hyperparameter tuning, and performance
evaluation metrics such as accuracy, precision, recall, and
F1-score.
16. Real-World Project – Heart Disease Prediction:
83
o The heart disease prediction project was a key component of
the training. Participants applied their skills to a healthcare
dataset to predict whether a person is at risk of heart disease
based on various health indicators.
o The project involved several stages: data cleaning, EDA,
feature engineering, model building, and model
evaluation. They explored relationships in the data and built
machine learning models such as Logistic Regression and
Random Forest to predict the outcome.
o This hands-on project helped participants gain practical
experience and reinforced the importance of each step in the
data science workflow.

Skills Gained

 Data Manipulation: Participants can now efficiently manipulate

Project Outcomes

The heart disease prediction project allowed participants to:

 Gain exposure to the entire data science lifecycle, from data

preprocessing to model deployment.
 Develop a solid understanding of how to analyze and interpret
healthcare data to make informed predictions.
84
 Learn how to work with various machine learning models and
assess their effectiveness using performance metrics.

The project's results showcased the effectiveness of machine learning in

healthcare, highlighting its potential in predicting serious conditions like
heart disease and improving decision-making processes in healthcare
management.

Overall Impact of the Training

 Skill Development: The training played a significant role in

Conclusion

The training at Sofcon Scortek Private Limited has been an invaluable

85
References

22. W3C. (n.d.). Introduction to Data Science. W3C. Retrieved

from https://www.w3.org
23. GeeksforGeeks. (2023, October 25). Python Programming
Language for Data Science. GeeksforGeeks. Retrieved from
https://www.geeksforgeeks.org/python-programming-language-
for-data-science/
24. Python.org. (n.d.). Pandas Documentation. Python.org.
Retrieved from https://pandas.pydata.org/pandas-docs/stable/
86
25. Scikit-learn. (n.d.). Supervised Learning. Scikit-learn.
Retrieved from https://scikit-
learn.org/stable/supervised_learning.html
26. Kaggle. (n.d.). Heart Disease UCI Dataset. Kaggle.
Retrieved from https://www.kaggle.com/ronitf/heart-disease-uci
27. Towards Data Science. (2022, April 30). Exploratory Data
Analysis (EDA) for Beginners. Towards Data Science. Retrieved
from https://towardsdatascience.com/eda-for-beginners
28. Analytics Vidhya. (2021, August 14). A Complete Guide to
Data Preprocessing. Analytics Vidhya. Retrieved from
https://www.analyticsvidhya.com

Industrial Training Report
No ratings yet
Industrial Training Report
26 pages
Lesson Plan Components: Level and Number of Learners Timetable Fit Main Aim(s)
No ratings yet
Lesson Plan Components: Level and Number of Learners Timetable Fit Main Aim(s)
2 pages
pptfinal
No ratings yet
pptfinal
97 pages
Data Science Report RTU
No ratings yet
Data Science Report RTU
101 pages
Data Science Brochure
No ratings yet
Data Science Brochure
16 pages
Data Science IT fsdfegg
No ratings yet
Data Science IT fsdfegg
31 pages
Scaler Data Science Machine Learning Brochure
No ratings yet
Scaler Data Science Machine Learning Brochure
19 pages
Itanil
No ratings yet
Itanil
38 pages
Data Science Career Transition Program
No ratings yet
Data Science Career Transition Program
21 pages
Project file for Internship report
No ratings yet
Project file for Internship report
17 pages
IndustrialTraining Report
No ratings yet
IndustrialTraining Report
26 pages
it
No ratings yet
it
45 pages
Real Report
No ratings yet
Real Report
62 pages
Internship_Report_bgsbu
No ratings yet
Internship_Report_bgsbu
19 pages
Machine Learning-Deep Learning and Its Applications: Title
No ratings yet
Machine Learning-Deep Learning and Its Applications: Title
29 pages
Industrial Training Report: Submitted by
No ratings yet
Industrial Training Report: Submitted by
27 pages
Report
No ratings yet
Report
28 pages
CSC407_Chapter 1
No ratings yet
CSC407_Chapter 1
31 pages
Industrial Training Report
No ratings yet
Industrial Training Report
12 pages
IT REPORT
No ratings yet
IT REPORT
24 pages
Internshala Summer Training Report On Data Science
No ratings yet
Internshala Summer Training Report On Data Science
70 pages
Data-Science-Brochure-Nitttr&Webel
No ratings yet
Data-Science-Brochure-Nitttr&Webel
8 pages
Mood Management Session 3 Handout
No ratings yet
Mood Management Session 3 Handout
9 pages
Full Stack Data-Science AI, ChatGPT & Generative - 5
No ratings yet
Full Stack Data-Science AI, ChatGPT & Generative - 5
35 pages
AI, ML & DS Brochure
No ratings yet
AI, ML & DS Brochure
10 pages
MD Salman
No ratings yet
MD Salman
22 pages
Data Science Report
No ratings yet
Data Science Report
32 pages
Report Format of Inustrial Traning
No ratings yet
Report Format of Inustrial Traning
23 pages
Aiml Report
No ratings yet
Aiml Report
70 pages
Futureinternet 15 00260 v3
No ratings yet
Futureinternet 15 00260 v3
60 pages
Sushil 7th (1 PDF
No ratings yet
Sushil 7th (1 PDF
29 pages
Industrial Training Report (Sahil)
No ratings yet
Industrial Training Report (Sahil)
33 pages
AP INTERNSHIP LAST
No ratings yet
AP INTERNSHIP LAST
30 pages
Data Science Report
No ratings yet
Data Science Report
32 pages
data-science-report
No ratings yet
data-science-report
32 pages
DS ML - BROCHURE - Updated
No ratings yet
DS ML - BROCHURE - Updated
30 pages
An Industrial Training Report On Data Science
No ratings yet
An Industrial Training Report On Data Science
36 pages
PDF
No ratings yet
PDF
25 pages
Aiml Report
No ratings yet
Aiml Report
70 pages
Resume_2024-01-12_5811866
No ratings yet
Resume_2024-01-12_5811866
1 page
TRAINING REPORT Abha Shrivas 0801EC171002
No ratings yet
TRAINING REPORT Abha Shrivas 0801EC171002
17 pages
Report File (VJ)
No ratings yet
Report File (VJ)
56 pages
Data Science & Analytics Course-Brochure
No ratings yet
Data Science & Analytics Course-Brochure
8 pages
INCEPTEZ FULLSTACK DATASCIENCE, AIML, GenAI, BIGDATA AND CLOUD 2024
No ratings yet
INCEPTEZ FULLSTACK DATASCIENCE, AIML, GenAI, BIGDATA AND CLOUD 2024
48 pages
tyeps of speech acts
No ratings yet
tyeps of speech acts
10 pages
Report Data Analysis
No ratings yet
Report Data Analysis
45 pages
brochure (3)
No ratings yet
brochure (3)
13 pages
File of ML
No ratings yet
File of ML
42 pages
Data Science Report
No ratings yet
Data Science Report
32 pages
Inceptez Fullstack Datascience, Bigdata and Cloud 2021
No ratings yet
Inceptez Fullstack Datascience, Bigdata and Cloud 2021
36 pages
CSS Unit 2
No ratings yet
CSS Unit 2
15 pages
Enginerring Students
No ratings yet
Enginerring Students
4 pages
ENERGY-SCIENCE-AND-ENGINEERING-KOE043
No ratings yet
ENERGY-SCIENCE-AND-ENGINEERING-KOE043
1 page
DSML - Curriculum Brochure
No ratings yet
DSML - Curriculum Brochure
40 pages
Data Scientist Syllabus Upd
No ratings yet
Data Scientist Syllabus Upd
7 pages
Narrative Report
No ratings yet
Narrative Report
11 pages
Data Sceince and AI Training Curriculum_V4.0
No ratings yet
Data Sceince and AI Training Curriculum_V4.0
19 pages
Intellipaat's Data Science Architect Masters Course V1
No ratings yet
Intellipaat's Data Science Architect Masters Course V1
13 pages
Artificial Intelligence Training and Placement Program - Bangalore and Coimbatore
100% (1)
Artificial Intelligence Training and Placement Program - Bangalore and Coimbatore
15 pages
Assessment 2 Lesson 3
No ratings yet
Assessment 2 Lesson 3
13 pages
Sppir V1-I2
No ratings yet
Sppir V1-I2
152 pages
Data Science & Machine Deep Learning
No ratings yet
Data Science & Machine Deep Learning
17 pages
Data Science - Curriculum Brochure
No ratings yet
Data Science - Curriculum Brochure
31 pages
Upgrad Campus - Data Science & Analytics Brochure
No ratings yet
Upgrad Campus - Data Science & Analytics Brochure
11 pages
7 Types of Research Gaps in Literature Review For Class Discussion and Workshop
No ratings yet
7 Types of Research Gaps in Literature Review For Class Discussion and Workshop
10 pages
Syllabus Data Science and Ai
No ratings yet
Syllabus Data Science and Ai
14 pages
Importance of Philosophical Analysis
100% (1)
Importance of Philosophical Analysis
2 pages
Intellipaat's Data Science Architect Masters Course PDF
No ratings yet
Intellipaat's Data Science Architect Masters Course PDF
13 pages
Holler, Shovelton & Beattie 2009 PDF
No ratings yet
Holler, Shovelton & Beattie 2009 PDF
16 pages
Teaching With Free Mind Mapping Software
No ratings yet
Teaching With Free Mind Mapping Software
13 pages
Data Science Training Report.
100% (1)
Data Science Training Report.
73 pages
Training Report On Data Sciencep
No ratings yet
Training Report On Data Sciencep
80 pages
1 s2.0 S1474667017530852 Main PDF
No ratings yet
1 s2.0 S1474667017530852 Main PDF
8 pages
Class 10 - AI STUDY MATERIAL 19.08.2024 - Removed (1) - Removed
No ratings yet
Class 10 - AI STUDY MATERIAL 19.08.2024 - Removed (1) - Removed
2 pages
Categorical Reparameterization With Gumbel Softmax
No ratings yet
Categorical Reparameterization With Gumbel Softmax
13 pages
Precis Writing
No ratings yet
Precis Writing
13 pages
Name of Teacher Candidate Lesson Overview: UH COE Lesson Plan Template
No ratings yet
Name of Teacher Candidate Lesson Overview: UH COE Lesson Plan Template
5 pages
English Teaching Criteria
No ratings yet
English Teaching Criteria
2 pages
The Right Job For Your Personality
No ratings yet
The Right Job For Your Personality
2 pages
BBC Learning English - Syllabus - Lower Intermediate
100% (1)
BBC Learning English - Syllabus - Lower Intermediate
5 pages
Topic 5-ICT in Various Content Areas v22
No ratings yet
Topic 5-ICT in Various Content Areas v22
67 pages
Tema 1. Oposición Magisterio Inglés Andalucía
100% (1)
Tema 1. Oposición Magisterio Inglés Andalucía
6 pages
Informatics 1 PDF
100% (1)
Informatics 1 PDF
62 pages
Factors Affecting Reading Comprehension in Cebuano and English Language Texts
No ratings yet
Factors Affecting Reading Comprehension in Cebuano and English Language Texts
11 pages
Machine Learning Mastery for Engineers
From Everand
Machine Learning Mastery for Engineers
Abdellatif Sadeq
No ratings yet
EXample TEFL Lesson Plans
No ratings yet
EXample TEFL Lesson Plans
6 pages
Ethical Issues in Nursin Research
0% (1)
Ethical Issues in Nursin Research
13 pages
Psychoanalysis: Some of The Basic Tenets of Psychoanalysis
No ratings yet
Psychoanalysis: Some of The Basic Tenets of Psychoanalysis
7 pages
Report of Industrial Training
No ratings yet
Report of Industrial Training
22 pages
Dyslexia Intervention Plan
No ratings yet
Dyslexia Intervention Plan
14 pages
JournalofSinging Cheri Montgomery
No ratings yet
JournalofSinging Cheri Montgomery
8 pages
2 Sociological Perspective of The Self
No ratings yet
2 Sociological Perspective of The Self
2 pages
Counselling Process
No ratings yet
Counselling Process
14 pages
Training Facility Norms and Standard Equipment Lists: Volume 2---Mechatronics Technology
From Everand
Training Facility Norms and Standard Equipment Lists: Volume 2---Mechatronics Technology
Fook Yen Chong
No ratings yet
Simulated Practice Test 1 Professional Education
No ratings yet
Simulated Practice Test 1 Professional Education
15 pages