IT REPORT
IT REPORT
Bachelor of Technology in
Department of Computer Science & Engineering
(2024-25)
ACKNOWLEDGEMENT
I take this opportunity to express a deep sense of gratitude towards Satyam Mishra,
for providing excellent guidance, encouragement and inspiration through-out the
practical training. Without his invaluable guidance, this practical training report
would never have been a successful one.
I also want to give thanks to all the faculty members of department of Computer
Science Engineering and my family for their love, support and encouragement
during this time. I want to thank my dear friends, the ones that remain here and the
ones that have already left, for making this period such an extraordinary experience
in my life.
Last but not least, my heartiest thanks to all the persons, who have been a pillar of support
during thearduous time of training.
Satwik Garg
i
PREFACE
The present report is the outcome of the offline practical training on “DATA SCIENCE”
provided by “AU FUTURE SKILL ACADEMY”. The objective of this training isto learn
about DATA SCIENCE with practical applications and to get a masterful grip on it. This
course provided me with hands-on experience and exposure to developing Front-End
applications for browsers. This course also helped in building a strong foundation on Front-
End which provided me with the tools to build a responsive web application.
This report explains the concepts learnt during the internship along with the description of
howthese technologies are used for the creation of the project.
ii
TABLE OF CONTENTS
PAGE
PARTICULARS
NO.
Title Page I
Candidate’s Declaration ii
Training Certificate from Company iii
Certificate by the Department iv
Acknowledgment v
Table of Contents vi
List of Figures vii
Abstract 1
Chapter 1: Introduction 2-4
About company 2
Training Platform 2
Training Starting Date 2
Training Ending Date 2
Total Training Duration 2
Date of Certification 2
Training Pictures/Images 3
Conclusion 4
5-6
Chapter2: Technical Training Platform
Introduction 5
Reason for selecting this platform 6
Profile of organization 5-6
Conclusion
6
Chapter 3: Overview of Technology Learn 7-13
Python 7-8
Machine learning 8-13
iii
LIST OFFIGURES
2 Picture of Dashbaord 3
3 Picture of Website 12
4 Picture of Website 13
4 Training project 15
5 Training project 15
6 Training project 16
7 Training project 16
8 Training project 17
9 Training project 17
8 Training project 18
9 Pic of PyCharm 19
10 Pic of Website 19
iv
ABSTRACT
Data science is an increasingly vital field that empowers organizations to derive valuable insights from data. Python,
a versatile and powerful programming language, has emerged as a preferred choice for data scientists due to its rich
ecosystem of libraries and tools. This course, "Data Science" is designed to equip participants with the essential
skills and knowledge to harness the potential of Python for data analysis, machine learning, and data visualization.
Throughout this course, participants will delve into the core concepts and techniques of data science, from data
cleaning and preprocessing to advanced machine learning algorithms. They will gain hands-on experience with
popular Python libraries, including NumPy, pandas, Matplotlib, and scikit learn, enabling them to manipulate,
analyze, and visualize data effectively. By the end of this course, participants will have a strong foundation in data
science using Python, enabling them to make data-driven decisions, build predictive models, and present their
findings in a compelling and informative manner. It is one of the fastest growing programming languages in the
world and is used by software developers, mathematicians, data analysts, scientists, network engineers, students and
accountants. It is called "loose typing". In this category of programming languages, when you define a function, you
don't have to specify the useful collation or the types of variables that the function will return before you create it.
Python's simple, easy-to-understand syntax emphasizes readability, making the program less valuable to maintain.
Python supports modules and packages to encourage program modularity and code reuse Keywords: Data Science,
Python, Data Analysis, Machine Learning, Data Visualization, Data Preprocessing, NumPy, pandas, Matplotlib,
scikit-learn, Data Exploration, Data Cleaning, Predictive Models, Real-World Projects, Descriptive Statistics,
Supervised Learning, Unsupervised Learning, Hands-On Experience, Career Advancement, Practical Skill
1
CHAPTER 1
INTRODUCTION
DATA SCIENCE
1.1 Introduction about the Internship:
The 8 weeks Summer Internship for 2024-25 Session was on “Data Science” From Celebal Technologies
Pvt. Ltd. It is a globally accepted IT consultant and solutions provider and a leader in open-source platform
design and e-commerce technology. Our core strength lies in IT services like custom business software
application development, e-commerce solutions, web development, mobile application, graphics designing,
animation, and installation of content management systems. The aim is to acquire knowledge and skills
that will be applicable to a wide range of software development profession and to take on the challenges of
modern web development, keeping up with the competitiveness in the tech industry and contribute to the
creation of innovative and intuitive online applications.
Aim:
The aim of a summer internship in data science is to provide hands-on experience in analyzing and
interpreting complex data. Interns typically learn to work with data sets, apply machine learning
algorithms, and use tools like Python, R, and SQL to solve real-world problems.
2
1.3 Conclusion:
The data science at AU skill academy provided invaluable practical exposure to real- world data science
projects. Working with diverse data sets, machine learning models, and advanced analytics tools enhanced
both my technical and problem-solving skills. The internship fostered a deeper understanding of how data
science is applied to business solutions, particularly in areas like predictive analytics, automation, and AI-
driven decision-making.
Through mentorship and collaboration with industry professionals, I developed a strong foundation in the
end-to-end data science process from data collection and preprocessing to model deployment and
performance evaluation. This experience has significantly strengthened my career prospects and fueled my
passion for a future in the data science field. The dynamic work culture at AU S kill ac ademy ,
alongwith its innovative approaches, has been truly inspiring.
3
CHAPTER 2
2.1 Introduction:
For the 56 days Internship for making us industry ready I did the training on subject titled “Data
Science” from the company AU skills Academy, Jaipur.
Our mentor was Mr. Satyam Mishra, who gave us in depth knowledge of each topic of Data Science and
helped us getting our doubts being cleared and guiding us for our improvement.
4
solution they come up with.
Mission:
Its goal is to empower businesses and organizations to use IT technology wisely to increase their
chances in their particular industries. It aims to align many aspects of your company's operations and
core beliefs with the ideas for bigger effects.
Vision:
Its vision is the capacity to disseminate the excellent work or goods produced by your company or
group fast and efficiently. The goal is to assist IT entrepreneurs nationwide and lower the startlingly
high percentage of software project failures.
2,2 Conclusion:
In conclusion, the decision to select the AU Skill academy platform was driven by a combination
of factors that promise a rewarding and enriching learning experience.
The program's comprehensive curriculum, experienced instructors, practical approach, and
reputation for delivering high-quality experience make it a standout choice.
By choosing this platform, I have faith that I have got the abilities and information required for both
professional and personal advancement.
This opportunity also enabled me to further expand my creativity while seizing the profession ethical
values as basis to venture into professional career in the future and working in the industry
environment will surely help me become an asset to any esteemed organization or industry.
5
CHAPTER 3
OVERVIEW OF TECHNOLOGY LEARN
3.1 Introduction:
In chapter 3, the main points considered are technological description and its exposure levelin company.
3.2 Technology Description
3.2.1 Python:
Python is an object-oriented programming language which had gain its popularity due to its easy syntax
and vast collection of libraries which cuts off a lot of hard work and helps us to achieve our goals in much
easier way. It was created by Guido van Rossum, back in the year 1991.
It is widely used for:
• Web development (server-side)
• Machine Learning
• Data Analysis
• System scripting
Although, there are lots of programming languages still python find its own place between them.
Following are the main reasons for considering python over other programming languages:
• Python has powerful syntax which can execute certain tasks in much easier way thus
• reducing the number of lines to code.
• Python is very reliable and easy to understand as its syntax is close to the English language.
• Python plays an important role in handling big data which lies in the peak of the current IT industry
6
Installation and documentation
If you use Mac OS X or Linux, then Python should already be installed on your computer by default. If
not, you can download the latest version by visiting the Python home page, at http://www.python.org
where you will also find loads of documentation and other useful information. Windows users can also
download Python at this website. Don’t forget this website; itis your first point of reference for all things
Python. You will find there, the excellent Python Tutorial by Guido van Rossum. You may find it useful to
read along in the Tutorial as a supplement to this document.
More than one variable getting assigned in one variable is the most basic concept Multiple Variable
Assignment. As we all know that the Variable are used for memory allocation to store values so here in the
Python language we need not to assign or give any kind of function or data-types i.e. int, float, char etc.
Just variable name and give their values.
7
Library files or modules in python
A library file consists of one or more functions with their definitions. There is a huge collection of modules
in python which can be imported by writing following code: >>>Import module name. To create our own
module, we first write or define a function and then save the given file in py format with the name same as
the module we want to create. Now to import the created module we must save the module file in the same
place where the user file or required file is stored.
• All these various types of operations can be performed with integers by assigning them they can be
Added, Subtract, Multiplied, divided etc.
• The only operator uses for the strings is Addition. Yes, String can only be added any other.
operation can’t be done with strings.
• Here, we multiply a string and an integer number to get that number of strings as an output.
NOTE: 1. Using the square brackets, we can find the index of a particular element of anything like String,
List etc.
2. Python does not count last word so including the space also in any variable we add one index number
more.
For Loop:
This loop will give you basically a series in numbers starting from one index going to another index very
effective or evaluating something like Data structure or series of number.
Now, I want to access the value or print out any single value. This is also called Range function because
using this we can perform the problems regarding range too. We can use this loop in list and tuple also.
While Loop:
This loop is basically use for changing the path of loop. That continues to run until the condition which is
applied inside the loop becomes false. Loop will be infinite if our condition will not get false. In these
loops, we have three control statements
• Break – This statement is used in order to determinate while loop or for loop.
• Continue – This is used to skip the remaining code and go back to head of the loop.
• Pass – It’s a filler pseudo statement, we can put any one of code just to looks nice.
8
Functions
These are the essence of any programming language. They allow us to re-use your code more efficiently
and save a lot of space and lot of time in your code. In Python IDEL we have to save a file in new window
to save any programmed and then write its code in that same file. After this we can use that programmed
file anywhere during coding. This use ‘def’ to define the function in file. In Python, we face some in-built
function also i.e. abs (), bool (), dir() , eval(), exec(), help(), etc.
9
CHAPTER 4
PROJECT DESCRIPTION
This project focuses on building a personalized movie recommendation system using the
dataset from TMDB (The Movie Database). The system utilizes two key datasets:
tmdb_5000_credits_new.csv: Includes data about the movie's cast, crew, and other
production-related information.
The goal of this project is to recommend movies to users based on various factors like genres,
popularity, keywords, and other movie attributes. The approach integrates both content-based
filtering and collaborative filtering techniques, leveraging data such as:
By combining these methods, the recommendation engine aims to suggest movies that align
closely with a user's preferences, improving their overall viewing experience.
1. Data Loading:
o The first step is to import the necessary libraries such as Pandas and NumPy.
o The movie metadata and credits data are loaded from CSV files (tmdb_5000_movies.csv
and tmdb_5000_credits_new.csv) using pandas.read_csv().
10
2. Data Preprocessing:
o Data Cleaning: Handling missing values, formatting data (e.g., splitting genres, keywords),
and merging relevant datasets.
o Data Transformation: Extracting necessary columns (like genres, popularity, overview,
etc.), creating new columns for model input, and processing text-based data (e.g., movie
descriptions, keywords).
o Feature Engineering: Creating new features based on the available data such as keyword-
based tags or genre vectors.
Initial analysis and visualization of data to understand its structure, distribution, and key patterns.
EDA may include plotting genre distribution, analyzing popularity, and understanding user
behavior.
Content-Based Filtering: This model recommends movies based on similarity in content, such as
genre, keywords, or the movie's description (overview).
o Techniques like TF-IDF (Term Frequency-Inverse Document Frequency) and cosine
similarity are often used here to calculate how similar two movies are based on their
descriptions or tags.
Collaborative Filtering (optional): This model recommends movies based on user interactions and
ratings. It may use matrix factorization techniques such as Singular Value Decomposition
(SVD) or Alternating Least Squares (ALS).
5. Similarity Calculations:
The similarity between movies (based on their metadata such as genres, keywords, etc.) is
computed using cosine similarity or other similarity metrics.
The system then suggests movies based on how closely they match the user's interests or the
current movie.
6. Generating Recommendations:
After computing the similarity, the system ranks the movies based on the score and recommends
the top N movies to the user.
This could involve suggesting popular or trending movies or personalized recommendations based
on user preferences.
7. Evaluation:
The effectiveness of the recommendation system is evaluated using metrics such as precision,
recall, or Mean Squared Error (MSE) (if collaborative filtering is used).
In some cases, the system might be tested on real user interactions to check how relevant the
recommendations are.
11
8. Deployment (Optional):
The recommendation system could be integrated into a web application using technologies like
Flask or Django for serving recommendations in real-time.
Technologies Used:
1. Programming Language:
o Python: The primary programming language used in this project due to its rich ecosystem
for data science and machine learning.
2. Libraries/Frameworks:
o Pandas: For data loading, cleaning, and manipulation.
o NumPy: For numerical computations.
o Scikit-learn: For implementing machine learning models such as TF-IDF vectorization,
cosine similarity, and possibly collaborative filtering algorithms.
o NLTK or spaCy: For text processing and natural language processing tasks (optional, if
more advanced text handling is required).
o Matplotlib/Seaborn: For data visualization during the exploratory data analysis (EDA)
phase.
3. Recommendation Algorithms:
o Content-Based Filtering: Utilizing TF-IDF (or other vectorization techniques) and cosine
similarity.
o Collaborative Filtering: If used, Scikit-learn’s SVD or ALS from libraries like Surprise or
LightFM.
4. Data Sources:
o TMDB Datasets: Containing movie metadata and credits information for training the
recommendation model.
12
Fig.1: Training project
13
Fig.2: Training project
14
Fig.4 Training project
15
Fig.6: Pic of PyCharm
16
Fig.8: Pic of Website
17
CHAPTER 5
CONCLUSION
All the projects in this internship were successfully designed and is tested for accuracy and quality. During
internship we have accomplished all the objectives and projects meet the needs of the desire solution. The
developed will be used in“Text to Speech conversion, in Performing any task with the help of Python
Script, Analysis of amount in investment information for the concerned requests.
Machine learning may detect patterns in seemingly unstructured or unconnected data, allowing conclusions
and predictions to be made. Tech businesses that acquire user data can utilize strategies to transform that
data into valuable or profitable information.
Machine learning has also made inroads into the transportation industry, such as with driverless cars. It is
simple to lower the number of accidents with the use of driverless cars.
For example, with driverless cars, training data is supplied to the algorithm, and the data is examined using
data Science approaches, such as the speed limit on the highway, busy streets, etc.
• Image Recognition
• Speech Recognition
• Internet Search
• Fraud Detection
• Recommendation System
18
REFERENCES
[1] https://www.python.org/
[2] https://en.wikipedia.org/wiki/Python_(programming_language)
[3] https://www.programiz.com/
[4] https://www.tutorialspoint.com/python/index.htm
[5] https://realpython.com/python-gui-tkinter/
[6] https://stackoverflow.com/
[7] https://www.geeksforgeeks.org/
[8] https://docs.python.org/3/library/tkinter.html