Internship Report (Data Science)
Internship Report (Data Science)
Belgaum, Karnataka-590018
Bachelor of Engineering in
Computer Science and Engineering
Submitted by
CHAITHANYA HG
1HK21CS031
Under the Guidance of
CERTIFICATE
Certified that the Internship work entitled “Data Science with AIML” carried out by
Ms. Chaithanya HG, 1HK21CS031, a bonafide student of HKBK College of
Engineering in partial fulfillment for the award of Bachelor of Engineering / Bachelor
of Technology in Computer Science and Engineering, of the Visvesvaraya
Technological University, Belgaum during the year 2023 - 24. It is certified that all
corrections/suggestions indicated for Internal Assessment have been incorporated in the
Report deposited in the departmental library.
II
ACKNOWLEDGEMENT
First of all, I would take this opportunity to express my heartfelt gratitude to the personalities
of HKBK College of Engineering, Mr. C M Ibrahim, Chairman, HKBKGI and Mr. C M
Faiz, Director, HKBKGI for providing facilities throughout the course.
I express my sincere gratitude to Dr. Mohammed Riyaz Ahmed, Principal, HKBCE for his
support which inspired us towards the attainment of knowledge.
We are grateful to Prof. Seema Shivapur and Prof. J Mary Stella., Assistant Professors,
Department of Computer Science and Engineering for providing us useful insights,
corrections and valuable guidance.
I would also like to thank my external guide Mr. Harsha G H from Cranes Varsity for
giving me an opportunity to work as an Intern in the field of Data Science with AIML.
Finally, I thank Almighty, all the staff members of CSE Department, our family members
and friends for their constant support and encouragement in carrying out the Internship
work.
CHAITHANYA HG
[1HK21CS031]
IV
ABSTRACT
The Amazon Trending Books Data Science Project further delves into sentiment analysis of
book reviews to gauge reader satisfaction and its impact on sales. Natural Language Processing
(NLP) techniques are utilized to process and analyze textual data, extracting sentiments and
key themes from customer reviews. Additionally, clustering algorithms are applied to segment
books into different categories based on their features, enabling a more granular understanding
of market segments and reader preferences. The project also incorporates time series analysis
to study the temporal dynamics of book sales, identifying seasonal patterns and cyclical trends.
This analysis helps in understanding how external factors such as holidays, literary awards, and
media adaptations influence book sales. Furthermore, the project explores the use of
recommendation systems to suggest trending books to users based on their reading history and
preferences, enhancing the personalized shopping experience on Amazon.
To ensure the robustness of the findings, the project employs cross-validation techniques and
rigorous statistical testing. The insights derived from the analysis are then compiled into
comprehensive reports and dashboards, providing actionable intelligence for stakeholders. This
holistic approach not only advances the understanding of book market dynamics but also
highlights the interdisciplinary nature of data science, combining aspects of web scraping, data
engineering, machine learning, and business analytics. Overall, the Amazon Trending Books
Data Science Project exemplifies the transformative potential of data science in the digital
marketplace, offering a blueprint for leveraging data to drive business decisions and enhance
customer engagement. Through the practical application of Python and data science
methodologies, the project underscores the critical role of data in navigating and thriving in the
competitive landscape of online retail
V
TABLE OF CONTENTS
ACKNOWLEDGEMENT IV
ABSTRACT V
TABLE OF CONTENTS VI
LIST OF FIGURES VII
CHAPTER 1
COMPANY PROFILE 02
CHAPTER 2
ABOUT THE PROJECT 05
CHAPTER 3
TECHNICAL DESCRIPTIONS 08
CHAPTER 4
DESIGN MODEL 13
CHAPTER 5
SPECIFIC OUTCOMES 17
CHAPTER 6
SCREENSHOTS 20
SUMMARY 24
REFERENCES 25
VI
LIST OF FIGURES
CHAPTER 1
COMPANY PROFILE
Cranes Varsity is a pioneer Technical Training institute turned EdTech Platform offering
Technology educational services for over 25 years. A division of Cranes Software International
Ltd, Cranes Varsity was established with an ambitious vision of bridging the gap between the
technology academia and the industry. The team continuously strives to be an organization
that brings together technology and education, empowering aspiring professionals to seek
assured placements and a lucrative career path. Being a trusted partner of over 5000+ reputed
Academia, Corporate & Defence Organizations we have successfully trained 1 Lakh+
engineers via its network of 2000+ Universities & colleges and placed 70,000+ engineers at
major Indian Corporate & MNCs. Over 50,000+ Alumnae testifying our legacy and are the
great ambassadors of Cranes Varsity Brand through their jobs worldwide.
Cranes Varsity carries a legacy of being the Authorized-training partner for Texas Instruments,
MathWorks, Wind River & ARM. Cranes Varsity has training leadership in EMBEDDED,
MATLAB & DSP, extending training domains to emerging industry trends like Automotive,
IoT, VLSI, Java full-stack, Data Science & Business Analytics. Cranes Varsity offers training
to Graduates – under the Finishing School Model, Industry connects University programs,
Upskilling programs for Working Professionals, and Customized training to Corporate &
Defence sectors. Cranes Varsity’s high-impact hands-on technology training catapults
engineering students, graduates, and working professionals to be quickly employable in Niche
high-end engineering fields. The in-house placement team further ensures that these students
get placed in leading corporate firms – with whom Cranes Varsity has decades-old
relationships. We stand by our principle –We Assist Until We Place. Being a trusted
recruitment & training partner with Corporate, we engage with them for the “Hire, Train &
Deploy” Model.
Cranes Varsity offers an array of high-end technology training in Embedded & Automotive
Systems, C, C++, MATLAB, RTOS, Linux, LDD, BSP, Embedded Testing, IoT Architecture,
Protocols – Edge node Computing, Gateway & Security with industrial IoT, DSP &
MATLAB, VLSI design, Java technologies, Cloud Computing, Azure, Python, Data Science
& Analytics, Tableau, Artificial Intelligence with Machine Learning, Deep Learning, NLP,
Business Intelligence and more, Learning Approach Model is EEE – Educate, Evolve,
Employment through our Pedagogical practices that integrate Learning Management Systems
(LMSS). They continuously aim for our participants’ satisfaction and placement commitment
through focused Training by our Subject-Matter Experts and Professionals.
CHAPTER 2
ABOUT THE PROJECT
Introduction
Introduction With the emerging rise of technology today, the dependency on e-commerce and
the online payments has grown exponentially. As the credit card provides convenience to the
users but frauds caused due to these activities causes inconvenience. The credit card
information is confidential, the bank and the other financial enterprises doesn't want to disclose
the information about their customers. Risk management is critical for financial enterprises to
survive in such competing industry.
Objectives
The primary objectives of this project include:
2.Enhance Detection Efficiency: Improve the efficiency of fraud detection systems to promptly
identify and mitigate fraudulent activities in real-time.
3.Reduce False Positives: Minimize false positive alerts to prevent inconveniencing genuine
cardholders while maintaining high detection rates for fraudulent transactions.
4.Handle Imbalanced Data: Address the imbalance between fraudulent and non-fraudulent
transactions in the dataset by employing techniques such as oversampling, under sampling, or
using algorithms designed for imbalanced data.
5. Ensure Scalability: Create models that are scalable and capable of handling large volumes
of transactions, ensuring robust performance as transaction volumes grow.
Methodology
1.Define Objectives: Clearly articulate the goals of the project, such as reducing fraud losses,
improving detection accuracy, or minimizing false positives.
2.Data Collection: Gather relevant data sources, including historical credit card transaction
data that contains both fraudulent and non-fraudulent transactions.
3.Data Cleaning: Handle missing values, duplicate entries, and outliers that may adversely
affect model performance.
4.Feature Engineering: Extract relevant features from the data that can help distinguish
between fraudulent and legitimate transactions. This may include transaction amount, time of
day, location, etc.
5.Normalization/Scaling: Normalize or scale numerical features to ensure uniformity and
improve model convergence during training.
6.Visualize Data: Use tools like histograms, box plots, and scatter plots to understand the
distribution of features and identify potential patterns or anomalies.
7.Model Training: Train multiple models using the selected algorithms on the pre-processed
data, using techniques like cross-validation to assess model performance and mitigate
overfitting.
Expected Outcomes
The expected outcomes of a credit card fraud detection project encompass several key
objectives aimed at bolstering security, efficiency, and reliability in financial transactions. By
leveraging advanced machine learning algorithms, the project aims to significantly improve
detection accuracy, thereby reducing the incidence of fraudulent transactions slipping through
undetected. This enhancement will not only safeguard financial institutions from substantial
monetary losses but also fortify customer trust by minimizing disruptions caused by false
positives.
CHAPTER 3
TECHNICAL DESCRIPTION
This chapter explores the technical details of the Amazon trending books data analysis
and visualization project. It outlines the methodologies, tools, and technologies used
throughout the project, offering a comprehensive understanding of the processes involved in
data collection, processing, analysis, and visualization.
This technical description outlines the systematic approach and methodologies involved in
developing a credit card fraud detection machine learning project, emphasizing data
preprocessing, model selection, evaluation, deployment, maintenance, security, compliance,
and scalability. Adjustments may be made based on specific project requirements,
organizational constraints, and technological advancements.
CHAPTER 4
DESIGN MODEL
This chapter explores design model provides a structured approach to analyze and
visualize the dataset of trending books, exploring various aspects such as authors, genres,
prices, and ratings. It also integrates machine learning for predictive analysis, aiming to provide
deeper insights into trends and patterns within the dataset
Import libraries
Data Visualization
and Exploration
Machine Learning
integration
Visualization of
Model Results
• Python Libraries:
• Plotting Functions: Use Matplotlib and Seaborn to visualize insights such as:
# Descriptive statistics
print(df.describe())
# Visualization of genres
df2_top5_genres = df['genre'].value_counts().head(5)
plt.barh(df2_top5_genres.index, df2_top5_genres.values, color="blue")
plt.xlabel('Count')
plt.ylabel('Genre')
plt.title('Top 5 Book Genres')
plt.show()
✓ Author Analysis: Calculate the number of books and points for each author based on
their ranks (df.groupby().sum()).
✓ Genre Analysis: Count occurrences of each genre and visualize the top genres
(pd.value_counts()).
✓ Price and Rating Analysis: Identify the most expensive books (df.sort_values('book
price', ascending=False).head(5)) and authors with the highest average ratings
✓ (df[['rating','author']].groupby('author').mean().sort_values('rating',ascending=False).h
ead(10)).
5. Machine Learning
✓ Actual vs Predicted Ratings: Plot a scatter plot (plt.scatter()) to compare actual ratings
against predicted ratings.
✓ Residuals Analysis: Plot a histogram (plt.hist()) to visualize the distribution of
residuals (difference between actual and predicted ratings), assessing the model's fit.
Chapter 5
SPECIFIC OUTCOMES
This chapter explores structured approach and leveraging Python libraries for
data analysis and machine learning, stakeholders can derive actionable insights that
drive business decisions in the dynamic book market. These outcomes enable informed
strategies for marketing, inventory management, pricing, and overall business growth,
aligning with current trends and consumer preferences in the industry
1. Market Insights:
✓ Genre Popularity: Identify the most popular genres based on frequency counts
and visualize their distribution.
✓ Author Performance: Determine top authors by the number of books and
average ratings, understanding their impact on book trends.
✓ Price Analysis: Discover the most expensive books and their genres, providing
insights into pricing strategies and consumer behavior.
2. Predictive Analysis
3. Visual Insights:
4. Strategic Decision-Making:
5. Business Impact:
6. Future Planning:
CHAPTER 6
SCREENSHOTS
credit_card_data.info()
credit_card_data.isnull().sum()
Figure 6 Finding the average values and making all non-null values
SUMMARY
The Credit Card Fraud Detection Machine Learning Project aims to develop a robust system
using advanced algorithms to accurately identify fraudulent transactions. Leveraging historical
credit card transaction data, the project involves comprehensive data preprocessing, including
cleaning, feature engineering, and normalization. Machine learning models such as logistic
regression, random forests, and gradient boosting are trained and evaluated to achieve high
detection accuracy while minimizing false positives. The deployment of the model integrates
with real-time transaction processing systems, enabling prompt detection and response to
suspicious activities. Continuous monitoring and optimization ensure the system adapts to
evolving fraud patterns, enhancing security, operational efficiency, and compliance with
regulatory standards.
This summary encapsulates the key objectives, methodologies, and expected outcomes of a
typical Credit Card Fraud Detection Machine Learning Project, highlighting its significance in
financial security and operational excellence.
REFERENCES
➢ https://www.cranesvarsity.com/
➢ https://www.kaggle.com/code/hainescity/amazon-s-top-100-trending-
books-inspect-and-eda
➢ https://www.datacamp.com/blog/predictive-analytics-guide
➢ https://colab.research.google.com/drive/1SsWPvVb7SdNtqjtY4FRk
o-ixcoVcgeUL