0% found this document useful (0 votes)

16 views

Data Science Intern _ Assignment

The assignment involves analyzing an eCommerce Transactions dataset through exploratory data analysis (EDA), building predictive models, and generating actionable insights. Key tasks include performing EDA and deriving business insights, creating a Lookalike Model for customer recommendations, and conducting customer segmentation using clustering techniques. Deliverables include Jupyter Notebooks, PDF reports, and a specific file naming convention for submission.

Uploaded by

saipavan3337

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

Data Science Intern _ Assignment

Uploaded by

saipavan3337

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Data Science Assignment: eCommerce

Transactions Dataset
Overview:
You are provided with an eCommerce Transactions dataset consisting of three files:
Customers.csv, Products.csv, and Transactions.csv. Your task is to perform
exploratory data analysis (EDA), build predictive models, and derive actionable insights. This
assignment will test your data analysis, machine learning, and business insight generation skills.

Customers.csv:
https://drive.google.com/file/d/1bu_--mo79VdUG9oin4ybfFGRUSXAe-WE/view?usp=sharing
Products.csv :
https://drive.google.com/file/d/1IKuDizVapw-hyktwfpoAoaGtHtTNHfd0/view?usp=sharing
Transactions.csv :
https://drive.google.com/file/d/1saEqdbBB-vuk2hxoAf4TzDEsykdKlzbF/view?usp=sharing

Files Description:
1. Customers.csv
○ CustomerID: Unique identifier for each customer.
○ CustomerName: Name of the customer.
○ Region: Continent where the customer resides.
○ SignupDate: Date when the customer signed up.
2. Products.csv
○ ProductID: Unique identifier for each product.
○ ProductName: Name of the product.
○ Category: Product category.
○ Price: Product price in USD.
3. Transactions.csv
○ TransactionID: Unique identifier for each transaction.
○ CustomerID: ID of the customer who made the transaction.
○ ProductID: ID of the product sold.
○ TransactionDate: Date of the transaction.
○ Quantity: Quantity of the product purchased.
○ TotalValue: Total value of the transaction.
○ Price: Price of the product sold.

Assignment Tasks:
Task 1: Exploratory Data Analysis (EDA) and Business Insights

1. Perform EDA on the provided dataset.

2. Derive at least 5 business insights from the EDA.
○ Write these insights in short point-wise sentences (maximum 100 words per
insight).

Deliverables:

● A Jupyter Notebook/Python script containing your EDA code.

● A PDF report with business insights (maximum 500 words).

Task 2: Lookalike Model

Build a Lookalike Model that takes a user's information as input and recommends 3 similar
customers based on their profile and transaction history. The model should:

● Use both customer and product information.

● Assign a similarity score to each recommended customer.

Deliverables:

● Give the top 3 lookalikes with there similarity scores for the first 20 customers
(CustomerID: C0001 - C0020) in Customers.csv. Form an “Lookalike.csv” which has
just one map: Map<cust_id, List<cust_id, score>>
● A Jupyter Notebook/Python script explaining your model development.

Evaluation Criteria:

● Model accuracy and logic.

● Quality of recommendations and similarity scores.
Task 3: Customer Segmentation / Clustering

Perform customer segmentation using clustering techniques. Use both profile information
(from Customers.csv) and transaction information (from Transactions.csv).

● You have the flexibility to choose any clustering algorithm and any number of clusters in
between(2 and 10)
● Calculate clustering metrics, including the DB Index(Evaluation will be done on this).
● Visualise your clusters using relevant plots.

Deliverables:

● A report on your clustering results, including:

○ The number of clusters formed.
○ DB Index value.
○ Other relevant clustering metrics.
● A Jupyter Notebook/Python script containing your clustering code.

Evaluation Criteria:

● Clustering logic and metrics.

● Visual representation of clusters.

Submission Instructions:
1. GitHub Link
○ Upload all the PDF and code files in a public GitHub repository.
2. File Naming Convention:
○ Use the following naming convention for all your files:
■ FirstName_LastName_EDA.pdf
■ FirstName_LastName_EDA.ipynb
■ FirstName_LastName_Lookalike.csv
■ FirstName_LastName_Lookalike.ipynb
■ FirstName_LastName_Clustering.pdf
■ FirstName_LastName_Clustering.ipynb
Evaluation Process:
Your submissions will be evaluated based on the following criteria:

Task Weightage

Exploratory Data Analysis 25%

Business Insights 15%

Lookalike Model 30%

Customer Segmentation 30%

Given the large number of submissions, the evaluation will be automated as much as possible.
Ensure your file formats and naming conventions are accurate to avoid disqualification.

Final Note:
This comprehensive assignment requires critical thinking and practical application of data
science concepts. Focus on creating clean, efficient code and providing meaningful insights that
can help the company improve its business strategy.

Good luck!

Rewired: The McKinsey Guide to Outcompeting in the Age of Digital and AI
From Everand
Rewired: The McKinsey Guide to Outcompeting in the Age of Digital and AI
Eric Lamarre
4.5/5 (2)
Grammar For Business Book - Table of Contents
100% (1)
Grammar For Business Book - Table of Contents
2 pages
Becoming Your Best Self
No ratings yet
Becoming Your Best Self
27 pages
Task - Case Study - DLMDSME01
No ratings yet
Task - Case Study - DLMDSME01
7 pages
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
MarketLytics DA
No ratings yet
MarketLytics DA
3 pages
assignment-1
No ratings yet
assignment-1
4 pages
IBM Cognos Business Intelligence
From Everand
IBM Cognos Business Intelligence
Dustin Adkison
No ratings yet
Microsoft Certified: Power BI Data Analyst Associate PL 300 Practice Tests
From Everand
Microsoft Certified: Power BI Data Analyst Associate PL 300 Practice Tests
CertSquad Professional Trainers
No ratings yet
QlikView Essentials
From Everand
QlikView Essentials
Sinha Chandraish
No ratings yet
Agile by Design: An Implementation Guide to Analytic Lifecycle Management
From Everand
Agile by Design: An Implementation Guide to Analytic Lifecycle Management
Rachel Alt-Simmons
No ratings yet
Big Data Visualization
From Everand
Big Data Visualization
James D. Miller
No ratings yet
Ce473 Project - Fall 2024
No ratings yet
Ce473 Project - Fall 2024
8 pages
Microsoft NAV Interview Questions: Unofficial Microsoft Navision Business Solution Certification Review
From Everand
Microsoft NAV Interview Questions: Unofficial Microsoft Navision Business Solution Certification Review
Equity Press
1/5 (1)
Implementing SugarCRM 5.x
From Everand
Implementing SugarCRM 5.x
Angel Magana
4/5 (1)
Microsoft Dynamics NAV Administration
From Everand
Microsoft Dynamics NAV Administration
Amit Sachdev
No ratings yet
Data Mining with Microsoft SQL Server 2008
From Everand
Data Mining with Microsoft SQL Server 2008
Jamie MacLennan
4/5 (1)
Walking the Design for Six Sigma Bridge with Your Customer
From Everand
Walking the Design for Six Sigma Bridge with Your Customer
Carl Cordy
No ratings yet
Manufacturing: Engineering, Management and Marketing
From Everand
Manufacturing: Engineering, Management and Marketing
S.O.T Ogaji
No ratings yet
Machine Learning Assignment-02
No ratings yet
Machine Learning Assignment-02
2 pages
Problem Statement_Usecase 1.2
No ratings yet
Problem Statement_Usecase 1.2
3 pages
Learning Qlik® Sense: The Official Guide
From Everand
Learning Qlik® Sense: The Official Guide
Christopher Ilacqua
No ratings yet
Capstones AIML and DS Capstone Projects
No ratings yet
Capstones AIML and DS Capstone Projects
6 pages
2025_DM4ML_Assign1
No ratings yet
2025_DM4ML_Assign1
6 pages
Solved Big Data and Data Science Projects
100% (1)
Solved Big Data and Data Science Projects
85 pages
a structured learning guide for becoming a Data Scientist
No ratings yet
a structured learning guide for becoming a Data Scientist
9 pages
Learning Qlik Sense®: The Official Guide - Second Edition
From Everand
Learning Qlik Sense®: The Official Guide - Second Edition
Ilacqua Christopher
3.5/5 (2)
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
From Everand
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
Georgio Daccache
No ratings yet
Professional Microsoft SQL Server 2012 Reporting Services
From Everand
Professional Microsoft SQL Server 2012 Reporting Services
Paul Turley
1/5 (1)
Learning Dynamics NAV Patterns: Create solutions that are easy to maintain, are quick to upgrade, and follow proven concepts and design
From Everand
Learning Dynamics NAV Patterns: Create solutions that are easy to maintain, are quick to upgrade, and follow proven concepts and design
Marije Brummel
No ratings yet
Microsoft Dynamics GP 2010 Implementation
From Everand
Microsoft Dynamics GP 2010 Implementation
Victoria Yudin
5/5 (2)
Webinar Draft by Ayush
No ratings yet
Webinar Draft by Ayush
4 pages
Oracle CRM On Demand Administration Essentials
From Everand
Oracle CRM On Demand Administration Essentials
Padmanabha Rao
No ratings yet
IIM PBA Assignment 2
No ratings yet
IIM PBA Assignment 2
3 pages
Book Series Increasing Productivity of Software Development, Part 2: Management Model, Cost Estimation and KPI Improvement
From Everand
Book Series Increasing Productivity of Software Development, Part 2: Management Model, Cost Estimation and KPI Improvement
Stefan Luckhaus
No ratings yet
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
From Everand
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
Brian Knight
3/5 (1)
Microsoft Dynamics AX 2012 Reporting Cookbook
From Everand
Microsoft Dynamics AX 2012 Reporting Cookbook
Kamalakannan Elangovan
No ratings yet
Agile Web Application Development with Yii1.1 and PHP5
From Everand
Agile Web Application Development with Yii1.1 and PHP5
Jeffrey Winesett
3.5/5 (1)
Building Dashboards with Microsoft Dynamics GP 2013 and Excel 2013
From Everand
Building Dashboards with Microsoft Dynamics GP 2013 and Excel 2013
Mark Polino
No ratings yet
SAP Business ONE Implementation
From Everand
SAP Business ONE Implementation
Wolfgang Niefert
No ratings yet
Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial
From Everand
Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial
David Hecksel
5/5 (2)
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Data & Analytics 300+ Powerful Prompts to Supercharge Your Workflow
From Everand
Data & Analytics 300+ Powerful Prompts to Supercharge Your Workflow
Hema
No ratings yet
Creating your MySQL Database: Practical Design Tips and Techniques
From Everand
Creating your MySQL Database: Practical Design Tips and Techniques
Marc Delisle
3/5 (1)
IBM Cognos 8 Planning
From Everand
IBM Cognos 8 Planning
Jason Edwards
No ratings yet
Productivity Algorithms
From Everand
Productivity Algorithms
Tom Austin
No ratings yet
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Business Dashboards: A Visual Catalog for Design and Deployment
From Everand
Business Dashboards: A Visual Catalog for Design and Deployment
Nils H. Rasmussen
4/5 (1)
MCS-034: Software Engineering
From Everand
MCS-034: Software Engineering
Dr. DK Sukhani
No ratings yet
Power BI
From Everand
Power BI
Vishal Mehra
No ratings yet
Raushan Dec-2023
No ratings yet
Raushan Dec-2023
2 pages
Data Analytics Project Ideas to Boost Your Resume (Chat GPT)
No ratings yet
Data Analytics Project Ideas to Boost Your Resume (Chat GPT)
3 pages
18CN627 Big Data Framework For Data Science: Centre For Excellence in Computational Engineering and Networking
No ratings yet
18CN627 Big Data Framework For Data Science: Centre For Excellence in Computational Engineering and Networking
1 page
In Tenshi PPP Tte Jum Am
No ratings yet
In Tenshi PPP Tte Jum Am
23 pages
Task-Senior Associate Consultant Role @NeenOpal
No ratings yet
Task-Senior Associate Consultant Role @NeenOpal
6 pages
Guru CV
No ratings yet
Guru CV
6 pages
Aadesh Sharma - Data Scientist and Analyst
No ratings yet
Aadesh Sharma - Data Scientist and Analyst
4 pages
Project List Data Analytics
No ratings yet
Project List Data Analytics
13 pages
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Backtrader Essentials: Building Successful Strategies with Python
From Everand
Backtrader Essentials: Building Successful Strategies with Python
Ali AZARY
No ratings yet
Extension courseware based on the ArchiMate Standard, Version 3.1 Standard by Van Haren Publishing
From Everand
Extension courseware based on the ArchiMate Standard, Version 3.1 Standard by Van Haren Publishing
Van Haren Learning Solutions a.o.
No ratings yet
Mastering Symfony
From Everand
Mastering Symfony
Sohail Salehi
No ratings yet
MELC 3 Employ The Appropriate Communicative Styles For Various Situations (Intimate, Casual, Conversational, Consultative, Frozen)
82% (11)
MELC 3 Employ The Appropriate Communicative Styles For Various Situations (Intimate, Casual, Conversational, Consultative, Frozen)
2 pages
HUMAN BEHAVIOUR @raghav Kumar
No ratings yet
HUMAN BEHAVIOUR @raghav Kumar
9 pages
LP 1 2 Rationale
No ratings yet
LP 1 2 Rationale
2 pages
Ms XBL EH0 P QZ 5 XMXQN NNKF 9 of BBN 34 U NVR N1 y Hpu 8
No ratings yet
Ms XBL EH0 P QZ 5 XMXQN NNKF 9 of BBN 34 U NVR N1 y Hpu 8
2 pages
Adverbs of Manner Lesson Plan
100% (1)
Adverbs of Manner Lesson Plan
6 pages
Analyzing Bible Difficulties
No ratings yet
Analyzing Bible Difficulties
5 pages
Engr 111 A, Section 3500 - D. Mair - Fall 2015 Syllabus
No ratings yet
Engr 111 A, Section 3500 - D. Mair - Fall 2015 Syllabus
8 pages
RPP SMK Xi
No ratings yet
RPP SMK Xi
65 pages
Data Leakage Detection and Prevention
No ratings yet
Data Leakage Detection and Prevention
6 pages
Use Modal Verbs: Can / Could / Be Able To Can/ May/ Might/ Could
No ratings yet
Use Modal Verbs: Can / Could / Be Able To Can/ May/ Might/ Could
8 pages
Verb 31
No ratings yet
Verb 31
3 pages
IELTS 1 (Ready For IELTS) Anne and Alistair in Progress
No ratings yet
IELTS 1 (Ready For IELTS) Anne and Alistair in Progress
8 pages
Note Making and Summarizing
No ratings yet
Note Making and Summarizing
11 pages
ELT Assignment
No ratings yet
ELT Assignment
3 pages
Top 25 Turkish Questions You Need To Know S1 #1 What's Your Name? in Turkish
No ratings yet
Top 25 Turkish Questions You Need To Know S1 #1 What's Your Name? in Turkish
6 pages
Lesson 10 and 11 Drama and Play Creative Writing
100% (1)
Lesson 10 and 11 Drama and Play Creative Writing
6 pages
Toeic Practice Test
No ratings yet
Toeic Practice Test
17 pages
Defining and Nondefining Relative Clauses 1232728025498530 3 PDF
No ratings yet
Defining and Nondefining Relative Clauses 1232728025498530 3 PDF
25 pages
Post Hoc Ergo Propter Hoc
No ratings yet
Post Hoc Ergo Propter Hoc
9 pages
MSP100 Rubric Assessment 2
No ratings yet
MSP100 Rubric Assessment 2
3 pages
Pidgin and Creole English PDF
No ratings yet
Pidgin and Creole English PDF
21 pages
Comprehension Assessment Instrument
No ratings yet
Comprehension Assessment Instrument
5 pages
Learning and Instruction Theory into Practice 6th Edition Margaret E. Gredler - Download the ebook now for the best reading experience
100% (1)
Learning and Instruction Theory into Practice 6th Edition Margaret E. Gredler - Download the ebook now for the best reading experience
54 pages
Adverb and Adjective
No ratings yet
Adverb and Adjective
6 pages
Progression Through High Frequency Words Valerie Thornber
No ratings yet
Progression Through High Frequency Words Valerie Thornber
26 pages
Eng 410 FINAL Exam
No ratings yet
Eng 410 FINAL Exam
5 pages
Moral Theories and Ethical Frameworks
No ratings yet
Moral Theories and Ethical Frameworks
4 pages
Chang et al. - 2023 - More Voices Persuade The Attentional Benefits of
No ratings yet
Chang et al. - 2023 - More Voices Persuade The Attentional Benefits of
20 pages

Data Science Intern _ Assignment

Uploaded by

Data Science Intern _ Assignment

Uploaded by

Data Science Assignment: eCommerce

1.​ Perform EDA on the provided dataset.

●​ A Jupyter Notebook/Python script containing your EDA code.

Task 2: Lookalike Model

●​ Use both customer and product information.

●​ Model accuracy and logic.

●​ A report on your clustering results, including:

●​ Clustering logic and metrics.

Exploratory Data Analysis 25%

Business Insights 15%

Lookalike Model 30%

Customer Segmentation 30%

You might also like

1. Perform EDA on the provided dataset.

● A Jupyter Notebook/Python script containing your EDA code.

● Use both customer and product information.

● Model accuracy and logic.

● A report on your clustering results, including:

● Clustering logic and metrics.