0% found this document useful (0 votes)
4 views

Coding and Communication in Statistics Presentation 2024

This document outlines a project focused on analyzing and visualizing a dataset related to women's clothing e-commerce sales. It details the project's requirements, dataset selection, research questions, data preparation, and planned visualizations using R programming. The goal is to derive insights into customer preferences, sales trends, and revenue generation to inform business strategies.

Uploaded by

Asif Ali
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Coding and Communication in Statistics Presentation 2024

This document outlines a project focused on analyzing and visualizing a dataset related to women's clothing e-commerce sales. It details the project's requirements, dataset selection, research questions, data preparation, and planned visualizations using R programming. The goal is to derive insights into customer preferences, sales trends, and revenue generation to inform business strategies.

Uploaded by

Asif Ali
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Coding and Communication in Statistics Presentation

2024

Student Name:

Student ID:

Submission Date: 1 November 2024

Course/Unit Name:

Instructor’s Name:
Table of Contents

Introduction……………………………..…………………………………………………………………..1

Project Requirements……………………………………………………………………………………….2

Dataset Selection and Research Questions…………………………………………………………………3

Data Preparation and Cleaning…………………………………………………….……………………….4

Data Visualization……………………………………………………………………………………….….5

Conclusion………………………………………………………………………………………………….6

Appendix………………………………………………………………………………………………...….7
1. Introduction

The purpose of this assignment is to demonstrate effective data analysis, data visualization, and
communication skills by presenting a story based on data selection This assignment will evaluate the
ability of data analysis to address and deliver specific research questions simple and engaging findings
will show an audience of peers with similar mathematical backgrounds.

In order to accomplish this task, I will:

 Identify an appropriate data set and formulate one or more research questions.
 Perform data cleansing and transformation as needed to prepare data sets for analysis.
 Use R programming to create visuals that support the story and highlight key insights.
 Create a slide presentation using Markdown to visually represent the data story and clearly
communicate the findings.

In addition, the recording of the lecture for the purpose of the evaluation will be made in a concise and
streamlined manner, approximately 5 minutes in length. This use case presentation will include R project
files, slide presentations, source files, and presentations recorded in zip file format.

2. Project Requirements

Summarize the basic requirements:

 Presentation format: R project, zip file with slide presentation, source files, and recordings.
 Presentation: Case analysis presented in an informative, peer-accessible format.
 Running Time: About 5 minutes.
 Software and tools: R programming, Markdown for slides, Zoom or OBS Studio for recording.
 Data format limitations: Specific dedicated data formats such as Mario Kart and PISA are not
used.

3. Dataset Selection and Research Questions

Dataset

The dataset for this study was women_clothing_ecommerce_sales. It contains behavioral data about an e-
commerce store that specializes in selling women’s clothing. This dataset contains a variety of key
attributes that capture important information about each system. Columns such as:

 order_id: Unique identifier for each given order.


 order_date: The date of the order.
 sku: The Stock Keeping Unit, which is a unique code for each item sold, identifies the item.
 color: The color of the accessory.
 Size: The size of the clothing item (e.g. S, M, L).
 unit_price: Price per unit of the product.
 quantity: The quantity of each item ordered in a particular transaction.
 Revenue: The total revenue generated by the sale of the product(s) in a particular order.
The dataset contains at least 528 rows, each representing a unique order transaction with all relevant data
points.This dataset provides a wealth of information to analyze sales patterns, customer preferences and
revenue, and provides e-commerce insights Business performance and customer behavior.

Research Question(s)

The following research questions will guide the research.

What colors and sizes are sold in women’s clothing?


 The purpose of this question is to identify the most popular products based on sales volume,
which can help the business optimize inventory and better meet customer needs.
How do incomes vary across dates and seasons?
 By analyzing sales over time, this question seeks to reveal seasonal trends or peak earnings
periods, enabling strategic trading strategies and cross-cutting banking strategies around peak
sales periods
What is the relationship between unit price and quantity sold?
 This question asks whether tons of products are cheap or expensive, providing insight into the
pricing strategies that can generate the most revenue.
What factors contribute most to overall revenue?
 This question will help identify the top revenue generating products, allowing the business to
prioritize high-value products and focus its promotional efforts on these products.

Understanding Dataset Selection

The women_clothing_ecommerce_sales dataset was chosen for its relevance to general retail performance
questions and its ability to tell a data-driven story about e-commerce growth This dataset enables analysis
of both business performance indices (such as income and sales volume) related to consumer buying
patterns the physical data In addition, the availability of categorical variables such as color and size
provide opportunities for a variety of attractive visuals that it will support compelling stories and
actionable ideas.

4. Data Preparation and Cleaning

In this work, the following data cleaning and preparation steps were performed to ensure accuracy and
suitability for visual analysis.

Data Cleanup Steps

 Missing Values: Checked for any missing values in key characters such as order_date, sku, color,
size, unit_price, quantity, and revenue. Missing values in critical fields such as order_date or sku
were removed in order to maintain data integrity, while missing values in critical fields were kept
or omitted where necessary

Data Type Changes:

 Change order_date to Date format so that proper time-series analysis can work.
 Columns such as unit_price and quantity were converted to numerical data sets to facilitate
calculation and integration.
 Outlier identification and removal: Outliers in unit_price and quantity were identified using
interquartile range (IQR) analysis. Excessive outliers that might indicate errors (e.g., too high
unit_price values) were checked and excluded to prevent distortion in the visualizations
 Duplicate entries: Checked for duplicate rows, especially focusing on order_id, in order to avoid
redundant data in drawings. Any duplicate orders were removed.

Data Transformation and Aggregation

Some modifications and combinations were made to answer the research questions and increase the
clarity of the visualization:

 Revenue aggregation by date: Revenue was summed for each unique order_date to obtain total
daily revenue. This combination supported the time series analysis of the revenues.
 Classification groups: Other classification changes were made where applicable, such as
grouping products by size or color to clearly identify customer preferences and facilitate bar
chart visualization

5. Data Visualization

In this section, I will demonstrate the visualization that is planned to analyze the
women_clothing_ecommerce_sales dataset. These visualizations will be created in R and designed to be
highly informative, providing insights into sales trends and consumer preferences related to
womenswear.

Planned Visualizations and Their Purposes

Bar Chart: Popular colors and sizes


 Purpose: This bar chart will show the distribution of products in different colors and sizes, and
will help us answer the question of which colors and sizes are most popular with customers
 Visualization Type: Stacked bar chart of optional color and size.
 Communication: Using contrasting colors and clear fonts will make this chart easier for
viewers to interpret quickly

Line Plot: Revenue Over Time


 Purpose: The line plot will show how earnings change over different days, highlighting each
seasonal trend or period of high sales.
 Visualization Type: Line plot with revenue on the y-axis and order date on the x-axis.
 Consistency: This system will use simple fonts with date marks, ensuring clarity to identify
sales over time.

Scatter plot: Unit price versus sales volume


 Objective: This scatter plot will reveal the relationship between product prices and sales volume,
and provide insight into the effectiveness of pricing strategies.
 Visualization Type: Scatter plot with trend line to show relationships.
 Consistency: Scatter plots will use distinguishable point colors and sizes, to increase readability.

Pie chart: Revenue from top sources


 Purpose: A pie chart will show the contribution of high-grossing factors, helping to identify
high-value products.
 Visualization Type: A pie chart showing the share of total revenue by product.
 Engagement : Using bright, contrasting colors for each component will make the layout more
visually appealing and informative.

Visual Apeal

For greater visibility, each graph will be designed with simplicity and clarity in mind. Fonts, titles and
legends will be included to facilitate interpretation, and the choice of colors will ensure that each graphic
will be visually appealing without overwhelming the audience.

6. Conclusion

To summarize, this project has been successful in analyzing and visualizing an e-commerce sales dataset
for a women clothing brand, that can help us gain some useful insights on the key business areas. We
examined trends in customer preference, product performance and revenue flows — all of this through
different types of visualizations. Colors and sizes that the bar chart conveyed as being popular had proved
crucial to know for inventory while revenue against time line plot emphasised peak sales periods. The
scatter plot for the unit price compared to the quantity sold illustrated how pricing strategies affect sales,
and the pie chart for top-selling products shows how product revenues are distributed. This visualization
was more accessible than the underlying data, and certainly easier to read not only pointed out trends that
could help inventory management, pricing strategies and marketing. Overall, choosing relatable datasets
and the associated visual to substantiate business decision was emphasized through this project. We could
build on this business model to segment customers and implement predictive modeling.

7. Appendix

R-Studio Analysis Screenshots

You might also like