Coding and Communication in Statistics Presentation 2024
Coding and Communication in Statistics Presentation 2024
2024
Student Name:
Student ID:
Course/Unit Name:
Instructor’s Name:
Table of Contents
Introduction……………………………..…………………………………………………………………..1
Project Requirements……………………………………………………………………………………….2
Data Visualization……………………………………………………………………………………….….5
Conclusion………………………………………………………………………………………………….6
Appendix………………………………………………………………………………………………...….7
1. Introduction
The purpose of this assignment is to demonstrate effective data analysis, data visualization, and
communication skills by presenting a story based on data selection This assignment will evaluate the
ability of data analysis to address and deliver specific research questions simple and engaging findings
will show an audience of peers with similar mathematical backgrounds.
Identify an appropriate data set and formulate one or more research questions.
Perform data cleansing and transformation as needed to prepare data sets for analysis.
Use R programming to create visuals that support the story and highlight key insights.
Create a slide presentation using Markdown to visually represent the data story and clearly
communicate the findings.
In addition, the recording of the lecture for the purpose of the evaluation will be made in a concise and
streamlined manner, approximately 5 minutes in length. This use case presentation will include R project
files, slide presentations, source files, and presentations recorded in zip file format.
2. Project Requirements
Presentation format: R project, zip file with slide presentation, source files, and recordings.
Presentation: Case analysis presented in an informative, peer-accessible format.
Running Time: About 5 minutes.
Software and tools: R programming, Markdown for slides, Zoom or OBS Studio for recording.
Data format limitations: Specific dedicated data formats such as Mario Kart and PISA are not
used.
Dataset
The dataset for this study was women_clothing_ecommerce_sales. It contains behavioral data about an e-
commerce store that specializes in selling women’s clothing. This dataset contains a variety of key
attributes that capture important information about each system. Columns such as:
Research Question(s)
The women_clothing_ecommerce_sales dataset was chosen for its relevance to general retail performance
questions and its ability to tell a data-driven story about e-commerce growth This dataset enables analysis
of both business performance indices (such as income and sales volume) related to consumer buying
patterns the physical data In addition, the availability of categorical variables such as color and size
provide opportunities for a variety of attractive visuals that it will support compelling stories and
actionable ideas.
In this work, the following data cleaning and preparation steps were performed to ensure accuracy and
suitability for visual analysis.
Missing Values: Checked for any missing values in key characters such as order_date, sku, color,
size, unit_price, quantity, and revenue. Missing values in critical fields such as order_date or sku
were removed in order to maintain data integrity, while missing values in critical fields were kept
or omitted where necessary
Change order_date to Date format so that proper time-series analysis can work.
Columns such as unit_price and quantity were converted to numerical data sets to facilitate
calculation and integration.
Outlier identification and removal: Outliers in unit_price and quantity were identified using
interquartile range (IQR) analysis. Excessive outliers that might indicate errors (e.g., too high
unit_price values) were checked and excluded to prevent distortion in the visualizations
Duplicate entries: Checked for duplicate rows, especially focusing on order_id, in order to avoid
redundant data in drawings. Any duplicate orders were removed.
Some modifications and combinations were made to answer the research questions and increase the
clarity of the visualization:
Revenue aggregation by date: Revenue was summed for each unique order_date to obtain total
daily revenue. This combination supported the time series analysis of the revenues.
Classification groups: Other classification changes were made where applicable, such as
grouping products by size or color to clearly identify customer preferences and facilitate bar
chart visualization
5. Data Visualization
In this section, I will demonstrate the visualization that is planned to analyze the
women_clothing_ecommerce_sales dataset. These visualizations will be created in R and designed to be
highly informative, providing insights into sales trends and consumer preferences related to
womenswear.
Visual Apeal
For greater visibility, each graph will be designed with simplicity and clarity in mind. Fonts, titles and
legends will be included to facilitate interpretation, and the choice of colors will ensure that each graphic
will be visually appealing without overwhelming the audience.
6. Conclusion
To summarize, this project has been successful in analyzing and visualizing an e-commerce sales dataset
for a women clothing brand, that can help us gain some useful insights on the key business areas. We
examined trends in customer preference, product performance and revenue flows — all of this through
different types of visualizations. Colors and sizes that the bar chart conveyed as being popular had proved
crucial to know for inventory while revenue against time line plot emphasised peak sales periods. The
scatter plot for the unit price compared to the quantity sold illustrated how pricing strategies affect sales,
and the pie chart for top-selling products shows how product revenues are distributed. This visualization
was more accessible than the underlying data, and certainly easier to read not only pointed out trends that
could help inventory management, pricing strategies and marketing. Overall, choosing relatable datasets
and the associated visual to substantiate business decision was emphasized through this project. We could
build on this business model to segment customers and implement predictive modeling.
7. Appendix