0% found this document useful (0 votes)
0 views

Data_Curation_Example_2

The document provides an overview of a dataset containing Amazon UK products, comprising 1,680,129 rows and 10 columns, sourced from Kaggle. It details the data curation process, including data wrangling, feature engineering, and recorded inconsistencies, while highlighting key attributes such as product categories and revenue potential. Additionally, it outlines the data table schema, specifying the types and descriptions of various fields within the dataset.

Uploaded by

Osama Rashayda
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Data_Curation_Example_2

The document provides an overview of a dataset containing Amazon UK products, comprising 1,680,129 rows and 10 columns, sourced from Kaggle. It details the data curation process, including data wrangling, feature engineering, and recorded inconsistencies, while highlighting key attributes such as product categories and revenue potential. Additionally, it outlines the data table schema, specifying the types and descriptions of various fields within the dataset.

Uploaded by

Osama Rashayda
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 2

Revenue Potential in

Amazon UK
Data Curation by Anonymous

General Dataset Information:


File Name: amz_uk_processed_data.csv
Description: Amazon UK Products Dataset 2023
Dataset Details: 1,680,129 Rows & 10 Columns
Size: 635,804KB ( 620MB )
Source: Kaggle - Dataset Link

Data Profile:
• Import Dataset to DBeaver SQLite Database
• Examine dataset through SQL
• Query essential data associated with problem statements
• Export data into Excel
• Apply Excel filter to the dataset
• Examine each attribute's unique value for inconsistency
• Notable Features ( Prior to Export )
• 217 Unique Product Category
• Large Dataset for each category
• Reviews - Concrete evidence of buyers
• isBestSeller - Possibly gauge product popularity
• broughtInLastMonth - Possible gauge for potential revenues

• Recorded Inconsistency:
• Some records consist of text strings in numerical fields.
• Exaggerated unit prices are suspected to be outliers and are test
listings.
• Many records have exaggerated unit prices with no reviews.
• Some records have exaggerated unit prices with many reviews.
• Data Wrangling:
• Using SQL, sorted attributes by field types. Some attributes contain
text string fields when they should be numerical. 10 Records were
filtered.

• Feature Engineered “RevenueByReviews” column by multiplying


Reviews and Price attributes together.

• Feature Engineered “PotentialRevenue” column by multiplying


BroughtInLastMonth and Price attributes together.

• Feature Engineered “Year Recorded” column, extracted the year


from the “Date Recorded” attribute.

• Filter “CategoryName” for any records that pertain to “Art Supply”.


Kids' Art & Craft Supplies, Handmade Artwork, Arts & Crafts, and
Handmade Gifts category filtered.

• Isolate CategoryName records and export the results to Excel.


46,324 records were isolated.

• Data Table Schema:


• Field • Type • Description
• ASIN • STRIN • Product Identifier from Amazon
G
• title • STRIN • Title of the Product
G
• imgUrl • STRIN • URL of the product image
G
• productURL • STRIN • URL of the product amazon page
G
• stars • REAL • Product Rating
• reviews • INTEG • Number of Reviews
ER
• price • REAL • Current unit price of the product
• isBestSeller • BOOL • Whether the product had the Amazon
EAN BestSeller status
• boughtInLastM • INTEG • Number of products last month
onth ER according to Amazon
• categoryName • STRIN • Name of the category this product
G belongs to

You might also like