Data_Curation_Example_2
Data_Curation_Example_2
Amazon UK
Data Curation by Anonymous
Data Profile:
• Import Dataset to DBeaver SQLite Database
• Examine dataset through SQL
• Query essential data associated with problem statements
• Export data into Excel
• Apply Excel filter to the dataset
• Examine each attribute's unique value for inconsistency
• Notable Features ( Prior to Export )
• 217 Unique Product Category
• Large Dataset for each category
• Reviews - Concrete evidence of buyers
• isBestSeller - Possibly gauge product popularity
• broughtInLastMonth - Possible gauge for potential revenues
• Recorded Inconsistency:
• Some records consist of text strings in numerical fields.
• Exaggerated unit prices are suspected to be outliers and are test
listings.
• Many records have exaggerated unit prices with no reviews.
• Some records have exaggerated unit prices with many reviews.
• Data Wrangling:
• Using SQL, sorted attributes by field types. Some attributes contain
text string fields when they should be numerical. 10 Records were
filtered.