Individual Coursework (Replacing In-Class Test) : Big Data (6CS030)
Individual Coursework (Replacing In-Class Test) : Big Data (6CS030)
(6CS030)
Student Id :
Student Name : Robin KC
Cohort/Batch :4
Submitted to :
Submitted on : <dd-mm-yy>
1. Report
2. Sample data
The employee ‘Steven King’ does not have the MANAGER_ID as shown in following
figure.
2.1.2 Outliers
Outlier means the value out of range in given field so that such type of data problems
can be addressed in ‘SALARY’ field.
Here, Sigal Tobias has 128000 salary that values is out of range for JOB_ID
‘PU_CLERK’.
This problem can be also seen in ‘COMMISION_PCT’ field.
In COMMISION_PCT, most of the values are present in the range of 0 to 1 so that
the values 8500 and 150 are taken as outlier.
In ‘HIREDATE’ filed, the hire date of different employees is present in different format
as shown in below which is also taken as problem and suggest to follow same
standard date format.
3. Evidence
Output:
Output:
3.2.3 Total number of missing values in ‘COMMISSION_PCT’
Output: