data science 1 (1)
data science 1 (1)
20201216
بسم الله الرحمن الرحيم
H.w #1
we first need to figure out what is the problem that al Azhar university
has with its data, by investigating the data records we can apply any of
the data pre processing methods to make the data clean for example due
to the incident that happened in Gaza all the data records that are
related to the semester before the incident were gone missing for us to
bypass this problem we can use methods like global constant to fill all
missing values or mean/mode imputation or any other solution
depending on the data that is missed. We need to detect if any
corruption has happened to the data as well and implement the
necessary solutions
3) Select only one data giving your opinion in the data by emphasize your
answer :
a) x=(20,17,19,5,60,13,18,19,4,15,10,7,8)
b) y=(99,30,20,80,88,77,3,77,60,89,55,44)
if we calculate the mean and median and range for both datasets we will
find x better
median = 4,5,7,8,10,13,15,17,18,19,19,20,60 = 15
range = 60-4 = 56
median = 3,20,30,44,55,60,77,77,80,88,89,99
range = 99-3 = 96
= 60+77/68.5
After comparing each mean, we find that dataset y has higher mean
because of the extreme values 99,3 ,for the median we find that dataset
y is extremely higher than data set x for the same reason , for the range
we find that dataset y is also larger than x which is going to affect
consistency and will casue greater variability
For these reasons data set x is better for its small range and low variance