05_Data_Transformation_Exploration_Visualization
05_Data_Transformation_Exploration_Visualization
Eleni-Rosalina Andrinopoulou
Department of Biostatistics, Erasmus Medical Center
7@erandrinopoulou
In this Section
▶ Data transformation
▶ Data exploration
▶ Data visualization
▶ A lot of practice
1
Data Transformation
2
Data Transformation
▶ Round continuous variables
4
Data Transformation
▶ Convert numeric variables to factors
head(sort(pbc$bili))
8
Data Transformation
▶ Sort the data set in either ascending or descending order
▶ The variable by which we sort can be a numeric, string or factor
head(pbc[order(pbc$bili), ])
id time status trt age sex ascites hepato spiders edema bili chol
8 8 2466 2 2 53.05681 f 0 0 0 0 0.3 280
36 36 3611 0 2 56.41068 f 0 0 0 0 0.3 172
163 163 2055 2 1 53.49760 f 0 0 0 0 0.3 233
84 84 4032 0 2 55.83025 f 0 0 0 0 0.4 263
108 108 2583 2 1 50.35729 f 0 0 0 0 0.4 127
135 135 3150 0 1 42.96783 f 0 0 0 0 0.4 263
albumin copper alk.phos ast trig platelet protime stage
8 4.00 52 4651.2 28.38 189 373 11.0 3
36 3.39 18 558.0 71.30 96 311 10.6 2
163 4.08 20 622.0 66.65 68 358 9.9 3
84 3.76 29 1345.0 137.95 74 181 11.2 3
108 3.50 14 1062.0 49.60 84 334 10.3 2
135 3.57 123 836.0 74.40 121 445 11.0 2
9
Data Transformation
▶ Sort the data set in either ascending or descending order
▶ The variable by which we sort can be a numeric, string or factor
head(pbc[order(pbc$bili, pbc$age), ])
id time status trt age sex ascites hepato spiders edema bili chol
8 8 2466 2 2 53.05681 f 0 0 0 0.0 0.3 280
163 163 2055 2 1 53.49760 f 0 0 0 0.0 0.3 233
36 36 3611 0 2 56.41068 f 0 0 0 0.0 0.3 172
135 135 3150 0 1 42.96783 f 0 0 0 0.0 0.4 263
320 320 2403 0 NA 44.00000 f NA NA NA 0.5 0.4 NA
168 168 2713 0 2 47.75359 f 0 1 0 0.0 0.4 257
albumin copper alk.phos ast trig platelet protime stage
8 4.00 52 4651.2 28.38 189 373 11.0 3
163 4.08 20 622.0 66.65 68 358 9.9 3
36 3.39 18 558.0 71.30 96 311 10.6 2
135 3.57 123 836.0 74.40 121 445 11.0 2
320 3.81 NA NA NA NA 226 10.5 3
168 3.80 44 842.0 97.65 110 NA 9.2 2
10
Data Transformation
11
Data Transformation
12
Data Transformation
?reshape
13
Data Exploration
14
Data Exploration
▶ Hints
15
Data Exploration
▶ Hints
mean(pbc$age)
[1] 50.74155
15
Data Visualization
16
Data Visualization
Take care!
Serum bilirubin
0.8
0.7
0.9
1
0.6
1.1
1.2 0.5
1.3
0.4
0.3
28
25.5
24.5
22.5
1.4 21.6
20
18
17.9
17.4
1.5 17.2
17.1
16.2
16
15
14.5
1.6 14.4
14.1
14
13.8
13.6
1.7 13
12.6
12.2
11.4
1.8 11.1
11
10.8
99.5
8.9
1.9 8.7
8.6
8.5
8.4
2 88.1
7.3
7.2
7.1
2.1 6.8
6.7
6.6
6.5
2.2 6.4
6.3
66.1
2.3 5.9
5.7
2.4 5.6
5.5
2.5 5.4
5.2
2.6
2.7 5.1
2.82.9 4.75
4.6
33.1
3.2 3.3 3.4 3.9
3.8 4.44.5
44.2
3.7
3.53.6
17
Data Visualization
Take care!
20 20
15 15
Serum bilirubin
Serum bilirubin
10 10
5 5
0 0
First Last First Last
18
Data Visualization
Take care!
Serum bilirubin
First Last
19
Data Visualization
Take care!
0.80
Serum bilirubin
0.75
0.70
0.65
0.60
First Last
20
Data Visualization
Take care!
25
Serum bilirubin
20
15
10
First Last
21
Data Visualization
22
Data Visualization
23
Data Visualization
24
Data Visualization
Continuous variables
15
10
5
0
30 40 50 60 70 80
pbc$age
25
Data Visualization
Continuous variables
30 40 50 60 70 80
age
26
Data Visualization
Continuous variables
30 40 50 60 70 80
age
27
Data Visualization
?plot
28
Data Visualization
Continuous variables per group
30 40 50 60 70 80
age
29
Data Visualization
Continuous variables per group
m f
sex
30
Data Visualization
31
Data Visualization
male
female
0.15
0.10
0.05
0.00
−2 0 2 4 6 8 10 12
bili
32
Data Visualization
33
Data Visualization
male
female
0.15
0.10
0.05
0.00
−2 0 2 4 6 8 10 12
bili
34
Data Visualization
Categorical variables
censored
transplant
dead
35
Summary
36