
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Find Percentile Rank for Groups in an R Data Frame
The word percentile means the percentage that falls below or above the percentile value. For example, if we have a value that lies at 50th percentile then we would say 50 percent of the values lies below or above that value. The value 50 here is called the percentile rank. To find the percentile rank for groups in an R data frame, we can use mutate function of dplyr package.
Example
Consider the below data frame −
Group<-sample(LETTERS[1:4],20,replace=TRUE) Response<-rpois(20,5) df1<-data.frame(Group,Response) df1
Output
Group Response 1 D 5 2 B 7 3 D 5 4 C 4 5 D 5 6 C 5 7 A 10 8 D 3 9 B 2 10 D 0 11 B 4 12 D 5 13 A 3 14 A 6 15 D 2 16 A 7 17 A 6 18 C 2 19 A 9 20 C 3
Example
Loading dplyr package −
library(dplyr)
Finding percentile rank of response for Groups −
Example
df1%>%group_by(Group)%>%mutate(Percentile_Rank=rank(Response)/length(Response)) # A tibble: 20 x 3 # Groups: Group [4]
Output
Group Response Percentile_Rank <chr> <int> <dbl> 1 D 5 0.786 2 B 7 1 3 D 5 0.786 4 C 4 0.75 5 D 5 0.786 6 C 5 1 7 A 10 1 8 D 3 0.429 9 B 2 0.333 10 D 0 0.143 11 B 4 0.667 12 D 5 0.786 13 A 3 0.167 14 A 6 0.417 15 D 2 0.286 16 A 7 0.667 17 A 6 0.417 18 C 2 0.25 19 A 9 0.833 20 C 3 0.5
Example
Class<-sample(c("I","II","III"),20,replace=TRUE) Y<-rnorm(20,25,3.27) df2<-data.frame(Class,Y) df2
Output
Class Y 1 III 32.88152 2 III 23.35048 3 III 19.78199 4 III 26.05137 5 I 26.16563 6 III 20.30466 7 I 22.93382 8 II 30.03620 9 I 16.89365 10 I 27.33329 11 I 27.46550 12 III 27.59028 13 II 27.40766 14 III 23.29442 15 II 28.69237 16 II 31.25723 17 II 22.58002 18 III 22.48583 19 I 26.08357 20 III 24.51681
Finding percentile rank of response for Class −
Example
df2%>%group_by(Class)%>%mutate(Percentile_Rank=rank(Y)/length(Y)) # A tibble: 20 x 3 # Groups: Class [3]
Output
Class Y Percentile_Rank <chr> <dbl> <dbl> 1 III 32.9 1 2 III 23.4 0.556 3 III 19.8 0.111 4 III 26.1 0.778 5 I 26.2 0.667 6 III 20.3 0.222 7 I 22.9 0.333 8 II 30.0 0.8 9 I 16.9 0.167 10 I 27.3 0.833 11 I 27.5 1 12 III 27.6 0.889 13 II 27.4 0.4 14 III 23.3 0.444 15 II 28.7 0.6 16 II 31.3 1 17 II 22.6 0.2 18 III 22.5 0.333 19 I 26.1 0.5 20 III 24.5 0.667
Advertisements