
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Subset Rows of an R Data Frame Based on Duplicate Values
Duplication is also a problem that we face during data analysis. We can find the rows with duplicated values in a particular column of an R data frame by using duplicated function inside the subset function. This will return only the duplicate rows based on the column we choose that means the first unique value will not be in the output.
Example
Consider the below data frame: x1<-1:20 x2<-rpois(20,4) df1<-data.frame(x1,x2) df1
Output
x1 x2 1 1 7 2 2 6 3 3 2 4 4 6 5 5 1 6 6 7 7 7 5 8 8 2 9 9 2 10 10 2 11 11 3 12 12 2 13 13 1 14 14 3 15 15 3 16 16 3 17 17 5 18 18 5 19 19 7 20 20 3
Create rows of df1 based on duplicates in column x2 −
Example
subset(df1,duplicated(x2))
Output
x1 x2 4 4 6 6 6 7 8 8 2 9 9 2 10 10 2 12 12 2 13 13 1 14 14 3 15 15 3 16 16 3 17 17 5 18 18 5 19 19 7 20 20 3
Example
y1<-LETTERS[1:20] y2<-sample(0:5,20,replace=TRUE) df2<-data.frame(y1,y2) df2
Output
y1 y2 1 A 5 2 B 4 3 C 1 4 D 2 5 E 3 6 F 4 7 G 1 8 H 4 9 I 3 10 J 1 11 K 5 12 L 5 13 M 0 14 N 3 15 O 5 16 P 0 17 Q 1 18 R 4 19 S 2 20 T 3
Create rows of df2 based on duplicates in column y2 −
Example
subset(df2,duplicated(y2))
Output
y1 y2 6 F 4 7 G 1 8 H 4 9 I 3 10 J 1 11 K 5 12 L 5 14 N 3 15 O 5 16 P 0 17 Q 1 18 R 4 19 S 2 20 T 3
Advertisements