dplyr
is a popular R package for data manipulation and analysis. It provides a set of functions that make data wrangling tasks more intuitive and efficient. Below is a dplyr
cheat sheet covering common operations:
Basic Operations
Install and Load dplyr
:
install.packages("dplyr")
library(dplyr)
Create a Data Frame:
df <- data.frame(
ID = c(1, 2, 3),
Name = c("Alice", "Bob", "Charlie"),
Age = c(25, 30, 22)
)
Selecting Columns
Select Columns:
selected_df <- select(df, ID, Age)
Select Columns by Index:
selected_df <- select(df, 1, 3)
Filtering Rows
Filter Rows by Condition:
filtered_df <- filter(df, Age > 25)
Multiple Conditions (AND):
filtered_df <- filter(df, Age > 25 & Name == "Zach")
Arranging Data
Arrange Data by Column:
arranged_df <- arrange(df, Age)
Descending Order:
arranged_df <- arrange(df, desc(Age))
Mutating Data
Create New Column:
mutated_df <- mutate(df, Salary = Age * 1000)
Multiple Mutations:
mutated_df <- mutate(df, Salary = Age * 1000, Bonus = Age * 0.1)
Summarizing Data
Summarize Data:
summary_df <- summarize(df, Mean_Age = mean(Age), Max_Age = max(Age))
Grouping Data
Group by Column:
grouped_df <- group_by(df, Name)
Summarize by Group:
summary_grouped_df <- summarize(grouped_df, Mean_Age = mean(Age))
Chaining Operations
Using %>%
(pipe):
result_df <- df %>%
filter(Age > 25) %>%
select(ID, Name) %>%
arrange(desc(Name))
Joining Data Frames
Inner Join:
merged_df <- inner_join(df1, df2, by = "ID")
Left Join:
merged_df <- left_join(df1, df2, by = "ID")
Other Useful Functions
Distinct Values:
distinct_df <- distinct(df, Name)
Count Frequencies:
count_df <- count(df, Name)
Rename Columns:
renamed_df <- rename(df, EmployeeID = ID, FullName = Name)
Case When:
df <- mutate(df, Category = case_when(Age > 25 ~ "Senior", TRUE ~ "Junior"))
This cheat sheet provides a quick reference for common dplyr
operations in R. For more detailed information, refer to the official dplyr
documentation.