Dplyr Cheat Sheet

dplyr is a popular R package for data manipulation and analysis. It provides a set of functions that make data wrangling tasks more intuitive and efficient. Below is a dplyr cheat sheet covering common operations:

Basic Operations

Install and Load dplyr:

install.packages("dplyr")
library(dplyr)

Create a Data Frame:

df <- data.frame(
  ID = c(1, 2, 3),
  Name = c("Alice", "Bob", "Charlie"),
  Age = c(25, 30, 22)
)

Selecting Columns

Select Columns:

selected_df <- select(df, ID, Age)

Select Columns by Index:

selected_df <- select(df, 1, 3)

Filtering Rows

Filter Rows by Condition:

filtered_df <- filter(df, Age > 25)

Multiple Conditions (AND):

filtered_df <- filter(df, Age > 25 & Name == "Zach")

Arranging Data

Arrange Data by Column:

arranged_df <- arrange(df, Age)

Descending Order:

arranged_df <- arrange(df, desc(Age))

Mutating Data

Create New Column:

mutated_df <- mutate(df, Salary = Age * 1000)

Multiple Mutations:

mutated_df <- mutate(df, Salary = Age * 1000, Bonus = Age * 0.1)

Summarizing Data

Summarize Data:

summary_df <- summarize(df, Mean_Age = mean(Age), Max_Age = max(Age))

Grouping Data

Group by Column:

grouped_df <- group_by(df, Name)

Summarize by Group:

summary_grouped_df <- summarize(grouped_df, Mean_Age = mean(Age))

Chaining Operations

Using %>% (pipe):

result_df <- df %>%
  filter(Age > 25) %>%
  select(ID, Name) %>%
  arrange(desc(Name))

Joining Data Frames

Inner Join:

merged_df <- inner_join(df1, df2, by = "ID")

Left Join:

merged_df <- left_join(df1, df2, by = "ID")

Other Useful Functions

Distinct Values:

distinct_df <- distinct(df, Name)

Count Frequencies:

count_df <- count(df, Name)

Rename Columns:

renamed_df <- rename(df, EmployeeID = ID, FullName = Name)

Case When:

df <- mutate(df, Category = case_when(Age > 25 ~ "Senior", TRUE ~ "Junior"))

This cheat sheet provides a quick reference for common dplyr operations in R. For more detailed information, refer to the official dplyr documentation.