Introduction to R II

Section 1: Basics, NAs & summarize

1) Try using the help function for the following package: ddply Hint: that will require you to load the package, install it if its not installed, and use ? or help()

2) Import the tidyr package and ensure iris dataset is loaded


3) Inspect the structure (str) and columns of the iris dataset (should be already loaded). Let's print the data frame too.


4) Use na.omit to remove the row with NA


6) Take the mean of Sepal.Width using mean()

mean(iris$Sepal.Width, na.rm=T)

7) Try using summarize to do the same thing

summarise(iris, mean_SWidth = mean(Sepal.Width, na.rm=T))

Section 2: Tidying the data

8) Using both time and subject as identifying variables, change the format to long

gathered_smith <- gather(smiths, type, value, age:height)

9) Load trustData.csv and qc.csv and merge them. Try doing it automatically and by specifying “sub” as the ID variable

trust_data <- read.csv("trustData.csv")
qc <- read.csv("qcTrust.csv")

head(inner_join(trust_data, qc, by=c("sub")))
joined_data <- inner_join(trust_data, qc)

10) Subset the joined data in order to take the mean choice of only female trials

female_data <- filter(joined_data, gender == "female")

Section 3: plyr

13) Load the french fries data and inspect it using various methods. You need to install and load the reshape2 package to load this dataset


14) Get into long format

gathered_fries <- gather(french_fries, type, rating, potato:painty)

Try the following excercises with and without chaining:

16) Use group_by and summarise to get the mean rating at each time point

gathered_fries %>%
  group_by(time) %>%
  summarise(mean_rating = mean(rating, na.rm=T))

17) Let's also get the SD and the number of trials in each Hint: the length() function tells you the length of a vector and na.omit removes them

gathered_fries %>%
  group_by(time) %>%
  summarise(mean_rating = mean(rating, na.rm=T), SD = sd(rating, na.r=T), N = n())

18) Use transform to scale the ratings column by time period

gathered_fries <- gathered_fries %>%
  group_by(time) %>%
  transform(scale_rating = scale(rating))

19) To check that scaling worked correctly, take the mean of the scale rating by time point again

gathered_fries %>%
  group_by(time) %>%
  summarise(mean(scale_rating, na.rm=T))

As you can see, they are all near zero because they were scaled!

20) One last problem: Filter subjects that had between 30-70 % later (or 1) choices.

Hint: use group_by and summarise to calculae if subject is “valid” Then anti_join back to original data

valid_subs <- trust_data %>%
  group_by(sub) %>%
  summarise(mean_choice = mean(choice)) %>%
  filter(mean_choice > 0.3, mean_choice < 0.7)

anti_join(trust_data, valid_subs)