This problem set will require knowledge of both Lesson 1 and Lesson 2. If you haven't read over those, go ahead and do that then come back. Or, if you think you got it, carry on!
Basic Math and Logic with R
Store the value 18 in a variable x
and the value 23 in the varialbe r_is_cool
.
- Write an simple expression in your console that adds those two variables
- Write an expression to test whether or not
x
is greater than or equal tor_is_cool
- Write an expression to test whether
x
is less thanr_is_cool
ANDx
+ 10 is greater thanr_is_cool
- What is the sum of the following vector:
c(TRUE, TRUE, FALSE, TRUE)
?
Indexing Vectors
Copy the following into your console:
what_the_vec <- c(seq.int(from = 8, to = 72, by = 2))
Now, answer the following questions.
- What is the 14th value of this vector,
what_the_vec
? - Store the 2nd, 8th, and 20th value of the vector in a new vector,
new_vec
- Write an expression to test whether the 2nd value of
new_vec
is smaller than the first - Remove the first value from the vector
new_vec
(recall that we can use a '-' sign to remove values) and store the remaining two innew_vec2
Indexing Data Frames
For this exercise, we are going to use the iris dataset, which is pre-loaded into base R. First, we should take a look at it.
- What are the dimensions of the iris data set? (How many observations and variables?)
- What kind of variables are present? (character, numeric, etc.)
Now that you know what kind of data you are working with, let's work on finding values within using base R.
- Since each column is a vector, we can store them in their own vector, if we'd like. Store the fourth column,
Petal.Width
in a new variable,pet_width
. - Using the
mean()
function, find the mean ofpet_width
- What is the second value of the third column of
iris
? - How many values in the second column,
Sepal.Width
, are greater than 3.2 (remember thatTRUE
= 1)? - What is that as a percentage? (recall that
nrow()
gives the number of rows... so we just need that and the answer to #3) - Using the
table()
function, what is the count of the various species within our dataset? - There are three different species in our data. Remove "setosa" and store the new data in
new_data
- What is average
Sepal.Length
of ournew_data
- In base R, create a new column for
new_data
calledlarge_petal_flag
. Using theifelse()
function, assign the value 1 to rows with aPetal.Length
greater than 4.5, and a value of 0 to those that don't. - What is the sum of
new_data$large_petal_flag
? What is this as a proportion? (hint: taking the average of binary variables gives you the proportion if they are 1/0)
Dplyr
Working in base R is not that fun... but it really can come in handy understanding how to do things the hard way. Now let's switch over to using our dplyr verbs. Remember, to use dplyr, you must have it installed. We will install the entire tidyverse by typing install.packages("tidyverse")
. No need to do this if you've already done it. If tidyverse
is already installed, just type library(tidyverse)
and you will have all the tools at your disposal.
Using the iris
data set, and our dplyr verbs, do the following...
- Create a new column,
double_petal
, withmutate()
that multipliesPetal.Width
by 2. Store this new data frame infancy_iris
- Using our
filter()
command, filter out all values ofdouble_petal
less than 2.3 and greater than 4. How many rows are in that data set? - Suppose we want to see the maximum
Sepal.Length
for each species - how can we do that? (hint: you will need to group by the variableSpecies
and use thesummarise
verb)
There are a few other verbs that we didn't cover in our lessons (for the sake of brevity). But some really useful ones to know along the way are pivot_wider
and pivot_longer
. We will go over these a bit in class but don't have time to cover them here.