A.2 Choosing rows with filter
For now, we’ll focus on the NBA data from the 2021-22 season. Using base R syntax, we could start by choosing the rows we want to work with by subsetting our data frame like this to choose only the rows where the column lg
is nba
and the column season
is 2022
.
date away home ascore hscore lg season season.type gid
1 2021-10-19 BKN MIL 104 127 nba 2022 reg 22100001
2 2021-10-19 GSW LAL 121 114 nba 2022 reg 22100002
gkey
1 nba.2021-10-19.BKNvMIL
2 nba.2021-10-19.GSWvLAL
Since we are going to start learning tidyverse
, we’ll show an alternative approach using the filter
function from the dplyr
package.
date away home ascore hscore lg season season.type gid
1 2021-10-19 BKN MIL 104 127 nba 2022 reg 22100001
2 2021-10-19 GSW LAL 121 114 nba 2022 reg 22100002
gkey
1 nba.2021-10-19.BKNvMIL
2 nba.2021-10-19.GSWvLAL
First argument is the data frame, and the remain argument is a logical expression that determines what rows we want. It’s the same logical expression as in our base R approach above, but note that we write lg
instead of d$lg
and season
instead of d$season
. The function understands that the data frame is d
, and columns have d
can be referred to without using d$
. Many tidyverse
functions work similarly and reduce repeated typing of data frames, dollar signs, and quotation marks.