A.2 Choosing rows with filter

For now, we’ll focus on the NBA data from the 2021-22 season. Using base R syntax, we could start by choosing the rows we want to work with by subsetting our data frame like this to choose only the rows where the column lg is nba and the column season is 2022.

Code
temp = d[d$lg == 'nba' & d$season == 2022,]
head(temp,2)
        date away home ascore hscore  lg season season.type      gid
1 2021-10-19  BKN  MIL    104    127 nba   2022         reg 22100001
2 2021-10-19  GSW  LAL    121    114 nba   2022         reg 22100002
                    gkey
1 nba.2021-10-19.BKNvMIL
2 nba.2021-10-19.GSWvLAL

Since we are going to start learning tidyverse, we’ll show an alternative approach using the filter function from the dplyr package.

Code
temp = filter(d, lg == 'nba' & season == 2022)
head(temp,2)
        date away home ascore hscore  lg season season.type      gid
1 2021-10-19  BKN  MIL    104    127 nba   2022         reg 22100001
2 2021-10-19  GSW  LAL    121    114 nba   2022         reg 22100002
                    gkey
1 nba.2021-10-19.BKNvMIL
2 nba.2021-10-19.GSWvLAL

First argument is the data frame, and the remain argument is a logical expression that determines what rows we want. It’s the same logical expression as in our base R approach above, but note that we write lg instead of d$lg and season instead of d$season. The function understands that the data frame is d, and columns have d can be referred to without using d$. Many tidyverse functions work similarly and reduce repeated typing of data frames, dollar signs, and quotation marks.