A.4 Choosing columns with select

Let’s get rid of some columns we won’t use. In base R, we could subset columns like this

Code
temp = d[,c('date'  , 'gid', 
            'away'  , 'home', 
            'ascore', 'hscore')]
head(temp)
        date      gid away home ascore hscore
1 2021-10-19 22100001  BKN  MIL    104    127
2 2021-10-19 22100002  GSW  LAL    121    114
3 2021-10-20 22100011  OKC  UTA     86    107
4 2021-10-20 22100013  SAC  POR    124    121
5 2021-10-20 22100012  DEN  PHX    110     98
6 2021-10-20 22100010  ORL  SAS     97    123

In tidyverse, we can use select and avoid typing quotes repeatedly.

Code
temp = d %>% 
  select(date  , gid, 
         away  , home, 
         ascore, hscore)
head(temp)
        date      gid away home ascore hscore
1 2021-10-19 22100001  BKN  MIL    104    127
2 2021-10-19 22100002  GSW  LAL    121    114
3 2021-10-20 22100011  OKC  UTA     86    107
4 2021-10-20 22100013  SAC  POR    124    121
5 2021-10-20 22100012  DEN  PHX    110     98
6 2021-10-20 22100010  ORL  SAS     97    123

We put new lines in to split up the columns given to select in a reasonable way. The first row date and gid are game information (id and date), the second row has the teams, and the third row has the scores. Note that ascore, the away team’s score, is under away, the away team, and likewise with hscore and home.

Note that we can also use - to specify which columns we don’t want. This is equivalent to the above:

Code
temp = d %>% 
  select(-lg, -season, 
         -season.type, -gkey)
head(temp)
        date away home ascore hscore      gid
1 2021-10-19  BKN  MIL    104    127 22100001
2 2021-10-19  GSW  LAL    121    114 22100002
3 2021-10-20  OKC  UTA     86    107 22100011
4 2021-10-20  SAC  POR    124    121 22100013
5 2021-10-20  DEN  PHX    110     98 22100012
6 2021-10-20  ORL  SAS     97    123 22100010

If we want to do choose rows and columns in the same step, we can use filter and select together in the same block of code using the pipe. Let’s finalize this data by saving as the object d instead of temp, and also add in that we want only regular season data.

Code
d = d %>% 
  filter(lg == 'nba', 
         season == 2022, 
         season.type == 'reg') %>%
  select(date  , gid, 
         away  , home, 
         ascore, hscore)
head(d)
        date      gid away home ascore hscore
1 2021-10-19 22100001  BKN  MIL    104    127
2 2021-10-19 22100002  GSW  LAL    121    114
3 2021-10-20 22100011  OKC  UTA     86    107
4 2021-10-20 22100013  SAC  POR    124    121
5 2021-10-20 22100012  DEN  PHX    110     98
6 2021-10-20 22100010  ORL  SAS     97    123

We put a new line after every %>%, put a new line for each logical expression, and split up the column names given to select in a reasonable way, all for improved readability.