A.16 Joining two data frames with left_join

Now that dm is two columns we can work on join the distance information with our data frame.

Code
dd = dd %>% 
  left_join(dm, by = c('loc', 'prev.loc'))

dd %>% 
  filter(team == 'PHI')
# A tibble: 82 × 11
# Groups:   team [1]
   date       team  score opp   opp.score gid     ha    days.rest loc   prev.loc
   <date>     <chr> <dbl> <chr>     <dbl> <chr>   <chr>     <dbl> <chr> <chr>   
 1 2021-10-20 PHI     117 NOP          97 221000… away         NA NOP   <NA>    
 2 2021-10-22 PHI     109 BKN         114 221000… home          2 PHI   NOP     
 3 2021-10-24 PHI     115 OKC         103 221000… away          2 OKC   PHI     
 4 2021-10-26 PHI      99 NYK         112 221000… away          2 NYK   OKC     
 5 2021-10-28 PHI     110 DET         102 221000… home          2 PHI   NYK     
 6 2021-10-30 PHI     122 ATL          94 221000… home          2 PHI   PHI     
 7 2021-11-01 PHI     113 POR         103 221000… home          2 PHI   PHI     
 8 2021-11-03 PHI     103 CHI          98 221001… home          2 PHI   PHI     
 9 2021-11-04 PHI     109 DET          98 221001… away          1 DET   PHI     
10 2021-11-06 PHI     114 CHI         105 221001… away          2 CHI   DET     
# ℹ 72 more rows
# ℹ 1 more variable: miles <int>

There is now a column miles that gives the distance between the current game’s location and the location of that team’s previous game.

The left in left_join means that we want to keep all rows of the first data frame (in this case dd), even if there is no match in the second data frame (in this case dm). We do not necessarily want to keep all the rows of the second data frame.

  • right_join - keep all rows of the second data frame, regardless whether or not there is a match.
  • inner_join - keep only rows in the first data frame where there is a match. (All rows in 1st AND 2nd)
  • full_join - keep all rows from both data frames, regardless of whether there is a match. (All rows in 1st OR 2nd).