3.2 Intro to NBA games data

We will start by working with game summary data, which provides the teams, game date, and game score, and other information for every regular season and playoff game. We can do a pretty decent job of answering many of these questions with this data.

First let’s load in some packages we’ll use, load the data, and look at the first and last two rows.

Code
library(tidyverse)
library(pubtheme)
d = readRDS('data/games.rds')
head(d,2)
tail(d,2)

Here is a description of the columns in this data frame.

  • date. The date of the game
  • away. The away team
  • home. The home team
  • ascore. The away team’s score
  • hscore. The home team’s score
  • lg. The league abbreviation.
  • season. The season in which the game took place.
  • season.type. Regular season reg or postseason post.
  • gid. Game ID from the leagues website, ESPN or other data source.
  • gkey. Primary key for that game.

Note that season is a numeric column. For leagues whose regular season spans two years (e.g. NBA 2021-22, NHL 2021-22), the later year is used as the season (e.g. 2022). For other leagues where the regular season only spans one year (e.g. MLB, NFL), that year is used, so 2022 refers to the MLB season that started in Spring of 2022 and the NFL season that started in Fall of 2022.

We won’t use the gkey column much for now. It is there to distinguish games from different leagues that may have the same gid. It is also more readable than gid.

Each row in the data frame contains game summary data for one game from one of several leagues (NBA, NHL, NFL, MLB, CFB, and MCBB)

Code
table(d$lg)

   cfb   mcbb    mlb    nba    nfl    nhl 
 89476 118508  35258  67881   4863  55411 

over many seasons

Code
table(d$season, d$lg) %>% head()
table(d$season, d$lg) %>% tail(30)
      
       cfb mcbb mlb nba nfl nhl
  1872   5    0   0   0   0   0
  1873   8    0   0   0   0   0
  1874  10    0   0   0   0   0
  1875  17    0   0   0   0   0
  1876  14    0   0   0   0   0
  1877  14    0   0   0   0   0
      
        cfb mcbb  mlb  nba  nfl  nhl
  1993  632    0    0 1183    0 1093
  1994  636    0    0 1184    0 1182
  1995  640    0    0 1180    0  705
  1996  662    0    0 1257    0 1152
  1997  666    0    0 1261    0 1148
  1998  674    0    0 1260    0 1148
  1999  686    0    0  791    0 1193
  2000  702    0    0 1264    0 1231
  2001  709    0    0 1260    0 1316
  2002  772 4821    0 1260    0 1320
  2003  771 5019    0 1277    0 1319
  2004  724 5020    0 1271    0 1319
  2005  718 5112    0 1314  268    0
  2006  792 5116    0 1319  268 1313
  2007  798 5497    0 1309  268 1311
  2008  804 5601 2449 1316  268 1315
  2009  808 5604 2450 1315  268 1317
  2010  808 5793 2458 1312  268 1319
  2011  812 5796 2458 1311  268 1319
  2012  840 5778 2460 1074  268 1316
  2013  855 5830 2458 1315  268  806
  2014 1343 5995 2459 1319  269 1323
  2015 1348 5949 2457 1311  268 1319
  2016 1516 5921 2460 1316  268 1321
  2017 1465 5977 2457 1309  269 1317
  2018 1483 6022 2451 1312  268 1355
  2019 1560 6066 2458 1312  268 1358
  2020  589 5976  933 1142  269 1213
  2021 2448 5282 2429 1165  286  952
  2022 3716 6333 2421 1317  286 1401