A.1 Intro to NBA games data

We will start by working with game summary data, which provides the teams, game date, and game score, and other information for every regular season and playoff game. We can do a pretty decent job of answering many of these questions with this data.

First let’s load in some packages we’ll use.

Code
library(tidyverse) ## or you can load just dplyr

The tidyverse is “an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures” (https://www.tidyverse.org/). There are a variety of cheatsheets made available by Posit (formerly R Studio) here: https://posit.co/resources/cheatsheets/

Now we’ll load in the data and look at the first two rows and last two rows.

Code
d = readRDS('data/games.rds')
head(d,2)
tail(d,2)
        date away home ascore hscore  lg season season.type      gid
1 2021-10-19  BKN  MIL    104    127 nba   2022         reg 22100001
2 2021-10-19  GSW  LAL    121    114 nba   2022         reg 22100002
                    gkey
1 nba.2021-10-19.BKNvMIL
2 nba.2021-10-19.GSWvLAL
             date     away     home ascore hscore   lg season season.type
371396 2002-03-30 Maryland   Kansas     97     88 mcbb   2002        post
371397 2002-04-01  Indiana Maryland     52     64 mcbb   2002        post
             gid gkey
371396 224000062 <NA>
371397 224000063 <NA>

The data frame contains game summary data for several leagues: NBA, NHL, NFL, MLB, CFB, and MCBB.

Code
table(d$lg)

   cfb   mcbb    mlb    nba    nfl    nhl 
 89476 118508  35258  67881   4863  55411 

over many seasons

Code
head(table(d$season, d$lg))
tail(table(d$season, d$lg),30)
      
       cfb mcbb mlb nba nfl nhl
  1872   5    0   0   0   0   0
  1873   8    0   0   0   0   0
  1874  10    0   0   0   0   0
  1875  17    0   0   0   0   0
  1876  14    0   0   0   0   0
  1877  14    0   0   0   0   0
      
        cfb mcbb  mlb  nba  nfl  nhl
  1993  632    0    0 1183    0 1093
  1994  636    0    0 1184    0 1182
  1995  640    0    0 1180    0  705
  1996  662    0    0 1257    0 1152
  1997  666    0    0 1261    0 1148
  1998  674    0    0 1260    0 1148
  1999  686    0    0  791    0 1193
  2000  702    0    0 1264    0 1231
  2001  709    0    0 1260    0 1316
  2002  772 4821    0 1260    0 1320
  2003  771 5019    0 1277    0 1319
  2004  724 5020    0 1271    0 1319
  2005  718 5112    0 1314  268    0
  2006  792 5116    0 1319  268 1313
  2007  798 5497    0 1309  268 1311
  2008  804 5601 2449 1316  268 1315
  2009  808 5604 2450 1315  268 1317
  2010  808 5793 2458 1312  268 1319
  2011  812 5796 2458 1311  268 1319
  2012  840 5778 2460 1074  268 1316
  2013  855 5830 2458 1315  268  806
  2014 1343 5995 2459 1319  269 1323
  2015 1348 5949 2457 1311  268 1319
  2016 1516 5921 2460 1316  268 1321
  2017 1465 5977 2457 1309  269 1317
  2018 1483 6022 2451 1312  268 1355
  2019 1560 6066 2458 1312  268 1358
  2020  589 5976  933 1142  269 1213
  2021 2448 5282 2429 1165  286  952
  2022 3716 6333 2421 1317  286 1401

See Chapter 2 for an explanation of the data.