A.1 Intro to NBA games data
We will start by working with game summary data, which provides the teams, game date, and game score, and other information for every regular season and playoff game. We can do a pretty decent job of answering many of these questions with this data.
First let’s load in some packages we’ll use.
The tidyverse
is “an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures” (https://www.tidyverse.org/). There are a variety of cheatsheets made available by Posit (formerly R Studio) here: https://posit.co/resources/cheatsheets/
Now we’ll load in the data and look at the first two rows and last two rows.
date away home ascore hscore lg season season.type gid
1 2021-10-19 BKN MIL 104 127 nba 2022 reg 22100001
2 2021-10-19 GSW LAL 121 114 nba 2022 reg 22100002
gkey
1 nba.2021-10-19.BKNvMIL
2 nba.2021-10-19.GSWvLAL
date away home ascore hscore lg season season.type
371396 2002-03-30 Maryland Kansas 97 88 mcbb 2002 post
371397 2002-04-01 Indiana Maryland 52 64 mcbb 2002 post
gid gkey
371396 224000062 <NA>
371397 224000063 <NA>
The data frame contains game summary data for several leagues: NBA, NHL, NFL, MLB, CFB, and MCBB.
cfb mcbb mlb nba nfl nhl
89476 118508 35258 67881 4863 55411
over many seasons
cfb mcbb mlb nba nfl nhl
1872 5 0 0 0 0 0
1873 8 0 0 0 0 0
1874 10 0 0 0 0 0
1875 17 0 0 0 0 0
1876 14 0 0 0 0 0
1877 14 0 0 0 0 0
cfb mcbb mlb nba nfl nhl
1993 632 0 0 1183 0 1093
1994 636 0 0 1184 0 1182
1995 640 0 0 1180 0 705
1996 662 0 0 1257 0 1152
1997 666 0 0 1261 0 1148
1998 674 0 0 1260 0 1148
1999 686 0 0 791 0 1193
2000 702 0 0 1264 0 1231
2001 709 0 0 1260 0 1316
2002 772 4821 0 1260 0 1320
2003 771 5019 0 1277 0 1319
2004 724 5020 0 1271 0 1319
2005 718 5112 0 1314 268 0
2006 792 5116 0 1319 268 1313
2007 798 5497 0 1309 268 1311
2008 804 5601 2449 1316 268 1315
2009 808 5604 2450 1315 268 1317
2010 808 5793 2458 1312 268 1319
2011 812 5796 2458 1311 268 1319
2012 840 5778 2460 1074 268 1316
2013 855 5830 2458 1315 268 806
2014 1343 5995 2459 1319 269 1323
2015 1348 5949 2457 1311 268 1319
2016 1516 5921 2460 1316 268 1321
2017 1465 5977 2457 1309 269 1317
2018 1483 6022 2451 1312 268 1355
2019 1560 6066 2458 1312 268 1358
2020 589 5976 933 1142 269 1213
2021 2448 5282 2429 1165 286 952
2022 3716 6333 2421 1317 286 1401
See Chapter 2 for an explanation of the data.