3.2 Intro to NBA games data
We will start by working with game summary data, which provides the teams, game date, and game score, and other information for every regular season and playoff game. We can do a pretty decent job of answering many of these questions with this data.
First let’s load in some packages we’ll use, load the data, and look at the first and last two rows.
Here is a description of the columns in this data frame.
date
. The date of the gameaway
. The away teamhome
. The home teamascore
. The away team’s scorehscore
. The home team’s scorelg
. The league abbreviation.season
. The season in which the game took place.season.type
. Regular seasonreg
or postseasonpost
.gid
. Game ID from the leagues website, ESPN or other data source.gkey
. Primary key for that game.
Note that season
is a numeric column. For leagues whose regular season spans two years (e.g. NBA 2021-22, NHL 2021-22), the later year is used as the season
(e.g. 2022). For other leagues where the regular season only spans one year (e.g. MLB, NFL), that year is used, so 2022 refers to the MLB season that started in Spring of 2022 and the NFL season that started in Fall of 2022.
We won’t use the gkey
column much for now. It is there to distinguish games from different leagues that may have the same gid
. It is also more readable than gid
.
Each row in the data frame contains game summary data for one game from one of several leagues (NBA, NHL, NFL, MLB, CFB, and MCBB)
cfb mcbb mlb nba nfl nhl
89476 118508 35258 67881 4863 55411
over many seasons
cfb mcbb mlb nba nfl nhl
1872 5 0 0 0 0 0
1873 8 0 0 0 0 0
1874 10 0 0 0 0 0
1875 17 0 0 0 0 0
1876 14 0 0 0 0 0
1877 14 0 0 0 0 0
cfb mcbb mlb nba nfl nhl
1993 632 0 0 1183 0 1093
1994 636 0 0 1184 0 1182
1995 640 0 0 1180 0 705
1996 662 0 0 1257 0 1152
1997 666 0 0 1261 0 1148
1998 674 0 0 1260 0 1148
1999 686 0 0 791 0 1193
2000 702 0 0 1264 0 1231
2001 709 0 0 1260 0 1316
2002 772 4821 0 1260 0 1320
2003 771 5019 0 1277 0 1319
2004 724 5020 0 1271 0 1319
2005 718 5112 0 1314 268 0
2006 792 5116 0 1319 268 1313
2007 798 5497 0 1309 268 1311
2008 804 5601 2449 1316 268 1315
2009 808 5604 2450 1315 268 1317
2010 808 5793 2458 1312 268 1319
2011 812 5796 2458 1311 268 1319
2012 840 5778 2460 1074 268 1316
2013 855 5830 2458 1315 268 806
2014 1343 5995 2459 1319 269 1323
2015 1348 5949 2457 1311 268 1319
2016 1516 5921 2460 1316 268 1321
2017 1465 5977 2457 1309 269 1317
2018 1483 6022 2451 1312 268 1355
2019 1560 6066 2458 1312 268 1358
2020 589 5976 933 1142 269 1213
2021 2448 5282 2429 1165 286 952
2022 3716 6333 2421 1317 286 1401