• S&DS 361 Notes
  • 1 Getting started
    • 1.1 Software installation
      • 1.1.1 Download R
      • 1.1.2 Download R Studio
      • 1.1.3 Install/update packages
      • 1.1.4 Check gganimate
      • 1.1.5 Check lmer
      • 1.1.6 Github
    • 1.2 Other preparation
      • 1.2.1 Bookmarks
      • 1.2.2 Backing up work
      • 1.2.3 Test code often
      • 1.2.4 dplyr and ggplot2
      • 1.2.5 Minimal reproducible examples
      • 1.2.6 Study/work habits
    • 1.3 Learning Objectives
    • 1.4 Coding conventions
      • 1.4.1 Exception: We’ll use = not <- for assignments
      • 1.4.2 Exception: We’ll use . more than _ for names
  • 2 Formatting plots with pubtheme
    • 2.1 Simple Example
    • 2.2 Gallery
  • I Explore NBA data with tidyverse
  • 3 Introduction
    • 3.1 What questions can we ask?
    • 3.2 Intro to NBA games data
  • 4 Game level data
    • 4.1 Interactive tables with DT
    • 4.2 Recommended options with DT
    • 4.3 Home advantage
    • 4.4 Home advantage by team
    • 4.5 Points previous vs current season
    • 4.6 Formatting with pubtheme
    • 4.7 Scoring vs stronger/weaker opponents
  • II EV charging stations
  • 5 Explore EV Charging Station data
    • 5.1 Introduction
    • 5.2 Data prepration
    • 5.3 Column descriptions
    • 5.4 Discussion
  • 6 Maps
    • 6.1 Charging Stations in US and Canada
    • 6.2 Level 2 and Level 3 stations
    • 6.3 Level 2 Charging Stations by Network
    • 6.4 Level 3 Charging Stations by Network
    • 6.5 Create a Month Year column
    • 6.6 Level 2 Charging Stations Animation
    • 6.7 Level 3 Charging Stations Animation
    • 6.8 Interative map with Leaflet
    • 6.9 Plotting stations over census data
  • 7 Growth over time
    • 7.1 Level 2 Cumulative Charging Stations Over Time
    • 7.2 Level 3 Cumulative Charging Stations Over Time
    • 7.3 Level 3 Cumulative Stations Line Animation
  • III Random
  • 8 Introduction
  • 9 Project development and planning
    • 9.1 Brainstorming, determining the end deliverables
    • 9.2 Create a basic framework
    • 9.3 Backwards planning
  • 10 Data acquistion
    • 10.1 Scrape HTML pages
    • 10.2 Parsing HTML tables
    • 10.3 Scraping pages that require log-in
    • 10.4 Scraping data that appears in javascript tables (coming soon!)
    • 10.5 Scrape PDFs
    • 10.6 RSelenium, Docker
    • 10.7 Excel
    • 10.8 Census data with tidycensus
    • 10.9 JSON
    • 10.10 Connecting to SQL databases
    • 10.11 Geospatial data
      • 10.11.1 Shape files .shp
      • 10.11.2 Geodatabase .gdb
  • 11 Data cleaning
    • 11.1 RecordLinkage
      • 11.1.1 Example data
      • 11.1.2 Compare/Dedup
      • 11.1.3 Attach names to IDs
    • 11.2 Unnesting lists
    • 11.3 Data imputation
      • 11.3.1 Example with simulations
    • 11.4 Random useful functions
      • 11.4.1 Formatting for display
      • 11.4.2 Identifying GEOIDs
      • 11.4.3 Distance from point to line
      • 11.4.4 Holidays
  • 12 Data visualization
    • 12.1 Esquisse
    • 12.2 Animating normal PDFs
    • 12.3 ggtext: HTML formatting in ggplot
    • 12.4 Leaflet
    • 12.5 geom_polygon
    • 12.6 ggplot versions of plot.lm
    • 12.7 3D World Map
  • 13 Analysis
    • 13.1 Variable importance
    • 13.2 Compariable observations
    • 13.3 Other types of regression models
      • 13.3.1 Multivariate regression
    • 13.4 Time series
    • 13.5 Calibration
    • 13.6 Polychoric correlation, PCA
    • 13.7 Model comparisons
    • 13.8 Variance Inflation Factor
    • 13.9 Errors in variables, total least squares
    • 13.10 Perfect collinearity
    • 13.11 Splines
  • 14 Bayesian estimates
    • 14.1 Beta-binomial
    • 14.2 Normal
    • 14.3 Data augmentation
  • 15 Simulating data
    • 15.1 Continuous and binary predictors
    • 15.2 Linear regression with a continuous predictor
    • 15.3 Poisson regression with a continuous predictor
    • 15.4 Linear regression with an extra binary predictor
    • 15.5 Poisson regression with an extra binary predictor
    • 15.6 Linear regression with interaction
    • 15.7 Poisson regression with interaction
  • 16 Shiny
    • 16.1 Getting started
    • 16.2 Dynamic UI
      • 16.2.1 Example App: CLT
    • 16.3 Click leaflet map
    • 16.4 Draw a shape on a leaflet map
    • 16.5 Save data generated from user inputs
  • 17 Understanding how functions work
    • 17.1 Help
    • 17.2 Code
      • 17.2.1 Generic functions
      • 17.2.2 methods
      • 17.2.3 getAnywhere
      • 17.2.4 Other resources
    • 17.3 Functions with unquoted arguments
  • 18 Debugging
    • 18.1 browser()
    • 18.2 debug, debugonce
  • 19 Testing
  • 20 For loops
    • 20.1 Showing progress
    • 20.2 Parallel processing with foreach
  • 21 Efficient Programming
    • 21.1 Efficient R Programming online book
    • 21.2 Alternatives to rbind for big data frames
    • 21.3 Measuring performance, Profiling
  • 22 Intro Prob/Stat
    • 22.1 Poisson approximates binomial
  • 23 New computer setup
    • 23.1 Tex
    • 23.2 php
    • 23.3 PostgreSQL
    • 23.4 OJS
  • 24 Job search
    • 24.1 Job sites
  • 25 Random online resources
  • 26 GitHub Copilot in RStudio
  • 27 Other
    • 27.1 Accessible Documents
    • 27.2 Github Actions
    • 27.3 Math symbols in plots
    • 27.4 Writing R packages
    • 27.5 Blogdown
    • 27.6 Bookdown
    • 27.7 Quarto
    • 27.8 Multiple authors in R Markdown
    • 27.9 Read pixel values of images
    • 27.10 Cloud
  • 28 Not data science
    • 28.1 Dollar cost averaging example
  • Appendix
  • A Data exploration with dplyr
    • A.1 Intro to NBA games data
    • A.2 Choosing rows with filter
    • A.3 Pipe Operator
    • A.4 Choosing columns with select
    • A.5 Checking data types with str
    • A.6 Add/Edit columns with mutate
    • A.7 Ordering rows with arrange
    • A.8 Summaries using summarise
    • A.9 Summarize using reframe
    • A.10 Gluing data.frames using bind_rows
    • A.11 Summaries by team using group_by
    • A.12 Using mutate and group_by
    • A.13 Using reframe and group_by
    • A.14 Using mutate and ifelse
    • A.15 Rearranging data with pivot_wider and pivot_longer
    • A.16 Joining two data frames with left_join
    • A.17 For more information
  • B Data visualization with ggplot
    • B.1 Scatter plot with geom_point
    • B.2 Color, size, shape
    • B.3 Formatting with the pubtheme package
    • B.4 ggsave
    • B.5 Trend line with geom_smooth
    • B.6 Grouping observations with group
    • B.7 Faceting with facet_wrap
    • B.8 Transparency using alpha
    • B.9 Scatter plot with geom_jitter
    • B.10 Hex bins geom_hex
    • B.11 Bar plot geom_col
    • B.12 Color bars using fill
    • B.13 Side-by-side bar charts with position_dodge
    • B.14 Text with geom_text
    • B.15 Histogram with geom_histogram
    • B.16 Line plot with geom_line
    • B.17 Grid plot with geom_tile
    • B.18 Customizing with theme
    • B.19 Maps
    • B.20 Quick interactive plots with ggplotly
    • B.21 For more information
  • Published with bookdown

Data Analysis

2.2 Gallery

Gallery is coming soon! For easy browsing, we will create a gallery of tables and visualizations along with hyperlinks to where each table or visualization is located in this book.

For now, you can see several examples and templates on the Github page https://github.com/bmacGTPM/pubtheme.