1.3 Learning Objectives

In these notes we will

  • Ask interesting questions and develop a strategy to answer those questions using data. This might seem like the easy part, but can often be the hardest part.
  • Build comfort with coding and working hands-on with data. You will explore, visualize and analyze data with R, and learn best practices for writing code that is organized and commented, and executes fully reproducible analysis. You will also gain experience producing production-ready code able to be run in automated overnight processes, as well as other tools that are useful for many industry jobs and that prepare you for conducting your own research.
  • Get experience with interactive data exploration and visualization, including what questions you want to ask about the data, what aspects of the data you want to highlight in your visualizations, and how to implement your ideas in R. We’ll focus on best practices in planning visualizations, sketching them by hand, and creating them using ggplot, plotly, leaflet, gganimate, and other visualization packages. We’ll create static, interactive, and animated visualizations and we’ll develop interactive web apps using shiny.
  • Learn statistical modeling with regression, generlized linear models, mixed effects regression and other topics, like harmonic regression, regularization methods, maximum likelihood, splines, Monte Carlo simulation, resampling methods, model selection, variable selection, model diagnostics. We will focus on getting hands-on experience with determining what question we want to ask about the data, brainstorming the most appropriate approach(es) to answering the question, implementing those approaches in R, and assessing and interpreting the results of the analysis. We’ll build the theoretical understanding that is necessary to appropriately apply these techniques as well.

In short, you will gain experience in all aspects of the data analysis workflow,

  1. Defining the problem, question, or goal
  2. Choosing data, acquiring data, assessing the quality of that data
  3. Cleaning and wrangling data
  4. Exploring and visualizing data
  5. Analyzing data and building predictive models
  6. Interpreting, visualizing, and communicating the results
  7. Developing recommended courses-of-action

although we’ll focus more on #1, #4, #5, and #6 than on the others.