9.2 Create a basic framework

  1. Main steps. Think about the 3-5 (or more) main steps that need to be done (for example data cleaning, modeling, etc., or something more specific to this project) and the details of what needs to be done in each of those steps.
  2. Sub steps. For each of those main steps, think about how to break the work into a few parts. If you are working in a group, do this so that the steps can be worked on in parallel by the members of your group.
  3. Main script. Create a main.r script that looks like this that will serve as a sort of outline for all of your scripts or functions
## main.r
## This script runs all code related to gymnastics case study. 
## It reads data and generates all outputs. 

source('R/get.data.r')
source('R/prep.data.r')
source('R/fit.model.r')
## etc

If you use functions instead of scripts, it might look like this

## main.r
## This script runs all code related to gymnastics case study. 
## It reads data and generates all outputs. 

## Load functions here
source('R/myfunctions.r')


## Get and prep data, fit models, etc
df = get.data()
dc = prep.data(df)
m  = fit.model(dc)
## etc
  1. Outline. For each of your main steps from #1, create a new file containing either functions or scripts and write an outline in that file. For example, a script prep.data.r might look like this
## prep.data.r
## This script prepares the data from the csv files for modeling

## First, deduplicate names, clean dates, etc

## Then, create some predictors

## etc (add more subheadings for other main tasks)

Think about the inputs, outputs, and how the data will need to be structured at each stage, and write notes to yourself in these files about that. Note that you are creating .R files, not .Rmd files

Once again we are not bound to exactly the structure that we create in this part. This main file can be modified as we modify different steps of the project, or modify the end goal. But we can at least establish a structure that is easily modifiable

  1. Visualize the pipeline (Optional) If you write functions, you can try using the package vizdataflow https://github.com/bmacGTPM/vizdataflow to visualize your workflow. This is a package I started developing and it’s only like Version 0.1 alpha, but you still might find it useful. This is purely optional. If you do use it, I’d be interested in your feedback. What do you wish the packages could do?