Skip to contents

Data Integration Model for personal EXposures

Introduction

This package implements the DIMEX model. The package is derived from the SPFFinalReport code (created for the case study and report for the final SPF reporting period).

See also the Configuration and Development vignettes.

Installation

To install the development version of dimex from GitHub:

# install.packages("remotes")
remotes::install_github("https://github.com/UoMResearchIT/SPFFinalReport")

See development installation for development work on the dimex package.

Data workflow

Data phases

flowchart LR
  A("Raw") --> B("Wrangled")
  B --> C("Output")

  1. Initial input is raw data (e.g. ‘.csv’ files, ‘.nc’ files).
  2. Raw data is then processed by transforming, merging, etc. (which is collectively called ‘wrangling’).
  3. The wrangled data is finally used for producing output.

Environments

For each of these data wrangling phases, there are three possible environments:

  • main: Data produced by the code
    • data that is produced by running the current version of the code
    • can be used in automated tests to compare with reference data
  • ref: Reference data
    • historical data
    • for comparison with data being produced from current code
    • to ensure the code is working as expected and continues to do so when any changes are made to the code
  • test: Test data
    • small enough to use for unit/integration tests
    • can be reliably reproduced

Configuration

See the Configuration vignette for configuration information.

Workflow: Running the pipeline

The main entry point for running the code is the run_workflow() function.

If called without any arguments, it will run all workflow steps by default. You can also:

  • Check which steps will run by providing a banner_only argument (in which case the steps would not be run but a banner would be printed for each step); and
  • Run a subset of steps by supplying a comma-separated string of step IDs, for example,
    “4a, 4b” or “1a”.
# Run each step in the workflow with the default config
run_workflow()

# Run steps 4a and 4b only
run_workflow("4a, 4b")

# Don't run any steps but print out the banner to show each step
run_workflow(banner_only = TRUE)

# Run steps 4a and 4b only, with the specified config
# See the [Configuration](config.html) vignette for information on
# how to create a non-default config
run_workflow("4a, 4b", cfg = cfg, cfg_overrides = cfg_overrides)

Development

For doing development work on the dimex package, see the Development vignette.