Workflow customisation and extension

Introduction

This vignette will cover the utilities provided for the user to modify workflow targets prior to generating project directories. Most users will likely not need this functionality as the workflows can easily be edited in the _targets.R file after project generation. However, where there is need for workflow targets to be modified routinely, the package provides this functionality for programmatic customisation and extension.

If not already familiar with the basics of how to use the package, see the Introduction vignette for details on how to get started.

This vignette will not cover the aspects of what makes good workflow targets. For more information on this topic, see the targets package documenation.

Firstly, load the package:

library(metaboWorkflows)
#> 
#> Attaching package: 'metaboWorkflows'
#> The following object is masked from 'package:base':
#> 
#>     args

Generating custom workflow targets

The target() function can be used for easy programmatic definition of a workflow target. The following defines a tar_target called a_target, that will execute the R expression 1 + 1, and includes the persistent memory argument with a preceding comment.

workflow_target <- target('a_target',
                          1 + 1,
                          type = 'tar_target',
                          args = list(memory = 'persistent'), 
                          comment = 'A target')

This creates an S4 object of class Target that contains the target definition. Printing workflow_target will display the R code for the target definition:

workflow_target
#> ## A target
#> tar_target(
#>   a_target,
#>   1 + 1,
#>   memory = "persistent"
#> )

The object can be further modified if needed using accessor methods for the Target class. For instance, the following will modify the target R code:

command(workflow_target) <- rlang::expr(1 * 2)

workflow_target
#> ## A target
#> tar_target(
#>   a_target,
#>   1 * 2,
#>   memory = "persistent"
#> )

See ?`Target-accessors` for more details of the available accessor methods.

It is recommended that the source package names, for any functions used in the R command, be specified using the pkg::function notation. This will ensure that these dependency packages can be detected and installed by renv during project directory generation.

Any custom targets from which either a plot or table output is to be included in the R Markdown report output, plot or summary should be included respectively in the target name to ensure that the relevant R Markdown code chunks are generated in the report. The table caption for targets prefixed with summary will be generated from the target name after summary, replacing _ with a space.

Targets prefixed with parameters and results will also generate R markdown report output chunks that will print the information about target object.

Modifying existing workflow template targets

We can first define an example workflow:

file_paths <- metaboData::filePaths('FIE-HRMS','BdistachyonEcotypes')
sample_information <- metaboData::runinfo('FIE-HRMS','BdistachyonEcotypes')

workflow_input <- inputFilePath(file_paths,sample_information)

workflow_definition <- defineWorkflow(workflow_input,
                                      'FIE-HRMS fingerprinting',
                                      'Example project')

Printing workflow_definition provides an overview of the definition.

workflow_definition
#> Workflow:  FIE-HRMS fingerprinting 
#> 
#> Project name: Example project 
#> Directory path: . 
#> Use renv: TRUE 
#> Docker: TRUE 
#> GitHub repository: FALSE 
#> Private repository: FALSE 
#> GitHub Actions: FALSE 
#> Parallel plan: jfmisc::suitableParallelPlan() 
#> Force creation: FALSE 
#> 
#> File path workflow input
#> # files: 68
#> 
#> # targets: 37

Inspecting a workflow

When modifying a workflow, it is essential to properly inspect the resulting pipeline, due to the interdependence of the workflow targets, The package contains a number of tools based on those available in the targets package, that facilitate the user to inspect workflow definitions prior to project generation.

A tibble of the containing information about the targets within a workflow definition can be returned using:

manifest(workflow_definition)
#> # A tibble: 42 × 3
#>    name                                    command                       pattern
#>    <chr>                                   <chr>                         <chr>  
#>  1 parameters_molecular_formula_assignment "assignments::assignmentPara… NA     
#>  2 parameters_correlations                 "metabolyseR::analysisParame… NA     
#>  3 file_paths_list                         "\"data/file_paths.txt\""     NA     
#>  4 sample_information_file                 "\"data/runinfo.csv\""        NA     
#>  5 mzML_files                              "readLines(file_paths_list)"  NA     
#>  6 sample_information                      "readr::read_csv(sample_info… NA     
#>  7 mzML                                    "mzML_files"                  map(mz…
#>  8 parameters_spectral_processing          "binneR::detectParameters(mz… NA     
#>  9 results_spectral_processing             "binneR::binneRlyse(mzML, sa… NA     
#> 10 parameters_pre_treatment                "metaboMisc::detectPretreatm… NA     
#> # ℹ 32 more rows

The workflow network graph can be plotted to visualise the links between the individual targets.

glimpse(workflow_definition)

This is useful when modifying a workflow as it allows its integrity to be visually inspected, ensuring that targets are correctly connected.

The workflow can also validated to check for any potential problems.

validate(workflow_definition)

An error or a warning will be thrown if problems are encountered.

Workflow structure

The template workflow targets are arranged into modules. These modules allow the user to specify groups of related targets. These also specify the individual section headings of the R Markdown report output.

Below shows the modules defined in the example workflow.

modules(workflow_definition)
#> [1] "input"                        "spectral_processing"         
#> [3] "pre_treatment"                "molecular_formula_assignment"
#> [5] "modelling"                    "correlations"                
#> [7] "report"

The workflow target definitions are stored as a list, nested by the modules. This list can be accessed using the targets() method. Below shows the targets available in the input module.

targets(workflow_definition)$input
#> $file_paths_list
#> ## Retrieve data file paths
#> tarchetypes::tar_file(
#>   file_paths_list,
#>   "data/file_paths.txt"
#> )
#> 
#> $mzML
#> ## Track individual data files
#> hrmtargets::tar_export(
#>   mzML,
#>   readLines(file_paths_list)
#> )
#> 
#> $sample_information_file
#> ## Sample information file path
#> tarchetypes::tar_file(
#>   sample_information_file,
#>   "data/runinfo.csv"
#> )
#> 
#> $sample_information
#> ## Parse sample information
#> tar_target(
#>   sample_information,
#>   readr::read_csv(sample_information_file)
#> )

Modifying individual workflow targets

There are a number of convenience methods available to facilitate modifying the individual targets within a workflow defnintion.

These methods include:

See ?`workflow-edit` for more details on these methods.

As a simple example, the following will remove the mzML target from the input module of the workflow definition.

workflow_definition <- targetRemove(workflow_definition,
                                 'input',
                                 'mzML')

The modified workflow can then be visualised.

glimpse(workflow_definition)

As can be seen above, the file_paths_list target is now isolated. This suggest that the generated workflow project may not function as expected and pipeline errors could be encountered.

Modifying workflow modules

Similarly as for modifying workflow targets, there are also methods for modifying whole module groups of targets.

These methods include:

See ?`workflow-edit` for more details on these target methods.

For example, the following will replace the spectral_processing module with a list group of alternative of targets.

workflow_definition <- moduleReplace(workflow_definition,
                                     'spectral_processing',
                                     list(
                                       a_target = target('a_target',
                                              1 + 1,
                                              args = list(memory = 'persistent'),
                                              comment = 'A target')

                                     ))

Then visualising the modified workflow definition to check the resulting pipeline.

glimpse(workflow_definition)

As can be seen above, there are now a number of targets that have been orphaned by the replacement of this module, including the replacement targets, where before the targets were connected between the modules.

With this modification, it is unlikely that the generated workflow project from this definition could be successfully executed by the user and errors would be encountered. Further modifications would be needed to ensure that a valid pipeline is generated.