Accessing raw mass spectrometry data from a grover API
Source:vignettes/client-usage.Rmd
client-usage.Rmd
Overview
This vignette outlines the grover
package client-side
functionality for accessing a raw mass spectrometry data repository
hosted by grover API. This functionality includes repository contents
querying, file transfer, file information retrieval, sample information
retrieval and raw data conversion.
To get started we can first load the grover
package.
Example API
For this example we will run an example grover API provided by the package. This will run as a background process to allow us to interact with the API without having to use an alterative R console. To activate this run the following:
grover_host <- grover(host = "127.0.0.1",
port = 8000,
auth = "1234",
repository = system.file('repository',
package = 'grover'))
api <- groverAPI(grover_host,
background = TRUE,
log_dir = paste0(tempdir(),'/logs'),
temp_dir = paste0(tempdir(),'/temp'))
For further details on hosting a grover API see the Hosting a grover API vignette.
grover API host details
In order to access the API, we need to first provide the host details of the grover API. There are two ways to do this. The primary method is through the use of a configuration file that can then be parsed and specified when the API is activated. This should be in YAML format and have the structure shown below:
host: 127.0.0.1
port: 8000
auth: 1234
This specifies the host address, the port, and authentication key that matches the key with which the host has been configured.
The package contains an example file that we can parse here using the
readGrover()
function.
grover_client <- readGrover(system.file('grover_client.yml',package = 'grover'))
This returns an S4 object of class GroverClient
that
contains the API host information. The host information can be viewed by
printing the object:
print(grover_client)
#>
#> Grover Information
#>
#> Host: 127.0.0.1
#> Port: 8000
#> Authentication: 1234
The second method, and the method that will be used for this example,
is to specify the host details directly using the grover()
function like the following:
grover_client <- grover(host = "127.0.0.1",
port = 8000,
auth = "1234")
This enables us to access the grover API hosting the example data repository included in the package.
Using this host information, we can first check that the API is
running using the extant()
function as shown below:
extant(grover_client)
#> [1] TRUE
Querying the data repository contents
We can list the instruments available within the data reposiroty using the following:
listInstruments(grover_client)
#> [1] "Thermo-Exactive"
As can be seen above, there is a single instrument available named Thermo-Exactive. To list the experiment directories available for this instrument, we can use the code below.
listDirectories(grover_client,'Thermo-Exactive')
#> [1] "Experiment_1"
This shows a single experiment data directory available. We can then list the contents of this directory to identify the data files available:
listFiles(grover_client,'Thermo-Exactive','Experiment_1')
#> [1] "QC01.raw"
We can see that there is a single raw data file available in this example repository.
File information
File Information such as file size and creation date be retrieved. We
can see this for the example file QC01.raw
using the
following:
fileInfo(grover_client,'Thermo-Exactive','Experiment_1','QC01.raw')
#> # A tibble: 1 × 6
#> instrument directory file extension size birth_time
#> <chr> <chr> <chr> <chr> <fs::bytes> <dbl>
#> 1 Thermo-Exactive Experiment_1 QC01.raw raw 6.88M 1676905442.
This can also be done directory wide when multiple files are available.
directoryFileInfo(grover_client,'Thermo-Exactive','Experiment_1')
#> # A tibble: 1 × 6
#> instrument directory file extension size birth_time
#> <chr> <chr> <chr> <chr> <fs::bytes> <dbl>
#> 1 Thermo-Exactive Experiment_1 QC01.raw raw 6.88M 1676905442.
Transfer files
Individual files can be transferred from the repository using
transferFile()
, stipulating the instrument, experiment
directory and file name. The outDir
argument allows us to
declare where the file will be downloaded to. In the example below, the
file will be transfered to the current working directory.
transferFile(grover_client,
'Thermo-Exactive',
'Experiment_1',
'QC01.raw',
outDir = '.')
Similarly, we can transfer an entire directory:
transferDirectory(grover_client,
'Thermo-Exactive',
'Experiment_1',
outDir = '.')
Sample information
Thermo .raw
mass spectrometry data files contain sample
meta information within the file headers. This can be extracted and
retrieved, in the form of a tibble, for a given file using:
sampleInfo(grover_client,'Thermo-Exactive','Experiment_1','QC01.raw')
#>
QC01.raw
[32m✔
[39m#>
[38;5;246m# A tibble: 1 × 39
[39m
#> `RAW file` RAW file …¹ Creat…² Opera…³ Numbe…⁴ Descr…⁵ Instr…⁶ Instr…⁷ Instr…⁸
#>
[3m
[38;5;246m<chr>
[39m
[23m
[3m
[38;5;246m<chr>
[39m
[23m
[3m
[38;5;246m<chr>
[39m
[23m
[3m
[38;5;246m<chr>
[39m
[23m
[3m
[38;5;246m<dbl>
[39m
[23m
[3m
[38;5;246m<chr>
[39m
[23m
[3m
[38;5;246m<chr>
[39m
[23m
[3m
[38;5;246m<chr>
[39m
[23m
[3m
[38;5;246m<chr>
[39m
[23m
#>
[38;5;250m1
[39m QC01.raw 64 04/25/… Thermo 2
[38;5;246m"
[39m
[38;5;246m"
[39m Exacti… Thermo… C:/Xca…
#>
[38;5;246m# … with 30 more variables: `Serial number` <chr>, `Software version` <chr>,
[39m
#>
[38;5;246m# `Firmware version` <chr>, Units <chr>, `Mass resolution` <chr>,
[39m
#>
[38;5;246m# `Number of scans` <dbl>, `Number of ms2 scans` <dbl>, `Scan range` <dbl>,
[39m
#>
[38;5;246m# `Time range` <dbl>, `Mass range` <dbl>, `Scan filter (first scan)` <chr>,
[39m
#>
[38;5;246m# `Scan filter (last scan)` <chr>, `Total number of filters` <chr>,
[39m
#>
[38;5;246m# `Sample name` <chr>, `Sample id` <chr>, `Sample type` <chr>,
[39m
#>
[38;5;246m# `Sample comment` <chr>, `Sample vial` <chr>, `Sample volume` <chr>, …
[39m
Similarly, the sample information for an entire experiment run can be retrieved with:
runInfo(grover_client,'Thermo-Exactive','Experiment_1')
#>
#> Genrating run info table for Experiment_1 containing 1 .raw files
#> # A tibble: 1 × 39
#> `RAW file` RAW file …¹ Creat…² Opera…³ Numbe…⁴ Descr…⁵ Instr…⁶ Instr…⁷ Instr…⁸
#> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr>
#> 1 QC01.raw 64 04/25/… Thermo 2 "" Exacti… Thermo… C:/Xca…
#> # … with 30 more variables: `Serial number` <chr>, `Software version` <chr>,
#> # `Firmware version` <chr>, Units <chr>, `Mass resolution` <chr>,
#> # `Number of scans` <dbl>, `Number of ms2 scans` <dbl>, `Scan range` <dbl>,
#> # `Time range` <dbl>, `Mass range` <dbl>, `Scan filter (first scan)` <chr>,
#> # `Scan filter (last scan)` <chr>, `Total number of filters` <chr>,
#> # `Sample name` <chr>, `Sample id` <chr>, `Sample type` <chr>,
#> # `Sample comment` <chr>, `Sample vial` <chr>, `Sample volume` <chr>, …
Raw file conversion to .mzML format
With grover
it is also possible retrieve
.mzML
format data files, converted from the
.raw
files. This file conversion uses the command line tool
msconvert,
implemented in R by the msconverteR
package.
File conversion
To retrieve the example .raw
file in .mzML
format, the convertFile()
function can be used. This takes
similar inputs as transferFile()
shown previously.
convertFile(grover_client,
'Thermo-Exactive',
'Experiment_1',
'QC01.raw',
outDir = '.')
convertDirectory(grover_client,
'Thermo-Exactive',
'Experiment_1',
outDir = '.')
Conversion arguments
The args
argument can be supplied to these conversion
functions pass specific conversion criterial to msconvert
.
The grover
package contains a number of helper functions to
simplify their use. The available functions are listed below.
conversionArgsMSlevel1
conversionArgsMSlevel2
conversionArgsMSlevel3
conversionArgsNegativeMode
conversionArgsPeakPick
conversionArgsPositiveMode
Calling these functions return the appropriate string argument that is to be passed to msconvert.
conversionArgsPeakPick()
#> [1] "peakPicking true 1-"
Mutiple functions can also be combined.
paste(conversionArgsPeakPick(),conversionArgsNegativeMode())
#> [1] "peakPicking true 1- polarity negative"
A full list of the available msconvert arguments can be found here. The
example below shows the use of the conversionArgsPeakPick()
to retrieve a centroided data in .mzML
format.
convertFile(grover_client,
'Thermo-Exactive',
'Experiment_1',
'QC01.raw',
args = conversionArgsPeakPick(),
outDir = '.')