Skip to contents

Overview

This vignette outlines the steps required for hosting a web API for providing web access to a mass spectrometry data repository. This includes the repository structure, specifying the host configuration, running the API and request logging.

To get started we can first load the grover package.

Data repository structure

The structure of the data repository should consist of a hierarchy of directories within a central data repository directory. The structure is outlined below.


[34mrepository
[39m
└──
[34minstrument
[39m
   └──
[34mexperiment
[39m
      └──
[32msample.raw
[39m

Within the central data repository directory are instrument directories specifying the mass spectrometers from which the data were generated. Within these instrument directories, the data should be organised into directories for each experiment. The experiment directories should then contain the .raw mass spectrometry data files and any other associated files such meta information or instrument methods.

Configuration

In order to activate the API, we first need to specify the host details. This can be done in two ways. The primary method is through the use of a configuration file that can then be parsed and specified when the API is activated. This should be in YAML format and have the structure shown below:

host: 127.0.0.1
port: 8000
auth: 1234
repository: ./data

This specifies the host address, the port, and authentication key for data security and the host system directory path to the data repository.

The package contains an example file that we can parse here using the readGrover() function.

grover_host <- readGrover(system.file('grover_host.yml',package = 'grover'))

This returns an S4 object of class GroverHost that contains the API host information. The host information can be viewed by printing the object:

print(grover_host)
#> 
#> Grover Information
#> 
#> Host:        127.0.0.1 
#> Port:         8000 
#> Authentication:   1234 
#> Repository:   ./data

The second method, and the method that will be used for this example, is to specify the host details directly using the grover() function like the following:

grover_host <- grover(host = "127.0.0.1",
                     port = 8000,
                     auth = "1234",
                     repository = system.file('repository',
                                              package = 'grover'))

This enables us to host the example data repository included in the package.

Running the API

The API can be activated using groverAPI(). For the purposes of this example, the background argument will also be specified as TRUE to enable the API to be run in a background process. This will enable us to interact with the API without having to move to an alternative R console. Run the following to start the API:

api <- groverAPI(grover_host,
                 background = TRUE,
                 log_dir = paste0(tempdir(),'/logs'),
                 temp_dir = paste0(tempdir(),'/temp'))

The log_dir argument has also been specified. See the final section for details on request logging by the API.

Running the API in the background returns a callr package process object. We can test the status of the background process in which the API is running using:

api$is_alive()
#> [1] TRUE

To test the API from the client side, the extant() function can be used to test if the API is live.

extant(grover_host)
#> [1] TRUE

For further details on client-side access to the hosted mass spectrometry data, see the Accessing raw mass spectrometry data from a grover API vignette.

Finally, the background API process can be terminated using:

api$kill()
#> [1] TRUE

Further details on options for securely deploying web APIs can be found as part of the plumber package documentation here.

Request logging

The log_dir argument can also be found in the call to groverAPI() in the previous section. In this example, the logs directory within the temporary directory has been specified. The snippet below can be used to identify the log files generated by our API request.

logs <- list.files(paste0(tempdir(),'/logs'),full.names = TRUE)
logs
#> [1] "/tmp/RtmpG5uCr0/logs/grover_2023-02-20.log"

The contents of this log file can be access as below:

readLines(logs[1])
#> [1] "INFO [2023-02-20 15:06:04] GET /extant 1234 200 0.034"

The log entry above shows our single request using extant() whilst the API was live. This log entry also includes, in order of appearance, the data and time of the request, the request type, the requested function, the auth argument specified, the status of the request and the processing time of the request.