Skip to contents

Perform functional enrichment analyses of explanatory features using the FELLA R package.

Usage

functionalEnrichment(
  x,
  organism,
  methods = availableMethods(),
  split = c("none", "trends"),
  organism_data = organismData(organism),
  adduct_rules_table = adduct_rules(),
  ...
)

# S4 method for RandomForest
functionalEnrichment(
  x,
  organism,
  methods = availableMethods(),
  split = c("none", "trends"),
  organism_data = organismData(organism),
  adduct_rules_table = adduct_rules(),
  ...
)

Arguments

x

object of S4 class RandomForest

organism

the KEGG code for the organism of interest

methods

the enrichment techniques to build. Any returned by availableMethods.

split

split the explanatory features into further groups based on their trends. See details.

organism_data

an object of S4 class FELLA.DATA

adduct_rules_table

the adduct ionisation rules for matching m/z features to KEGG compounds. Format should be as returned from mzAnnotation::adduct_rules.

...

arguments to pass to metabolyseR::explanatoryFeatures

Value

An object of S4 class FunctionalEnrichment.

Details

For argument split = 'trends', the explanatory features can be split into further groups based on their trends. This is not supported for unsupervised random forest.

For random forest classification, this is for binary comparisons only. Functional enrichment is performed seperately on the up and down regulated explanatory features for each comparison. The up regulated and down regulated groups are based on the trends of log2 ratios between the comparison classes. up regulated explanatory features have a higher median intensity in the right-hand class compared to the left-hand class of the comparison. The opposite is true for the down regulated explanatory features.

For random forest regression, the explanatory features are split based on their Spearman's correlation coefficient with the response variable prior to functional enrichment analysis giving positively correlated and negatively correlated subgroups.

Examples

## Perform random forest on the example data 
random_forest <- assigned_data %>% 
metabolyseR::randomForest(
  cls = 'class'
)

## Perform functional enrichment analysis
functionalEnrichment(
  random_forest,
  'bdi',
  methods = 'hypergeom',
  organism_data = organismData(
    'bdi',
    database_directory = system.file(
      'bdi',
      package = 'riches'),
    internal_directory = FALSE
  )
)
#> Loading KEGG graph data...
#> Done.
#> Loading hypergeom data...
#> Loading matrix...
#> Done.
#> Loading diffusion data...
#> Loading matrix...
#> 'diffusion.matrix.RData' not present in:/home/runner/work/_temp/Library/riches/bdi/diffusion.matrix.RData. Simulated permutations may execute slower for diffusion.
#> Done.
#> Loading rowSums...
#> 'diffusion.rowSums.RData' not present in:/home/runner/work/_temp/Library/riches/bdi/diffusion.rowSums.RData. Z-scores won't be available for diffusion.
#> Done.
#> Loading pagerank data...
#> Loading matrix...
#> 'pagerank.matrix.RData' not present in:/home/runner/work/_temp/Library/riches/bdi/pagerank.matrix.RData. Simulated permutations may execute slower for pagerank.
#> Done.
#> Loading rowSums...
#> 'pagerank.rowSums.RData' not present in:/home/runner/work/_temp/Library/riches/bdi/pagerank.rowSums.RData. Z-scores won't be available for pagerank.
#> Done.
#> Data successfully loaded.
#> 
#> class
#> ABR1~ABR5~ABR6~BD21
#> Running hypergeom...
#> Starting hypergeometric p-values calculation...
#> Done.
#> 
#> Random forest classification 
#> 
#> Samples:	 60 
#> Features:	 1706 
#> Response:	 class 
#> # comparisons:	 1 
#> 
#> General data:
#> - KEGG graph:
#>   * Nodes:  13085 
#>   * Edges:  42910 
#>   * Density:  0.0002506364624 
#>   * Categories:
#>     + pathway [136]
#>     + module [218]
#>     + enzyme [993]
#>     + reaction [6703]
#>     + compound [5035]
#>   * Size:  7.1 Mb 
#> - KEGG names are ready.
#> -----------------------------
#> Hypergeometric test:
#> - Matrix is ready
#>   * Dim:  5035 x 136 
#>   * Size:  473.8 Kb
#> -----------------------------
#> Heat diffusion:
#> - Matrix not loaded.
#> - RowSums not loaded.
#> -----------------------------
#> PageRank:
#> - Matrix not loaded.
#> - RowSums not loaded.
#> 
#> 87 m/z features matched to KEGG compounds.
#> 153 explanatory m/z features.
#> 
#> 
#> class_ABR1~ABR5~ABR6~BD21 
#> Compounds in the input: 41
#>  [1] "C00042" "C02170" "C00009" "C01620" "C01454" "C03765" "C06224" "C07085"
#>  [9] "C07086" "C07211" "C07215" "C10700" "C00168" "C00383" "C01146" "C09315"
#> [17] "C00493" "C04236" "C16588" "C17696" "C01179" "C01197" "C05350" "C12623"
#> [25] "C00257" "C00514" "C00800" "C00817" "C00880" "C15930" "C05533" "C01750"
#> [33] "C03951" "C12249" "C12626" "C16409" "C00209" "C03758" "C16666" "C06181"
#> [41] "C21525"
#> Background compounds: 117
#> -----------------------------
#> Hypergeometric test: ready.
#> Top 15 p-values:
#> bdi00010 bdi00020 bdi00030 bdi00040 bdi00051 bdi00052 bdi00053 bdi00061 
#>        1        1        1        1        1        1        1        1 
#> bdi00062 bdi00071 bdi00073 bdi00100 bdi00130 bdi00190 bdi00195 
#>        1        1        1        1        1        1        1 
#> 
#> -----------------------------
#> Heat diffusion: not performed
#> -----------------------------
#> PageRank: not performed

## An example using split trends
## Perform binary random forest classification on the example data 
random_forest <- assigned_data %>% 
  metabolyseR::randomForest(
    cls = 'class',
    binary = TRUE
  )

## Perform functional enrichment analysis
functionalEnrichment(
  random_forest,
  'bdi',
  methods = 'hypergeom',
  split = 'trends',
  organism_data = organismData(
    'bdi',
    database_directory = system.file(
      'bdi',
      package = 'riches'),
    internal_directory = FALSE
  )
)
#> Loading KEGG graph data...
#> Done.
#> Loading hypergeom data...
#> Loading matrix...
#> Done.
#> Loading diffusion data...
#> Loading matrix...
#> 'diffusion.matrix.RData' not present in:/home/runner/work/_temp/Library/riches/bdi/diffusion.matrix.RData. Simulated permutations may execute slower for diffusion.
#> Done.
#> Loading rowSums...
#> 'diffusion.rowSums.RData' not present in:/home/runner/work/_temp/Library/riches/bdi/diffusion.rowSums.RData. Z-scores won't be available for diffusion.
#> Done.
#> Loading pagerank data...
#> Loading matrix...
#> 'pagerank.matrix.RData' not present in:/home/runner/work/_temp/Library/riches/bdi/pagerank.matrix.RData. Simulated permutations may execute slower for pagerank.
#> Done.
#> Loading rowSums...
#> 'pagerank.rowSums.RData' not present in:/home/runner/work/_temp/Library/riches/bdi/pagerank.rowSums.RData. Z-scores won't be available for pagerank.
#> Done.
#> Data successfully loaded.
#> 
#> class
#> ABR1~ABR5
#> down regulated
#> Running hypergeom...
#> Starting hypergeometric p-values calculation...
#> Done.
#> 
#> class
#> ABR1~ABR5
#> up regulated
#> Running hypergeom...
#> Starting hypergeometric p-values calculation...
#> Done.
#> 
#> class
#> ABR1~ABR6
#> down regulated
#> Running hypergeom...
#> Starting hypergeometric p-values calculation...
#> Done.
#> 
#> class
#> ABR1~ABR6
#> up regulated
#> Running hypergeom...
#> Starting hypergeometric p-values calculation...
#> Done.
#> 
#> class
#> ABR1~BD21
#> down regulated
#> Running hypergeom...
#> Starting hypergeometric p-values calculation...
#> Done.
#> 
#> class
#> ABR1~BD21
#> up regulated
#> Running hypergeom...
#> Starting hypergeometric p-values calculation...
#> Done.
#> 
#> class
#> ABR5~ABR6
#> down regulated
#> Running hypergeom...
#> Starting hypergeometric p-values calculation...
#> Done.
#> 
#> class
#> ABR5~ABR6
#> up regulated
#> Running hypergeom...
#> Starting hypergeometric p-values calculation...
#> Done.
#> 
#> class
#> ABR5~BD21
#> down regulated
#> Running hypergeom...
#> Starting hypergeometric p-values calculation...
#> Done.
#> 
#> class
#> ABR5~BD21
#> up regulated
#> Running hypergeom...
#> Starting hypergeometric p-values calculation...
#> Done.
#> 
#> class
#> ABR6~BD21
#> down regulated
#> Running hypergeom...
#> Starting hypergeometric p-values calculation...
#> Done.
#> 
#> class
#> ABR6~BD21
#> up regulated
#> Running hypergeom...
#> Starting hypergeometric p-values calculation...
#> Done.
#> 
#> Random forest classification 
#> 
#> Samples:	 60 
#> Features:	 1706 
#> Response:	 class 
#> # comparisons:	 6 
#> 
#> General data:
#> - KEGG graph:
#>   * Nodes:  13085 
#>   * Edges:  42910 
#>   * Density:  0.0002506364624 
#>   * Categories:
#>     + pathway [136]
#>     + module [218]
#>     + enzyme [993]
#>     + reaction [6703]
#>     + compound [5035]
#>   * Size:  7.1 Mb 
#> - KEGG names are ready.
#> -----------------------------
#> Hypergeometric test:
#> - Matrix is ready
#>   * Dim:  5035 x 136 
#>   * Size:  473.8 Kb
#> -----------------------------
#> Heat diffusion:
#> - Matrix not loaded.
#> - RowSums not loaded.
#> -----------------------------
#> PageRank:
#> - Matrix not loaded.
#> - RowSums not loaded.
#> 
#> 87 m/z features matched to KEGG compounds.
#> 443 explanatory m/z features.
#> 
#> 
#> class_ABR1~ABR5_down regulated 
#> Compounds in the input: 14
#>  [1] "C00493" "C04236" "C16588" "C17696" "C01750" "C03951" "C12249" "C12626"
#>  [9] "C16409" "C00209" "C00183" "C00431" "C00719" "C15987"
#> Background compounds: 117
#> -----------------------------
#> Hypergeometric test: ready.
#> Top 15 p-values:
#>      bdi00944      bdi00010      bdi00020      bdi00030      bdi00040 
#> 0.01836274845 1.00000000000 1.00000000000 1.00000000000 1.00000000000 
#>      bdi00051      bdi00052      bdi00053      bdi00061      bdi00062 
#> 1.00000000000 1.00000000000 1.00000000000 1.00000000000 1.00000000000 
#>      bdi00071      bdi00073      bdi00100      bdi00130      bdi00190 
#> 1.00000000000 1.00000000000 1.00000000000 1.00000000000 1.00000000000 
#> 
#> -----------------------------
#> Heat diffusion: not performed
#> -----------------------------
#> PageRank: not performed
#> 
#> class_ABR1~ABR5_up regulated 
#> Compounds in the input: 1
#> [1] "C05533"
#> Background compounds: 117
#> -----------------------------
#> Hypergeometric test: ready.
#> Top 15 p-values:
#> bdi00010 bdi00020 bdi00030 bdi00040 bdi00051 bdi00052 bdi00053 bdi00061 
#>        1        1        1        1        1        1        1        1 
#> bdi00062 bdi00071 bdi00073 bdi00100 bdi00130 bdi00190 bdi00195 
#>        1        1        1        1        1        1        1 
#> 
#> -----------------------------
#> Heat diffusion: not performed
#> -----------------------------
#> PageRank: not performed
#> 
#> class_ABR1~ABR6_down regulated 
#> Compounds in the input: 14
#>  [1] "C00122" "C01384" "C00022" "C00222" "C00149" "C00497" "C03064" "C01750"
#>  [9] "C03951" "C12249" "C12626" "C16409" "C00209" "C00059"
#> Background compounds: 117
#> -----------------------------
#> Hypergeometric test: ready.
#> Top 15 p-values:
#>      bdi00944      bdi00650      bdi00230      bdi00010      bdi00020 
#> 0.01836274845 0.47823280545 0.54578579364 1.00000000000 1.00000000000 
#>      bdi00030      bdi00040      bdi00051      bdi00052      bdi00053 
#> 1.00000000000 1.00000000000 1.00000000000 1.00000000000 1.00000000000 
#>      bdi00061      bdi00062      bdi00071      bdi00073      bdi00100 
#> 1.00000000000 1.00000000000 1.00000000000 1.00000000000 1.00000000000 
#> 
#> -----------------------------
#> Heat diffusion: not performed
#> -----------------------------
#> PageRank: not performed
#> 
#> class_ABR1~ABR6_up regulated 
#> Compounds in the input: 12
#>  [1] "C00009" "C01454" "C03765" "C06224" "C07085" "C07086" "C07211" "C07215"
#>  [9] "C10700" "C09315" "C03758" "C16666"
#> Background compounds: 117
#> -----------------------------
#> Hypergeometric test: ready.
#> Top 15 p-values:
#>    bdi00195    bdi00260    bdi00410    bdi00510    bdi00564    bdi00565 
#> 0.634032634 0.634032634 0.634032634 0.634032634 0.634032634 0.634032634 
#>    bdi00740    bdi00780    bdi00902    bdi00965    bdi03015    bdi03018 
#> 0.634032634 0.634032634 0.634032634 0.634032634 0.634032634 0.634032634 
#>    bdi03030    bdi03060    bdi03410 
#> 0.634032634 0.634032634 0.634032634 
#> 
#> -----------------------------
#> Heat diffusion: not performed
#> -----------------------------
#> PageRank: not performed
#> 
#> class_ABR1~BD21_down regulated 
#> Compounds in the input: 13
#>  [1] "C00257" "C00514" "C00800" "C00817" "C00880" "C15930" "C01750" "C03951"
#>  [9] "C12249" "C12626" "C16409" "C06181" "C21525"
#> Background compounds: 117
#> -----------------------------
#> Hypergeometric test: ready.
#> Top 15 p-values:
#>      bdi00944      bdi00040      bdi00010      bdi00020      bdi00030 
#> 0.01311624889 0.59275588604 1.00000000000 1.00000000000 1.00000000000 
#>      bdi00051      bdi00052      bdi00053      bdi00061      bdi00062 
#> 1.00000000000 1.00000000000 1.00000000000 1.00000000000 1.00000000000 
#>      bdi00071      bdi00073      bdi00100      bdi00130      bdi00190 
#> 1.00000000000 1.00000000000 1.00000000000 1.00000000000 1.00000000000 
#> 
#> -----------------------------
#> Heat diffusion: not performed
#> -----------------------------
#> PageRank: not performed
#> 
#> class_ABR1~BD21_up regulated 
#> Compounds in the input: 23
#>  [1] "C00042" "C02170" "C01620" "C09315" "C01179" "C01197" "C05350" "C12623"
#>  [9] "C01494" "C12204" "C01454" "C03765" "C06224" "C07085" "C07086" "C07211"
#> [17] "C07215" "C10700" "C02646" "C05629" "C16706" "C03758" "C16666"
#> Background compounds: 117
#> -----------------------------
#> Hypergeometric test: ready.
#> Top 15 p-values:
#>     bdi00360     bdi00940     bdi00950     bdi00010     bdi00020     bdi00030 
#> 0.2477100692 0.6436571765 0.6436571765 1.0000000000 1.0000000000 1.0000000000 
#>     bdi00040     bdi00051     bdi00052     bdi00053     bdi00061     bdi00062 
#> 1.0000000000 1.0000000000 1.0000000000 1.0000000000 1.0000000000 1.0000000000 
#>     bdi00071     bdi00073     bdi00100 
#> 1.0000000000 1.0000000000 1.0000000000 
#> 
#> -----------------------------
#> Heat diffusion: not performed
#> -----------------------------
#> PageRank: not performed
#> 
#> class_ABR5~ABR6_down regulated 
#> Compounds in the input: 18
#>  [1] "C00122" "C01384" "C00022" "C00222" "C00149" "C00497" "C03064" "C00059"
#>  [9] "C00168" "C00383" "C01146" "C00158" "C00311" "C00679" "C03921" "C04575"
#> [17] "C20889" "C20896"
#> Background compounds: 117
#> -----------------------------
#> Hypergeometric test: ready.
#> Top 15 p-values:
#>     bdi00053     bdi00650     bdi00020     bdi01200     bdi00630     bdi00250 
#> 0.1374405752 0.1433068595 0.2622158130 0.2622158130 0.3298539545 0.3970016756 
#>     bdi04146     bdi00010     bdi00030     bdi00040     bdi00051     bdi00052 
#> 0.8926599210 1.0000000000 1.0000000000 1.0000000000 1.0000000000 1.0000000000 
#>     bdi00061     bdi00062     bdi00071 
#> 1.0000000000 1.0000000000 1.0000000000 
#> 
#> -----------------------------
#> Heat diffusion: not performed
#> -----------------------------
#> PageRank: not performed
#> 
#> class_ABR5~ABR6_up regulated 
#> Compounds in the input: 13
#>  [1] "C00009" "C01454" "C03765" "C06224" "C07085" "C07086" "C07211" "C07215"
#>  [9] "C10700" "C00183" "C00431" "C00719" "C15987"
#> Background compounds: 117
#> -----------------------------
#> Hypergeometric test: ready.
#> Top 15 p-values:
#>     bdi00071     bdi00195     bdi00260     bdi00280     bdi00310     bdi00330 
#> 0.5596707819 0.5596707819 0.5596707819 0.5596707819 0.5596707819 0.5596707819 
#>     bdi00410     bdi00510     bdi00564     bdi00565     bdi00740     bdi00780 
#> 0.5596707819 0.5596707819 0.5596707819 0.5596707819 0.5596707819 0.5596707819 
#>     bdi00902     bdi00970     bdi02010 
#> 0.5596707819 0.5596707819 0.5596707819 
#> 
#> -----------------------------
#> Heat diffusion: not performed
#> -----------------------------
#> PageRank: not performed
#> 
#> class_ABR5~BD21_down regulated 
#> Compounds in the input: 9
#> [1] "C00257" "C00514" "C00800" "C00817" "C00880" "C15930" "C00209" "C06181"
#> [9] "C21525"
#> Background compounds: 117
#> -----------------------------
#> Hypergeometric test: ready.
#> Top 15 p-values:
#>     bdi00040     bdi00010     bdi00020     bdi00030     bdi00051     bdi00052 
#> 0.2426873159 1.0000000000 1.0000000000 1.0000000000 1.0000000000 1.0000000000 
#>     bdi00053     bdi00061     bdi00062     bdi00071     bdi00073     bdi00100 
#> 1.0000000000 1.0000000000 1.0000000000 1.0000000000 1.0000000000 1.0000000000 
#>     bdi00130     bdi00190     bdi00195 
#> 1.0000000000 1.0000000000 1.0000000000 
#> 
#> -----------------------------
#> Heat diffusion: not performed
#> -----------------------------
#> PageRank: not performed
#> 
#> class_ABR5~BD21_up regulated 
#> Compounds in the input: 25
#>  [1] "C00042" "C02170" "C01620" "C09315" "C01179" "C01197" "C05350" "C12623"
#>  [9] "C01494" "C12204" "C05533" "C00183" "C00431" "C00719" "C15987" "C01454"
#> [17] "C03765" "C06224" "C07085" "C07086" "C07211" "C07215" "C10700" "C03758"
#> [25] "C16666"
#> Background compounds: 117
#> -----------------------------
#> Hypergeometric test: ready.
#> Top 15 p-values:
#> bdi00010 bdi00020 bdi00030 bdi00040 bdi00051 bdi00052 bdi00053 bdi00061 
#>        1        1        1        1        1        1        1        1 
#> bdi00062 bdi00071 bdi00073 bdi00100 bdi00130 bdi00190 bdi00195 
#>        1        1        1        1        1        1        1 
#> 
#> -----------------------------
#> Heat diffusion: not performed
#> -----------------------------
#> PageRank: not performed
#> 
#> class_ABR6~BD21_down regulated 
#> Compounds in the input: 19
#>  [1] "C00168" "C00383" "C01146" "C00257" "C00514" "C00800" "C00817" "C00880"
#>  [9] "C15930" "C01750" "C03951" "C12249" "C12626" "C16409" "C00209" "C00975"
#> [17] "C03459" "C06181" "C21525"
#> Background compounds: 117
#> -----------------------------
#> Hypergeometric test: ready.
#> Top 15 p-values:
#>      bdi00944      bdi00010      bdi00020      bdi00030      bdi00040 
#> 0.07110291008 1.00000000000 1.00000000000 1.00000000000 1.00000000000 
#>      bdi00051      bdi00052      bdi00053      bdi00061      bdi00062 
#> 1.00000000000 1.00000000000 1.00000000000 1.00000000000 1.00000000000 
#>      bdi00071      bdi00073      bdi00100      bdi00130      bdi00190 
#> 1.00000000000 1.00000000000 1.00000000000 1.00000000000 1.00000000000 
#> 
#> -----------------------------
#> Heat diffusion: not performed
#> -----------------------------
#> PageRank: not performed
#> 
#> class_ABR6~BD21_up regulated 
#> Compounds in the input: 19
#>  [1] "C00042" "C02170" "C01620" "C01454" "C03765" "C06224" "C07085" "C07086"
#>  [9] "C07211" "C07215" "C10700" "C01179" "C01197" "C05350" "C12623" "C00009"
#> [17] "C05533" "C01127" "C05946"
#> Background compounds: 117
#> -----------------------------
#> Hypergeometric test: ready.
#> Top 15 p-values:
#>     bdi00360     bdi00071     bdi00190     bdi00195     bdi00220     bdi00240 
#> 0.3910288852 0.5136155834 0.5136155834 0.5136155834 0.5136155834 0.5136155834 
#>     bdi00260     bdi00261     bdi00270     bdi00310     bdi00330     bdi00340 
#> 0.5136155834 0.5136155834 0.5136155834 0.5136155834 0.5136155834 0.5136155834 
#>     bdi00350     bdi00380     bdi00400 
#> 0.5136155834 0.5136155834 0.5136155834 
#> 
#> -----------------------------
#> Heat diffusion: not performed
#> -----------------------------
#> PageRank: not performed