Perform functional enrichment analyses of explanatory features using the FELLA R package.
Usage
functionalEnrichment(
x,
organism,
methods = availableMethods(),
split = c("none", "trends"),
organism_data = organismData(organism),
adduct_rules_table = adduct_rules(),
...
)
# S4 method for RandomForest
functionalEnrichment(
x,
organism,
methods = availableMethods(),
split = c("none", "trends"),
organism_data = organismData(organism),
adduct_rules_table = adduct_rules(),
...
)
Arguments
- x
object of S4 class
RandomForest
- organism
the KEGG code for the organism of interest
- methods
the enrichment techniques to build. Any returned by
availableMethods
.- split
split the explanatory features into further groups based on their trends. See details.
- organism_data
an object of S4 class
FELLA.DATA
- adduct_rules_table
the adduct ionisation rules for matching m/z features to KEGG compounds. Format should be as returned from
mzAnnotation::adduct_rules
.- ...
arguments to pass to
metabolyseR::explanatoryFeatures
Details
For argument split = 'trends'
, the explanatory features can be split into further groups
based on their trends. This is not supported for unsupervised random forest.
For random forest classification, this is for binary comparisons only. Functional enrichment
is performed seperately on the up and down regulated explanatory features for each comparison. The
up regulated
and down regulated
groups are based on the trends of log2 ratios between
the comparison classes. up regulated
explanatory features have a higher median intensity
in the right-hand class compared to the left-hand class of the comparison. The opposite is true
for the down regulated
explanatory features.
For random forest regression, the explanatory features are split based on their Spearman's
correlation coefficient with the response variable prior to functional enrichment analysis
giving positively correlated
and negatively correlated
subgroups.
Examples
## Perform random forest on the example data
random_forest <- assigned_data %>%
metabolyseR::randomForest(
cls = 'class'
)
## Perform functional enrichment analysis
functionalEnrichment(
random_forest,
'bdi',
methods = 'hypergeom',
organism_data = organismData(
'bdi',
database_directory = system.file(
'bdi',
package = 'riches'),
internal_directory = FALSE
)
)
#> Loading KEGG graph data...
#> Done.
#> Loading hypergeom data...
#> Loading matrix...
#> Done.
#> Loading diffusion data...
#> Loading matrix...
#> 'diffusion.matrix.RData' not present in:/home/runner/work/_temp/Library/riches/bdi/diffusion.matrix.RData. Simulated permutations may execute slower for diffusion.
#> Done.
#> Loading rowSums...
#> 'diffusion.rowSums.RData' not present in:/home/runner/work/_temp/Library/riches/bdi/diffusion.rowSums.RData. Z-scores won't be available for diffusion.
#> Done.
#> Loading pagerank data...
#> Loading matrix...
#> 'pagerank.matrix.RData' not present in:/home/runner/work/_temp/Library/riches/bdi/pagerank.matrix.RData. Simulated permutations may execute slower for pagerank.
#> Done.
#> Loading rowSums...
#> 'pagerank.rowSums.RData' not present in:/home/runner/work/_temp/Library/riches/bdi/pagerank.rowSums.RData. Z-scores won't be available for pagerank.
#> Done.
#> Data successfully loaded.
#>
#> class
#> ABR1~ABR5~ABR6~BD21
#> Running hypergeom...
#> Starting hypergeometric p-values calculation...
#> Done.
#>
#> Random forest classification
#>
#> Samples: 60
#> Features: 1706
#> Response: class
#> # comparisons: 1
#>
#> General data:
#> - KEGG graph:
#> * Nodes: 13085
#> * Edges: 42910
#> * Density: 0.0002506364624
#> * Categories:
#> + pathway [136]
#> + module [218]
#> + enzyme [993]
#> + reaction [6703]
#> + compound [5035]
#> * Size: 7.1 Mb
#> - KEGG names are ready.
#> -----------------------------
#> Hypergeometric test:
#> - Matrix is ready
#> * Dim: 5035 x 136
#> * Size: 473.8 Kb
#> -----------------------------
#> Heat diffusion:
#> - Matrix not loaded.
#> - RowSums not loaded.
#> -----------------------------
#> PageRank:
#> - Matrix not loaded.
#> - RowSums not loaded.
#>
#> 87 m/z features matched to KEGG compounds.
#> 153 explanatory m/z features.
#>
#>
#> class_ABR1~ABR5~ABR6~BD21
#> Compounds in the input: 41
#> [1] "C00042" "C02170" "C00009" "C01620" "C01454" "C03765" "C06224" "C07085"
#> [9] "C07086" "C07211" "C07215" "C10700" "C00168" "C00383" "C01146" "C09315"
#> [17] "C00493" "C04236" "C16588" "C17696" "C01179" "C01197" "C05350" "C12623"
#> [25] "C00257" "C00514" "C00800" "C00817" "C00880" "C15930" "C05533" "C01750"
#> [33] "C03951" "C12249" "C12626" "C16409" "C00209" "C03758" "C16666" "C06181"
#> [41] "C21525"
#> Background compounds: 117
#> -----------------------------
#> Hypergeometric test: ready.
#> Top 15 p-values:
#> bdi00010 bdi00020 bdi00030 bdi00040 bdi00051 bdi00052 bdi00053 bdi00061
#> 1 1 1 1 1 1 1 1
#> bdi00062 bdi00071 bdi00073 bdi00100 bdi00130 bdi00190 bdi00195
#> 1 1 1 1 1 1 1
#>
#> -----------------------------
#> Heat diffusion: not performed
#> -----------------------------
#> PageRank: not performed
## An example using split trends
## Perform binary random forest classification on the example data
random_forest <- assigned_data %>%
metabolyseR::randomForest(
cls = 'class',
binary = TRUE
)
## Perform functional enrichment analysis
functionalEnrichment(
random_forest,
'bdi',
methods = 'hypergeom',
split = 'trends',
organism_data = organismData(
'bdi',
database_directory = system.file(
'bdi',
package = 'riches'),
internal_directory = FALSE
)
)
#> Loading KEGG graph data...
#> Done.
#> Loading hypergeom data...
#> Loading matrix...
#> Done.
#> Loading diffusion data...
#> Loading matrix...
#> 'diffusion.matrix.RData' not present in:/home/runner/work/_temp/Library/riches/bdi/diffusion.matrix.RData. Simulated permutations may execute slower for diffusion.
#> Done.
#> Loading rowSums...
#> 'diffusion.rowSums.RData' not present in:/home/runner/work/_temp/Library/riches/bdi/diffusion.rowSums.RData. Z-scores won't be available for diffusion.
#> Done.
#> Loading pagerank data...
#> Loading matrix...
#> 'pagerank.matrix.RData' not present in:/home/runner/work/_temp/Library/riches/bdi/pagerank.matrix.RData. Simulated permutations may execute slower for pagerank.
#> Done.
#> Loading rowSums...
#> 'pagerank.rowSums.RData' not present in:/home/runner/work/_temp/Library/riches/bdi/pagerank.rowSums.RData. Z-scores won't be available for pagerank.
#> Done.
#> Data successfully loaded.
#>
#> class
#> ABR1~ABR5
#> down regulated
#> Running hypergeom...
#> Starting hypergeometric p-values calculation...
#> Done.
#>
#> class
#> ABR1~ABR5
#> up regulated
#> Running hypergeom...
#> Starting hypergeometric p-values calculation...
#> Done.
#>
#> class
#> ABR1~ABR6
#> down regulated
#> Running hypergeom...
#> Starting hypergeometric p-values calculation...
#> Done.
#>
#> class
#> ABR1~ABR6
#> up regulated
#> Running hypergeom...
#> Starting hypergeometric p-values calculation...
#> Done.
#>
#> class
#> ABR1~BD21
#> down regulated
#> Running hypergeom...
#> Starting hypergeometric p-values calculation...
#> Done.
#>
#> class
#> ABR1~BD21
#> up regulated
#> Running hypergeom...
#> Starting hypergeometric p-values calculation...
#> Done.
#>
#> class
#> ABR5~ABR6
#> down regulated
#> Running hypergeom...
#> Starting hypergeometric p-values calculation...
#> Done.
#>
#> class
#> ABR5~ABR6
#> up regulated
#> Running hypergeom...
#> Starting hypergeometric p-values calculation...
#> Done.
#>
#> class
#> ABR5~BD21
#> down regulated
#> Running hypergeom...
#> Starting hypergeometric p-values calculation...
#> Done.
#>
#> class
#> ABR5~BD21
#> up regulated
#> Running hypergeom...
#> Starting hypergeometric p-values calculation...
#> Done.
#>
#> class
#> ABR6~BD21
#> down regulated
#> Running hypergeom...
#> Starting hypergeometric p-values calculation...
#> Done.
#>
#> class
#> ABR6~BD21
#> up regulated
#> Running hypergeom...
#> Starting hypergeometric p-values calculation...
#> Done.
#>
#> Random forest classification
#>
#> Samples: 60
#> Features: 1706
#> Response: class
#> # comparisons: 6
#>
#> General data:
#> - KEGG graph:
#> * Nodes: 13085
#> * Edges: 42910
#> * Density: 0.0002506364624
#> * Categories:
#> + pathway [136]
#> + module [218]
#> + enzyme [993]
#> + reaction [6703]
#> + compound [5035]
#> * Size: 7.1 Mb
#> - KEGG names are ready.
#> -----------------------------
#> Hypergeometric test:
#> - Matrix is ready
#> * Dim: 5035 x 136
#> * Size: 473.8 Kb
#> -----------------------------
#> Heat diffusion:
#> - Matrix not loaded.
#> - RowSums not loaded.
#> -----------------------------
#> PageRank:
#> - Matrix not loaded.
#> - RowSums not loaded.
#>
#> 87 m/z features matched to KEGG compounds.
#> 443 explanatory m/z features.
#>
#>
#> class_ABR1~ABR5_down regulated
#> Compounds in the input: 14
#> [1] "C00493" "C04236" "C16588" "C17696" "C01750" "C03951" "C12249" "C12626"
#> [9] "C16409" "C00209" "C00183" "C00431" "C00719" "C15987"
#> Background compounds: 117
#> -----------------------------
#> Hypergeometric test: ready.
#> Top 15 p-values:
#> bdi00944 bdi00010 bdi00020 bdi00030 bdi00040
#> 0.01836274845 1.00000000000 1.00000000000 1.00000000000 1.00000000000
#> bdi00051 bdi00052 bdi00053 bdi00061 bdi00062
#> 1.00000000000 1.00000000000 1.00000000000 1.00000000000 1.00000000000
#> bdi00071 bdi00073 bdi00100 bdi00130 bdi00190
#> 1.00000000000 1.00000000000 1.00000000000 1.00000000000 1.00000000000
#>
#> -----------------------------
#> Heat diffusion: not performed
#> -----------------------------
#> PageRank: not performed
#>
#> class_ABR1~ABR5_up regulated
#> Compounds in the input: 1
#> [1] "C05533"
#> Background compounds: 117
#> -----------------------------
#> Hypergeometric test: ready.
#> Top 15 p-values:
#> bdi00010 bdi00020 bdi00030 bdi00040 bdi00051 bdi00052 bdi00053 bdi00061
#> 1 1 1 1 1 1 1 1
#> bdi00062 bdi00071 bdi00073 bdi00100 bdi00130 bdi00190 bdi00195
#> 1 1 1 1 1 1 1
#>
#> -----------------------------
#> Heat diffusion: not performed
#> -----------------------------
#> PageRank: not performed
#>
#> class_ABR1~ABR6_down regulated
#> Compounds in the input: 14
#> [1] "C00122" "C01384" "C00022" "C00222" "C00149" "C00497" "C03064" "C01750"
#> [9] "C03951" "C12249" "C12626" "C16409" "C00209" "C00059"
#> Background compounds: 117
#> -----------------------------
#> Hypergeometric test: ready.
#> Top 15 p-values:
#> bdi00944 bdi00650 bdi00230 bdi00010 bdi00020
#> 0.01836274845 0.47823280545 0.54578579364 1.00000000000 1.00000000000
#> bdi00030 bdi00040 bdi00051 bdi00052 bdi00053
#> 1.00000000000 1.00000000000 1.00000000000 1.00000000000 1.00000000000
#> bdi00061 bdi00062 bdi00071 bdi00073 bdi00100
#> 1.00000000000 1.00000000000 1.00000000000 1.00000000000 1.00000000000
#>
#> -----------------------------
#> Heat diffusion: not performed
#> -----------------------------
#> PageRank: not performed
#>
#> class_ABR1~ABR6_up regulated
#> Compounds in the input: 12
#> [1] "C00009" "C01454" "C03765" "C06224" "C07085" "C07086" "C07211" "C07215"
#> [9] "C10700" "C09315" "C03758" "C16666"
#> Background compounds: 117
#> -----------------------------
#> Hypergeometric test: ready.
#> Top 15 p-values:
#> bdi00195 bdi00260 bdi00410 bdi00510 bdi00564 bdi00565
#> 0.634032634 0.634032634 0.634032634 0.634032634 0.634032634 0.634032634
#> bdi00740 bdi00780 bdi00902 bdi00965 bdi03015 bdi03018
#> 0.634032634 0.634032634 0.634032634 0.634032634 0.634032634 0.634032634
#> bdi03030 bdi03060 bdi03410
#> 0.634032634 0.634032634 0.634032634
#>
#> -----------------------------
#> Heat diffusion: not performed
#> -----------------------------
#> PageRank: not performed
#>
#> class_ABR1~BD21_down regulated
#> Compounds in the input: 13
#> [1] "C00257" "C00514" "C00800" "C00817" "C00880" "C15930" "C01750" "C03951"
#> [9] "C12249" "C12626" "C16409" "C06181" "C21525"
#> Background compounds: 117
#> -----------------------------
#> Hypergeometric test: ready.
#> Top 15 p-values:
#> bdi00944 bdi00040 bdi00010 bdi00020 bdi00030
#> 0.01311624889 0.59275588604 1.00000000000 1.00000000000 1.00000000000
#> bdi00051 bdi00052 bdi00053 bdi00061 bdi00062
#> 1.00000000000 1.00000000000 1.00000000000 1.00000000000 1.00000000000
#> bdi00071 bdi00073 bdi00100 bdi00130 bdi00190
#> 1.00000000000 1.00000000000 1.00000000000 1.00000000000 1.00000000000
#>
#> -----------------------------
#> Heat diffusion: not performed
#> -----------------------------
#> PageRank: not performed
#>
#> class_ABR1~BD21_up regulated
#> Compounds in the input: 23
#> [1] "C00042" "C02170" "C01620" "C09315" "C01179" "C01197" "C05350" "C12623"
#> [9] "C01494" "C12204" "C01454" "C03765" "C06224" "C07085" "C07086" "C07211"
#> [17] "C07215" "C10700" "C02646" "C05629" "C16706" "C03758" "C16666"
#> Background compounds: 117
#> -----------------------------
#> Hypergeometric test: ready.
#> Top 15 p-values:
#> bdi00360 bdi00940 bdi00950 bdi00010 bdi00020 bdi00030
#> 0.2477100692 0.6436571765 0.6436571765 1.0000000000 1.0000000000 1.0000000000
#> bdi00040 bdi00051 bdi00052 bdi00053 bdi00061 bdi00062
#> 1.0000000000 1.0000000000 1.0000000000 1.0000000000 1.0000000000 1.0000000000
#> bdi00071 bdi00073 bdi00100
#> 1.0000000000 1.0000000000 1.0000000000
#>
#> -----------------------------
#> Heat diffusion: not performed
#> -----------------------------
#> PageRank: not performed
#>
#> class_ABR5~ABR6_down regulated
#> Compounds in the input: 18
#> [1] "C00122" "C01384" "C00022" "C00222" "C00149" "C00497" "C03064" "C00059"
#> [9] "C00168" "C00383" "C01146" "C00158" "C00311" "C00679" "C03921" "C04575"
#> [17] "C20889" "C20896"
#> Background compounds: 117
#> -----------------------------
#> Hypergeometric test: ready.
#> Top 15 p-values:
#> bdi00053 bdi00650 bdi00020 bdi01200 bdi00630 bdi00250
#> 0.1374405752 0.1433068595 0.2622158130 0.2622158130 0.3298539545 0.3970016756
#> bdi04146 bdi00010 bdi00030 bdi00040 bdi00051 bdi00052
#> 0.8926599210 1.0000000000 1.0000000000 1.0000000000 1.0000000000 1.0000000000
#> bdi00061 bdi00062 bdi00071
#> 1.0000000000 1.0000000000 1.0000000000
#>
#> -----------------------------
#> Heat diffusion: not performed
#> -----------------------------
#> PageRank: not performed
#>
#> class_ABR5~ABR6_up regulated
#> Compounds in the input: 13
#> [1] "C00009" "C01454" "C03765" "C06224" "C07085" "C07086" "C07211" "C07215"
#> [9] "C10700" "C00183" "C00431" "C00719" "C15987"
#> Background compounds: 117
#> -----------------------------
#> Hypergeometric test: ready.
#> Top 15 p-values:
#> bdi00071 bdi00195 bdi00260 bdi00280 bdi00310 bdi00330
#> 0.5596707819 0.5596707819 0.5596707819 0.5596707819 0.5596707819 0.5596707819
#> bdi00410 bdi00510 bdi00564 bdi00565 bdi00740 bdi00780
#> 0.5596707819 0.5596707819 0.5596707819 0.5596707819 0.5596707819 0.5596707819
#> bdi00902 bdi00970 bdi02010
#> 0.5596707819 0.5596707819 0.5596707819
#>
#> -----------------------------
#> Heat diffusion: not performed
#> -----------------------------
#> PageRank: not performed
#>
#> class_ABR5~BD21_down regulated
#> Compounds in the input: 9
#> [1] "C00257" "C00514" "C00800" "C00817" "C00880" "C15930" "C00209" "C06181"
#> [9] "C21525"
#> Background compounds: 117
#> -----------------------------
#> Hypergeometric test: ready.
#> Top 15 p-values:
#> bdi00040 bdi00010 bdi00020 bdi00030 bdi00051 bdi00052
#> 0.2426873159 1.0000000000 1.0000000000 1.0000000000 1.0000000000 1.0000000000
#> bdi00053 bdi00061 bdi00062 bdi00071 bdi00073 bdi00100
#> 1.0000000000 1.0000000000 1.0000000000 1.0000000000 1.0000000000 1.0000000000
#> bdi00130 bdi00190 bdi00195
#> 1.0000000000 1.0000000000 1.0000000000
#>
#> -----------------------------
#> Heat diffusion: not performed
#> -----------------------------
#> PageRank: not performed
#>
#> class_ABR5~BD21_up regulated
#> Compounds in the input: 25
#> [1] "C00042" "C02170" "C01620" "C09315" "C01179" "C01197" "C05350" "C12623"
#> [9] "C01494" "C12204" "C05533" "C00183" "C00431" "C00719" "C15987" "C01454"
#> [17] "C03765" "C06224" "C07085" "C07086" "C07211" "C07215" "C10700" "C03758"
#> [25] "C16666"
#> Background compounds: 117
#> -----------------------------
#> Hypergeometric test: ready.
#> Top 15 p-values:
#> bdi00010 bdi00020 bdi00030 bdi00040 bdi00051 bdi00052 bdi00053 bdi00061
#> 1 1 1 1 1 1 1 1
#> bdi00062 bdi00071 bdi00073 bdi00100 bdi00130 bdi00190 bdi00195
#> 1 1 1 1 1 1 1
#>
#> -----------------------------
#> Heat diffusion: not performed
#> -----------------------------
#> PageRank: not performed
#>
#> class_ABR6~BD21_down regulated
#> Compounds in the input: 19
#> [1] "C00168" "C00383" "C01146" "C00257" "C00514" "C00800" "C00817" "C00880"
#> [9] "C15930" "C01750" "C03951" "C12249" "C12626" "C16409" "C00209" "C00975"
#> [17] "C03459" "C06181" "C21525"
#> Background compounds: 117
#> -----------------------------
#> Hypergeometric test: ready.
#> Top 15 p-values:
#> bdi00944 bdi00010 bdi00020 bdi00030 bdi00040
#> 0.07110291008 1.00000000000 1.00000000000 1.00000000000 1.00000000000
#> bdi00051 bdi00052 bdi00053 bdi00061 bdi00062
#> 1.00000000000 1.00000000000 1.00000000000 1.00000000000 1.00000000000
#> bdi00071 bdi00073 bdi00100 bdi00130 bdi00190
#> 1.00000000000 1.00000000000 1.00000000000 1.00000000000 1.00000000000
#>
#> -----------------------------
#> Heat diffusion: not performed
#> -----------------------------
#> PageRank: not performed
#>
#> class_ABR6~BD21_up regulated
#> Compounds in the input: 19
#> [1] "C00042" "C02170" "C01620" "C01454" "C03765" "C06224" "C07085" "C07086"
#> [9] "C07211" "C07215" "C10700" "C01179" "C01197" "C05350" "C12623" "C00009"
#> [17] "C05533" "C01127" "C05946"
#> Background compounds: 117
#> -----------------------------
#> Hypergeometric test: ready.
#> Top 15 p-values:
#> bdi00360 bdi00071 bdi00190 bdi00195 bdi00220 bdi00240
#> 0.3910288852 0.5136155834 0.5136155834 0.5136155834 0.5136155834 0.5136155834
#> bdi00260 bdi00261 bdi00270 bdi00310 bdi00330 bdi00340
#> 0.5136155834 0.5136155834 0.5136155834 0.5136155834 0.5136155834 0.5136155834
#> bdi00350 bdi00380 bdi00400
#> 0.5136155834 0.5136155834 0.5136155834
#>
#> -----------------------------
#> Heat diffusion: not performed
#> -----------------------------
#> PageRank: not performed