Modelling accessor methods — binaryComparisons • metabolyseR

Methods for accessing modelling results.

Usage

binaryComparisons(x, cls = "class")

# S4 method for AnalysisData
binaryComparisons(x, cls = "class")

mtry(x, cls = "class")

# S4 method for AnalysisData
mtry(x, cls = "class")

type(x)

# S4 method for RandomForest
type(x)

# S4 method for Univariate
type(x)

response(x)

# S4 method for RandomForest
response(x)

# S4 method for Univariate
response(x)

metrics(x)

# S4 method for RandomForest
metrics(x)

# S4 method for list
metrics(x)

# S4 method for Analysis
metrics(x)

predictions(x)

# S4 method for RandomForest
predictions(x)

# S4 method for list
predictions(x)

# S4 method for Analysis
predictions(x)

importanceMetrics(x)

# S4 method for RandomForest
importanceMetrics(x)

importance(x)

# S4 method for RandomForest
importance(x)

# S4 method for Univariate
importance(x)

# S4 method for list
importance(x)

# S4 method for Analysis
importance(x)

proximity(x, idx = NULL)

# S4 method for RandomForest
proximity(x, idx = NULL)

# S4 method for list
proximity(x, idx = NULL)

# S4 method for Analysis
proximity(x, idx = NULL)

explanatoryFeatures(x, ...)

# S4 method for Univariate
explanatoryFeatures(
  x,
  threshold = 0.05,
  value = c("adjusted.p.value", "p.value")
)

# S4 method for RandomForest
explanatoryFeatures(
  x,
  metric = "false_positive_rate",
  value = c("value", "p-value", "adjusted_p-value"),
  threshold = 0.05
)

# S4 method for list
explanatoryFeatures(x, ...)

# S4 method for Analysis
explanatoryFeatures(x, ...)

Arguments

x: S4 object of class AnalysisData,RandomForest, Univariate, Analysis or a list.
cls: sample information column to use
idx: sample information column to use for sample names. If NULL, the sample row number will be used. Sample names should be unique for each row of data.
...: arguments to parse to method for specific class
threshold: threshold below which explanatory features are extracted
value: the importance value to threshold. See the usage section for possible values for each class.
metric: importance metric for which to retrieve explanatory features

Methods

binaryComparisons: Return a vector of all possible binary comparisons for a given sample information column.
mtry: Return the default mtry random forest parameter value for a given sample information column.
type: Return the type of random forest analysis.
response: Return the response variable name used for a random forest analysis.
metrics: Retrieve the model performance metrics for a random forest analysis
predictions: Retrieve the out of bag model response predictions for a random forest analysis.
importanceMetrics: Retrieve the available feature importance metrics for a random forest analysis.
importance: Retrieve feature importance results.
proximity: Retrieve the random forest sample proximities.
explanatoryFeatures: Retrieve explanatory features.

Examples

library(metaboData)

d <- analysisData(abr1$neg[,200:300],abr1$fact)

## Return possible binary comparisons for the `day` response column
binaryComparisons(d,cls = 'day')
#>  [1] "1~2" "1~3" "1~4" "1~5" "1~H" "2~3" "2~4" "2~5" "2~H" "3~4" "3~5" "3~H"
#> [13] "4~5" "4~H" "5~H"

## Return the default random forest `mtry` parameter for the `day` response column
mtry(d,cls = 'day')
#> [1] 10

## Perform random forest analysis
rf_analysis <- randomForest(d,cls = 'day')

## Return the type of random forest
type(rf_analysis)
#> [1] "classification"

## Return the response variable name used
response(rf_analysis)
#> [1] "day"

## Retrieve the model performance metrics
metrics(rf_analysis)
#> # A tibble: 4 × 5
#>   response comparison  .metric  .estimator .estimate
#>   <chr>    <chr>       <chr>    <chr>          <dbl>
#> 1 day      1~2~3~4~5~H accuracy multiclass    0.567 
#> 2 day      1~2~3~4~5~H kap      multiclass    0.48  
#> 3 day      1~2~3~4~5~H margin   NA            0.0424
#> 4 day      1~2~3~4~5~H roc_auc  hand_till     0.886 

## Retrieve the out of bag model response predictions
predictions(rf_analysis)
#> # A tibble: 120 × 13
#>    response comparison rep   sample obs   pred  margin    `1`   `2`   `3`    `4`
#>    <chr>    <chr>      <chr>  <int> <fct> <fct> <marg>  <dbl> <dbl> <dbl>  <dbl>
#>  1 day      1~2~3~4~5… 1          1 2     2      0.05… 0.191  0.273 0.113 0.124 
#>  2 day      1~2~3~4~5… 1          2 3     2     -0.10… 0.0782 0.397 0.291 0.0838
#>  3 day      1~2~3~4~5… 1          3 4     4      0.02… 0.197  0.149 0.144 0.223 
#>  4 day      1~2~3~4~5… 1          4 1     H     -0.02… 0.210  0.2   0.210 0.0769
#>  5 day      1~2~3~4~5… 1          5 2     2      0.12… 0.0843 0.343 0.197 0.152 
#>  6 day      1~2~3~4~5… 1          6 1     H     -0.09… 0.283  0.174 0.114 0.0272
#>  7 day      1~2~3~4~5… 1          7 2     H     -0.17… 0.0788 0.241 0.217 0.0345
#>  8 day      1~2~3~4~5… 1          8 4     4      0.02… 0.075  0.16  0.27  0.295 
#>  9 day      1~2~3~4~5… 1          9 H     3     -0.10… 0.168  0.196 0.261 0.136 
#> 10 day      1~2~3~4~5… 1         10 H     1     -0.01… 0.306  0.167 0.144 0.0722
#> # ℹ 110 more rows
#> # ℹ 2 more variables: `5` <dbl>, H <dbl>

## Show the available feature importance metrics
importanceMetrics(rf_analysis)
#>  [1] "1"                    "2"                    "3"                   
#>  [4] "4"                    "5"                    "H"                   
#>  [7] "MeanDecreaseAccuracy" "MeanDecreaseGini"     "false_positive_rate" 
#> [10] "selection_frequency" 

## Retrieve the feature importance results
importance(rf_analysis)
#> # A tibble: 1,010 × 5
#>    response comparison  feature metric                  value
#>    <chr>    <chr>       <chr>   <chr>                   <dbl>
#>  1 day      1~2~3~4~5~H N200    1                    0       
#>  2 day      1~2~3~4~5~H N200    2                    0       
#>  3 day      1~2~3~4~5~H N200    3                    0       
#>  4 day      1~2~3~4~5~H N200    4                    0       
#>  5 day      1~2~3~4~5~H N200    5                    0       
#>  6 day      1~2~3~4~5~H N200    H                    0       
#>  7 day      1~2~3~4~5~H N200    MeanDecreaseAccuracy 0       
#>  8 day      1~2~3~4~5~H N200    MeanDecreaseGini     6.00e- 2
#>  9 day      1~2~3~4~5~H N200    false_positive_rate  2.35e-40
#> 10 day      1~2~3~4~5~H N200    selection_frequency  1.6 e+ 1
#> # ℹ 1,000 more rows

## Retrieve the sample proximities
proximity(rf_analysis)
#> # A tibble: 14,400 × 5
#>    response comparison  sample1 sample2 proximity
#>    <chr>    <chr>         <int>   <dbl>     <dbl>
#>  1 day      1~2~3~4~5~H       1       1    1     
#>  2 day      1~2~3~4~5~H       1       2    0.0704
#>  3 day      1~2~3~4~5~H       1       3    0.0580
#>  4 day      1~2~3~4~5~H       1       4    0.0930
#>  5 day      1~2~3~4~5~H       1       5    0.0556
#>  6 day      1~2~3~4~5~H       1       6    0.0435
#>  7 day      1~2~3~4~5~H       1       7    0.0556
#>  8 day      1~2~3~4~5~H       1       8    0.0441
#>  9 day      1~2~3~4~5~H       1       9    0.106 
#> 10 day      1~2~3~4~5~H       1      10    0     
#> # ℹ 14,390 more rows

## Retrieve the explanatory features
explanatoryFeatures(rf_analysis,metric = 'false_positive_rate',threshold = 0.05)
#> # A tibble: 35 × 5
#>    response comparison  feature metric                  value
#>    <chr>    <chr>       <chr>   <chr>                   <dbl>
#>  1 day      1~2~3~4~5~H N229    false_positive_rate 5.75e-129
#>  2 day      1~2~3~4~5~H N259    false_positive_rate 4.88e- 72
#>  3 day      1~2~3~4~5~H N277    false_positive_rate 3.98e- 67
#>  4 day      1~2~3~4~5~H N255    false_positive_rate 3.27e- 53
#>  5 day      1~2~3~4~5~H N213    false_positive_rate 4.92e- 45
#>  6 day      1~2~3~4~5~H N200    false_positive_rate 2.35e- 40
#>  7 day      1~2~3~4~5~H N221    false_positive_rate 1.80e- 38
#>  8 day      1~2~3~4~5~H N299    false_positive_rate 4.91e- 36
#>  9 day      1~2~3~4~5~H N245    false_positive_rate 9.75e- 27
#> 10 day      1~2~3~4~5~H N279    false_positive_rate 2.38e- 20
#> # ℹ 25 more rows