2 GORIC(A) weights benchmarks

The GORIC(A) weights benchmarks come from several percentiles of sets of GORIC(A) weights assuming that the specified set of population values is true (for the sample size under consideration). More specifically, the benchmarks are, currently, based on the 5th, 35th, 50th, 65th, and 95th percentiles of the GORIC(A) weights for the preferred hypothesis and of the ratios of the GORIC(A) weights for the preferred hypothesis versus the other hypotheses. Bear in mind that research (e.g., a simulation study) is needed to obtain more insight into how well these choices work.
Notably, you can a-priori decide on what percentiles you believe reflect the different types of support, and pre-register that together with your informative hypothesis/-es.

You can compare your GORIC(A) weight and/or ratios of GORIC(A) weights of the preferred hypothesis to the benchmarks to draw a conclusion regarding the strength of support for the hypothesis (given the assumed set of population parameter values and given the sample size). If the benchmarks show a maximum/bounded support (see below in Example 2), then there is support for the overlap of two or more hypotheses (which is also signaled by equal log-likelihood values). Otherwise, the GORIC(A) weights (ratios) benchmarks can be used to qualify the height of support (see below in Example 1).

2.1 How to use benchmarks

2.1.1 Labelling

We are not in favor of cut-off points (or ‘surrounding anchors’), but we need them when we want to label the height of support via the benchmarks, we propose the following:

Benchmark (percentile)	Height support
below 5th	no support
between 5th and 35th	low support
between 35th and 65th	medium support
between 65th and 95th	high (compelling) support
over 95th	very large (tremendous) support

We advise on using some kind of null model as the assumed population (possibly, using multiple ones). Then, you can see how extreme your finding is (or not) for this null population. The extremer your finding, the more support for your informative, theory-based, inequality-constrained hypothesis.

You can of course use other percentile levels (for which you think the finding is said to be extreme enough etc). But: Do make sure to define these before seeing the data, and preferably also pre-register them (together with your informative hypothesis/-es).

2.1.2 Use minimum effect

You may want to use a minimum effect. One option – the one we also advise, when it is doable – is to specify your hypothesis such that it inspects, for example, minimum differences between parameters (e.g., $\mu_1 - \mu_2 >$ 0.2 instead of $\mu_1 > \mu_2$, that is, $\mu_1 - \mu_2 >$ 0). Notably, one then possible does not need benchmarks anymore: Finding a ratio of GORIC(A) weights of 1 and higher would probably suffice.

A second option – although not something we advise on doing – is to investigate benchmarks using a minimum effect size (or looking at multiple ones). Then, you will compare your samples values with the distribution of (ratios of) GORIC(A) weights under a population with this minimum effect size.
In the function, you should specify the population parameter values (reflecting specific effect sizes). In case of an ANOVA model, you can also specify the effect size level(s) (for Cohen’s $f$). Note that the benchmarks differ when you assume different effect sizes or different population parameter estimates.

2.1.3 Sensitivity analysis

If of interest, as a sensitivity analysis, you can calculate the benchmarks for multiple sets of population parameter values (or population effect sizes). Note that this may also complicate drawing conclusions (especially when the assumed sets of population parameter values differ much, like when using multiple effect size heights). We advise on doing this for multiple null populations (setting some to all of the inequality restrictions to equalities).

If your preferred hypothesis does not have the highest fit and you want to inspect benchmarks, we also advise on inspecting multiple ratios of population parameter estimates, where some are in agreement with your hypothesis and others in agreement with the data.

2.1.4 Defaults

Once more, we advise on inspecting populations in which some of the inequalities of your hypothesis/-es of interest are set to equalities. This can also be very helpful if the log-likelihood values seem to be close: This way, you can also check for the support of a boundary hypothesis; as discussed below in Example 3.

By default, two populations are used: i) a null population in which the population effect size is – or population parameter values are – set to 0 and ii) a population based on the observed effect size (‘model_type = “means”’) – or observed parameter values (‘model_type = “asymp”’, the default). One can overrule this by using the pop_es or pop_est argument, respectively; as discussed below in Example 3.

2.1.5 General R code

# If ANOVA model:

# In practise you may want to increase the number of iter, say 2000, and use parallel computing
# future::plan(multisession, workers = 8) # windows machines
# future::plan(multicore, workers = 8) # unix machines
benchmarks_means <- benchmark(goric_object, model_type = "means", iter = 400)
benchmarks_means 
plot(benchmarks_means) 
# Use 'pop_es' to specify own null population(s).   
#
## If ANOVA or other model:
benchmarks_asymp <- benchmark(goric_object, iter = 400)
# 'model_type = "asymp"', the default
benchmarks_asymp
plot(benchmarks_asymp) 
# Use 'pop_est' to specify own null population(s).
# 
# print() options:
# output_type = c("rgw", "gw", "rlw", "ld", "all")
# hypo_rate_threshold = <number>
#
# plot() options:
# output_type = c("rgw", "gw", "rlw", "ld")
# percentiles = NULL 
# x_lim = c(<min>, <max>) 
# log_scale = FALSE/TRUE

2.2 Examples

Next, we will discuss two ANOVA examples. More specifically, we will inspect the case-specific benchmarks values (using the ratio of means as in the data). We will look at
- an ANOVA example where we evaluate $H_1: \mu_1 > \mu_2 > \mu_3$ versus its complement, and $H_1$ is true;
- an ANOVA example where we evaluate two overlapping hypotheses, namely $H_1: \mu_1 > \mu_2 > \mu_3$ and $H_2: \mu_1 > \mu_2, \ \mu_3$, together with the unconstrained, and $H_1$ is true (and thus the others are as well, but they are not the most parsimonious one).

Later on (in another section), we will also discuss the following example:
- an ANOVA example where we evaluate $H_1: \mu_1 > \mu_2 > \mu_3$ versus its complement, and the border $\mu_1 = \mu_2 > \mu_3$ is true. Notably, here, both hypotheses are true, $H_1$ is the most parsimonious one, and we want to conclude that the border is true.

For a description of interpreting GORIC(A) output, see ‘Guidelines_output_GORIC’ (https://github.com/rebeccakuiper/Tutorials).

2.2.1 Example 1 (ANOVA): $H_1$ vs its complement

# H1 vs complement (default) - H1 is true
H1 <- "D1 > D2 > D3" # mu1 > mu2 > mu3

# Apply GORIC #
set.seed(123) # set seed: to obtain same results when you re-run it
results_1c <- goric(fit, hypotheses = list(H1))
results_1c

## restriktor (0.6-30): generalized order-restricted information criterion: 
## 
## Results:
##         model    loglik  penalty    goric  loglik.weights  penalty.weights  goric.weights
## 1          H1  -155.075    2.833  315.816           0.704            0.697          0.845
## 2  complement  -155.939    3.667  319.212           0.296            0.303          0.155
## 
## Conclusion:
## The order-restricted hypothesis 'H1' has 5.46 times more support than its complement.

So, we find that $H_1: \mu_1 > \mu_2 > \mu_3$ is 0.85 $/$ 0.15 $\approx$ 5.46 times more supported than its complement. But how much is that? How can we label that amount of relative support (for this study)?

We can use benchmarks to label the amount of relative support found. The benchmark function will give (by default) the distribution of the relative support for our hypothesis assuming a population as in our data set (‘Observed’). The distribution is depicted in a plot, and the percentiles of that distribution are shown in a table. This provides more insight into the (un)certainty in the amount of relative support. Additionally, we can compare this distribution to other ones, assuming other populations. The default is to compare it to a null, ‘No-effect’ distribution, assuming that the parameters are the same (so, assuming a null population). This is demonstrated and explained next.

# Benchmarks based on null
set.seed(123) # set seed: to obtain same results when you re-run it
benchmarks_1c <- benchmark(results_1c, iter = 200)

## Calculating asymptotic benchmark for population estimates = No-effect 
## Calculating asymptotic benchmark for population estimates = Observed

#benchmarks_1c # use in R file
print(benchmarks_1c, color = FALSE) # use in Rmd file, since Rmd cannot deal with colored text

## 
## Benchmark Results 
## ---------------------------------------------------------------------- 
## Preferred Hypothesis: H1
## Error probability Preferred Hypothesis vs. Complement: 0.154
## Sample Size: 120
## Number of Parameters: 3
## Population Estimates (PE):
##               D1     D2     D3
## No-effect  0.628  0.628  0.628
## Observed   1.117  0.513  0.253
## 
## 
## =================================================================================== 
## Benchmark: Percentiles of Ratio-of-GORICA-weights for the Preferred Hypothesis 'H1' 
## ----------------------------------------------------------------------------------- 
## Population estimates = No-effect
##                   Sample 5%    35%   50%   65%   95%  
## H1 vs. complement 5.498  0.252 1.342 1.682 2.167 2.562
## 
## Population estimates = Observed
##                   Sample 5%    35%   50%   65%   95%    hypothesis_rate
## H1 vs. complement 5.498  2.223 3.238 4.387 7.237 51.841 0.995

#plot(benchmarks_1c) 
plot(benchmarks_1c, log_scale = T)

From the goric output, you can conclude i) that there is support for $H_1: \mu_1 > \mu_2 > \mu_3$ and ii) that $H_1$ is 0.85 $/$ 0.15 $\approx$ 5.46 times more supported than its complement. The probability that $H_1$ is not the best is 15.39% (namely, the goric.weight for the complement of $H_1$, which is also given in the benchmark output by $error_prob_pref_hypo, that is, the Error probability Preferred Hypothesis vs. complement). This already gives insight into the (un)certainty and, therefore, helps in qualifying the results. Additionally, the benchmarks can help:

Based on the benchmarks, you can check how plausible your finding is (given the assumed population parameters and given the sample size). If you want to compare your results with the situation in which one or more equalities hold true, use a null population (here, an effect size of 0, indicating that the three means are equal). Then, you obtain insight into how (un)likely / how extreme your finding (based on inequalities) is.¹

When assuming that there is no effect in the population (i.e., under the null), that is, all three group means are equal, the ratio of GORIC weights of $H_1$ versus its complement (i.e., 5.46) is larger than the 95th percentile (i.e., 2.56). Hence, our finding is very extreme if the null would be true. Using the table with cut-off values above, this indicates that there is very large (tremendous) support, when assuming no population effect size (given a group size of 40 for each of the 3 groups).

Additionally, when inspecting the distributions of the ratio of GORIC weights of $H_1$ versus its complement under the null population and under the population based on the observed effect size (see plot above), then we can see that they differ, that is, do not overlap much. This also indicates that our finding is very unlikely if the null would be true.

The output regarding the ‘Ratio-of-GORIC-weights’ also gives the hypothesis rate under the observed values (either the observed effect size (‘model_type = “means”’) or observed parameter estimates (‘model_type = “asymp”’, the default)). That is, it gives the proportion of time the hypothesis is chosen (when compared to the mentioned competing hypothesis; here, H1 vs. complement), assuming that the population values equal the observed values.
Here, the hypothesis rate is 0.995, meaning that, when the population effect size is 1.1170002, the number of times H1 is selected versus the competing one (here, H1 vs. complement is 99.5%. By default, a threshold of 1 is used, meaning that if the ratio of GORIC weights is greater than 1, H1 is selected. If you want to use another threshold, say 2, then use the following code:

hypo_rate_2 <- print(benchmarks_1c, hypo_rate_threshold = 2)

hypo_rate_2$hypothesis_rate[[2]]

## vs. complement 
##           0.97

Notably, the gray line in the plot denotes a ratio of GORIC(A) weights of 1, which means that the support for both hypothesis is equal. In that case, you would be indecisive. The reason to plot this line is to remember that a ratio of GORIC(A) weights ranges from 0 to infinity, where 0 to 1 denotes the same support as 1 to infinity for the opposite comparison (e.g., a support of 0.2 for $H_1$ versus $H_c$ is the same as a support of 1/0.2=5 for $H_c$ versus $H_1$.
One can, for visibility, apply a log10 transformation to the x-axis, by using `log_scale = TRUE’:

plot(benchmarks_1c, log_scale = TRUE)

Notably, the x-axis is transformed, not the ratio values themselves. In this way, the distribution of the ratios is probably less skewed, since ratios of 0 to 1 have the same space on the x-axis as 1 to infinity does.

Log-likelihood check:
Before inspecting the height of the support, one may want to establish whether there is support for the overlap or boundary of hypotheses. Since we evaluate an informative hypothesis $H_1$ versus its complement, we should check whether there is support for a boundary hypothesis (in which one or more inequalities in $H_1$ is replaced by an equality). For this, one should inspect the log-likelihood / fit values of the hypotheses. When these are close (i.e., the ratio of loglik weights is close to 1 or the difference in loglik values is close to 0), then there is support for (one of their) boundaries. In this case, the loglik values are -155.07 and -155.94, with corresponding loglik.weights of 0.7 and 0.3 (and thus a difference of approximately 0.86 and a ratio of approximately 2.37). Since it is hard to judge what is close, one can inspect the benchmarks for the ratio of log-likelihood (loglik) weights and/or for the differences in log-likelihood values. This should then be done for a null population in which such a boundary is true. We discuss this in the next section.
For now, we assume that the loglik values are not close.

Population information:
In the data generation, we used a ratio of population means of 3:2:1; implying that $H_1$ is correct. More specifically, we used population mean values of approximately 0.92, 0.61, and 0.31. This implies that Cohen’s $f$ is .25; thus, there is a medium population effect size (which are in the same order as hypothesized). We then sampled 40 observations for each of the three groups, ran an ANOVA (with three groups), and applied the GORIC. Note that Cohen (1992) suggest that a minimum group size of 52 is needed to find a medium effect when doing null hypothesis testing.
Notably, the sample/observed effect size is .411 (with sample means of 1.12, 0.51, and 0.25), which can be seen as a high effect size. This also explains why we, despite the medium population effect size, find tremendous support for our hypothesis.
When we would sample more observations, the GORIC(A) weight for $H_1$ converges to 1 (denoting full support for $H_1$). Note that the benchmarks for the GORIC(A) weight for $H_1$ will remain the same for a null population and will go to 1 for a non-null population. Note that the error probability then goes to 0, and that the ratio of GORIC(A) weights of $H_1$ versus its complement then goes to infinity.

2.2.1.1 Footnote

2.2.2 Example 2 (ANOVA): Overlapping hypotheses

# H1, H2, and unconstrained (default) - subset/overlap, that is, H1 is true
H1 <- "D1 > D2 > D3"               # mu1 > mu2 > mu3
H2 <- "D1 > D2" # H2: D1 > D2, D3  # mu1 > mu2,  mu3

# Apply GORIC #
set.seed(123) 
results_12u <- goric(fit, hypotheses = list(H1, H2))

results_12u

## restriktor (0.6-30): generalized order-restricted information criterion: 
## 
## Results:
##            model    loglik  penalty    goric  loglik.weights  penalty.weights  goric.weights  goric.weights_without_unc
## 1             H1  -155.075    2.833  315.816           0.333            0.548          0.548                      0.661
## 2             H2  -155.075    3.500  317.149           0.333            0.281          0.281                      0.339
## 3  unconstrained  -155.075    4.000  318.149           0.333            0.171          0.171

## 
## Conclusion:
## - The order-restricted hypothesis 'H1' is the best in the set, as it has the highest GORIC(A) weight.
## - Since 'H1' has a higher GORIC(A) weight than the unconstrained hypothesis, it is not considered weak. We can now inspect the relative support for 'H1' against the other order-restricted hypotheses:
##   * 'H1' is 1.948 times more supported than 'H2' (This relative support reached its maximum, see Note).

round(results_12u$ratio.gw, 3)

##               vs. H1 vs. H2 vs. unconstrained
## H1             1.000  1.948             3.211
## H2             0.513  1.000             1.649
## unconstrained  0.311  0.607             1.000

# Benchmarks
set.seed(123) 
benchmarks_12u <- benchmark(results_12u, model_type = "means", iter = 400)

## Calculating means benchmark for effect-size = 0 (No-effect)
## Calculating means benchmark for effect-size = 0.2381031 (Observed)

#benchmarks_12u # R file
print(benchmarks_12u, color = FALSE) # Rmd file

## 
## Benchmark Results 
## ---------------------------------------------------------------------- 
## Preferred Hypothesis: H1
## Error probability Preferred Hypothesis vs. Complement: 0.154
## Number of Groups: 3
## Group Sizes: 120, 120, 120
## Ratio of Population Means: 
## Population Effect-Sizes (Cohens f): 0.000, 0.238
## Observed Effect-Size (Cohens f): 0.238
## 
## 
## =================================================================================== 
## Benchmark: Percentiles of Ratio-of-GORICA-weights for the Preferred Hypothesis 'H1' 
## ----------------------------------------------------------------------------------- 
## Population effect-size = No-effect
##                      Sample 5%    35%   50%   65%   95%   hypothesis_rate
## H1 vs. H2            1.948  0.365 1.587 1.911 1.948 1.948 0.812          
## H1 vs. unconstrained 3.211  0.367 1.908 2.411 2.926 3.211 0.825          
## 
## Population effect-size = Observed
##                      Sample 5%    35%   50%   65%   95%   hypothesis_rate
## H1 vs. H2            1.948  1.910 1.948 1.948 1.948 1.948 0.998          
## H1 vs. unconstrained 3.211  3.147 3.211 3.211 3.211 3.211 0.998

#
# Plots of benchmarks
#plot(benchmarks_12u, output_type = "rgw") 
#plot(benchmarks_12u, output_type = "rgw", log_scale = TRUE)
#
#plot_out <- plot(benchmarks_12u) # save all plots in object plot_out
#plot(plot_out$grobs$`H1 vs. H2`) # call separate plot
#
## Plots of benchmarks with log10 transformation of x-axis
#plot_out_log <- plot(benchmarks_12u, output_type = "rgw", log_scale = TRUE) # save all plots in object plot_out -->
#plot(plot_out_log$grobs$`H1 vs. H2`) # call separate plot

#plot(benchmarks_12u, output_type = "rgw", log_scale = TRUE)

From the GORIC output, we can see that both hypotheses are not weak and that $H_1$ is the best. Additionally, we can see that the log-likelihood values are exactly the same, and thus there is support for the overlap of the hypotheses which is $H_1$ here (since $H_1$ is a subset of $H_2$).

If we would now check the GORIC(A) weights benchmarks (especially for non-nulls), we would find again that there is support for the overlap (here, $H_1$): Namely, the benchmarks for multiple percentiles (here, from the 35th percentile on) have the same value as our finding, that is, the sample value.

Notably, if there is support for the overlap or boundary, it is not meaningful to use the benchmarks to label the height of support for $H_1$, since the ratio of support reached its maximum. Hence, we do not proceed with labeling the height of support (by comparing the sample (ratio of) GORIC weights value(s) to the GORIC weights benchmarks under the null population). Instead, we conclude that there is support for the overlap.
If of interest, one can next inspect the support for the overlap, here $H_1$, versus its complement. When that does not result in support for a boundary, then the height of the support for $H_1$ versus its complement can be labelled using the benchmarks. This is then helpful information for future research: Generating a new theory and having another value to compare future results to.

Log-likelihood check:
Here, one clearly finds that the loglik / fit values are exactly the same. Hence, the ratio of loglik weights is exactly 1 and the difference in loglik values is exactly 0. Consequently, there is support for the overlap. In this case, we do not need to check the log-likelihood benchmarks.
Note that the log-likelihood benchmarks (under a null population) give insight into the distribution of loglik weights ratios and of the loglik differences in case some or all of the group means would be the same.

Population information:
In the data generation, we used a ratio of population means of 3:2:1; implying that $H_1$ is correct. More specifically, we used population mean values of approximately 0.92, 0.61, and 0.31. This implies that Cohen’s $f$ is .25; thus, there is a medium population effect size (which are in the same order as hypothesized). We then sampled 40 observations for each of the three groups, ran an ANOVA (with three groups), and applied the GORIC. Note that Cohen (1992) suggest that a minimum group size of 52 is needed to find a medium effect when doing null hypothesis testing. Notably, the sample/observed effect size is .411 (with sample means of 1.12, 0.51, and 0.25), which can be seen as a high effect size.
When we would sample more observations, it does not (really) affect the GORIC(A) weights for $H_1$, $H_2$, and the unconstrained: It will converge to the bounds (i.e., the maximum support) it can take on. The benchmarks for the GORIC(A) weights will also attain the maximum value as will the ratio of weights; and it will for each positive population effect size. For a null population, the GORIC(A) weight benchmarks will remain the same.

3 Log-likelihood benchmarks

Before inspecting the height of the support, one may want to establish whether there is support for the overlap / boundary of hypotheses (otherwise the height of the support may be bounded). Since we evaluate an informative hypothesis $H_1$ versus its complement, we should check whether there is support for a boundary hypothesis (in which one or more inequalities in $H_1$ is replaced by an equality). For this, one should inspect the log-likelihood / fit values of the hypotheses. When these are close (i.e., the ratio of loglik weights is close to 1 or the difference in loglik values is close to 0), then there is support for (one of their) boundaries.

Since it is hard to judge what is close, one can inspect the benchmarks for the ratio of log-likelihood (loglik) weights and for the differences in log-likelihood values. This should then be done for a population in which such a boundary is true.

Next, we will inspect two examples: one in which the border is true (Example 3) and one where it is not (Example 1 continued). We will first use a group size of 40, like in the example above; afterwards, we will inspect a higher sample size to obtain insight into the asymptotic properties of the loglik weights.

For now, we will use that the loglik values are said to be the same if the loglik differences are in between the 5th and 95th percentiles of the loglik benchmarks for a null population. You can of course use a narrower range to be more strict (preferably, pre-register this as well).

Remarks:
- The loglik benchmarks need to be more properly investigated.
- The benchmark function contains two types of loglik benchmarks:
* the loglik ratios (i.e., ratio of loglik weights; output_type = "rlw"), which can take on values between 0 and infinity, where 1 means that the loglik values are the same (denoting equal fit);
* the loglik differences (output_type = "ld"), which can take on values between minus infinity and infinity and where 0 means that the loglik values are the same (denoting equal fit);
- One could think about doing a likelihood ratio test (LRT), using a Chi-square distribution or, because of the inequality restrictions, a Chi-bar-square distribution. However, the LRT test is appropriate for nested models, while we are interested in models that are non-nested and even non-overlapping (except for the boundary).

3.1 Example 3 (ANOVA): Border is true

# H1 vs complement (default) - border (nl., mu1 = mu2 > mu3) is true
H1 <- "D1 > D2 > D3" # mu1 > mu2 > mu3

# Apply GORIC #
set.seed(123) 
results_1c_border <- goric(fit_border, hypotheses = list(H1))
results_1c_border

## restriktor (0.6-30): generalized order-restricted information criterion: 
## 
## Results:
##         model    loglik  penalty    goric  loglik.weights  penalty.weights  goric.weights
## 1          H1  -150.549    2.833  306.765           0.482            0.697          0.682
## 2  complement  -150.477    3.667  308.288           0.518            0.303          0.318
## 
## Conclusion:
## The order-restricted hypothesis 'H1' has 2.14 times more support than its complement.

Before we inspect the height of support for the preferred hypothesis, we should check whether there is support for the boundary/border of the two (non-overlapping) hypotheses. By eyeballing, we believe the log-likelihood values are close. To obtain better evidence for the closeness of the loglik values, we will use the log-likelihood benchmarks functions for several specifications of null populations.

Next, you find code for using the default; here, the default null population where the effect size is set to zero (and the default population based on the sample effect size):

#Default null (when using `model_type = "means"`)
# Loglik benchmarks based on default null / no effect sizes, that is,
# setting all three means equal in the population
set.seed(123) 
benchmarks_1c_border <- benchmark(results_1c_border, model_type = "means", iter = 400)

## Calculating means benchmark for effect-size = 0 (No-effect)
## Calculating means benchmark for effect-size = 0.2102195 (Observed)

# loglik diff
#print(benchmarks_1c_border, output_type = "ld") # in R file
print(benchmarks_1c_border, output_type = "ld", color = FALSE) # in Rmd file

## 
## Benchmark Results 
## ---------------------------------------------------------------------- 
## Preferred Hypothesis: H1
## Error probability Preferred Hypothesis vs. Complement: 0.318
## Number of Groups: 3
## Group Sizes: 120, 120, 120
## Ratio of Population Means: 
## Population Effect-Sizes (Cohens f): 0.000, 0.210
## Observed Effect-Size (Cohens f): 0.210
## 
## 
## ================================================================================================ 
## Benchmark: Percentiles of Differences in Log-likelihood Values for the Preferred Hypothesis 'H1' 
## ------------------------------------------------------------------------------------------------ 
## Population effect-size = No-effect
##                   Sample 5%     35%    50%    65%    95%  
## H1 vs. complement -0.072 -2.170 -0.546 -0.290 -0.095 0.048
## 
## Population effect-size = Observed
##                   Sample 5%     35%    50%    65%   95%  
## H1 vs. complement -0.072 -1.505 -0.257 -0.058 0.002 0.738

plot(benchmarks_1c_border, output_type = "ld")

# ratio loglik weights
#print(benchmarks_1c_border, output_type = "rlw") # in R file
print(benchmarks_1c_border, output_type = "rlw", color = FALSE) # in Rmd file

## 
## Benchmark Results 
## ---------------------------------------------------------------------- 
## Preferred Hypothesis: H1
## Error probability Preferred Hypothesis vs. Complement: 0.318
## Number of Groups: 3
## Group Sizes: 120, 120, 120
## Ratio of Population Means: 
## Population Effect-Sizes (Cohens f): 0.000, 0.210
## Observed Effect-Size (Cohens f): 0.210
## 
## 
## =========================================================================================== 
## Benchmark: Percentiles of Ratio-of-log-likelihood-weights for the Preferred Hypothesis 'H1' 
## ------------------------------------------------------------------------------------------- 
## Population effect-size = No-effect
##                   Sample 5%    35%   50%   65%   95%  
## H1 vs. complement 0.931  0.114 0.579 0.748 0.909 1.049
## 
## Population effect-size = Observed
##                   Sample 5%    35%   50%   65%   95%  
## H1 vs. complement 0.931  0.222 0.773 0.944 1.002 2.092

#plot(benchmarks_1c_border, output_type = "rlw", x_lim = c(0, 2.5))
plot(benchmarks_1c_border, output_type = "rlw", log_scale = TRUE)

Notably, 'log_scale = TRUE' can be helpful when looking at ratios of weights (either GORICA or log-likelihood weights). In that case, the x-axis is transformed (not the ratio values themselves). In this way, the distribution of the ratios is probably less skewed, since ratios of 0 to 1 have the same space on the x-axis as 1 to infinity does.

However, we feel it is more informative to inspect multiple null populations (based on setting some to all inequalities to equalities) and a population based on the sample:

# Specifying multiple null populations, that is,
# using all possibilities of setting inequalities to equalities.
# Here, we will use the default `model_type` (i.e., "asymp") which takes population parameter values (instead of population effect sizes) 
est <- coef(fit_border)
pop_est <- matrix(c(
                  mean(est[1:3]), mean(est[1:3]), mean(est[1:3]),
                  mean(est[1:2]), mean(est[1:2]), est[3],
                  mean(est[1:2]), est[2], mean(est[1:2]),
                  est[1], mean(est[2:3]), mean(est[2:3]),
                  
                  est[1], est[2], est[3]
                  ),
                  byrow = TRUE, ncol = length(est))
rownames(pop_est) <- c("PE_123eq", "PE_12eq", "PE_13eq", "PE_23eq", "Observed")
#
set.seed(123) 
benchmarks_1c_border_allpos <- benchmark(results_1c_border, pop_est = pop_est, iter = 200)

## Calculating asymptotic benchmark for population estimates = PE_123eq 
## Calculating asymptotic benchmark for population estimates = PE_12eq 
## Calculating asymptotic benchmark for population estimates = PE_13eq 
## Calculating asymptotic benchmark for population estimates = PE_23eq 
## Calculating asymptotic benchmark for population estimates = Observed

#
# loglik difference
#print(benchmarks_1c_border_allpos, output_type = "ld") # R file
print(benchmarks_1c_border_allpos, output_type = "ld", color = FALSE) # Rmd file

## 
## Benchmark Results 
## ---------------------------------------------------------------------- 
## Preferred Hypothesis: H1
## Error probability Preferred Hypothesis vs. Complement: 0.318
## Sample Size: 120
## Number of Parameters: 3
## Population Estimates (PE):
##              D1     D2     D3
## PE_123eq  0.696  0.696  0.696
## PE_12eq   0.913  0.913  0.264
## PE_13eq   0.913  0.949  0.913
## PE_23eq   0.877  0.606  0.606
## Observed  0.877  0.949  0.264
## 
## 
## ================================================================================================ 
## Benchmark: Percentiles of Differences in Log-likelihood Values for the Preferred Hypothesis 'H1' 
## ------------------------------------------------------------------------------------------------ 
## Population estimates = PE_123eq
##                   Sample 5%     35%    50%    65%    95%  
## H1 vs. complement -0.072 -2.212 -0.539 -0.314 -0.060 0.107
## 
## Population estimates = PE_12eq
##                   Sample 5%     35%    50%        65%   95%  
## H1 vs. complement -0.072 -1.405 -0.083 -8.426e-06 0.106 1.122
## 
## Population estimates = PE_13eq
##                   Sample 5%     35%    50%    65%    95%  
## H1 vs. complement -0.072 -2.262 -0.631 -0.288 -0.103 0.095
## 
## Population estimates = PE_23eq
##                   Sample 5%     35%    50%    65%   95%  
## H1 vs. complement -0.072 -1.050 -0.118 -0.014 0.021 0.469
## 
## Population estimates = Observed
##                   Sample 5%     35%    50%    65%   95%  
## H1 vs. complement -0.072 -1.611 -0.283 -0.043 0.002 0.603

plot(benchmarks_1c_border_allpos, output_type = "ld")

#plot(benchmarks_1c_border_allpos, output_type = "ld", x_lim = c(-.25,.25))
#
# ratio of loglik weights
#print(benchmarks_1c_border_allpos, output_type = "rlw") # R file
print(benchmarks_1c_border_allpos, output_type = "rlw", color = FALSE) # Rmd file

## 
## Benchmark Results 
## ---------------------------------------------------------------------- 
## Preferred Hypothesis: H1
## Error probability Preferred Hypothesis vs. Complement: 0.318
## Sample Size: 120
## Number of Parameters: 3
## Population Estimates (PE):
##              D1     D2     D3
## PE_123eq  0.696  0.696  0.696
## PE_12eq   0.913  0.913  0.264
## PE_13eq   0.913  0.949  0.913
## PE_23eq   0.877  0.606  0.606
## Observed  0.877  0.949  0.264
## 
## 
## =========================================================================================== 
## Benchmark: Percentiles of Ratio-of-log-likelihood-weights for the Preferred Hypothesis 'H1' 
## ------------------------------------------------------------------------------------------- 
## Population estimates = PE_123eq
##                   Sample 5%    35%   50%   65%   95%  
## H1 vs. complement 0.931  0.110 0.583 0.731 0.942 1.113
## 
## Population estimates = PE_12eq
##                   Sample 5%    35%   50%   65%   95%  
## H1 vs. complement 0.931  0.245 0.920 1.000 1.112 3.072
## 
## Population estimates = PE_13eq
##                   Sample 5%    35%   50%   65%   95%  
## H1 vs. complement 0.931  0.104 0.532 0.750 0.902 1.100
## 
## Population estimates = PE_23eq
##                   Sample 5%    35%   50%   65%   95%  
## H1 vs. complement 0.931  0.350 0.889 0.986 1.021 1.599
## 
## Population estimates = Observed
##                   Sample 5%    35%   50%   65%   95%  
## H1 vs. complement 0.931  0.200 0.754 0.958 1.002 1.828

#plot(benchmarks_1c_border_allpos, output_type = "rlw")
#plot(benchmarks_1c_border_allpos, output_type = "rlw", x_lim = c(0.950, 1.025))
plot(benchmarks_1c_border_allpos, output_type = "rlw", log_scale = TRUE)

In this example – as can be seen from both the tables and the plots – the difference in log-likelihood values and the ratio of log-likelihood weights lies between the 5th and the 95th percentile of the corresponding benchmarks (for all null populations). Based on this, we conclude that the log-likelihood values of $H_1$ and its complement are close (i.e., their loglik weights ratio is close to 1 and their difference is close to 0). Possibly, one can also conclude that there is a bit more support for a boundary hypothesis in which either the first two or the last means are the same (because the sample value is less extreme for those orderings, that is, the sample value is the closest to the 50th percentile of those null distributions).

Hence, we found that there is support for a boundary hypothesis (or even two). This means that we should not label the height of the support for $H_1$ versus its complement, because this is now bounded. Notably, one can still say that $H_1$ is the preferred hypothesis out of the set of $H_1$ and its complement. Additionally, one could evaluate one or more boundary hypothesis, to have extra information for future research.

Population information:
In the data generation, we used a ratio of population means of 2.5:2.5:1; implying that the boundary of $H_1$ and its complement is correct (and that $H_1$ is preferred over its complement, since it is more parsimonious). More specifically, we used population mean values of approximately 0.88, 0.88, and 0.35. This implies that Cohen’s $f$ is .25; thus, there is a medium population effect size. We then sampled 40 observations per group, ran an ANOVA (with three groups), and applied the GORIC. Notably, the sample/observed effect size is .318 (with sample means of 0.877, 0.949, and 0.264).
When we would sample more observations, the GORIC(A) weight for $H_1$ converges to 1 (denoting full support for $H_1$). Note that the error probability then goes to 0 and the the ratio of GORIC(A) weights of $H_1$ versus its complement then goes to infinity. Nevertheless, this is not of interest now, now we are interested in the closeness of log-likelihood values.

3.1.1 Higher sample size

# Now, group size is 200 (instead of 40)

# H1 vs complement (default) - border (nl., mu1 = mu2 > mu3) is true
H1 <- "D1 > D2 > D3" # mu1 > mu2 > mu3

# Apply GORIC #
set.seed(123) 
results_1c_border_200 <- goric(fit_border_200, hypotheses = list(H1))
results_1c_border_200

## restriktor (0.6-30): generalized order-restricted information criterion: 
## 
## Results:
##         model    loglik  penalty     goric  loglik.weights  penalty.weights  goric.weights
## 1          H1  -829.928    2.833  1665.522           0.509            0.697          0.704
## 2  complement  -829.962    3.667  1667.258           0.491            0.303          0.296
## 
## Conclusion:
## The order-restricted hypothesis 'H1' has 2.38 times more support than its complement.

We will also now check whether there is support for the border of the two (non-overlapping) hypotheses:

# Specifying multiple null populations, that is,
# using all possibilities of setting inequalities to equalities.
# Here, we will use the default `model_type` (i.e., "asymp") which takes population parameter values (instead of population effect sizes) 
est <- coef(fit_border_200)
pop_est <- matrix(c(
                  mean(est[1:3]), mean(est[1:3]), mean(est[1:3]),
                  mean(est[1:2]), mean(est[1:2]), est[3],
                  mean(est[1:2]), est[2], mean(est[1:2]),
                  est[1], mean(est[2:3]), mean(est[2:3]),
                  
                  est[1], est[2], est[3]
                  ),
                  byrow = TRUE, ncol = length(est))
rownames(pop_est) <- c("PE_123eq", "PE_12eq", "PE_13eq", "PE_23eq", "Observed")
#
set.seed(123) 
benchmarks_1c_border_allpos_200 <- benchmark(results_1c_border_200, pop_est = pop_est, iter = 200)

## Calculating asymptotic benchmark for population estimates = PE_123eq 
## Calculating asymptotic benchmark for population estimates = PE_12eq 
## Calculating asymptotic benchmark for population estimates = PE_13eq 
## Calculating asymptotic benchmark for population estimates = PE_23eq 
## Calculating asymptotic benchmark for population estimates = Observed

#
# loglik difference
#print(benchmarks_1c_border_allpos_200, output_type = "ld") # R file
print(benchmarks_1c_border_allpos_200, output_type = "ld", color = FALSE) # Rmd file

## 
## Benchmark Results 
## ---------------------------------------------------------------------- 
## Preferred Hypothesis: H1
## Error probability Preferred Hypothesis vs. Complement: 0.296
## Sample Size: 600
## Number of Parameters: 3
## Population Estimates (PE):
##              D1     D2     D3
## PE_123eq  0.729  0.729  0.729
## PE_12eq   0.874  0.874  0.440
## PE_13eq   0.874  0.861  0.874
## PE_23eq   0.886  0.650  0.650
## Observed  0.886  0.861  0.440
## 
## 
## ================================================================================================ 
## Benchmark: Percentiles of Differences in Log-likelihood Values for the Preferred Hypothesis 'H1' 
## ------------------------------------------------------------------------------------------------ 
## Population estimates = PE_123eq
##                   Sample 5%     35%    50%    65%    95%  
## H1 vs. complement 0.035  -2.212 -0.539 -0.314 -0.060 0.107
## 
## Population estimates = PE_12eq
##                   Sample 5%     35%    50%        65%   95%  
## H1 vs. complement 0.035  -1.405 -0.083 -8.426e-06 0.106 1.180
## 
## Population estimates = PE_13eq
##                   Sample 5%     35%    50%    65%    95%  
## H1 vs. complement 0.035  -2.268 -0.577 -0.285 -0.090 0.083
## 
## Population estimates = PE_23eq
##                   Sample 5%     35%    50%       65%   95%  
## H1 vs. complement 0.035  -1.024 -0.081 6.979e-05 0.073 1.004
## 
## Population estimates = Observed
##                   Sample 5%     35%    50%   65%   95%  
## H1 vs. complement 0.035  -0.665 -0.006 0.061 0.252 1.514

plot(benchmarks_1c_border_allpos_200, output_type = "ld") # x_lim = c(<min>, <max>)

#
# ratio of loglik weights
#print(benchmarks_1c_border_allpos_200, output_type = "rlw") # R file
print(benchmarks_1c_border_allpos_200, output_type = "rlw", color = FALSE) # Rmd file

## 
## Benchmark Results 
## ---------------------------------------------------------------------- 
## Preferred Hypothesis: H1
## Error probability Preferred Hypothesis vs. Complement: 0.296
## Sample Size: 600
## Number of Parameters: 3
## Population Estimates (PE):
##              D1     D2     D3
## PE_123eq  0.729  0.729  0.729
## PE_12eq   0.874  0.874  0.440
## PE_13eq   0.874  0.861  0.874
## PE_23eq   0.886  0.650  0.650
## Observed  0.886  0.861  0.440
## 
## 
## =========================================================================================== 
## Benchmark: Percentiles of Ratio-of-log-likelihood-weights for the Preferred Hypothesis 'H1' 
## ------------------------------------------------------------------------------------------- 
## Population estimates = PE_123eq
##                   Sample 5%    35%   50%   65%   95%  
## H1 vs. complement 1.035  0.110 0.583 0.731 0.942 1.113
## 
## Population estimates = PE_12eq
##                   Sample 5%    35%   50%   65%   95%  
## H1 vs. complement 1.035  0.245 0.920 1.000 1.112 3.257
## 
## Population estimates = PE_13eq
##                   Sample 5%    35%   50%   65%   95%  
## H1 vs. complement 1.035  0.104 0.562 0.752 0.914 1.086
## 
## Population estimates = PE_23eq
##                   Sample 5%    35%   50%   65%   95%  
## H1 vs. complement 1.035  0.359 0.922 1.000 1.075 2.729
## 
## Population estimates = Observed
##                   Sample 5%    35%   50%   65%   95%  
## H1 vs. complement 1.035  0.514 0.994 1.062 1.287 4.546

#plot(benchmarks_1c_border_allpos_200, output_type = "rlw") # x_lim = c(<min>, <max>)
plot(benchmarks_1c_border_allpos_200, output_type = "rlw", log_scale = T)

Also in this example, with a much higher sample size, the sample difference in log-likelihood values and the sample ratio of log-likelihood weights lies between the 5th and the 95th percentiles of the corresponding benchmarks (for all null populations). Based on this, we conclude that the log-likelihood values of $H_1$ and its complement are close (i.e., their loglik weights ratio is close to 1 and their loglik difference is close to 0). Additionally, one can argue that there is more support for a boundary hypothesis in which group means 1 and 2 are the same, since the sample value is the closest to the 50th percentile of that null distribution.

3.2 Example 1 (ANOVA) Ctd.

In this subsection, we look at Example 1 again, where we evaluate H1 $\leftarrow$ “D1 > D2 > D3” versus its complement, like in Example 3. In this example, $H_1$ is true in the population, while in Example 3 the truth is on the border.

In this example, the loglik values are -155.07 and -155.94, with corresponding loglik.weights of 0.7 and 0.3 (and thus a difference of approximately 0.86 and a ratio of approximately 2.37). Since it is hard to judge what is close, one can inspect the benchmarks for the ratio of log-likelihood (loglik) weights and for the differences in log-likelihood values:

# Specifying multiple null populations, that is,
# using all possibilities of setting inequalities to equalities.
# Here, we will use the default `model_type` (i.e., "asymp") which takes population parameter values (instead of population effect sizes) 
est <- coef(fit)
pop_est <- matrix(c(
                  mean(est[1:3]), mean(est[1:3]), mean(est[1:3]),
                  mean(est[1:2]), mean(est[1:2]), est[3],
                  mean(est[1:2]), est[2], mean(est[1:2]),
                  est[1], mean(est[2:3]), mean(est[2:3]),

                  est[1], est[2], est[3]
                  ),
                  byrow = TRUE, ncol = length(est))
rownames(pop_est) <- c("PE_123eq", "PE_12eq", "PE_13eq", "PE_23eq", "Observed")
#
set.seed(123) 
benchmarks_1c_allpos <- benchmark(results_1c, pop_est = pop_est, iter = 200)

## Calculating asymptotic benchmark for population estimates = PE_123eq 
## Calculating asymptotic benchmark for population estimates = PE_12eq 
## Calculating asymptotic benchmark for population estimates = PE_13eq 
## Calculating asymptotic benchmark for population estimates = PE_23eq 
## Calculating asymptotic benchmark for population estimates = Observed

#
# loglik difference
#print(benchmarks_1c_allpos, output_type = "ld") # R file
print(benchmarks_1c_allpos, output_type = "ld", color = FALSE) # Rmd file

## 
## Benchmark Results 
## ---------------------------------------------------------------------- 
## Preferred Hypothesis: H1
## Error probability Preferred Hypothesis vs. Complement: 0.154
## Sample Size: 120
## Number of Parameters: 3
## Population Estimates (PE):
##              D1     D2     D3
## PE_123eq  0.628  0.628  0.628
## PE_12eq   0.815  0.815  0.253
## PE_13eq   0.815  0.513  0.815
## PE_23eq   1.117  0.383  0.383
## Observed  1.117  0.513  0.253
## 
## 
## ================================================================================================ 
## Benchmark: Percentiles of Differences in Log-likelihood Values for the Preferred Hypothesis 'H1' 
## ------------------------------------------------------------------------------------------------ 
## Population estimates = PE_123eq
##                   Sample 5%     35%    50%    65%    95%  
## H1 vs. complement 0.871  -2.212 -0.539 -0.314 -0.060 0.107
## 
## Population estimates = PE_12eq
##                   Sample 5%     35%    50%        65%   95%  
## H1 vs. complement 0.871  -1.405 -0.083 -8.426e-06 0.101 1.078
## 
## Population estimates = PE_13eq
##                   Sample 5%     35%    50%    65%    95%   
## H1 vs. complement 0.871  -5.719 -1.975 -1.246 -0.753 -0.002
## 
## Population estimates = PE_23eq
##                   Sample 5%     35%    50%       65%   95%  
## H1 vs. complement 0.871  -1.024 -0.069 7.963e-04 0.100 1.159
## 
## Population estimates = Observed
##                   Sample 5%     35%   50%   65%   95%  
## H1 vs. complement 0.871  -0.064 0.333 0.868 1.352 2.989

plot(benchmarks_1c_border_allpos_200, output_type = "ld") # x_lim = c(<min>, <max>)

#
# ratio of loglik weights
#print(benchmarks_1c_allpos, output_type = "rlw") # R file
print(benchmarks_1c_allpos, output_type = "rlw", color = FALSE) # Rmd file

## 
## Benchmark Results 
## ---------------------------------------------------------------------- 
## Preferred Hypothesis: H1
## Error probability Preferred Hypothesis vs. Complement: 0.154
## Sample Size: 120
## Number of Parameters: 3
## Population Estimates (PE):
##              D1     D2     D3
## PE_123eq  0.628  0.628  0.628
## PE_12eq   0.815  0.815  0.253
## PE_13eq   0.815  0.513  0.815
## PE_23eq   1.117  0.383  0.383
## Observed  1.117  0.513  0.253
## 
## 
## =========================================================================================== 
## Benchmark: Percentiles of Ratio-of-log-likelihood-weights for the Preferred Hypothesis 'H1' 
## ------------------------------------------------------------------------------------------- 
## Population estimates = PE_123eq
##                   Sample 5%    35%   50%   65%   95%  
## H1 vs. complement 2.390  0.110 0.583 0.731 0.942 1.113
## 
## Population estimates = PE_12eq
##                   Sample 5%    35%   50%   65%   95%  
## H1 vs. complement 2.390  0.245 0.920 1.000 1.106 2.938
## 
## Population estimates = PE_13eq
##                   Sample 5%    35%   50%   65%   95%  
## H1 vs. complement 2.390  0.003 0.139 0.288 0.471 0.998
## 
## Population estimates = PE_23eq
##                   Sample 5%    35%   50%   65%   95%  
## H1 vs. complement 2.390  0.359 0.933 1.001 1.105 3.188
## 
## Population estimates = Observed
##                   Sample 5%    35%   50%   65%   95%   
## H1 vs. complement 2.390  0.938 1.395 2.381 3.865 19.893

#plot(benchmarks_1c_allpos, output_type = "rlw") # x_lim = c(<min>, <max>)
plot(benchmarks_1c_allpos, output_type = "rlw", log_scale = T)

In this example, the sample difference in log-likelihood values and the sample ratio of log-likelihood weights are larger than the 95th percentile of the benchmarks for the null distributions ‘PE_123eq’ and ‘PE_13eq’ (see the benchmarks output table). For the other two null distributions – ‘PE_12eq’ and ‘PE_23eq’, where the means of groups 1 and 2 and that of 2 and 3, respectively, are set equal – the sample value is in between the 65th and 95th percentile. When looking at the distributions, the sample value seems more plausible to come from the non-null population (‘Observed’). Nevertheless, it could also fit in the null distributions ‘PE_12eq’ and ‘PE_23eq’. Hence, we cannot rule out that the log-likelihood values are close, indicating support for the means of groups 1 and 2 or that of 2 and 3 being the same.

Next, we will inspect the case where we have a higher sample size.

3.2.1 Higher sample size

# Now, group size is 200 (instead of 40)

# H1 vs complement (default)
H1 <- "D1 > D2 > D3" # mu1 > mu2 > mu3

# Apply GORIC #
set.seed(123) 
results_1c_200 <- goric(fit_200, hypotheses = list(H1))
results_1c_200

## restriktor (0.6-30): generalized order-restricted information criterion: 
## 
## Results:
##         model    loglik  penalty     goric  loglik.weights  penalty.weights  goric.weights
## 1          H1  -829.928    2.833  1665.522           0.889            0.697          0.949
## 2  complement  -832.009    3.667  1671.351           0.111            0.303          0.051
## 
## Conclusion:
## The order-restricted hypothesis 'H1' has 18.43 times more support than its complement.

Also here, we will check whether there is support for the border of the two (non-overlapping) hypotheses:

# Specifying multiple null populations, that is,
# using all possibilities of setting inequalities to equalities.
# Here, we will use the default `model_type` (i.e., "asymp") which takes population parameter values (instead of population effect sizes) 
est <- coef(fit_200)
pop_est <- matrix(c(
                  mean(est[1:3]), mean(est[1:3]), mean(est[1:3]),
                  mean(est[1:2]), mean(est[1:2]), est[3],
                  mean(est[1:2]), est[2], mean(est[1:2]),
                  est[1], mean(est[2:3]), mean(est[2:3]),
                  
                  est[1], est[2], est[3]
                  ),
                  byrow = TRUE, ncol = length(est))
rownames(pop_est) <- c("PE_123eq", "PE_12eq", "PE_13eq", "PE_23eq", "Observed")
#
set.seed(123) 
benchmarks_1c_allpos_200 <- benchmark(results_1c_200, pop_est = pop_est, iter = 200)

## Calculating asymptotic benchmark for population estimates = PE_123eq 
## Calculating asymptotic benchmark for population estimates = PE_12eq 
## Calculating asymptotic benchmark for population estimates = PE_13eq 
## Calculating asymptotic benchmark for population estimates = PE_23eq 
## Calculating asymptotic benchmark for population estimates = Observed

#
# loglik difference
#print(benchmarks_1c_allpos, output_type = "ld") # R file
print(benchmarks_1c_allpos_200, output_type = "ld", color = FALSE) # Rmd file

## 
## Benchmark Results 
## ---------------------------------------------------------------------- 
## Preferred Hypothesis: H1
## Error probability Preferred Hypothesis vs. Complement: 0.051
## Sample Size: 600
## Number of Parameters: 3
## Population Estimates (PE):
##              D1     D2     D3
## PE_123eq  0.634  0.634  0.634
## PE_12eq   0.755  0.755  0.392
## PE_13eq   0.755  0.589  0.755
## PE_23eq   0.921  0.491  0.491
## Observed  0.921  0.589  0.392
## 
## 
## ================================================================================================ 
## Benchmark: Percentiles of Differences in Log-likelihood Values for the Preferred Hypothesis 'H1' 
## ------------------------------------------------------------------------------------------------ 
## Population estimates = PE_123eq
##                   Sample 5%     35%    50%    65%    95%  
## H1 vs. complement 2.088  -2.212 -0.539 -0.314 -0.060 0.107
## 
## Population estimates = PE_12eq
##                   Sample 5%     35%    50%        65%   95%  
## H1 vs. complement 2.088  -1.405 -0.083 -8.426e-06 0.106 1.149
## 
## Population estimates = PE_13eq
##                   Sample 5%     35%    50%    65%    95%   
## H1 vs. complement 2.088  -6.364 -2.361 -1.556 -0.984 -0.012
## 
## Population estimates = PE_23eq
##                   Sample 5%     35%    50%       65%   95%  
## H1 vs. complement 2.088  -1.024 -0.069 7.963e-04 0.100 1.280
## 
## Population estimates = Observed
##                   Sample 5%    35%   50%   65%   95%  
## H1 vs. complement 2.088  0.067 1.185 1.917 2.548 4.925

plot(benchmarks_1c_allpos_200, output_type = "ld") # x_lim = c(<min>, <max>)

#
# ratio of loglik weights
#print(benchmarks_1c_allpos_200, output_type = "rlw") # R file
print(benchmarks_1c_allpos_200, output_type = "rlw", color = FALSE) # Rmd file

## 
## Benchmark Results 
## ---------------------------------------------------------------------- 
## Preferred Hypothesis: H1
## Error probability Preferred Hypothesis vs. Complement: 0.051
## Sample Size: 600
## Number of Parameters: 3
## Population Estimates (PE):
##              D1     D2     D3
## PE_123eq  0.634  0.634  0.634
## PE_12eq   0.755  0.755  0.392
## PE_13eq   0.755  0.589  0.755
## PE_23eq   0.921  0.491  0.491
## Observed  0.921  0.589  0.392
## 
## 
## =========================================================================================== 
## Benchmark: Percentiles of Ratio-of-log-likelihood-weights for the Preferred Hypothesis 'H1' 
## ------------------------------------------------------------------------------------------- 
## Population estimates = PE_123eq
##                   Sample 5%    35%   50%   65%   95%  
## H1 vs. complement 8.070  0.110 0.583 0.731 0.942 1.113
## 
## Population estimates = PE_12eq
##                   Sample 5%    35%   50%   65%   95%  
## H1 vs. complement 8.070  0.245 0.920 1.000 1.112 3.156
## 
## Population estimates = PE_13eq
##                   Sample 5%    35%   50%   65%   95%  
## H1 vs. complement 8.070  0.002 0.094 0.211 0.374 0.988
## 
## Population estimates = PE_23eq
##                   Sample 5%    35%   50%   65%   95%  
## H1 vs. complement 8.070  0.359 0.933 1.001 1.105 3.596
## 
## Population estimates = Observed
##                   Sample 5%    35%   50%   65%    95%    
## H1 vs. complement 8.070  1.069 3.270 6.802 12.787 138.242

#plot(benchmarks_1c_allpos_200, output_type = "rlw") # x_lim = c(<min>, <max>)
plot(benchmarks_1c_allpos_200, output_type = "rlw", log_scale = TRUE)

In this example with a higher sample size, the sample difference in log-likelihood values and the sample ratio of log-likelihood weights are higher than the 95th percentile of the corresponding benchmarks (for all null populations; see the benchmarks output table). Based on this, we conclude that the log-likelihood values of $H_1$ and its complement are not close. Thus, there is no support for a boundary hypothesis, only for $H_1$. In that case, we can inspect the height of the support via the GORIC(A) benchmarks (as was done in a previous section).

Note that the benchmarks are based on the GORICA, not GORIC. Therefore, in case you used the GORIC, the reported sample value in the benchmark output may differ somewhat from your finding (in the goric output). For those interested: This has to do with using a different scaling of the covariance matrix; namely, dividing by N vs N-1.↩︎

Guidelines interpretation GORIC(A) benchmark output

Rebecca M. Kuiper and Leonard Vanbrabant

2026-02-05

1 Introduction