The information criteria GORIC and GORICA - referred to as GORIC(A) - are AIC-type information criteria and can evaluate one or more informative, theory-based hypotheses. Hence, you do not have to specify a null hypothesis nor (only) equality restrictions. You can (also) compare the size and/or ordering of mean parameters or of (standardized) regression-type parameters. You could, for example, evaluate the hypothesis \(H_m: \mu_1 > \mu_2 > \mu_3\) (a simple ordering of mean parameters) or \(H_m: \beta_1 - \beta_2 > \beta_3 - \beta_4\) (a possible representation of an interaction effect in terms of regression parameters). The goal of the GORIC(A) is to select the best from a set of candidate hypotheses/models.
The GORIC(A) is an information criterion and, therefore, balances fit and complexity. Fit denotes the compatibility of the hypothesis with the data, expressed by the maximum log likelihood part. Complexity reflects the size of the hypothesis in terms of (expected) number of parameters, expressed by the penalty part. Stated otherwise, GORIC(A) selects the hypothesis that describes the data best with the fewest number of parameters, out of a set of candidate hypotheses.
The hypothesis with the smallest GORIC(A) value is the preferred one in the set of candidate hypotheses; since that hypothesis then has the smallest (Kullback–Leibler) distance to the truth. The GORIC(A) values themselves are not interpretable; only the differences between the values can be inspected (i.e., the smaller, the better). To improve the interpretation, GORIC(A) weights can be computed.
A GORIC(A) weight (\(w_m\)) represent the relative likelihood of a hypothesis (\(H_m\)) given the data and the set of hypotheses. The hypothesis with the largest GORIC(A) weight is the preferred one in the set of candidate hypotheses. Additionally, the ratio of two GORICA weights (\(w_m/w_{m'}\)) can be used to quantify their relative support. This leads to conclusions like Hypothesis \(H_1\) is \(w_1/w_2\) more supported than the competing hypothesis \(H_2\).
The set of hypotheses should consist of the hypotheses of interest (reflecting one or more theories from the literature or ones based on expertise), possibly with a failsafe hypothesis (to prevent from selecting a hypothesis which is the best of the worst). I will distinguish different types of set of hypotheses: The first distinction is based on the number of informative hypotheses: one or more; The second distinction is based on the choice of safeguard hypothesis. In the next section, I will describe these type of sets and give guidelines for interpretation the results from each of them. I will also remark on how to specify your hypothesis such that the GORIC(A) output helps you even more. Subsequently, I will discuss two special cases situations one should be aware of:
The case of equal fit, which can happen in case of overlapping hypotheses. This is discussed together with finding support for the boundary of hypotheses;
The case of just below maximum fit, which can occur when there is at least one equality restriction in (one of) the hypothesis under investigation.
Consequently, be aware of specifying overlapping hypotheses and hypotheses containing one or more equality restrictions.
In general, the GORIC(A) output is interpreted as follows:
The hypothesis with the highest GORIC(A) weight is the preferred one: the best from the set.
The GORIC(A) weight of that hypothesis denotes its relative
strength in the set (given the data).
The GORIC(A) weights reflect the (un)certainty regarding that
hypothesis being the best - but be aware that it is conditional on
the other hypotheses in the set.
Additionally, one can compare the strength of that hypothesis versus a competing hypothesis using the ratio of their GORIC(A) weights.
As discussed later on, you need to inspect a bit more output to check for special cases (like support for overlap of the hypotheses of interest). Generally good advise - as will become clear later on - before interpreting the ratios of GORIC(A) weights, one needs to check the (ratio of) fit/loglik weights of the preferred hypotheses with that of the other hypotheses in the set. Additionally, you may want to take into account the sample size you have.
Next, I will go into more detail for each type of hypothesis set.
Here, I assume you have one informative hypothesis, that is, there are no competing hypotheses of interest. Then, I would advise you to evaluate it with its complement (an option in the software), as opposed to the unconstrained hypothesis. Nevertheless, I will describe both situations next.
The complement of a hypothesis denotes all the other possible hypotheses (i.e., all the other possible orderings); so, excluding the one of interest. The complement acts like a competing hypothesis. Either your (informative) hypothesis or its complement is the best and the ratio of their GORIC(A) weights denotes the relative strength.
Say, your hypothesis of interest has a GORIC(A) weight of \(w_m\) and, thus, its complement of \(w_c = 1 - w_m\). Then, your hypothesis has \(w_m/(1-w_m)\) more support than its complement. This relative support can go from (approximately) zero to infinity - the first denoting infinite support for the complement and the latter infinite support your hypothesis. When the ratio is close to 1, then both hypotheses are (approximately) equally good (in terms of balancing fit and complexity). Notably, the ratio of GORIC(A) weights together with other output can also denote support for their boundary, which is discussed in the Special cases Section.
When inspecting a hypothesis versus its complement, one can interpret \(1 - w_m = w_c\) as a error probability. That is, the probability that \(H_m\) is not the best one is \(w_c*100\%\). Note that in case of multiple hypotheses, especially in case of overlapping hypotheses, I would not advise on using this (conditional) error probability interpretation, since \(w_m\) and thus \(1-w_m\) depends on the other hypotheses in the set.
# H1 vs complement
H1 <- "D1 > D2 > D3" # Denoting mu1 > mu2 > mu3
# Apply GORIC #
set.seed(123)
results_1c <- goric(fit, hypotheses = list(H1 = H1), comparison = "complement")
results_1c
restriktor (0.6-10): generalized order-restricted information criterion:
Results:
model loglik penalty goric loglik.weights penalty.weights goric.weights
1 H1 -123.384 2.832 252.431 0.679 0.698 0.830
2 complement -124.133 3.668 255.601 0.321 0.302 0.170
Conclusion:
The order-restricted hypothesis 'H1' has 4.88 times more support than its complement.
From this, you can conclude that \(H_1:
\mu_1 > \mu_2 > \mu_3\) is \(.83/.17 \approx 4.88\) time more supported
than its complement.
Additionally, one could say that the probability that \(H_1\) is not the best is 17%.
Population information:
In the data generation, I used a ratio of population means of 3:2:1;
implying that \(H_1\) is correct. More
specifically, I used population mean values of approximately 0.98, 0.65,
and 0.33. This implies that Cohen’s \(d\) is approximately .27; thus, there is a
small to medium population effect size (which are in the same order as
hypothesized). I then sampled 100 observations, ran an ANOVA (with three
groups), and applied the GORIC.
When I would sample more observations, the GORIC(A) weight for \(H_1\) converges to 1 (denoting full support
for \(H_1\)).
Note: As will become clear in the Special cases Section, one should first check the ratio of fit/loglik weights (i.e. ratio of loglik.weights), since there could be support for the boundary of these two hypotheses. Since the loglik.weights differ (i.e., the ratio of loglik.weights is not close to 1), there is no support for the boundary of these hypotheses; only support for \(H_1\).
The unconstrained hypothesis denotes all the possible hypotheses (i.e., all the possible orderings); so, including the one of interest. Since it covers the hypothesis of interest, it cannot act like a competing hypothesis. The unconstrained can only be used as a failsafe:
If the unconstrained is better (i.e., has a higher weight), then your hypothesis of interest is weak, that is, it is not supported by the data.
If your hypothesis is better (i.e., has a higher
weight), then it is not weak. In this case, the
ratio of GORIC(A) weights is bounded (as described in the Example
section below) and should, therefore, not be interpreted. Stated
otherwise, in this case, the ratio of GORIC(A) weights will never go to
infinity; as it would do in the case of using the complement. Here, a
ratio of GORIC(A) weights equal to its bound denotes full support; even
when the sample size is small.
Therefore, I advise using the complement in case of one
informative hypothesis; because, then, the resulting ratio of GORIC(A)
weights is not bounded, and the weights reflect the uncertainty
there is (in case of a small sample size).
# H1 vs unconstrained (default)
H1 <- "D1 > D2 > D3" # mu1 > mu2 > mu3
# Apply GORIC #
set.seed(123)
results_1u <- goric(fit, hypotheses = list(H1 = H1))
results_1u
restriktor (0.6-10): generalized order-restricted information criterion:
Results:
model loglik penalty goric loglik.weights penalty.weights goric.weights
1 H1 -123.384 2.832 252.431 0.679 0.698 0.830
2 complement -124.133 3.668 255.601 0.321 0.302 0.170
Conclusion:
The order-restricted hypothesis 'H1' has 4.88 times more support than its complement.
From this, you can conclude that \(H_1: \mu_1 > \mu_2 > \mu_3\) is more supported than the unconstrained (since \(.763/.237 > 1\)) and is, therefore, not a weak hypothesis.
When you look at the fit/loglik value, you can see that they are the
same for both hypotheses (also reflected by the loglik.weights; leading
to a ratio of loglik.weights of 1). The unconstrained has, by
definition, the maximum fit. Here, \(H_1\) also has maximum fit, meaning that
its restrictions are in agreement with the data. Since both have the
same fit, their GORIC(A) weights solely depend on the penalty values
and, thus, are bounded: they equal the penalty.weights (and thus not 1
and 0, what you otherwise would find in case of full support). Finding
equal fit (or GORIC(A) weights being equal to the penalty weights) means
that there is support for the overlap (see also the Special cases
Section).
The informative hypothesis \(H_1\) is
completely contained in the unconstrained (by definition). Hence, the
overlap of the two hypotheses is \(H_1\). Therefore, we can say that \(H_1\) has (full) support: it has the
maximum fit and is (by definition) more parsimonious than the
unconstrained. Note that, now, we cannot say anything regarding the
uncertainty (which we could in the case of using the complement as the
failsafe).
Population information:
In the data generation, I used a ratio of population means of 3:2:1;
implying that \(H_1\) is correct. More
specifically, I used population mean values of approximately 0.98, 0.65,
and 0.33. This implies that Cohen’s \(d\) is approximately .27; thus, there is a
small to medium population effect size (which are in the same order as
hypothesized). I then sampled 100 observations, ran an ANOVA (with three
groups), and applied the GORIC.
When I would sample more observations, the GORIC(A) weight for \(H_1\) remains the same. When I would have
sampled less observations, the fit/loglik values may differ. Then, the
GORIC(A) weight of \(H_1\) is less than
the penalty weight of \(H_1\).
Note: In case of one informative hypothesis of interest, I would advise on using the complement instead (as done in the previous section). Then, the support for a true hypothesis increases with sample size (and/or effect size). Moreover, you have information regarding the uncertainty of your decision: In the example above, one would conclude that \(H_1\) has (full) support; while using the complement (in the previous example), one would conclude that \(H_1\) is preferred and that there is an uncertainty of 17%, namely a 17% chance that \(H_1\) is not the best.
Let us assume that the literature states two competing informative hypotheses. Then, a safeguard hypothesis is needed if these informative hypotheses do not cover the whole parameter space, that is, do not cover all possible theories/orderings. Namely, when both informative hypotheses are weak hypotheses, GORIC(A) selects the best out of a set of weak hypotheses (i.e., best of the worst). To refrain from this, one should then include a safeguard hypothesis.
The complement of a set of hypotheses denotes all the other possible hypotheses (i.e., all the other possible orderings); so, excluding the ones of interest. The complement acts like another competing hypothesis. One can also compare its strength to that of the hypotheses of interest.
As always, the hypothesis with the highest GORIC(A) weight is the preferred one.
If the best hypothesis is the complement of the set, then the
hypotheses of interest are weak. One should not compare the strength of
the hypotheses of interest (to avoid choosing the best from worst). Bear
in mind that one can, if that would be of interest, compare the strength
of the complement versus each of the hypotheses of interest by
inspecting their ratio of GORIC(A) weights.
Notably, one can take on an additional exploratory approach to create
one or more new hypotheses for future research.
If the best hypothesis is one of the hypotheses of interest, then
it is not weak and can be compared to the other hypothesis/-es of
interest (including the complement). Note that one can compare all the
non-weak hypotheses to all the hypotheses of interest.
Be aware of the special cases, as will be discussed in the Special cases
Section.
Important: This option is unfortunately not yet available in GORIC(A) software.
The unconstrained hypothesis denotes all the possible hypotheses (i.e., all the possible orderings); so, including the ones of interest. Since it covers the hypotheses of interest, it cannot act like a competing hypothesis. The unconstrained can only be used as a failsafe:
If the best hypothesis is the unconstrained is, then your
hypotheses of interest are weak, that is, they are not supported by the
data. One should not compare the strength of the hypotheses of interest
(to avoid choosing the best from worst).
Notably, one can take on an additional exploratory approach to create
one or more new hypotheses for future research.
If the best hypothesis is one of the hypotheses of interest, then
it is not weak and can be compared to the other hypothesis/-es of
interest. Note that one can compare all the non-weak hypotheses to all
the hypotheses of interest.
Be aware of the special cases, as will be discussed in the Special cases
Section.
# H1, H2, and unconstrained (default)
H1 <- "D1 > D2 > D3" # mu1 > mu2 > mu3
H2 <- "D1 < D2 < D3" # mu1 < mu2 < mu3
# Apply GORIC #
set.seed(123)
results_12u <- goric(fit, hypotheses = list(H1 = H1, H2 = H2))
results_12u
restriktor (0.6-10): generalized order-restricted information criterion:
Results:
model loglik penalty goric loglik.weights penalty.weights goric.weights goric.weights_without_unc
1 H1 -123.384 2.832 252.431 0.500 0.433 0.763 1.000
2 H2 -131.346 2.832 268.356 0.000 0.433 0.000 0.000
3 unconstrained -123.384 4.000 254.767 0.500 0.135 0.237
Conclusion:
- The order-restricted hypothesis 'H1' is the best in the set, as it has the highest GORIC(A) weight.
- Since 'H1' has a higher GORIC(A) weight than the unconstrained hypothesis, it is not considered weak. We can now inspect the relative support for 'H1' against the other order-restricted hypotheses:
* 'H1' is 2.871e+03 times more supported than 'H2'.
vs. H1 vs. H2 vs. unconstrained
H1 1.000 2871.230 3.216
H2 0.000 1.000 0.001
unconstrained 0.311 892.888 1.000
From this, you can conclude that \(H_1: \mu_1 > \mu_2 > \mu_3\) is not a weak hypothesis, since it is \(.763/.237 > 1\) times more supported than the unconstrained. Therefore, \(H_1\) can be compared to its competing hypothesis \(H_2: \mu_1 < \mu_2 < \mu_3\). Note that \(H_2\) is weak \((.000/.237 < 1\) or \(.000 < .237)\); so, we already know that \(H_1\) is better, but not yet how much better.
From the GORIC weights (or their ratios in results_12u$ratio.gw - H1
vs. H2), you can conclude that \(H_1\)
is .763/.000 = an infinite times (or 2871.23 times) more supported than
\(H_2\). This denotes (almost) full
support for \(H_1\).
In comparison, if you would calculate the GORIC(A) weights for the set
consisting of solely \(H_1\) and \(H_2\) (thus, without the unconstrained),
you would obtain GORIC weights of 1.000 and .000, respectively; denoting
full support for \(H_1\).
Population information:
In the data generation, I used a ratio of population means of 3:2:1;
implying that \(H_1\) is correct. More
specifically, I used population mean values of approximately 0.98, 0.65,
and 0.33. This implies that Cohen’s \(d\) is approximately .27; thus, there is a
small to medium population effect size (which are in the same order as
hypothesized in \(H_1\)). I then
sampled 100 observations, ran an ANOVA (with three groups), and applied
the GORIC.
When I would sample more observations, it does not affect the results in
this case. In case \(H_2\) did receive
some support (i.e., had a GORIC(A) weight larger than 0), increasing the
same size would lead to GORIC(A) weights as reported here, where the
ratio of GORIC(A) weights for \(H_1\)
and the unconstrained is fixed and the ratio of GORIC(A) weights for
\(H_1\) and \(H_2\) is infinite.
Note: As will become clear in the Special cases Section, one should first check the ratio of fit/loglik weights (i.e. ratio of loglik.weights) of the best hypothesis \(H_1\) and the other hypotheses, since there could be support for the boundary of \(H_1\) and one of the other hypotheses. Since the loglik.weights of \(H_1\) and its competing (non-overlapping) hypothesis \(H_2\) differ (i.e., their ratio of loglik.weights is not close to 1), there is no support for the boundary of these hypotheses; only support for \(H_1\).
Only if your hypotheses of interest cover all theories/orderings
(i.e., cover the whole parameter space), then you do not need a failsafe
hypothesis. Namely, the truth is covered in one or more
hypotheses.
Be ware of the special cases, like overlapping hypotheses, as will be
discussed in the Special cases Section.
# H1, H2, and unconstrained (default)
H1 <- "D1 > D2" # # => H1: D1 > D2, D3 # mu1 > mu2, mu3
H2 <- "D1 < D2" # # => H2: D1 < D2, D3 # mu1 < mu2, mu3
# Apply GORIC #
set.seed(123)
results_12 <- goric(fit, hypotheses = list(H1, H2), comparison = "none")
results_12
restriktor (0.6-10): generalized order-restricted information criterion:
Results:
model loglik penalty goric loglik.weights penalty.weights goric.weights
1 H1 -123.384 3.500 253.767 0.978 0.500 0.978
2 H2 -127.183 3.500 261.366 0.022 0.500 0.022
Conclusion:
The order-restricted hypothesis 'H1' has 44.68 times more support than 'H2'.
vs. H1 vs. H2
H1 1.000 44.682
H2 0.022 1.000
From this, you can conclude that \(H_1: \mu_1 > \mu_2, \ \mu_3\) is \(.978/.022\) \(\approx\) \(45\) times more supported than \(H_2: \mu_1 < \mu_2, \ \mu_3\).
Population information:
In the data generation, I used a ratio of population means of 3:2:1;
implying that \(H_1\) is correct. More
specifically, I used population mean values of approximately 0.98, 0.65,
and 0.33. This implies that Cohen’s \(d\) is approximately .27; thus, there is a
small to medium population effect size (which are in the same order as
hypothesized in \(H_1\)). I then
sampled 100 observations, ran an ANOVA (with three groups), and applied
the GORIC.
When I would sample more observations, the GORIC(A) weights of \(H_1\) and \(H_2\) will converge to 1 and 0,
respectively, leading to an infinite support for \(H_1\) versus \(H_2\); stated otherwise, leading to full
support for \(H_1\).
Note: As will become clear in the Special cases Section, one should first check the ratio of fit/loglik weights (i.e. ratio of loglik.weights) of the best hypothesis \(H_1\) and one of the other hypotheses, since there could be support for the boundary of \(H_1\) and the other hypotheses. Since the loglik.weights differ (i.e., the ratio of loglik.weights is not close to 1), there is no support for the boundary of \(H_1\) and \(H_2\); only support for \(H_1\).
A hypothesis is considered the best of the set if it has the highest GORIC(A) weight. In that case, all ratios of GORIC(A) weights of that hypothesis versus another one is 1 or higher. In case of decision making, you may want to pre-specify a cut-off value \(x\) for the ratio of GORIC(A) weights: if the ratio of GORIC(A) weights is larger than \(x\), then I am willing to choose (to make policy based on) this best hypothesis. It can be hard to pre-specify \(x\); unless there is already previous research perhaps. Therefore, my advise is to create the hypotheses in such a way that finding a ratio of 1 (or higher) is enough evidence for the preferred hypothesis. I would also advise against overlapping hypotheses and using equality restrictions; for more detail, see the special cases discussed in the Special cases Section.
For example, let us assume that we compare an outcome (e.g., a standardized level of happiness) for three types of treatments: a new treatment (A), the established treatment (B), and a placebo treatment (P). You could evaluate \(H_{1g}: \mu_A > \mu_B > \mu_P\) versus its complement \(H_c\). Now, it can be hard to decide to go for Treatment A when \(H_{1g}\) is 1 or perhaps 1.1 times more supported than \(H_c\). Then, it could be handy (if possible) to specify the hypothesis in such a way that a ratio of (just over) 1 makes you willing to choose for Treatment A. You could specify a minimum difference between Treatments A and B and additional state (as a check) that Treatment B (and then also A) does better than the placebo: \(H_{1m}: \mu_A - \mu_B > .1, \ \mu_B > \mu_P\).
Note: \(H_{1g}: \mu_A > \mu_B > \mu_P\) can be rewritten as \(H_{1g}: \mu_A - \mu_B > 0, \ \mu_B > \mu_P\). In \(H_{1m}\), the (minimum) difference/bound/threshold of 0 is replaced by .1, the minimum difference in treatments you would like to find.
# General hypothesis vs complement
H1g <- "D1 > D2 > D3" # mu1 > mu2 > mu3
# Apply GORIC #
set.seed(123)
results_12 <- goric(fit, hypotheses = list(H1g = H1g), comparison = "complement")
results_12
restriktor (0.6-10): generalized order-restricted information criterion:
Results:
model loglik penalty goric loglik.weights penalty.weights goric.weights
1 H1g -123.384 2.832 252.431 0.831 0.698 0.919
2 complement -124.978 3.668 257.292 0.169 0.302 0.081
Conclusion:
The order-restricted hypothesis 'H1g' has 11.36 times more support than its complement.
vs. H1g vs. complement
H1g 1.000 11.363
complement 0.088 1.000
From this, you can conclude that \(H_{1g}:
\mu_1 > \mu_2 > \mu_3\) is \(.919/.081 \approx 11.4\) times more
supported than its complement. Since the ratio is larger than 1, \(H_{1g}\) is the best.
[Note: Since the fit/loglik weights differ (i.e., the ratio of
loglik.weights is not close to 1), there is no support for the boundary
- as discussed in the Special cases Section - but solely for \(H_{1g}\).]
Sometimes you may want a minimum support for your hypothesis of interest. You could do this by pre-specifying a cut-off value for the ratio: when the ratio of GORIC(A) weights is higher than that value, then you select the hypothesis. Alternatively, you could make you hypothesis more specific, such that a ratio of 1 or more is sufficient to select the hypothesis:
# More specific hypothesis vs complement
H1m <- "D1 - D2 > .1, D2 > D3" # mu1 - mu2 > .1, mu2 > mu3
# Apply GORIC #
set.seed(123)
results_12 <- goric(fit, hypotheses = list(H1m = H1m), comparison = "complement")
results_12
restriktor (0.6-10): generalized order-restricted information criterion:
Results:
model loglik penalty goric loglik.weights penalty.weights goric.weights
1 H1m -123.384 2.832 252.431 0.736 0.698 0.865
2 complement -124.409 3.668 256.154 0.264 0.302 0.135
Conclusion:
The order-restricted hypothesis 'H1m' has 6.43 times more support than its complement.
vs. H1m vs. complement
H1m 1.000 6.433
complement 0.155 1.000
From this, you can conclude that \(H_{1m}:
\mu_1 - \mu_2 > .1, \ \mu_2 > \mu_3\) is \(.865/.135 \approx 6.4 > 1\) times more
supported than its complement. Since the difference in treatment means
is at least as large as pre-specified, you can now convincingly go for
Treatment A.
[Note: Since the fit/loglik weights differ (i.e., the ratio of
loglik.weights is not close to 1), there is no support for the boundary
- as discussed in the Special cases Section - but solely for \(H_{1m}\).]
Population information:
In the data generation, I used a ratio of population means of 3:2.5:1;
implying that both \(H_{1g}\) and \(H_{1m}\) are correct. More specifically, I
used population mean values of approximately 0.89, 0.74, and 0.30. This
implies that Cohen’s \(d\) is
approximately .25; thus, there is a small to medium population effect
size (in the same ordering as hypothesized). I then sampled 100
observations, ran an ANOVA (with three groups), and applied the
GORIC.
When I would sample more observations, the GORIC(A) weight of both \(H_{1g}\) and \(H_{1m}\) will converge to 1, reflecting
full support; and the ratio of GORIC(A) weights of the hypothesis of
interest versus its complement will go to infinity.
Notes:
When specifying a bound for a positive difference (e.g., the .1 in the example above), you should use the minimum difference you would like to find. If you use a higher bound/threshold then needed, the complexity does not change, while the fit decreases; which leads to less support for your hypothesis of interest.
In case you want to inspect an absolute difference, you can make use of the abs() function (e.g., \(|\beta_1| > |\beta_2|\) can be reflected in the code (using b1 and b2) via abs(b1) > abs(b2); or, when using bounds/thresholds, abs(b1) - abs(b2) > .1); as is explained in GORIC(A) tutorials.
You should be aware when some of the hypotheses overlap. Note that all hypotheses overlap with the unconstrained hypotheses (per definition). Also, competing hypotheses can overlap; e.g., \(\beta_1 > \beta_2, \ \beta_3\) and \(\beta_1 > \beta_2 > \beta_3\) overlap, since the latter is a subset / special case of the first.
When hypotheses overlap and the truth lies in this overlapping part (i.e., in the subset):
Then, the overlapping hypotheses share support, that is, they divide the support. Consequently, none of them will obtain full support (i.e., a GORIC(A) weight of 1); and their relative support (i.e., ratio of GORIC(A) weights) is even bounded. Bear in mind that the most parsimonious hypothesis will be the best one.
Then, the fit/loglik values of the overlapping hypotheses will be equal (i.e., the ratio of loglik.weights will be 1), that is, these overlapping hypotheses will have the same maximum log likelihood. In that case, the ratio of GORIC(A) weights will be solely based on the penalty part (and thus equal the so-called penalty weights).
Then, it does not make sense to interpret the ratio of GORIC(A)
weights of these overlapping hypotheses, you just know that there is
support for the overlap of the hypotheses.
If possible and if of interest, you can specify the overlap and evaluate
that versus its complement (to obtain the support for the overlapping
part). This can be of interest for future research. Notably, if the
overlap cannot easily be specified, one can alternatively inspect the
best hypothesis versus its complement.
Thus, when hypotheses have the same fit values (and thus when the ratio of their loglik.weights equal 1), you know that there is support for the overlap of the hypotheses. Note that, when one hypothesis is a subset of another, this implies support for the subset. Note further that, in some cases, this implies support for the boundary of hypotheses, as will be discussed in a section later on.
In contrast, when there is support for only one of the overlapping hypotheses, this implies support for the non-overlapping part. Bear in mind that the GORIC(A) weight itself addresses the support for the complete hypothesis, not the overlap. If possible and if of interest, you can specify the (non-)overlapping part and evaluate that versus its complement (to obtain the support for this (non-)overlapping part). This can be of interest for future research: You then create a more specific hypothesis or more adjusted set of hypotheses to be evaluated in future research which aids in theory building.
# H1, H2, and unconstrained (default) - subset true
H1 <- "D1 > D2 > D3" # mu1 > mu2 > mu3
H2 <- "D1 > D2" # H2: D1 > D2, D3 # mu1 > mu2, mu3
# Apply GORIC #
set.seed(123)
results_12u <- goric(fit, hypotheses = list(H1, H2))
results_12u
restriktor (0.6-10): generalized order-restricted information criterion:
Results:
model loglik penalty goric loglik.weights penalty.weights goric.weights goric.weights_without_unc
1 H1 -123.384 2.832 252.431 0.333 0.548 0.548 0.661
2 H2 -123.384 3.500 253.767 0.333 0.281 0.281 0.339
3 unconstrained -123.384 4.000 254.767 0.333 0.171 0.171
Conclusion:
- The order-restricted hypothesis 'H1' is the best in the set, as it has the highest GORIC(A) weight.
- Since 'H1' has a higher GORIC(A) weight than the unconstrained hypothesis, it is not considered weak. We can now inspect the relative support for 'H1' against the other order-restricted hypotheses:
* 'H1' is 1.950 times more supported than 'H2' (This relative support reached its maximum, see Note.).
vs. H1 vs. H2 vs. unconstrained
H1 1.000 1.950 3.216
H2 0.513 1.000 1.649
unconstrained 0.311 0.607 1.000
From this, you can conclude that \(H_1: \mu_1 > \mu_2 > \mu_3\) and \(H_2: \mu_1 > \mu_2, \ \mu_3\) are not weak, since they are \(.548/.171 > 1\) and \(.281/.171 > 1\), respectively, times more supported than the unconstrained. Therefore, \(H_1\) and \(H_2\) can be compared.
From the GORIC weights, we can conclude that \(H_1\) is the best. Before we interpret the ratio of GORIC weights (i.e., goric.weights), we need to check the fit/loglik values of \(H_1\) and \(H_2\). These are the same, that is, the ratio of loglik.weights is 1. Hence, the relative support (denoted by the GORIC weight) is bounded and, now, attains the maximum value. Thus, there is support for the overlap of the hypotheses, which is \(H_1\) here (since \(H_1\) is a subset of \(H_2\)).
You could choose to investigate the support for this overlap, by evaluating \(H_1\) versus its complement. We already did this in the first example subsection, there we found that \(H_1\) is 4.88 times more supported than its complement (with an error probability of 17%). For future research, you could advise to evaluate \(H_1\) versus its complement.
Population information:
In the data generation, I used a ratio of population means of 3:2:1;
implying that \(H_1\) is correct. More
specifically, I used population mean values of approximately 0.98, 0.65,
and 0.33. This implies that Cohen’s \(d\) is approximately .27; thus, there is a
small to medium population effect size (which are in the same order as
hypothesized). I then sampled 100 observations, ran an ANOVA (with three
groups), and applied the GORIC.
When I would sample more observations, it does not affect the GORIC(A)
weights for \(H_1\), \(H_2\), and the unconstrained.
# H1, H2, and unconstrained (default) - non-overlapping part true
H1 <- "D1 > D2 > D3" # mu1 > mu2 > mu3
H2 <- "D1 > D2" # H2: D1 > D2, D3 # mu1 > mu2, mu3
# Apply GORIC #
set.seed(123)
results_12u <- goric(fit, hypotheses = list(H1, H2))
results_12u
restriktor (0.6-10): generalized order-restricted information criterion:
Results:
model loglik penalty goric loglik.weights penalty.weights goric.weights goric.weights_without_unc
1 H1 -124.316 2.832 254.297 0.164 0.548 0.323 0.434
2 H2 -123.384 3.500 253.767 0.418 0.281 0.421 0.566
3 unconstrained -123.384 4.000 254.767 0.418 0.171 0.255
Conclusion:
- The order-restricted hypothesis 'H2' is the best in the set, as it has the highest GORIC(A) weight.
- Since 'H2' has a higher GORIC(A) weight than the unconstrained hypothesis, it is not considered weak. We can now inspect the relative support for 'H2' against the other order-restricted hypotheses:
* 'H2' is 1.303 times more supported than 'H1'.
vs. H1 vs. H2 vs. unconstrained
H1 1.000 0.767 1.265
H2 1.303 1.000 1.649
unconstrained 0.790 0.607 1.000
From this, you can conclude that \(H_1: \mu_1 > \mu_2 > \mu_3\) and \(H_2: \mu_1 > \mu_2, \ \mu_3\) are not weak, since since they are \(.323/.255 > 1\) and \(.421/.255 > 1\), respectively, times more supported than the unconstrained. Therefore, \(H_1\) and \(H_2\) can be compared.
From the GORIC weights, we can conclude that \(H_2\) is the best. Since the fit/loglik values of \(H_2\) and \(H_1\) are not the same (i.e., the ratio of loglik.weights is not close to 1), there is no support for their overlap (which would have been their boundary then; more details can be found in a section later on). From the GORIC weights goric.weights (or their ratios in results_12u$ratio.gw - H1 vs. H2), you can conclude that \(H_2\) is \(.421/.323\) times (or \(1.30\) times) more supported than \(H_1\). Notably, if you would calculate the GORIC(A) weight for the set consisting of solely \(H_1\) and \(H_2\), you would obtain weights of \(.323/(.323+.421) \approx .43\) and \(.57\) (denoting the same relative support of course).
Since we know/see that \(H_1\) and \(H_2\) overlap, we can take it one step further. The support for \(H_2\) indicates support for the non-overlapping part of \(H_2\) and \(H_1\). In this case, it is possible to specify the the non-overlapping part, namely \(\mu_1 > \mu_2 < \mu_3\). If of interest, you could evaluate that versus its complement (as done below). For future research, you could advise to evaluate this non-overlapping part versus its complement.
Population information:
In the data generation, I used a ratio of population means of 3:2:3;
implying that \(H_2\) is correct. More
specifically, I used population mean values of approximately 0.62, 0.41,
and 0.62. This implies that Cohen’s \(d\) is approximately .107; thus, there is a
small population effect size (in the same ordering as hypothesized in
\(H_1\)). I then sampled 100
observations, ran an ANOVA (with three groups), and applied the
GORIC.
When I would sample more observations, the ratio of GORIC(A) weights for
\(H_2\) versus \(H_1\) would go to infinity. Note that the
GORIC(A) weight of \(H_1\) would go to
0, but that of \(H_2\) not to 1 since
the unconstrained always obtains support (because it also includes \(H_2\) / the true ordering).
# non-overlapping part vs its complement
Hn <- "D1 > D2 < D3" # mu1 > mu2 < mu3
# Apply GORIC #
set.seed(123)
results_n <- goric(fit, hypotheses = list(Hn = Hn), comparison = "complement")
results_n
restriktor (0.6-10): generalized order-restricted information criterion:
Results:
model loglik penalty goric loglik.weights penalty.weights goric.weights
1 Hn -123.384 3.168 253.103 0.718 0.541 0.750
2 complement -124.316 3.332 255.297 0.282 0.459 0.250
Conclusion:
The order-restricted hypothesis 'Hn' has 2.99 times more support than its complement.
When hypotheses do not overlap, there can mathematically still be overlap, namely the boundary of these hypotheses. This is always the case when evaluating an informative hypothesis versus its complement, but can also be the case for other pairs of hypotheses. As a very simple example: The overlap/boundary of \(\beta_1 > 0\) and its complement (here, \(\beta_1 < 0\)) is \(\beta_1 = 0\).
Consequently, when non-overlapping hypotheses have (approximately) the same fit values (and thus when the ratio of loglik.weights (approximately) equal 1), you know that there is support for the boundary of these hypotheses.
# H1 vs complement
H1 <- "D1 > D2 > D3" # mu1 > mu2 > mu3
# Apply GORIC #
set.seed(123)
results_1c <- goric(fit, hypotheses = list(H1), comparison = "complement")
results_1c
restriktor (0.6-10): generalized order-restricted information criterion:
Results:
model loglik penalty goric loglik.weights penalty.weights goric.weights
1 H1 -123.450 2.832 252.564 0.483 0.698 0.683
2 complement -123.384 3.668 254.103 0.517 0.302 0.317
Conclusion:
The order-restricted hypothesis 'H1' has 2.16 times more support than its complement.
D1 D2 D3
0.9342210 0.4515234 0.5260562
From the GORIC weights, we can conclude that \(H_1\) is the best. Before interpreting the
ratio of GORIC(A) weights, one needs to check the fit/loglik values (or
the ratio of their weights). From the output, you can see that the fit
values of \(H_1\) and its complement
resemble (-123.450 vs -123.384) and, thus, their loglik.weights are
almost the same (.483 vs .517; i.e., their ratio is close to 1).
Likewise, you can see that the GORICA weights resemble the penalty
weights (for \(H_1\), .683 vs .698).
This indicates that there is support for the boundary of the hypotheses.
Bear in mind that the overlap of a hypothesis and its complement is the
boundary of them.
In this example, the boundaries are:
\(\mu_1 = \mu_2 > \mu_3\),
\(\mu_1 > \mu_2 = \mu_3\), and
\(\mu_1 = \mu_2 = \mu_3\).
Now, you can take two approaches: i) do an exploratory evaluation of
these hypotheses (as done below) or ii) inspect the sample means (by
looking at coef(fit)). Here, based on both approaches, you would
conclude that there is support for \(\mu_1
> \mu_2 = \mu_3\). This hypothesis can then be proposed to be
evaluated in future research.
If of interest, one can also evaluate this boundary hypothesis versus
its complement. Notably, such a boundary hypothesis contains at least
one equality (here, there is one), which you may want to rephrase as an
about-equality, as discusses in a next section; and exemplified in the
the follow-up analysis in a next section.
Population information:
In the data generation, I used a ratio of population means of 3:2:2;
implying that (the boundaries of) \(H_1\) and its complement are both correct
(where \(H_1\) is the most parsimonious
one). More specifically, I used population mean values of approximately
0.71, 0.482, and 0.48. This implies that Cohen’s \(d\) is approximately .11; thus, there is a
small population effect size. I then sampled 100 observations, ran an
ANOVA (with three groups), and applied the GORIC.
When I would sample more observations, the fit values will become more
closer and the GORIC(A) weights will converge to the penalty
weights.
# Check on three boundary hypotheses
Hb1 <- "D1 = D2 > D3" # mu1 = mu2 > mu3
Hb2 <- "D1 > D2 = D3" # mu1 > mu2 = mu3
Hb3 <- "D1 = D2 = D3" # mu1 = mu2 = mu3
# Apply GORIC #
set.seed(123)
results_b123 <- goric(fit, hypotheses = list(Hb1 = Hb1, Hb2 = Hb2, Hb3 = Hb3))
results_b123
restriktor (0.6-10): generalized order-restricted information criterion:
Results:
model loglik penalty goric loglik.weights penalty.weights goric.weights goric.weights_without_unc
1 Hb1 -126.132 2.500 257.263 0.031 0.258 0.050 0.060
2 Hb2 -123.450 2.500 251.900 0.458 0.258 0.725 0.876
3 Hb3 -126.570 2.000 257.139 0.020 0.426 0.053 0.064
4 unconstrained -123.384 4.000 254.767 0.490 0.058 0.173
Conclusion:
- The order-restricted hypothesis 'Hb2' is the best in the set, as it has the highest GORIC(A) weight.
- Since 'Hb2' has a higher GORIC(A) weight than the unconstrained hypothesis, it is not considered weak. We can now inspect the relative support for 'Hb2' against the other order-restricted hypotheses:
* 'Hb2' is 14.611 times more supported than 'Hb1'.
* 'Hb2' is 13.734 times more supported than 'Hb3'.
# H1, H2, and unconstrained (default) - non-overlapping part true
H1 <- "D1 > D2 > D3" # mu1 > mu2 > mu3
H2 <- "D1 > D2 < D3" # mu1 > mu2 < mu3
# Apply GORIC #
set.seed(123)
results_12u <- goric(fit, hypotheses = list(H1, H2))
results_12u
restriktor (0.6-10): generalized order-restricted information criterion:
Results:
model loglik penalty goric loglik.weights penalty.weights goric.weights goric.weights_without_unc
1 H1 -123.450 2.832 252.564 0.319 0.494 0.477 0.567
2 H2 -123.384 3.168 253.103 0.341 0.353 0.364 0.433
3 unconstrained -123.384 4.000 254.767 0.341 0.154 0.159
Conclusion:
- The order-restricted hypothesis 'H1' is the best in the set, as it has the highest GORIC(A) weight.
- Since 'H1' has a higher GORIC(A) weight than the unconstrained hypothesis, it is not considered weak. We can now inspect the relative support for 'H1' against the other order-restricted hypotheses:
* 'H1' is 1.310 times more supported than 'H2'.
vs. H1 vs. H2 vs. unconstrained
H1 1.000 1.310 3.009
H2 0.764 1.000 2.298
unconstrained 0.332 0.435 1.000
vs. H1 vs. H2 vs. unconstrained
H1 1.000 1.399 3.216
H2 0.715 1.000 2.298
unconstrained 0.311 0.435 1.000
From this, you can conclude that \(H_1: \mu_1 > \mu_2 > \mu_3\) and \(H_2: \mu_1 > \mu_2 < \mu_3\) are not weak, since since they are \(.477/.159 > 1\) and \(.364/.159 > 1\), respectively, times more supported than the unconstrained. Therefore, \(H_1\) and \(H_2\) can be compared.
From the GORIC weights, we can conclude that \(H_1\) is the best. When checking the
fit/loglik values (or the ratio of their weights), you can see that the
fit values of \(H_1\) and \(H_2\) resemble and, thus, their
loglik.weights are almost the same (i.e., the ratio of loglik.weights is
close to 1). Likewise, you can see that the GORICA weights of \(H_1\) vs \(H_2\) (in results_12u$ratio.gw - H1 vs. H2)
resemble the penalty weights of \(H_1\)
vs \(H_2\) (in results_12u$ratio.pw -
H1 vs. H2). This indicates that there is support for the overlap of
these hypotheses. In this case, the overlap means \(\mu_1 > \mu_2\) (i.e., the part \(H_1\) and \(H_2\) have in common) together with the
boundary of \(\mu_2 > \mu_3\) and
\(\mu_2 < \mu_3\), which is \(\mu_2 = \mu_3\); hence, \(\mu_1 > \mu_2 = \mu_3\). This hypothesis
can then be proposed to be evaluated in future research.
If of interest, one can evaluate this new hypothesis versus its
complement. Notably, this hypothesis contains an equality, which you may
want to rephrase as an about-equality, as discussed in a next section
and exemplified in the next subsection.
Population information:
In the data generation, I used a ratio of population means of 3:2:2;
implying that (the boundaries of) \(H_1\) and \(H2\) are both correct (where both
hypotheses are equally parsimonious). More specifically, I used
population mean values of approximately 0.71, 0.482, and 0.48. This
implies that Cohen’s \(d\) is
approximately .11; thus, there is a small population effect size. I then
sampled 100 observations, ran an ANOVA (with three groups), and applied
the GORIC.
When I would sample more observations, the fit values will become more
closer and the GORIC(A) weights will converge to the penalty
weights.
# overlap H1 and H2 versus its complement
Hb <- "D1 > D2, -.1 < D2 - D3 < .1" # mu1 > mu2, -.1 < mu2 - mu3 < .1
# Apply GORIC #
set.seed(123)
results_b <- goric(fit, hypotheses = list(Hb = Hb), comparison = "complement")
results_b
restriktor (0.6-10): generalized order-restricted information criterion:
Results:
model loglik penalty goric loglik.weights penalty.weights goric.weights
1 Hb -123.384 2.500 251.767 0.944 0.731 0.979
2 complement -126.207 3.500 259.413 0.056 0.269 0.021
Conclusion:
The order-restricted hypothesis 'Hb' has 45.75 times more support than its complement.
Note 1:
A ratio of GORIC(A) weights of 1 means that the hypotheses are equally
likely (in terms of balancing fit and complexity). Then, both hypotheses
are equally supported. This does not per se mean that there is support
for their overlap (or boundary). Only when the penalty values of the
hypotheses are the same, this coincides. For example, the penalty of
\(\beta_1 > 0\) equals that of its
complement (here, \(\beta_1 < 0\)),
thus their ratio of GORIC(A) weights will equal 1 when \(\beta_1 = 0\).
Bear in mind that when the ratio of GORIC(A) weights equals the ratio of
penalty weights (which does not need to be 1), then there is support for
the overlap (or boundary). Likewise, one can look for equal fit values
(reflected by a ratio of loglik.weights of 1), as done in all the
examples.
Note 2:
There are no cut-off values for fit values to resemble or for the ratio
of loglik.weights being close to 1 (only gut feelings). Since the fit
values depend on sample size, it is probably better to look the ratio of
loglik.weights. Additionally, it could help to check the estimates that
belong to those hypotheses (using, e.g., results_0c$ormle) and see if
they meaningfully differ.
Even if an equality restriction (e.g., \(\beta_1 = \beta_2\), that is, \(\beta_1 - \beta_2 = 0\)) is true in the population, the probability of finding this in your data is 0. In your data, the relationship will never be exactly equal. Therefore, the fit value (i.e., the maximum log likelihood) of a true hypothesis with equality restriction will never equate the maximum fit (i.e., the maximum log likelihood of the unconstrained), but a value (just) below this maximum. Consequently, a true hypothesis with equality restriction will never obtain a GORIC(A) weight of 1, but one lower than 1.
Hence, be aware of interpreting your results when there is at least one equality restriction in the set. Moreover, only include an equality restriction when you are really interested in it. If possible, I would advise you to specify an about-equality restriction instead, as discussed next (after the example).
# Equality versus its complement
H1 <- "D1 = D2 > 0, D3 > 0" # mu1 = mu2 > 0, mu3 > 0
# Apply GORIC #
set.seed(123)
results_0c <- goric(fit, hypotheses = list(H1), comparison = "complement")
results_0c
restriktor (0.6-10): generalized order-restricted information criterion:
Results:
model loglik penalty goric loglik.weights penalty.weights goric.weights
1 H1 -139.114 2.000 282.228 0.499 0.818 0.817
2 complement -139.111 3.500 285.222 0.501 0.182 0.183
Conclusion:
The order-restricted hypothesis 'H1' has 4.47 times more support than its complement.
vs. H1 vs. complement
H1 1.000 0.997
complement 1.003 1.000
vs. H1 vs. complement
H1 1.000 4.469
complement 0.224 1.000
From this, you can conclude that \(H_1: \mu_1 = \mu_2 > 0, \ \mu_3 > 0\) is more supported than its complement. When you look at the fit/loglik part, you see that both fit/loglik values resemble; that is, the ratio of loglik.weights is close to 1 (namely, .997). Thus, you can conclude that there is support for the overlap/boundary of these hypotheses.
Here, the complement receives the highest fit; that is, the estimates are in agreement with the complement. In this case, the fit of the complement is based on the fit for \(\mu_1 > 0, \mu_2 > 0, \mu_3 > 0\) (so, the hypothesis that does not restrict the first two means to be equal). Consequently, the preferred hypothesis \(H_1\) does not have the highest fit/loglik (it has the best fit-complexity balance). Notably, since \(H_1\) contains an equality, it will also never have maximum fit. Moreover, the weight for a true equality hypothesis will never be 1. This could be a reason to evaluate an about-equality hypothesis instead, as discussed in the next section.
Population information:
In the data generation, I used a ratio of population means of 3:3:1.5;
implying that (the boundaries of) \(H_1\) and its complement are both correct
(where \(H_1\) is the most parsimonious
one). More specifically, I used population mean values of approximately
0.70, 0.70, and 0.35. This implies that Cohen’s \(d\) is approximately .16; thus, there is a
small to medium population effect size. I then sampled 100 observations,
ran an ANOVA (with three groups), and applied the GORIC.
When I would sample more observations, the fit/loglik values of \(H_1\) and its complement will remain close
(with the highest value for the complement) and the results are
conclusion-wise the same; but, more importantly, the weight for \(H_1\) will not converge to 1.
Instead of equality restrictions (e.g., \(\beta_1 = \beta_2\), that is, \(\beta_1 - \beta_2 = 0\)) one can specify
about-equality restrictions (e.g., \(-0.2\) \(<\) \(\beta_1
- \beta_2\) \(<\) \(0.2\)). In this case, the hypothesis can
have a maximum fit and a GORIC(A) weight of 1.
In practice, it can be hard to specify such a range. Notably, when the
range is too broad, the hypothesis will per definition have maximum fit.
In addition, the broader the range, the lower the fit for the
complement. Thus, you should specify the range meaningfully; preferably,
based on literature and/or expertise or perhaps data-based.
If you use a data-based approach to
specify the range, it is best to use a training set
(when the sample size is large enough). In that case, one uses one part
of the data to come up with a range value for the about-equality
restriction and the other part of the data to evaluate the
about-equality restriction.
When the sample size is not large enough, you could for example use a
cut-off value from the literature together with the standard deviation
(i.e., the standard errors of the mean difference times the square root
of the sample size). Bear in mind that you use the data twice, so only
use this when creating a training set is not possible.
# Equality versus its complement
H1 <- "-0.2 < D1 - D2 < .2, D1 > 0, D2 > 0, D3 > 0"
# -0.2 < mu1 - mu2 < .2, mu1 > 0, mu2 > 0, mu3 > 0
# Apply GORIC #
set.seed(123)
results_12 <- goric(fit, hypotheses = list(H1), comparison = "complement")
results_12
restriktor (0.6-10): generalized order-restricted information criterion:
Results:
model loglik penalty goric loglik.weights penalty.weights goric.weights
1 H1 -139.111 2.000 282.222 0.926 0.818 0.983
2 complement -141.645 3.500 290.290 0.074 0.182 0.017
Conclusion:
The order-restricted hypothesis 'H1' has 56.48 times more support than its complement.
vs. H1 vs. complement
H1 1.000 12.602
complement 0.079 1.000
vs. H1 vs. complement
H1 1.000 56.479
complement 0.018 1.000
D1 D2 D3
0.8073380 0.8251897 0.3860030
From this, you can conclude that \(H_1:
-0.2 < \mu_1 - \mu_2 < .2,\) \(\mu_1 > 0\), \(\mu_2 > 0\), \(\mu_3 > 0\) is \(.908/.092 \approx 9.89\) times more
supported than its complement. Now, the fit/loglik value of this
about-equality hypothesis \(H_1\) is
the highest (while it will never be the case for an equality
hypothesis).
Note that the fit values of \(H_1\) and
its complement are not close: the ratio of loglik.weights is \(.573/.427 \approx 1.34\) (this ratio
changes with the choice of range).
Population information: In the data generation, I used a ratio of
population means of 3:3:1.5; implying that \(H_1\) is correct. More specifically, I used
population mean values of approximately 0.70, 0.70, and 0.35. This
implies that Cohen’s \(d\) is
approximately .16; thus, there is a small to medium population effect
size. I then sampled 100 observations, ran an ANOVA (with three groups),
and applied the GORIC.
When I would sample more observations, the ratio of loglik weights of
\(H_1\) and its complement will go to
infinity and, most importantly, the GORIC(A) weight of \(H_1\) will converge to 1.
The GORIC(A), like the AIC or any other information criterion,
balances fit and complexity. You are thus looking for a trade-off: The
hypothesis that describes the data best with the fewest number of
parameters. Consequently, the preferred/best hypothesis does not need to
have the highest fit - then, the difference in complexity outweighs the
difference in fit.
When the best hypothesis does not have maximum fit, you know that the
estimates (i.e., your data) are not fully in agreement with the
preferred hypothesis; they are in agreement with the hypothesis which
has maximum fit. Per definition, this scenario goes hand in hand with a
GORIC(A) weight for the preferred hypothesis being lower than 1.
In such cases, the sample size is not large enough to find convincing evidence. In such a case, the GORIC(A) weights will also denote this uncertainty, since none of the weights will be close to 1 (and perhaps the weights will even be close together).
When you encounter such a situation, you may want to additionally
take on an additional exploratory approach to come up with a few
hypotheses for future research (by looking at sample means/estimates
taking into account what theoretically makes sense or by specifying some
plausible hypotheses which theoretically make sense). Then, in future
research, one can evaluate the new set of hypotheses with a larger
sample size.
When you believe the lower fit is solely because of a small sample or
when you cannot create other theoretically meaningful hypotheses, you
should advise to inspect the same hypothesis with a larger sample size
in future research.
# Equality versus its complement
H1 <- "D1 > D2 > D3" # mu1 > mu2 > mu3
# Apply GORIC #
set.seed(123)
results_2c <- goric(fit, hypotheses = list(H1 = H1), comparison = "complement")
results_2c
restriktor (0.6-10): generalized order-restricted information criterion:
Results:
model loglik penalty goric loglik.weights penalty.weights goric.weights
1 H1 -120.398 2.832 246.459 0.419 0.698 0.625
2 complement -120.072 3.668 247.481 0.581 0.302 0.375
Conclusion:
The order-restricted hypothesis 'H1' has 1.67 times more support than its complement.
vs. H1 vs. complement
H1 1.0 1.667
complement 0.6 1.000
Here, we see that our informative hypothesis \(H_1\) is the preferred one, but it does not have the best fit (ratio of loglik.weights is \(.419 / .581\) \(\approx\) \(.72\) \(<\) \(1\)).
Based on the sample size (\(N=100\), that is, approximately 33 observations per group), one could decide to leave the hypothesis like it is and only advise to increase the sample size in future research. Alternatively, you could also go for an additional exploratory approach to come up with one or a few competing hypotheses for future research.
Population information:
In the data generation, I used a ratio of population means of
1.7:1.6:1.5; implying that \(H_1\) is
correct. More specifically, I used population mean values of
approximately 0.57, 0.53, and 0.50. This implies that Cohen’s \(d\) is approximately .03; thus, there is a
very small population effect size. I then sampled 100 observations, ran
an ANOVA (with three groups), and applied the GORIC.
Notably, I sampled multiple times 100 observations (by using a different
seed value every time) until I found this situation: Support for the
correct \(H_1\), but \(H_1\) does not have highest fit.
When I would sample more observations, the GORIC(A) weight for \(H_1\) converges to 1 (denoting full support
for \(H_1\)).
# Equality versus its complement
H1 <- "D1 - D2 > .5, D2 > D3" # mu1 - mu2 > .5, mu2 > mu3
# Apply GORIC #
set.seed(123)
results_3c <- goric(fit, hypotheses = list(H1 = H1), comparison = "complement")
results_3c
restriktor (0.6-10): generalized order-restricted information criterion:
Results:
model loglik penalty goric loglik.weights penalty.weights goric.weights
1 H1 -139.436 2.832 284.536 0.419 0.698 0.625
2 complement -139.111 3.668 285.558 0.581 0.302 0.375
Conclusion:
The order-restricted hypothesis 'H1' has 1.67 times more support than its complement.
vs. H1 vs. complement
H1 1.0 1.667
complement 0.6 1.000
Here, we see that our informative hypothesis \(H_1\) is the preferred one, but it does not have the best fit (ratio of loglik.weights is \(.419 / .581\) \(\approx\) \(.72\) \(<\) \(1\)).
Based on your sample size, you could decide to leave the hypothesis like it is and only advise to increase the sample size in future research. Alternatively, you could also go for an additional exploratory approach to come up with one or a few competing hypotheses for future research.
Population information:
In the data generation, I used a ratio of population means of 3:2:1 and
population mean values of approximately 0.98, 0.65, and 0.33. Note that
Cohen’s \(d\) is approximately .27;
thus, there is a small to medium population effect size. Note further
that the difference between the first two population means is
approximately \(.33 < .5\). Hence,
\(H_1\) is incorrect (and, thus, its
complement is correct). I then sampled 100 observations, ran an ANOVA
(with three groups), and applied the GORIC.
Notably, I sampled two times 100 observations (by using a different seed
value) to find this situation: Support for the incorrect \(H_1\), and the incorrect \(H_1\) does not have the highest fit.
When I would sample more observations, the GORIC(A) weight for \(H_1\) converges to 0 (denoting no support
for \(H_1\)).
When the preferred hypothesis does not have the highest fit, this is an indication for having a too small sample - whether the hypothesis is correct or not. In case of a small sample, the GORIC(A) weights will also reflect this uncertainty - which can also be seen (indirectly) at the end of the last section, where I inspect cases for various sample sizes.
Let me also repeat the following:
There are no cut-off values for fit values to resemble or for the ratio
of loglik.weights being close to 1 (only gut feelings). Since the fit
values depend on sample size, it is probably better to look the ratio of
loglik.weights. Additionally, it could help to check the estimates that
belong to those hypotheses (using, e.g., results_0c$ormle) and see if
they meaningfully differ.
All of the above also holds true for model selection using Bayes factors (BFs; comparable to ratio of GORIC(A) weights) and posterior model probabilities (PMPs; comparable to the GORIC(A) weights). However, there is one extra element to take into account in case of Bayesian model selection: prior sensitivity in case of at least one equality restriction. In the R package bain, one can do this by checking the results for multiple so-called fraction values; as shown in the first example below.
In terms of conclusions (i.e., there is support for \(H_1\) or not), the output of GORIC(A) and bain are (often if not always) the same. In terms of the amount of support, bain renders more support for hypotheses with equality restrictions than the GORIC(A) does. This is the case when the equality restrictions are correct but also when they are incorrect; where the first is shown in the first example below and the latter in the second example below.
# Equality versus its complement
H1 <- "D1 = D2 > 0 & D3 > 0"
# Apply bain
set.seed(123)
bain1_1 <- bain(fit, H1)
bain1_1
Bayesian informative hypothesis testing for an object of class lm (ANOVA):
Fit Com BF.u BF.c PMPa PMPb PMPc
H1 1.628 0.071 22.800 22.800 1.000 0.958 0.958
Hu 0.042
Hc 0.042
Hypotheses:
H1: D1=D2>0&D3>0
Note: BF.u denotes the Bayes factor of the hypothesis at hand versus the unconstrained hypothesis Hu. BF.c denotes the Bayes factor of the hypothesis at hand versus its complement. PMPa contains the posterior model probabilities of the hypotheses specified. PMPb adds Hu, the unconstrained hypothesis. PMPc adds Hc, the complement of the union of the hypotheses specified.
Bayesian informative hypothesis testing for an object of class lm (ANOVA):
Fit Com BF.u BF.c PMPa PMPb PMPc
H1 1.628 0.101 16.122 16.122 1.000 0.942 0.942
Hu 0.058
Hc 0.058
Hypotheses:
H1: D1=D2>0&D3>0
Note: BF.u denotes the Bayes factor of the hypothesis at hand versus the unconstrained hypothesis Hu. BF.c denotes the Bayes factor of the hypothesis at hand versus its complement. PMPa contains the posterior model probabilities of the hypotheses specified. PMPb adds Hu, the unconstrained hypothesis. PMPc adds Hc, the complement of the union of the hypotheses specified.
Bayesian informative hypothesis testing for an object of class lm (ANOVA):
Fit Com BF.u BF.c PMPa PMPb PMPc
H1 1.628 0.124 13.164 13.164 1.000 0.929 0.929
Hu 0.071
Hc 0.071
Hypotheses:
H1: D1=D2>0&D3>0
Note: BF.u denotes the Bayes factor of the hypothesis at hand versus the unconstrained hypothesis Hu. BF.c denotes the Bayes factor of the hypothesis at hand versus its complement. PMPa contains the posterior model probabilities of the hypotheses specified. PMPb adds Hu, the unconstrained hypothesis. PMPc adds Hc, the complement of the union of the hypotheses specified.
From this, you can see (from BF.c) that the equality hypothesis \(H_1\) is approximately \(23\), \(16\), and \(13\) times more supported than its complement when using a fraction of \(1\), \(2\), and \(3\), respectively.
Population information:
In the data generation, I used a ratio of population means of 3:3:1.5;
implying that \(H_1\) is correct. More
specifically, I used population mean values of approximately 0.70, 0.70,
and 0.35. This implies that Cohen’s \(d\) is approximately .16; thus, there is a
small to medium population effect size. I then sampled 100 observations,
ran an ANOVA (with three groups), and applied bain.
Comparison bain and GORIC(A):
In terms of conclusions (i.e., there is support for \(H_1\) or not), the output of GORIC(A) and
bain are the same. In terms of the amount of support, bain renders more
support for equality restrictions than the GORIC(A). In this example,
this is good property, but: bain rendering more support for equality
restrictions happens both when the equality restrictions are
true/correct and when they are incorrect; as will be shown in the
examples below. Bear in mind that I advise on using about-equality
restrictions instead of equality restrictions.
# Equality versus its complement
H1 <- "D1 = D2 > 0 & D3 > 0"
# Apply bain
set.seed(123)
bain1_1 <- bain(fit, H1)
bain1_1
Bayesian informative hypothesis testing for an object of class lm (ANOVA):
Fit Com BF.u BF.c PMPa PMPb PMPc
H1 1.393 0.071 19.504 19.504 1.000 0.951 0.951
Hu 0.049
Hc 0.049
Hypotheses:
H1: D1=D2>0&D3>0
Note: BF.u denotes the Bayes factor of the hypothesis at hand versus the unconstrained hypothesis Hu. BF.c denotes the Bayes factor of the hypothesis at hand versus its complement. PMPa contains the posterior model probabilities of the hypotheses specified. PMPb adds Hu, the unconstrained hypothesis. PMPc adds Hc, the complement of the union of the hypotheses specified.
# Apply GORIC #
set.seed(123)
results_12 <- goric(fit, hypotheses = list(H1), comparison = "complement")
results_12
restriktor (0.6-10): generalized order-restricted information criterion:
Results:
model loglik penalty goric loglik.weights penalty.weights goric.weights
1 H1 -139.260 2.000 282.520 0.463 0.818 0.794
2 complement -139.111 3.500 285.222 0.537 0.182 0.206
Conclusion:
The order-restricted hypothesis 'H1' has 3.86 times more support than its complement.
From the bain output, you can see (from BF.c) that the incorrect equality hypothesis \(H_1\) is approximately \(19.5\) times more supported than its complement (when using a fraction of \(1\)). From the GORIC output, you can see that the incorrect equality hypothesis \(H_1\) is approximately \(3.8\) times more supported than its complement.
Population information:
In the data generation, I used a ratio of population means of 3:2.5:1;
implying that \(H_1\) is correct. More
specifically, I used population mean values of approximately 0.89, 0.74,
and 0.30. This implies that Cohen’s \(d\) is approximately .25; thus, there is a
small to medium population effect size. I then sampled 100 observations,
ran an ANOVA (with three groups), and applied bain and the GORIC.
When I would sample more observations, the support for \(H_1\) will go to 0, for both methods; thus,
rendering full support for the complement. Note that the GORIC(A) will
conclude this sooner (i.e., for smaller samples) than bain will:
Below, you can find output for a sample size (\(N\)) of 1000 and 10000. In case of \(N = 1000\), GORIC already shows support for the correct complement: The complement is \(.825/.175\) \(\approx\) \(4.71\) (or \(1/.212\)) times more supported than \(H_1\); while bain still favors the incorrect equality restriction: \(H_1\) is \(.776/.224\) \(\approx\) \(3.46\) times more supported than its complement. In case of \(N = 10000\), both methods show full support for the complement of \(H_1\).
Bayesian informative hypothesis testing for an object of class lm (ANOVA):
Fit Com BF.u BF.c PMPa PMPb PMPc
H1 0.263 0.076 3.458 3.458 1.000 0.776 0.776
Hu 0.224
Hc 0.224
Hypotheses:
H1: D1=D2>0&D3>0
Note: BF.u denotes the Bayes factor of the hypothesis at hand versus the unconstrained hypothesis Hu. BF.c denotes the Bayes factor of the hypothesis at hand versus its complement. PMPa contains the posterior model probabilities of the hypotheses specified. PMPb adds Hu, the unconstrained hypothesis. PMPc adds Hc, the complement of the union of the hypotheses specified.
# Apply GORIC #
set.seed(123)
results_12_N1000 <- goric(fit, hypotheses = list(H1), comparison = "complement")
results_12_N1000
restriktor (0.6-10): generalized order-restricted information criterion:
Results:
model loglik penalty goric loglik.weights penalty.weights goric.weights
1 H1 -1346.431 2.000 2696.861 0.045 0.818 0.175
2 complement -1343.380 3.500 2693.760 0.955 0.182 0.825
Conclusion:
The order-restricted hypothesis 'H1' has 0.175 / 0.825 < 1 times more support than its complement. That is, the complement has 4.71 times more support than 'H1'
Bayesian informative hypothesis testing for an object of class lm (ANOVA):
Fit Com BF.u BF.c PMPa PMPb PMPc
H1 0.000 0.076 0.000 0.000 1.000 0.000 0.000
Hu 1.000
Hc 1.000
Hypotheses:
H1: D1=D2>0&D3>0
Note: BF.u denotes the Bayes factor of the hypothesis at hand versus the unconstrained hypothesis Hu. BF.c denotes the Bayes factor of the hypothesis at hand versus its complement. PMPa contains the posterior model probabilities of the hypotheses specified. PMPb adds Hu, the unconstrained hypothesis. PMPc adds Hc, the complement of the union of the hypotheses specified.
# Apply GORIC #
set.seed(123)
results_12_N10000 <- goric(fit, hypotheses = list(H1), comparison = "complement")
results_12_N10000
restriktor (0.6-10): generalized order-restricted information criterion:
Results:
model loglik penalty goric loglik.weights penalty.weights goric.weights
1 H1 -13428.294 2.000 26860.588 0.000 0.818 0.000
2 complement -13411.461 3.500 26829.922 1.000 0.182 1.000
Conclusion:
The order-restricted hypothesis 'H1' has 0.000 / 1.000 < 1 times more support than its complement. That is, the complement has Inf times more support than 'H1'