![[Graphics:Images/index_gr_1.gif]](Images/index_gr_1.gif)
In a normal significance test, for all intents and purposes this is the basic scheme. The red portion signifies α, the probability of rejecting the null hypothesis given that the null is true (the area to the right of a significance criterion). The blue area is β, the probability of failing to reject the null given that the alternate hypothesis is true. The yellow area is the power for the test, the probability of rejecting the null given the truth of the alternate hypothesis.
One problem with the use of null hypothesis significance testing is the tendency for those failing to reject the null to instead accept the null, whether overtly or covertly. Given the array of problems associated with that logically, it's a statement that isn't allowed to be made. In practice, the use of the significance test is generally not to disprove a point-null hypothesis, which is assumed to be false inherently. The purpose of the test is to decide whether the state of affairs looks more like it would under the null-or-something-like-it or more like it would under the alternate-or-something-like-it. That's the point in making use of variance in statistical testing. So, whether the point null is true is not the question for people who decide to accept the null. Their intent is generally to say that the state of affairs is closer to the null hypothesis than to the alternate, despite being able to "legally" accept the null, rather than passively failing to reject it.
What are some ways to remedy this common error? One way is to make an allowance for acceptance of the null under conditions in which it is expressly true-or-close-to-it (for lack of better words). Currently, it is acceptable in practice to "accept the null for all intents and purposes" under circumstances in which one has very high power to find a significant effect of the desired size, has good control over all extraneous variables, and still fails to reject the null. A more active or direct, yes/no answer is perhaps desirable though.
A few ways one could "accept the null":
By rejecting the alternate:
![[Graphics:Images/index_gr_2.gif]](Images/index_gr_2.gif)
Here, we have a decently large "power" to accept the null, over 50% in this example - not the best however. Also, it's dependent upon the exact location of the data of the alternate hypothesis.
Staying away from depending upon the alternate for criteria, we could do what is in essence a standard significance test on the null hypothesis, but with a ridiculously high (for normal uses) α, say around .45:
![[Graphics:Images/index_gr_3.gif]](Images/index_gr_3.gif)
Now, if one fails to reject the null, there is built into the procedure very high power to reject. Thus, one can likely accept the null "for all intents and purposes". The power to accept the null is relatively high, around 50%, but could still be much better.
In keeping with better practice, one could improve this technique from the one-tailed version above to a two-tailed version:
![[Graphics:Images/index_gr_4.gif]](Images/index_gr_4.gif)
Now we maintain the high power to reject the null and confine the region of null acceptance to a smaller area nearer to the null. This procedure, as before, could be done by using a regular significance test, but with a very high α, say .90 or .95, and to use a two-tailed criterion. The power to accept the null with this procedure is surprisingly low, merely 10% in the example given (1-α in general). Thus, actual acceptance of the null hypothesis can be assumed to be relatively rare and confined to those cases in which the null is likely the best representative of the true state of the world.
Still, simply picking a location on the number line and testing it seems rather unsatisfactory to maintain a null hypothesis. What would be more convincing is an acceptance of the null hypothesis after a failed attempt to reject it (where the story normally ends, unless someone is misusing significance testing logic). This calls for a contingency plan of two significance tests (ignoring the added error of multiple testing): An initial test as is the norm to attempt to reject the null hypothesis. If and only if that test fails, a second test to determine the validity of accepting the null hypothesis.
We have two options for a procedure to accept the null with the guideline of using a two-test procedure:
Double-rejection-attempt - after failing to reject the null, one attempts to reject the alternate
Rejection-attempt-accept - after failing to reject the null, one attempts another rejection of null with very high α
To decide which to use, we can compare possible error rates under both scenarios.
α: the probability of rejecting the null given that the null is true will remain at its nominal level in both schemes (if the null is rejected in the first step, the probability of an error is α; if not rejected, α is irrelevant for further testing)
β: the probability of failing to reject the null given that the alternate is true will remain at the same level similarly in both schemes (if one fails to reject the null on the first step, the probability of error is β, if one rejects the null, β is irrelevant)
pseudo-β: the probability of accepting the null, given that the alternate is true, while being part of β, is the portion that can be differentiated. This can be seen perhaps as a more heinous error than general β, so it will be the determinant of a better procedure.
Under the double-rejection-attempt scheme, one must first fail to reject the null (β), then reject the alternate (1-
pseudo-β = β(1-
, where
varies depending on the separation of the distributions
Under rejection-attempt-accept, one must still fail to reject the null on the first step (β), then fail to reject the null under more stringent standards (1-
). As
is generally going to be lower than
, this is the preferred method, with a lower pseudo-β, the probability of accepting the null given the truth of the alternate.
pseudo-β = β(1-
), where
is set by the experimenter
Type III error will remain negligible in both scenarios.
So, while not reducing overall error, we can, with this method, divide β into two components: one component corresponds to accepting the null hypothesis given that the alternate is true, while the other component corresponds to simply failing to reject the null hypothesis given that the alternate is true. We now have three possible courses of action in any given significance test:
-Reject the null
-Accept the null
-No decision
The probabilities of each of these courses under the truth of the differing options is given (assuming α=.05,
=.05 two-tailed, .45 one-tailed:
![[Graphics:Images/index_gr_14.gif]](Images/index_gr_14.gif)
p(reject
|
) = α = .05
p(reject ![]()
) = 1 - β = varies, approx .60 in graphic
p(accept
|
) = (1-α)(
) = .4275 1t, .0425 2t
p(accept ![]()
) = β
= varies, approx .18 1t, .02 2t in graphic
p(no decision |
) = (1-α)(1-
) = .5225 1t, .9025 2t
p(no decision |
) = β(1-
) = varies, approx .22 1t, .38 2t in graphic
As a quick check that probabilities are done properly, checking that
= 1 and
= 1
![]()
![]()
-CIs have two forms, one of which is flawed and the other mostly useless for experimentation
-used to see if a CI encompasses 0, or if two CIs overlap, the same problems as garden variety NHST occur
-used purely descriptively, one can't make any decisional errors, because one can't make ANY DECISIONS - unfortunately for CIs, we need to make decisions in psych research - whether to continue exploring or not, whether to use treatment A or treatment B, whether effect X matters, etc
This technique remains pragmatic in that it keeps p-values and NHST in all its glory, keeping with the status quo for publishing, etc, as well as the usefulness afforded by NHST when done properly
-at the same time, it reduces (hopefully, assuming it's used) false null acceptances and findings of "no difference", etc
-one other nice aspect is the possibility of reducing non-decision outcomes ("fail to reject Ho, so...um...") by taking a slice of them out for acceptance rather than non-decisions
Is it a perfect system? No.
Does it fix all the problems there are with NHST? No.
Can it still be useful though? Yes.