Means Grouping based on Studentized Midrange

The purpose of this paper was to develop two procedures of multiple comparisons based on methods of clustering means, that is, mean grouping tests based on the midrange (MGM) and range (MGR). The first is based on the studentized midrange distribution, and the second is based on the studentized range distribution. The tests presented similar performance (evaluation of type I error and power) to the performance of the considered tests used for comparison. Like the tests presented that were based on methods of grouping averages of literature, the MGM and MGR tests did not control the experimentwise error rate for almost all evaluated scenarios. However, under the complete H 1 hypothesis, these tests showed high power, with emphasis on the MGM test. Thus, what we propose is yet another test alternative without ambiguity in its results and not a substitution for the traditional tests already present in the literature.


Introduction
In experimental statistics, some of the existing problems are the simultaneous comparison of hypothesis tests, so that the global type I error increases as the number of comparisons between treatments increases, and this is what we call the multiplicity effect.Statistical procedures designed to adequately control for multiplicity effects are called multiple comparison procedures (MCPs).
One of the biggest problems in the study of multiple comparison procedures is the lack of transitivity (results ambiguity) when two levels of the factor had the same difference between themselves, but they do not differ from a third party, that difficult interpretation the results.As an alternative, different methodologies were proposed to contour this situation, for example, methods based on grouping analysis.The grouping analysis uses as a separation criterion of objects, the characteristics that these objects own.The proposal is to unite groups of objects with similar characteristics.An efficient alternative to contouring the problem of MCPs ambiguity is Scott-Knott's test (Scott & Knott, 1974).
Despite the Scott-Knott's test being able to solve the MCPs results ambiguity problem, its performance presents some problems, like small deviations from rate of type I error under complete H 0 and high type I error rates under partial H 0 .In addition, an interesting situation is that the criterion of partition of the groups of means of this test, in certain situations, forms groups of means with a difference between consecutive means intra-group greater than the difference between consecutive means that delimit these groups.
Many other alternative forms of grouping tests, similar to Scott-Knott's test, were presented in the specialized literature (Bhering et al., 2008;Calinski & Corsten, 1985 Shimokawa & Goto, 2011).None of these methods was able to solve all problems presented beforehand and, in some cases, they displayed a performance even worse than the reference Scott-Knott's test.A considerable quantity of these methods is based on the externally studentized range or in the F statistics.The search for alternative statistics to make this grouping was the mark of these works since Scott-Knott's test used the likelihood ratio.A very interesting statistic is the midrange since according to Rider (1957), it is more efficient (it's an estimator with less variance to the population average) than the arithmetic average in some populations, such as the cosine population, the parabolic population, the rectangular population and the inverted parabolic population.Based on this information, it was noticed that these statistics could be an alternative to the development of multiple comparisons methods.
In the literature, some works about the midrange were published by Gumbel (1958), David & Nagaraja (2003), among others, obtained the midrange distribution and density functions to the case of a normally distributed population.Batista & Ferreira (2017) developed the density, distribution and quantile functions for the case of the externally studentized midrange, both theoretical and numerical methods.As a consequence of these works, was created an R package, denominated SMR (Batista & Ferreira, 2014a), with the implementation of the algorithms published in Batista & Ferreira (2014b), that calculates the cumulative distribution function, the density function and returns the quantile values for the distribution of this statistic.Batista & Ferreira (2020) published two tests based on this same distribution, being an alternative to Tukey's test.
(4) However, Q is not an ancillary statistic to µ, avoiding the test development based on this statistic.This way, Batista & Ferreira (2014a) showed that to Y i ∼ N(0, σ 2 ), the probability density function and the cumulative distribution function are and respectively.However, to Y i ∼ N(µ, σ 2 ), the hope of Q is given by Γ (ν/2) , wherein µ = 0, one has that E[ Q] = 0.This is fundamental information to the development of the proposed tests in this work.
The externally studentized range is defined by the ratio between W = Y (n) -Y (1) and S, in which S is the population standard deviation estimator σ, associated to ν degrees of freedom, being independently distributed from W, namely, The distribution function and density function of the externally studentized range are given, respectively, by: wherein ϕ(y) and Φ(y) are the density function and the distribution function of a standard normal random variable, with y ∈ R and f X (x; ν) is the density function of X, that was expressed in (4).These results are given in the function of the standard normal distribution.Namely, independently from the parameters of the initial normal distribution, the density function and the Q distribution function will be always expressed in terms of the standard normal distribution.
Considering that both the externally range distribution and the externally studentized midrange distribution, this work has as an objective, develop two average grouping methods that don't present ambiguity and that have control upon type I error and high power.The performance of the tests is rated by Monte Carlo simulations, considering the type I error rate experimentwise and power.

Materials and Methods
Considering n treatments and r repetitions, for the tests proposition, the following random sample was experimentally obtained: is the random observation referenced to the ith treatment in its jth repetition, i = 1, 2, . . ., n and j = 1, 2, . . ., r.The ith treatment average is: This sample is subjected to variance analysis, adopting the following model: in which ϵ ij ∼ N(0, σ 2 ) and µ i = µ + τ i are the ith treatment average.Thus, the mean squared error (MSE) is estimated by: It is well known that Ȳi. and the MSE are independently distributed and that V( Ȳi. ) = MSE/r, see Graybill (1961) and Searle (1987).
Under the null hypothesis H 0 : µ 1 = µ 2 = . . .= µ n = µ, the n treatments have a common average µ.In this particular case, the statistics of order Ȳ(1). ,Ȳ(2). ,. .., Ȳ(n).are centered on µ.Therefore, the externally studentized midrange, defined by has a distribution function dependent on µ (under H 0 ), as presented in the expressions (2) and (3).However, µ is unknown and hardly equal to zero in real situations.Thus, to utilize the distribution of Q with unknown µ and µ ̸ = 0 is impossible in the test proposition.
This way, we chose to utilize the midrange distribution in the especifica case where µ = 0, expressions ( 5) and ( 6) and adapt the test's statistic to adjust to the sample mean.It is observed that the distribution is centered on µ, which is unknown.Therefore, to use the distribution centred on 0, there was a correction in the statistic.Since R = ( Ȳ(1).+ Ȳ(n). )/2 has a distribution centred on µ, the corrected statistic was Rn = R -Ȳ * ., wherein Ȳ * . is an estimator for µ.Initially, the overall average was thought of as an estimator of Ȳ * ., i.e.Ȳ.. = n i=1 Ȳi. /n.However, when the simulated data were under H 1 , this mean estimated the overall mean of the parameters µ = n i=1 µ i /n, where µ i = E( Ȳi. ).Thus, under H 1 , it was observed utilizing of the simulation that the quantity (µ 1 + µ n )/2 was close to µ and therefore, E( Rn ) = (µ 1 + µ n )/2µ ≈ 0. Thus, the performance of the power tests was very low.
Under H 0 , it is clear that the expected value E( Rn ) is null, which at first would support the direct use of the externally studentized midrange in the test.However, what was observed, in a preliminary evaluation via simulation, was that the type I errors experimentwise were too high.Initially, it was speculated that this resulted from the fact of the statistic is a function of Ȳ.. , which also has a sample error associated with it.Thus, an initial (Minimum Significant Difference) MSD that would represent the standard error of Ȳ.. was built.However, the test started to control adequately the type I error, but presented low power.
It happened because E( Rn ), although different from zero, under H 1 , presented values in magni- tude not so different from zero.Thus, it was sought an estimator of µ that would maximize E( Rn ) under H 1 and where E( Rn ) = 0 under H 0 .In this case, it was used Ȳ * . that would correspond to the average of one of the two potential groups to be obtained in the test.This partition would be between two ordered means of maximum range.
To obtain an estimator with a smaller standard error, it was used the average of the group with the higher number of involved averages between the two groups considered.This estimator was determined based on empiric criteria and validated through Monte Carlo simulation.Therefore, considering the partitions Ȳ(1). ,Ȳ(2). ,. .., Ȳ(k).and Ȳ(k+1). ,Ȳ(k+2). ,. .., Ȳ(n). ,whose point k corre- sponds to the value j, where max j ( Ȳ(j+1).-Ȳ(j). ) happens, for j = 1, 2, . . ., n -1.if there are ties with two or more values different from k, say k 1 , k 2 , . .., then it is formed a partition wherein This way, taking otherwise.Thus, the final statistic of the test is: The MSD to reject or not the hypothesis, initially was considered where q(α/2;n,ν) is the 100α/2% upper tail quantile of the distribution of Q, expression 6, with n treatments and ν degrees of freedom.On the results of preliminary Monte Carlo simulations, it was observed that the type I error per experiment was way smaller than the nominal levels of significance and that the power was low.Thus, the contribution of Ȳ * . for the MSD, (1/ √ n)× √ MSE/r should be reduced by a factor between 0 and 1.By trial and error in a process of Monte Carlo simulation, was found a value that converged to √ 2/2.Thus, the final MSD considered was where q(α/2;n,ν) is the 100α/2% upper tail quantile of the distribution of Q, expression 6, with n treatments and ν degrees of freedom.

Mean grouping test based on the midrange (MGM)
The MCP was proposed using a criteria of forming a partition of m oredered means in the position k, in which otherwise.The steps for this test's application are: 1. Do m = n and take the ordered means of the treatments by: Ȳ(1). ,Ȳ(2). ,. .., Ȳ(m).; 2. Determine k and Ȳ * m as discussed previously; 3. Determine the statistic's value by: 4. The MSD is where q(α/2;m,ν) is the 100α/2% upper tail quantile of the distribution of Q, expression 6, with m treatments and ν degrees of freedom; 5.If | Rm | ≤ ∆ m is the stopping criterion, therefore the m means are considered not different and then the group is marked as not partitionable.Otherwise, consider the group's means Ȳ(1). ,Ȳ(2). ,. .., Ȳ(k).as different from the group's means Ȳ(k+1). ,Ȳ(k+2). ,. .., Ȳ(m).and go to the 6th step; 6.For each group obtained and marked as partitionable, consider m as the number of means of the related group.Repeat the steps from 2 to 5, with one reservation, in the 4th step, the following must be used as the minimum significant difference: where q(α/2;m,ν) is the 100α/2% upper tail quantile of the distribution of Q, expression 6, with m treatments and ν degrees of freedom.The reason for changing the DMS in expressions (11) and ( 12) is to increase the power controlling the type I error rate at nominal significance level.This was verified through simulation.The step (repetition of the steps 2 to 5) is done for all the groups until no other group can be partitioned in two new ones or until all the groups contain a single mean.

Mean grouping test based on the studentized range (MGR)
A similar version of Scott-Knott's test (Scott & Knott, 1974), based on the studentized range was also proposed.The essence of the test is the same as the proposed MGM test (Section 2.1).To partition the groups, it was used a potential partition point the position of the maximum range between ordered means.Thus, for Ȳ(1). ,Ȳ(2). ,. .., Ȳ(m). ,the partition must be considered in the position k where is verified: for j = 1, 2, . .., m-1.Must be considered for the application of the test, the superior quantile 100α%, q (α;m,ν) , of the externally studentized range.The steps for the application of the test are: 1. Do m = n and consider m ordered means: Ȳ(1). ,Ȳ(2). ,. .., Ȳ(m).; 2. Determine k according to what was previously discussed; 3. Calculate the statistic of the test by: 4. The MSD is 5. If q m ≤ ∆ m , then the m means are considered not different, mark the group as not partitionable and go to the 6th step.Otherwise, consider the group means Ȳ(1). ,Ȳ(2). ,. .., Ȳ(k).as different from the group's means Ȳ(k+1). ,. .., Ȳ(m).and go to the 6th step.6.For each group obtained and marked as partitionable, consider m the number of means for the related group.Repeat steps 2 to 5.This procedure is done for all groups until no other group can be partitioned into two new ones or until every group contain a single mean.

Performance evaluation of the proposed tests
Two strategies were considered in this work.The first was to evaluate the experimentwise error rate (EER) of the proposed multiple comparison tests.The second was to evaluate the power of the tests.In both cases, Monte Carlo simulation was used in the R software (R CORE TEAM, 2022).
In each simulation the multiple comparison tests were applied at a pre-established nominal level of significance α, verifying whether or not the null hypothesis was rejected.This process, in each case, was repeated N * = 5000 times and the proportion of experiments with at least one incorrect decision, in the first case, refers to the empirical EER and in the second case, the proportion of correct decisions (rejections) refers to the empirical power.
To evaluate the empirical EER simulated via Monte Carlo, it was used the exact binomial test with a coefficient of 99% of probability to test the hypothesis H 0 : α = 5% against H 1 : α ̸ = 5% and H 0 : α = 1% against H 1 : α ̸ = 1%.If the null hypothesis is rejected and the empirical EER is considered significant (p-value < 0, 01) inferior to the nominal level, the test will be considered conservative.If the empirical EER is considered significantly (p-value < 0, 01) superior to the nominal level, the test will be considered liberal.If the observed value of the EER is not significant (p-value > 0, 01), the test will be considered exact (Oliveira & Ferreira, 2010).
Considering y as the number of rejected null hypotesis in N * = 5000 Monte Carlo simulations, for a nominal level of significance α, the test statistic using the relation between the distribution F and the binomial distribution (Leemis & Trivedi, 1996), with success rate of p = α, is given by under H 0 .This statistic has an F distribution with ν 1 = 2(N *y) and ν 2 = 2(y + 1) degrees of freedom.If F < F 0,005 or F > F 0,995 , the null hypothesis must be rejected to the significant level 1% of probability, wherein F 0,005 and F 0,995 are the quantiles of the F distribution with ν 1 and ν 2 degrees of freedom (Oliveira & Ferreira, 2010).
In both steps data were simulated according to the statistic model described in (7), where µ is the general constant fixed at 100 for all the cases, without loss of generality, τ i is the effect of the ith treatment and ϵ ij is the effect of the random error with normal distribution and independently distributed with 0 mean and common variance of σ 2 , also fixed at 100, without loss of generality, being i = 1, 2, . .., n and j = 1, 2, . .., r, wherein r is the number of repetitions.
In the first step for the evaluation of the EER, the effects of the treatment τ i were considered equal to 0 for all i, i = 1, 2, . .., n.Therefore, the data were generated under the complete null hypothesis, namely, with all the treatments having the same parametric means.The probability of the EER (α) was estimated by the proportion of experiments with at least one incorrectly detected difference according to the total of N * simulated experiments, namely, , wherein E k is a binary variable that assumes the value 1 if at least one type I error occurred in the kth experiment and 0, otherwise, for k = 1, 2, . .., N * and I(E k = 1) is the indicator function that returns 1 if the equality is verified and 0, otherwise.
In the second step of the power evaluation, the effects of the treatments were simulated with two options, to generate a simulation of complete H 1 (alternative hypothesis) and of partial H 0 (null hypothesis).Thus, in the first case, the effect of the treatment 1 was considered equal to 0, namely, τ 1 = 0, and the others are fixed by for δ, δ = 1, 2, 4, 8, 16 and 32, representing the number of standard errors of the difference between means to specify the effect of the consecutive treatments, considering i = 2, 3, . .., n.Thus, the power was computed by the proportion of rejections among the means involving multiples of δ, about the total number of comparisons involving this difference.Therefore, between consecutive treatments, for example, there are n -1 comparisons per experiment and N * (n -1) comparisons in total, which corresponds to the power of detecting δ standard errors of the difference between means.In the same way, for the neighbours with step 2 (first and third means, second and fourth up to the antepenultimate and last ordered means) there are n-2 comparisons per experiment involving 2δ standard errors of the difference of means to be detected.This procedure is done for all the cases until the first and last means are compared, i.e., (n -1)δ standard errors to be detected in only 1 comparison per experiment and a total of N * comparisons to all simulated experiments.
The second option for the study of the power involving a simulation under partial H 0 involved the simulation of two mean groups, with k 1 = ⌊n/2⌋ and k 2 = nk 1 means in each, where ⌊x⌋ refer to the biggest integer lesser or equal to x.The means of the first group were all the same, for which the effects were τ i = 0, i = 1, 2, 3, . .., k 1 , without loss of generality.The second group, with k 2 means, had its effects also equal to where different values of δ were considered as δ = 1, 2, 4, 8, 16.In this case, the proportion of rejections involving comparisons of different groups in the total of N * k 1 k 2 comparisons involving means of two groups in the N * simulated experiments provided an estimate of the power.The intragroup comparisons allowed us to also evaluate the ratios of the EER under partial H 0 .The proportion of experiments with at least one rejection of the null hypothesis of equality between the two intragroup means was an estimate of this ratio of test error.All the tests were applied to each one of the simulated scenarios, the EER (intragroup comparisons) and power (intergroup comparisons) were computed and the results were compared.Considered some configurations in both steps with different values of n and r.Thus, were considered the cases with n = 5, 10, 20, 40 and 100, and r = 4, 10 and 20.Considered also the nominal significance level of 1% and 5%.The coefficient of variation of the experiment adopted was 10% because, in the simulated results, it was noticed that the evaluated MCPs were not influenced by the coefficient of variation, considering a normal population, for the evaluation of the performance of type I error per experiment and power, since, in the simulation, when the means differed, it was always in terms of standard errors.Besides, preliminary analyses were made with the proposed tests and this same behaviour was verified.Therefore, the simulations were fixed in a single variation coefficient.

Results and Discussion
The performance evaluation of the proposed tests will be compared with the results of existing tests in the literature, emphasizing the Tukey, SNK and Scott-Knott tests, the first two being a classical reference in the literature and the last one being the reference for proposing the tests.To understand more about these tests, an updated reference can be found in Cui et al. (2021) and an updated recommendation for multiple comparisons can be found in Sauder & DeMars (2019).Even so, the Tukey and SNK tests were also simulated, to confirm the results that already exist on these in the literature.The results found by Ramos & Vieira (2014), Ramos & Ferreira (2009), Conrado et al. (2017), Bernhardson (1975), Carmer & Swanson (1973), Bhering et al. (2008) and Einot & Gabriel (1975) will also be used for discussion.The performance evaluation will be based on the type I error and the power of the tests.Several arrangements have been chosen for performance evaluation.The results will be discussed and presented through tables and graphs to facilitate exposure and interpretation.As the simulation was performed in several scenarios, only some will be presented due to the amount of simulated data and also taking into account that in some simulation scenarios the performance of the tests was similar.The first evaluation of the tests was based on the EER.Two scenarios were evaluated under complete H 0 and partial H 0p .Tables 1 and 2 show the results of the EER under complete H 0 .It was observed that the MGM and MGR tests controlled the experimentwise error rate because none of them had the empirical EER rejected by the exact binomial test, such that F ≥ F 0.995 .However, in some cases, the empirical nominal levels for the MGM test were rejected by the exact binomial test, such that F ≤ F 0.995 , making it conservative.Confirming the results of this work (Tables 1 and 2), Carmer & Swanson (1973) and Bernhardson (1975), they also showed that the  Regardless of the number of replications, the proposed tests (MGM and MGR) controlled the EER (Tables 1 and 2).This was also verified by Borges & Ferreira (2003) when they evaluated the performance of Tukey and SNK tests, thus confirming the results of Tables 1 and 2. They used the same methodology of simulation of the present work, concerning the number of replications and the coefficient of variation.
The reason for this is the simulation adopted.Treatment parameters are linked to the number of replications and the difference between means is always preserved in terms of standard error, which is therefore related to the coefficient of variation and the number of replications.However, considering Scott-Knott's test, these same authors observed that for a large number of replications, r = 20, only when the number of treatments was small, n = 5, the test was liberal.Conrado et al. (2017) also verified this for the adjusted Scott-Knott test, when α > 12% in the simulated experimental conditions.
In Figures 1 and 2 that represent the evaluation of the experimentwise error rate, two red lines can be observed, one is full and the other is dashed.The first delimit the rejection where the EER values below this line were lower than the overall nominal level, i.e., the hypothesis H 0 : α = 5% was rejected by the exact binomial test because F ≤ F 0.005 .Thus, it is a conservative test.The second delimits the region in which the EER values above this line were higher than the overall nominal level, that is, the hypothesis H 0 : α = 5% was rejected by the exact binomial test because F > F 0.995 .This is a liberal test.
In Figure 1, the MGM, MGR, Tukey and SNK tests controlled the EER, since none of the tests evaluated by the exact binomial test exceeded the red lines identifying the rejection of the H 0 hypothesis.* The red lines delimit the rejection region by the exact binomial test.Regarding the number of treatments, based on a graphical representation for the arrangement r = 20 and α = 5% in Figure 2, it was found that the MGR and MGM tests control the EER.It was noticed that in the MGM test as the number of treatments increases, the EER decreases to the point of being conservative for both α = 1% and α = 5% (Tables 1 and 2).The other simulated settings can be seen in Tables 1 and 2, and the results were similar to those shown in Figure 2 For the Tukey, SNK and Scott-Knott tests, regardless of the number of treatments, considering a normal population, Borges & Ferreira (2003) showed that the experimentwise error rate remains equal to the level of overall significance.However, when considering non-normal populations, the Carmer & Swanson (1973) and Perecin & Malheiros (1989) evaluated the t-bayesian test proposed by Waller & Duncan (1969) and high rates of type I error were found per experiment for this test.Carmer & Swanson (1973) observed that for the numbers of treatments equal to 5, 10 and 20 and significance level α = 5%, the values of type I error rates per experiment were 15.6%, 18.4%, and 18.7%, respectively, confirming that it was a liberal test.
An interesting fact for the proposed MGR test is that it presents EER identical to the Tukey and SNK tests, regardless of the number of replications and treatments, under complete H 0 .This is due to the similarity in the theoretical development of the tests.For example, the Tukey and SNK tests for the first difference between the extreme means (lowest mean and highest mean), present the same MSD, as observed by Carmer & Swanson (1973).This fact was also observed in Borges & Ferreira (2003), considering the normal distribution and variation coefficient of 10% (same conditions of the simulation of this study), in which the Tukey and SNK tests present equal EER.
However, the assumption in the null hypothesis that the treatment means are the same, can be observed in the experiments that very rarely all these means are the same.Based on this, in the following subsection, we considered the scenario in which the simulation was based on the partial null hypothesis.Thus, another way to evaluate the type I error is through simulations taking into account the partial null hypothesis (H 0p ), see Tables 3 to 5.
Figure 3 shows the performance evaluation of the tests about the difference in consecutive means (δ), fixing the number of treatments (n = 5, 20 and 100) and the number of replications (r =10).In 2.920 -- 3.020 -- 18.320 ++ 4.200 -- * The symbol " --" indicates that the EER was rejected by the exact binomial test, such that F ≤ F 0.005 .The "++" symbol indicates that the EER was rejected by the exact binomial test, such that F ≥ F 0.995 .
general, it was observed that the proposed tests exceed the established levels of significance, especially when the difference in groups of consecutive means is greater than 2σ Ȳ .For Ramos & Vieira (2014), the CCR and CCF tests, as well as their bootstrap versions, presented these problems from δ ≥ 1, making them liberal tests.Borges & Ferreira (2003) evaluated the Scott-knott's test for δ = 0.5 and 4, and α = 5%.In almost all scenarios evaluated, the Scott-Knott test exceeds the nominal level adopted, being also a liberal test.
According to the simulation performed in the present study, Tukey's test is the only test that controls the EER to the level of overall significance, regardless of the configuration of the experiment.Carmer & Swanson (1973) found that the Tukey and Scheffé tests did not exceed the 3.1% EER in all configurations, considering a significance level of α = 5%.In this same study, it was also observed that the Duncan and t-Bayesian tests have the highest type I error rates per experiment under partial H 0 .
The SNK's test showed control in the EER only for small differences in consecutive means.When the difference between groups of consecutive means was greater than 4σ Ȳ , the EER of this test exceeded the nominal level for both α = 0.01 and α = 0.05, which characterizes it as a liberal test being also confirmed by Carmer & Swanson (1973.The MGR test showed control over the overall significance level when the difference between groups of consecutive means was greater than or equal to 8σ Ȳ .The MGM test controlled the level of overall significance only for a large number of treatments (n = 100) and δ ≤ 2, see Tables 3 and  4.
The tests evaluated did not suffer expressive influences, as can be seen in Figure 4, which presents the results of the EERs for the proposed tests concerning the number of replications, for n = 10, α = 5% and the difference between consecutive group means of 4σ Ȳ .
The number of treatments had an influence on the EER at the overall significance level, under    The SNK's test, for δ ≤ 2, presented the EER according to the level of significance adopted, regardless of the number of treatments, see Tables 3 and 4.However, when δ > 2 the test presented the EER > α, becoming a liberal test.It was also observed that the MGR test, for δ < 8, considerably increases the values of the EER with the increase in the number of treatments.For δ ≥ 8, the EER of this test stabilizes and becomes identical to the overall significance level.When δ = 16, the experimentwise is identical to Tukey's test.This is due to the similarity between the structures of the two methods.So, it doesn't matter whether you use one or the other, under partial H 0 when δ ≥ 8.The problem is that this difference between averages is not very common in practical situations and in fact, we don't know the real difference between the means.
For the MGM test, when δ ≥ 8, it is observed that by increasing the number of treatments, this test also increases the EER, becoming a very liberal test, under partial H 0 .When δ < 8, the EERs of these tests as a function of the number of treatments (n), present a behaviour of a parable (Figure 5), having a peak in the values of the EER when the number of treatments is equal to 20.
In the second step, the tests were compared through the study of power.Several situations were considered: number of replications, number of treatments, differences between means, level of significance, and number of populations.In the latter situation, two groups were simulated that had the same means internally and differed from each other by a quantity of δ standard errors, that is, under partial H 0 .The power study was also evaluated under the hypothesis H 1 , in which comparisons between groups of different means were considered.
The power of the tests, under complete H 1 , was influenced by the number of treatments.In Table 9, the performance evaluation of the Tukey, SNK, MGM and MGR tests can be observed, for the difference between averages of 2σ Ȳ , r = 4 replications and a significance level of α = 5% probability.
The SNK test increased power as the number of treatments (n) increased, while the Tukey test decreased power as the number of treatments increased.Tukey's test has almost 0% power when the * The red lines delimit the rejection region by the exact binomial test.number of treatments equals 100 for δ = 1.This can also be verified for the Scheffé test, according to Carmer & Swanson (1973).This fact shows that Tukey and Scheffé tests are not recommended for comparison of two to two means with a large number of treatments.This result shows the importance of the type I error for these tests, i.e. a very conservative test, increases the type II error, and consequently decreases the power of the test.Another power performance evaluation was performed for the MGR, MGM and Scott-Knott tests, Figure 6.This evaluation was performed based on the number of treatments, fixing the difference between means of 2σ Ȳ , r = 4 replications and α = 5% probability.The power of Scott-Knott's test was taken from the work of Silva et al. (1999).These authors evaluated Scott-Knott's test in the same experimental scenario as the present study.
The MGM test had an increase in power with an increase in the number of treatments.This behaviour was also verified for the Scott-Knott test.However, the MGR test practically did not change power with the variation in the number of treatments.What occurred was a small decrease in power with an increase of n.For n = 5, the power of the MGM, Scott-Knott and MGR tests was 42.99%, 39.45% and 37.59%, respectively.While for n = 100, the power of the tests was 72.64%, 48.45% and 33.51%, respectively.The MGM test showed greater power than the Scott-Knott and MGR tests.The latter showed the worst performance among the three tests.
For a broader evaluation, all the tests presented to date were compared with the performance evaluations of other tests performed by Perecin & Malheiros (1989).These authors showed that the test with the greatest power was the t-Bayesian test, followed by the t-test, Duncan test, modified Newman-Keus test and the Newman-Keuls test.All of them had power above 60% for 4σ Ȳ standard errors, this effect being more expressive with the t-bayesian test, power above 78%.However, this is due to the high type I error rates per experiment, under the H 0 hypothesis.
Figure 7 shows the power of the tests in the scenario δ = 4, r = 4 replications and α = 5% probability.This scenario served as the basis for presenting the other situations since the results were equivalent.It was observed that the MGM, t-bayesian and Scott-Knott tests showed the greatest power, and the first test showed the greatest prominence.Tukey's test presented the worst performance.The other tests presented the values of intermediate power between those of the MGM test (test with greater power) and Tukey's test (test with less power).
The number of replications was another aspect that influenced the power of the tests, but not as expressive as in the case of the number of treatments.Figure 8 has presented the evaluation of power performance concerning the number of replications.This evaluation was analyzed in three scenarios: (a) 5 treatments, (b) 20 treatments and (c) 100 treatments, for a difference between the mean of 2σ Ȳ and α = 5%.
For a small number of treatments, Figure 8(a), there was an increase in the power of the tests with an increase in the number of replications, mainly from 4 to 10.However, when the number of treatments increased (n ≥ 20), Figures 8(b) and 8(c), the power of the tests hardly changed with the increase in the number of replications.This may be due to the higher accuracy of the estimation of residual variance, because of the increase in the number of treatments, regardless of the increase in the number of replications, degrees of freedom are high.However, with a small number of treatments, the degrees of freedom are also small for a small number of replications, and high for a large number of replications.Thus, we observe the greatest effect of the number of replications for the power in the latter situation, once the accuracy of the experiment was fixed.
Another evaluation of power performance was based on the difference between means.The power of the tests increased rapidly as the difference between means increased.Silva et al. (1999) and Perecin & Malheiros (1989) showed that when the magnitude between means was equal to or greater than 6σ Ȳ , the correct decision percentages of the tests evaluated were high.
In Figure 9 presented the power of the tests for the real differences between averages of 1 to 32σ Ȳ , with 4 replications and α = 0.05.The scenario was divided concerning the number of treatments: (a) n = 5, (b) n = 20 and (c) n = 100.When δ ≤ 6, Figure 9(a), the MGR (for n = 5) and MGM (for n = 20 and 100) tests obtained higher power than the others.When δ > 6 almost all the tests reached 100% power (Figures 9(b) and 9(c)), remembering that the proposed tests converged more slowly to this value.With 100 treatments, the power of these tests did not exceed 50%, even when the actual difference between averages was 32σ Ȳ (Figure 9(c)).It is interesting to note that the SNK test tends to be slightly higher than the Tukey test in all configurations, as can also be seen in Borges & Ferreira (2003), and that this test has converged more quickly to 100%.
To compare the power of the proposed tests with the power of other tests found in the literature, the real difference between means was considered from 2 to 32σ Ȳ , for the number of treatments 5, 20 and 100, with 4 replications and α = 0.05, Figure 10.
For a small real difference between means, regardless of the size of n, the MGM test obtained higher power than the others, being even more accentuated as n increased, especially from the Tukey test, a test with less power for this situation.
The t-Bayesian and Duncan tests were highlighted concerning power, as expected, because these two tests have high rates of error type I per experiment Bernhardson (1975), in the case of liberal tests.Being liberal tests, a high type I experimentwise error rate implies a small type II error rate and consequently high power.With the increase in the difference between means, these two tests converged more quickly at 100%.
Unlike the MGM test, the MGR test showed a power lower than the Scott-Knott test.However, in the performance evaluation, it was classified as having intermediate test power concerning that of the evaluated tests.The modified SNK and SNK tests also showed intermediate power, but lower than the MGR test for n ≤ 5 and higher than the MGR test for n > 5, Figure 10.
When the actual difference between means increased, the power of the t-bayesian and Duncan tests increased, although the MGM test maintained high power and control of type I error per experiment.The Scott-Knott and modified SNK tests obtained higher power than the MGR tests, with slightly higher values with an increase of n.Interestingly, in almost all configurations, the modified SNK and SNK tests were practically the same, except for the difference between means between 4 and 8σ Ȳ .
Comparing the power of the MGM and MGR tests with the CCR, CCF, CCRb and CCFb tests (Ramos & Ferreira, 2009;Ramos & Vieira, 2014), we noticed that the tests proposed in this work showed greater power for smaller differences between means (δ ≤ 2).As this difference increases, the CCRb and CCFb tests have greater power than the other tests.
A very relevant aspect in the proposed tests (MGR and MGM) was that although they may have presented a slower convergence to the maximum percentage of correct decisions (100%), for small values of δ, these tests were superior when compared to the other tests presented in this paper.In real experiments, this is the most common situation, Figure 9.
In Figure 11, we observe the setting for the real difference between means of 4 to 32σ Ȳ , for n = 5, 20 and 100 treatments, with 4 replications and α = 0.05.For this scenario, the proposed tests were compared with the Tukey and SNK tests.Regardless of the number of treatments, the MGM test obtained higher power, followed by the SNK and MGR tests.Once again, the test with the worst performance was Tukey's test.When the initial value of the difference between means was greater than 4σ Ȳ , the power of the tests quickly converged at 100%, since the real difference between means was very large.
In the present study, it was verified that the initial values of the real differences between means influenced the power of the proposed tests.This was not verified in any other study.Consider the power value of the MGM test as an example.In Table 9, the value of the difference between means (δ) for the n = 5, r = 4 and α = 0.05 scenario was 1 to 32.In Table 7, the value of δ was 2 to 32, and in Table 8, 4 to 32.
Note that the initial δ values were different.Thus, for these three scenarios, considering the same difference between averages of 4σ Ȳ , the power for the three situations was 34.94%, 69.91% and 89.58%, respectively, Figure 12.
This shows that the power of the proposed tests and the SNK test has increased as population means have become more heterogeneous.However, this did not happen with the MGR test.When the initial values of δ were 1 to 32 for 2 to 32, the power of this test increased when evaluating the same difference between means (4σ Ȳ ).
However, when the initial values of δ went from 2 to 32 to 4 to 32, the power of this test increased.Thus, what is observed is that the MGR test tends to be more powerful when it is evaluated medium populations that are more homogeneous than medium populations that are more heterogeneous.For the Tukey test, power has become constant for the same difference between increasingly more heterogeneous population means.This can be explained by the fact that the Tukey test is very conservative.Excessive control in type I error ends up influencing power, as predicted in the literature.
Another way to evaluate the tests is under the hypothesis under partial H 0 .The evaluation took into account the number of treatments, number of replications, difference between means and level of significance.
The number of treatments was a point that influenced the power, under partial H 0 , although the number of replications did not show as much influence.In Figure 13, it was observed that the increase in the number of treatments (n) decreases the power of the tests.However, when the real difference between means was 4σ Ȳ , Figure 13(c), the MGR test started to increase power with an increase of n, being the only test to reach power around 90% when n = 100.This test and the MGM test obtained the highest percentages of correct decisions.However, when δ ≤ 4, the power values did not exceed 30%.Even so, the Tukey test performed worst in almost all situations.With the increase of n, its power came close to 0%.From δ > 8, almost all the tests converged to 100% power.The power of the tests evaluated had little practical significance, since the EER of all these tests was higher than the level of significance adopted, under partial H 0 .Only the Tukey and Scheffé tests had EER identical to the nominal level, as verified in Carmer & Swanson (1973).However, the power of these EERs came to 0% in certain situations.
A characteristic that can be improved in the MGM and MGR tests, for the control of type I error by experiment and high power, under partial H 0 , is to try to improve the contribution that the unknown population mean influences the MSD of the tests, since the distribution of the centred midrange in µ depends on the location parameter.

Application
We apply the tests the proposed tests, using the experiment of Figueiredo et al. (2015).The results will be compared with Scott-Knott's test.

Example 1
The experiment was performed in a triple-lattice design evaluating the genotypes in 7 environments.For this study, the evaluation of the environments will not be taken into consideration, since it goes beyond the study objective of this work.The evaluation of the Scott-Knott test for the flowering period was also performed by Figueiredo et al. (2015) and is presented in Table 10.The additional information of this study was: an analysis of variance whose residual mean square was 6.3078 with 252 degrees of freedom.The number of replications with which the genotype means were estimated was 21.A data entry not very common in routines is the average of the treatments, which will be presented in this example.In this example, it will be shown that by entering only the results of the mean square of the residue, the degree of freedom and the number of replications, the MRtest function can perform the procedure of the four proposed tests.
In Table 11, the test results are presented for comparison, as well as the consecutive differences between the ordered means to assist in the comparison of the test results.Another aspect is the emphasis on the lines in which one of the tests separated the group of means.
The results show that the proposed tests (MGM and MGR) showed a greater separation of the groups of means than Scott-Knott's test in a more coherent way.The proposed tests showed very similar results.The means were ordered to facilitate discussion.Note the difference in test results in the first groups of means.The means of the genotypes BR507 and BR506 were considered statistically equal by the Scott-Knott test, but different by the MGM and MGR tests.Subsequently, the means of genotypes BR506 and BR508 were considered statistically equal by the proposed tests, but different by the Scott-Knott test.There is an inconsistency in the Scott-Knott test, which is very common in practice.Note the difference ȲBR507 -ȲBR506 = 0.84.The value of 0.84 between these two means was not enough for the Scott-Knott test to detect that they are sampled from populations with different means.However, this same test found that the difference ȲBR506 -ȲBR508 =0.63 was significant, and therefore the mean effects of these genotypes are different.This is due to the philosophy of how the Scott-Knott test was developed.The separation of groups occurs by the likelihood ratio between groups.The differences between the limiting means of each group can often be smaller than the differences between consecutive means within the groups.Unlike Scott-Knott's test, the MGM and MGR tests are more consistent in this respect.The difference between the BR506 and BR508 genotypes of 0.63 was not sufficient for the proposed tests to evaluate these two genotypes as statistically different.However, for the major difference between the BR507 and BR506 genotypes of 0.84, they were statistically different.
However, in one situation the MGR test did not get rid of this aspect either.Verifying the difference between the genotypes V82393 and V82392, which was 1.31, the MGR test did not detect a difference between these means, as it was not verified by Scott-Knott's test.This question is because the difference between the BR507 and BR506 genotypes of 0.84 was detected as a significant difference by the MGR test.For the MGM test, this does not occur; the difference for genotypes V82393 and V82392 of 1.31 was detected as statistically different genotypes.Only in one situation did none of the tests detect significance in a difference of 0.86 (difference between the genotypes CMSXS639 and CMSXS642).The smallest significant difference detected for the MGM and MGR tests was 0.84, and for the Scott-Knott test was 0.63.Thus, all tests above these values should also detect differences.It is worth remembering that for the proposed tests, the values 0.84 and 0.86 are very close, being a threshold for these tests to detect the significance of the difference between the means.
In all other situations in which Scott-Knott's test differentiated the groups of means, the MGM and MGR tests were also able to detect.Taking into account that the MGM test further refined the group separation.
More coherently, the greater group separation occurs in the MGM and MGR tests due to the development of how the tests were proposed.The separation of the groups of these tests takes into account the greater consecutive difference between means, and this was determinant to avoid the inconsistency that often occurs in the Scott-Knott test.
Example 2 Calinski & Corsten (1985) proposed two grouping methods, one based on the F distribution, we will call it the CF test, and the other based on the studentized range, we will call it the CR.For these authors, the idea of the tests was to provide unambiguous results, to have a separation of small groups and that this separation would provide more homogeneous groups among any other formation of groups, that is, groups with lower intra-group variances.Thus, they exemplified the application of the two proposed tests applying them in the experiment analyzed by Duncan (1955) and then by Scott & Knott (1974).The experiment evaluated the yields (bushels per acre) of seven varieties of barley were compared in a randomized block design, which contained 6 blocks.The means of the varieties were:  the homogeneity of the groups, we applied a weighted average of the variances of the formed groups, in which the weights were the degrees of freedom computed in each group.For example, for the CF test, we have two groups formed (1-4) (5)(6)(7).Thus, the variance of means for the first group was 30.33667, and for the second it was 4.443333.In the first group, there are four means, and therefore, 3 degrees of freedom.In the second group, there were 2 degrees of freedom.Thus, the value for the weighted mean of the variances of the groups for the CF test is (30.33667 × 3 + 4.443333 × 2)/5 = 19.97933.For the other tests, the weighted average of the variances is given in Table 13.It can be observed that the MGM test presented more homogeneous groups and the MGR test presented the formation of more heterogeneous groups.One could think that this occurred because the MGM test formed more groups.However, as well mentioned by Calinski & Corsten (1985), it can be observed that the difference between consecutive means of treatments 1 and 2, in the order of 2.33σ Ȳ , was greater than the difference between treatments 4 and 5, in the order of 1.67σ Ȳ .These last treatments were the limited treatments for the breakdown of group formation.However, the formation of the groups by the CF, CR and Scott-Knott tests, with the inclusion of treatment 1 in the group (1)(2)(3)(4), provided that this group had a greater sum of squares, and consequently, a greater variance for the group.This shows that the treatment included in this group differs from the other treatments and, therefore, could not be included in the group.This was verified by the MGM test, which resulted in the formation of groups (1)(2-4) (5)(6)(7).

Varieties
Something that also draws attention is that Ramos & Vieira (2014) evaluated these tests, and the power of the CF and CR tests is greater than the power of the MGM test.Considering, a scenario similar to this experiment, with 5 treatments and 4 replications under complete H 1 , the power of the CF and CR tests to detect the difference between means of 2σ barY or more, is greater than approximately 30%.For the MGM and MGR tests, the power of the tests is in the order of 25% and 21%, respectively.Still, the tests proposed by Calinski & Corsten (1985) could not detect separation of treatment 1 form treatments 2 to 4. This can be explained because for the CF and CR tests to have reached the formation of this group, they had a type I error rate per experiment ranging from 1% up to 53.1%.
Thus, the fact that the groups formed by the MGM test were no longer homogeneous because they separated more groups, but rather because they formed groups with similar characteristics.
These results presented in the two examples do not mean that this will always happen for the MGM and MGR tests.However, one can observe the good characteristics that these tests, whereas the classical tests such as Scott-Knott could not detect such differences.Thus, the idea is not to show that these proposed tests are better than those already present in the literature, but to be an alternative for the use of procedures that present good characteristics, with the control of type I error per experiment, under complete H 1 , high power, and without ambiguity in its results, that is, a new alternative for the formation of groups of means.
Based on Carmer & Swanson (1973), one should not develop an MCP giving total emphasis to type I error, because in this view one can see the fragility of Tukey's test.Nor even less want the formation of smaller groups in a more homogeneous way, since it ends up generating some inconsistencies as shown in the Example 2. However, we cannot make choices like those made by Carmer & Swanson (1973 when choosing Fisher's protected T-test or the t-Bayesian test, because it performs well in some evaluations, however, very high rates of type I error.One should be very cautious in the usage choices of an MCP because the search for good multiple comparison procedures continues since this still represents a gap in science.

Conclusions
The proposed MGM and MGR tests performed better than the Skott-Knott's test, in most of the evaluations performed, except for the type I error per experiment under partial H 0 , which even the Skott-Knott test does not control.Among all the tests evaluated in this study, the MGM test presented the best performance in almost all evaluation configurations, adding the advantage of not presenting ambiguity in its results.
However, we noticed some limitations of the proposed tests (MGM and MGR tests).As the number of treatments increases in the simulated scenarios, under partial H 0 , the empirical type I error rate increases.While the only test presented that controls the type I error rate at the nominal level of significance is the Tukey's test, but with decreasing power the more the number of treatments in the simulated scenarios is increased.This means that the decision to choose the best test to be used must be taken with caution and verification of the advantages and disadvantages and in which experimental scenario the test is being applied.We are presenting two more possible tests to be used in multiple comparison procedures, allowing the researcher more test options for decision making.
. Carmer & Swanson (1973) and Boardman & Moffitt (1971) verified this same behaviour for Scheffé's test, considering 4000 experiments.For n = 20 treatments, the EER of this test reached almost 0% type I error per experiment, a very conservative test.

partial H 0 (
Figure 5), mainly for the MGM and MGR tests.But the behaviours of the tests were different from each other.Initially, Tukey's test preserved the EER at the significance level when the number of treatments varied.Compared to the performance evaluation made byBorges & Ferreira (2003), it was found that the results of the EER for Tukey's test are similar.* The red lines delimit the rejection region by the exact binomial test.

Figure 5 .
Figure 5. Type I error rate per experiment, in percentage, of the Tukey, SNK, MGR and MGM tests, depending on the number of treatments, under hypothesis H 0 partial, for a) δ = 2, b) δ = 4 and c) δ = 16, with r = 10, α = 5%, assessed by the exact binomial test with a confidence coefficient of 99% probability.

Figure 6 .
Figure 6.Power of the Scott-Knott, MGR and MGM tests, in percentage, under H 1 complete, to detect a difference between averages of 2σȲ , with r = 4 replications, in depending on the number of treatments, for a α = 0.05.

Figure 7 .
Figure 7. Power of Duncan, Scott-Knott, MGM, MGR, SNK, modified SNK, t, t-bayesian, and Tukey tests, in percentage, under H 1 complete, to detect a difference between averages of 4σȲ , with r = 4 replications, depending on the number of treatments, for a α = 0.05.

Figure 8 .
Figure 8. Power of the Scott-Knott, MGM, MGR, SNK, and Tukey tests, in percentage, under H 1 complete, to detect a difference between averages of 2σȲ , with (a) n = 5, (b) n = 20 and (c) n = 100, depending on the number of replications, for a α = 0.05.

Figure 10 .
Figure 10.Power of the Scott-Knott MGM, MGR, SNK, Tukey, Duncan, modified SNK and t-Bayesian tests, in percentage, under H 1 complete, to detect differences between averages from 2 to 32 σȲ , considering the number of treatments (a) n = 5, (b) n = 20 and (c) n = 100, 4 replications, and a significance level of 5% probability.

Figure 11 .
Figure 11.Power of the MGM, MGR, SNK and Tukey tests, in percentage, under H 1 complete, to detect differences between averages from 4 σȲ to 32 σȲ , considering the number of treatments (a) n = 5, (b) n = 20 and (c) n = 100, 4 replications, and a significance level of 5% probability.

Figure 12 .
Figure 12.Power of the MGM, MGR, SNK and Tukey tests, in percentage, under H 1 complete, for the initial values of the actual differences between means for 4σȲ , for n = 5, r = 4 and α = 0.05.

Figure 13 .
Figure 13.Power of the MGM, MGR, SNK and Tukey tests, in percentage, under partial H 0 , depending on the number of treatments, to detect differences between averages of (a) 1 σȲ , (b) 2σȲ , (c) 4σȲ e (d) 8σȲ , with 4 replications, and a significance level of 5% probability.

Table 1 .
Experimentwise error rate, in percentage, of the Tukey, SNK, MGR and MGM tests, as a function of the number of treatments and of the number of repetitions, under complete H 0 , at the significance level α = 1% probability, evaluated by the exact binomial test with a confidence coefficient of 99% probability The symbol " --" indicates that the EER was rejected by the exact binomial test, such that F ≤ F 0.005 .The "++" symbol indicates that the EER was rejected by the exact binomial test, such that F ≥ F 0.995 . * Calinski & Corsten (1985)lation settings of the present study, except for N = 2.000 simulations, observed that some values of the EER were higher than the nominal levels of significance (α) of 1% and 5%.The values of the EER that exceeded the value of α (liberal tests) were those in which the number of treatments was 5, although they did not distance themselves much from the nominal values.Conrado et al. (2017)presented a version of the Scott-Knott test for unbalanced designs.The performance evaluation of the adjusted Scott-Knott's test showed results similar to those found bySilva et al. (1999)andBorges & Ferreira (2003).In some situations, the test exceeded the overall level of significance.This can be justified by the Monte Carlo error.Ramos & Ferreira (2009) as well as Ramos & Vieira (2014) developed the tests created byCalinski & Corsten (1985), in the bootstrap version.They also evaluated the original tests and those that were developed.The tests developed byCalinski & Corsten (1985)are Calinski-Corsten's test based on the studentized range distribution (CCR test) and Calinski-Corsten's test based on the F distribution (CCF test).The bootstrap versions of these tests will be called CCRb and CCFb tests, respectively.All tests were considered exact under the null hypothesis and normal distribution.This is because the tests were evaluated under non-normal conditions, something that was not the objective of this work.
Tukey and SNK tests have the control of the EER.Silva et al. (1999) and Borges & Ferreira (2003), evaluating the performance of the Scott-Knott's test and

Table 2 .
Experimentwise error rate, in percentage, of the Tukey, SNK, MGR, and MGM tests, depending on the number of treatments and the number of repetitions, under H 0 complete, at the significance level α = 5% probability, assessed by the exact binomial test with a confidence coefficient of 99% probability The symbol " --" indicates that the EER was rejected by the exact binomial test, such that F ≤ F 0.005 .The "++" symbol indicates that the EER was rejected by the exact binomial test, such that F ≥ F 0.995 . *

Table 3 .
Experimentwise, in percentage, of the Tukey, SNK, MGR, and MGM tests, depending on the number of treatments and the number of replications, under partial H 0 , at the significance level α = 5% probability and δ = 1σȲ , evaluated by the exact binomial test with a confidence coefficient of 99% probability

Table 4 .
Experimentwise error rate, in percentage, of the Tukey, SNK, MGR, and MGM tests, depending on the number of treatments and the number of replications, under partial H 0 , at the significance level probability α = 5% and δ = 2σȲ , assessed by the exact binomial test with a confidence coefficient of 99% probability --" indicates that the EER was rejected by the exact binomial test, such that F ≤ F 0.005 .The "++" symbol indicates that the EER was rejected by the exact binomial test, such that F ≥ F 0.995 .
*The symbol "

Table 5 .
Experimentwise error rate, in percentage, of the Tukey, SNK, MGR, and MGM tests, depending on the number of treatments and the number of replications, under partial H 0 , at the significance level α = 5% probability and δ = 4σȲ , assessed by the exact binomial test with a 99% probability confidence coefficient * The symbol " --" indicates that the EER was rejected by the exact binomial test, such that F ≤ F 0.005 .The "++" symbol indicates that the EER was rejected by the exact binomial test, such that F ≥ F 0.995 .

Table 6 .
Power of the Tukey, SNK, MGR, and MGM tests, in percentage, to detect a difference between averages starting with standard error (1σ Y ) to 32σ Y , depending on the number of treatments and the number of replications equal to 4 (r = 4), under H 1 complete, at the significance level of 5% probability

Table 7 .
Power of the Tukey, SNK, MGR, and MGM tests, in percentage, to detect a difference between averages starting with two standard errors (2σ Y ) to 32σ Y , depending on the number of treatments and the number of replications equal to 4 (r = 4), under H 1 complete, at the significance level of 5% probability

Table 8 .
Power of the Tukey, SNK, MGR, and MGM tests, in percentage, to detect a difference between averages starting with four standard errors (4σ Y ) to 32σ Y , depending on the number of treatments and the number of replications equal to 4 (r = 4), under H 1 complete, at the significance level of 1% probability

Table 9 .
Power of Tukey, SNK, MGR, and MGM tests, in percentage, to detect a difference between averages starting with standard error (1σ Y ) to 32σ Y , as a function of the number of treatments and the number of replications equal to 4 (r = 4), under H 1 complete, at the significance level of 5% probability

Table 10 .
Selection of twenty-five sorghum genotypes based on the flowering period, evaluated by the Scott-Knott test

Table 11 .
Results of the MGM, MGR and Scott-Knott tests evaluating the 25 sorghum genotypes presented in Example 1

Table 12 .
Calinski & Corsten (1985)ts for the tests proposed byCalinski & Corsten (1985), the Scott-Knott test, and the MGM and MGR tests, in Table 12.Results that present equal letters in the means between treatments represent that they are statistically equal.The different letters represent the means of different groups.To verify Result of multiple comparison tests

Table 13 .
The weighted average of the variances of the groups formed from the multiple comparison tests