Analysis of variance (ANOVA) is a very useful tool in the biological and industrial context, and it has undoubtedly gained ground in modern fields such as geomatics, where data captured with remote or proximal sensors predominate. ANOVA has been one of the standard tools used to analyze experimental data for decades; however, currently there are so many application modalities that it is not enough to appear in the methodology that applied an ANOVA (AOV, ANAVAR, ANDEVA) to have complete clarity of what is behind the method and its proper use, since aspects such as balancing, blocking, the presence of covariates, temporal measurements, geo-referenced information, discrete or categorical response(s) of univariate or multivariate nature generate their own analytic strategies for estimating parameters of the associated model (Christensen, 2019).
Currently, remote sensing data capture technologies, and drone images or special cameras allow the creation of spectral indexes (Adak et al., 2021) that are associated with data that usually have very different structure to the experimental design, obligating users to do some kind of aggregation to experimental plots or other units in order to be able to treat this information as a source of additional variation in the ANOVA or simply as the same response. These data that involve some form of georeferencing, may also incorporate some spatial dependency structure (and many other temporal and space-times). Regarding different aspects such as analysis of vegetation cover, climatic and meteorological variables, among others, it is important to highlight that these vary due to many factors, including the time scale, therefore, it is necessary to develop evaluations with updated information from two or more periods, which allow comparisons to identify temporary variations of the object of study, before the attribution of the cause of said change. This process is known as multitemporal evaluation, which consists of comparing the object in the same geographical area in different periods, in order to know its variation, for this there are several evaluation methods (Gao et al., 2012). If these aspects are overlooked in the inferential process, it could invalidate its application, especially if they are analyzed with the methods usually studied in basic courses of experimental design (Kenny, 1995; Kenny & Judd, 1986; Zimmerman et al., 2007; Zimmerman & Zumbo, 2010).
It is surprising that many users of the technique incompletely report aspects for consideration prior to the ANOVA (experimental design) in the ANOVA’s own results; and they report insufficient details of analysis after ANOVA (interactions or multiple comparisons) to be certain of the scope and interpretation of the results (Acutis et al., 2012). This allows its statistical software package to set the tone for substantive data analysis decision-making usurping in some way the preponderant role of the users, perhaps assuming that by the fame of the software or its recognition it can be taken for granted that everything developed there is directly applicable or interpretable.
Several documents highlight the misuse of ANOVA in two-way designs due to heterogeneous variance (Rowell & Walters, 2006) highlighting defective data (Pearce, 2006). Defective data includes the non-specification of the type of ANOVA (Abubakar et al., 2022), confusion in the choice of the type of effects (fixed or random) (Bennington & Thayne, 1994), assumption too quickly of the weak argument that not rejecting the null hypothesis implies a strong conclusion (Rong, 2000), inappropriate use of the terms associated with the error in the construction of the statistical F or the type of sums of squares (Li & Lomax, 2011). Such defective data even includes the belief, supported in recognized literature that only a couple of assumptions are necessary to make use of the technique (Acutis et al., 2012). This ignores the supposed key to independence (Lindman, 1992) in which it is closely linked to the reality of the data captured with sophisticated sensors or equipment related to agriculture, such as performance monitors, spectrometric or other techniques for capturing data from radar or satellite images, or images from spectral or hyperspectral cameras, etc. All this information is usually converted into indices that usually relate to the physical-chemical and microbiological properties of soils or water as well as the vegetation, specifically in the development of the crop. In turn, this information is quite often incorporated into the structure of the experimental design in order to associate the data with each observation or experimental unit and thus, using ANOVA, evaluate the effect of the treatments (Atik & Akdemir, 2022; Barbosa et al., 2020; Ding et al., 2020; Firozjaei et al., 2020; Stoy et al., 2022; Volcani et al., 2005; Ya’acob et al., 2014). In the documents cited above, how SIMPLE (Simple, Informative, Meaningful, Powerful, Logical, Effective,) (Mcintosh, 2015) were these analyses in general? In the case of data which in many cases have been shown to have spatial or temporal dependency, shouldn´t an adequate description be made of the essential aspects of the ANOVA to involve the dependency structure present in the data? Knowing the weaknesses of ANOVA in certain contexts and of certain types of data can very surely enhance its benefits.
A form of spatial dependence can arise with remote sensor data or similar technologies, such as spatial lag or overlap (Shukl & Subrahmanyam, 1999), in order to find data associated with experiments where some design model was adjusted for the comparison of treatments, using as response(s) or some performance indicator evaluated in field or some estimate generated by some type of sensor (Wang et al., 2016). Some of these indices are added to each polygon that make up the experimental plot to finally incorporate them into the model, sometimes as a response and sometimes as covariates (if treatments do not affect this covariate, or a multiple-slope model has been specified). The literature points out how these indices show some kind of spatial dependence (dos Santos et al., 2021; H. Zhang et al., 2011); however, the rituality of traditional analysis mainly by knowledge of the methods that usually do not treat spatial dependence allows many excellent data captures, invalidating the work of field or laboratory tests because of the analytic technique (Gotway & Cressie, 1990).
With all the reasons and documentation presented, it is important to understand how ANOVA can be invalidated, at least in the context of spatial lag; but the same results extend to other situations that may have the same effect, such as edge effects, competition, poor specification of a model, etc. (Christensen & Bedrick,1997). Using conditional simulations, a Gaussian variogram was generated with fully defined parameters, and 1000 simulations were obtained to generate spatial dependence. The final goal was to show the invalidity of the ANOVA when spatial lag was presented using the Moran index and the spatial lag coefficient, although the other assumptions usually found as the most important in the context of the ANOVA are fulfilled.