Consumer perception of brand equity based on environmental sustainability in the Amazon: development and validation of a scale

Purpose Brands gradually became the core dimension and strategic asset of branding for organizations of all sizes, and today many companies adopt various forms of green marketing activities as part of their strategies. In this sense, this study aimed to develop and investigate the psychometric properties of precision and validity of a scale for brand evaluation equity based on the environmental sustainability of the Amazon in the Brazilian context. The scale was validated by adopting the norms described in the Standards standards. As samples for testing, eight companies from the Amazônia UP program participated in this research, having data interviews applied to 262 potential consumers. discoveries A scale with 23 items was constructed and validated, and after evaluation by the specialists, 22 items were obtained, divided into four dimensions: Quality perception — POQ(5 questions); Strategic Brand Positioning – SBP (10 questions); Willingness to buy — WIB(2 questions) and Innovation in retail — BRI (4 questions). Three items were modied because they presented the CVC at 0.8, being accepted after the adjustments. The instrument items showed good internal consistency (0.877) regarding their domains. As for DIF data, the scale invariably works for older and younger people for almost all items, except item BRI 04. validated cross-culturally can be applied to potential consumers aged between 19 and 64 years of both sexes.


Introduction
The innovative human being, as well as an innovative company, is always looking for solutions and can see opportunities in the foreseen problems and risks, having initiative and being proactive and visionary (Alencar et al., 2019;Loiola et al., 2016). According to Morit et al. (2018), conditioning factors of success in entrepreneurship can be based on three groups: individual characteristics of the founder, conditions determined by the company's environment, and structural and strategiaspects of the new business. Morit et al. (2018) describe the two fundamentals as symbiotic when planning to change the business image focused on innovation. Changes in the strategic characteristics of the new business image also require an engagement in changing the business environment, transforming its entire conceptual structure. Although companies make signi cant changes to their brand equity in this context, aiming to achieve innovative growth and a favorable competitive position, they seek to associate environmental preservation policies with their brand.
These companies seek to coordinate the innovation of products and processes and effectively disseminate the value proposition in the face of sustainable management, using brand equity through strategic brand management. From this perspective, brand equity is the set of assets and liabilities associated with a brand, brand name, or symbol that can add to or subtract from the value of a product or service Page 4/21 branding that incorporates sustainable practices in creating a solver brand. This strategy may in uence reliability and a subsequent preference of consumers for the product linked to this company (Rau et & Bres, 2014).
In this sense, this study aims to develop and investigate the psychometric properties of precision and validity of a scale for brand evaluation equity based on the environmental sustainability of the Amazon in the Brazilian context. With this objective in mind, this research is organized as follows. First, we present the dimensions analyzed for the development of the scale. After this description, we present the methodology adopted to validate and re ne the proposed scale. Finally, we discuss the results obtained and then provide our nal remarks.

Study Design
Operationalization of Variables the brand Equity comprises a series of dimensions, which the brand can develop, manage and control. Different conceptual structures include different dimensions. In this sense, this research is considered a theoretical reference to the dimensions postulated by Aaker, (1996Aaker, ( , 1991: awareness, loyalty, perceived quality, and associations. The initial dimensions and ndings from a scoping review were developed it by cross-cultural translation ( ' forward ' ) of selected items from the original instruments. Two translators performed the translation, one being a linguist and the other from the research area of the experiment. Two translators were selected since when translating a scale, several equivalences must be sought about the original, such as cultural, semantic, technical, content, and criteria (Guillemin et al., 1993). Furthermore, the current literature in the area highlights the need to avoid a literal translation of the items, a fact that can be avoided if they are only met through the variability of translators (Callegaro Borsa et al., 2012).
After the consensus of the translators, a structured scale initially contains 23 items to deal with the brand equity of brands related to environmental sustainability in Amazonia. The four dimensions proposed in this research were: Perception of quality -POQ (5 questions); Strategic Brand Positioning -SBP (10 questions); Willingness to buy -WIB (2 questions) and Branding Innovation -BRI (4 questions).
We investigated data from 7 related brands that sell products that use the Amazônia brand participating in the Amazonia UP program, developed by the Instituto do Homem e Meio Ambiente da Amazônia (IMAZON). The brands are from six product categories: cosmetics, tourism, clothing, accessories, and food. To characterize the perception of the brand, we gathered images from the website, images of the products, posts on social networks, and descriptive texts, as well as the prototyping of the e-commerce website, using data provided by them.
The total sample consisted of 263 potential consumers, aged between 16 and 64 (ages 19 to 30 = 40%; 31 to 64 = 60%). Participating interviewees were selected using the quota method to guarantee homogeneity regarding age and sex. Participants included consumers from all regions of Brazil. The survey was carried out from March to July 2022.

Dimensions analyzed
The rst dimension was initially composed of 10 (ten) transliterated questions from the scale developed by Grigorescu et al. (2019), which will assess the interviewee's perception of the strategic positioning of the brand (SBP) in the Amazon region of Brazil, as well as the participants' level of perception regarding the importance of the brand. Company/startup relationship with environmental sustainability in the region. Of the 21 (twenty-one) items of the original scale developed by Grigorescu et al, (2019) 10 (ten) were used with minor language modi cations.
The second dimension will have ve items and comprises the perception of product quality and the brand equity related to environmental sustainability initially proposed by each brand (PEB). This construct was evaluated using ve pictures/items of the brand, ve pictures of the store (digital), four pictures/items of the premium product (with a value above the market average), and ve pictures representing the business intention toward environmental sustainability in the Amazon. (production/extraction medium). From the hands of the gures/products seeking to contemplate this dimension, 5 of the 11 items were adapted that deal with the perception of quality of the questionnaires developed by Netemeyer et al. (2004) and Yoo et al. (2017).
The third dimension contains three items, which seek to understand how much participants would be willing to purchase products from companies related to environmental sustainability through the data presented (willingness to purchase -DCM). To measure willingness to buy, the 3 (three) items arranged in work by Arnett et al, (2003).
Finally, the fourth dimension will assess how much the business presents branding innovation (BRI). The dimension was set up by asking participants to answer four questions that addressed the theme of innovation, uniqueness and brand design, and the questions were adapted from the questionnaires proposed by Lin,(2015) and Netemeyer et al (2004 2.1 Cross-cultural translation of the items into the instrument the translation and adaptation of the items used from the original questionnaires followed ve essential steps recommended by the literature: (1) translation of the instrument from the source language to the target language, (2) analysis of the prototypes, and analysis of the synthesized version by expert judges, (4) back-translation to the source language), and (5) pilot study (Gjersing et al., 2010;Hambleton, 1993). For the convenience of this research, phase (2) was performed during content validation by CVC for the full scale, and phases (4) and (5) after completion of content validation.

Scale validation
After the formulation and structuring of the items to be used in the data collection scale, it was submitted for validation based on the tripartite model described in the Standards for Educational and Psychological Testing (Frey & Association, 2018), which presents the validation process in the following phases:

Content validity
To measure the effectiveness of the scale proposed in this study, the Content Validity Coe cient (CVC) method was used as a veri cation parameter. This technique involves seeking the consensus of a community of judges (that is, professionals effectively involved in the research area). The CVC method is based on the structured application of the knowledge and experience of experts in the eld, assuming that the joint judgment of a given process, if well organized, is better than a single person's opinion (Gurgel et al., 2021). With item acceptance criteria, as suggested by the literature (Nakano & Siqueira, 2012;Saiful & Yusoff, 2019), items that reached 0.70 for each item of a scale (I-IVC), as well as 0.8 for the general scale (S-IVC) (I for Item; S for Scale ) were accepted. ) this preferably, more signi cant than 0.90 less than 0.6 were eliminated.
The evaluation of the scale by the judges was performed blindly, therefore, the con dentiality of the participants guarantees greater delity to the results at the end of the experiment, allows better exposure to the results, and maintains their understanding Hasson al. (2000). After the validation of the items, the randomization of the items was performed using the Excel software, and the data collection was carried out during March to May 2022.

Pilot public and initial application of the tool.
An online survey was chosen when considering the current pandemic moment and the adopted population coverage. According to Walter (2016), due to its advantages, such as speed and capacity, researchers are increasingly using it to reach speci c populations, reducing possible costs. The selection of participating respondents was carried out using the quota method to ensure homogeneity in terms of age, gender (50% female and male), and socio-professional position. In addition, participants included higher education students.
2.1 Pilot validation of the tool 2.1.1 Validity is based on internal structure.
Validity based on internal structure re ects the degree to which the structure of correlations between items conforms to the construct the test intends to measure (Frey & Association, 2018). Latent variables or constructs are phenomena that are not directly observable but can be inferred through the relationship between observable variables that can be measured directly, like items on a scale (Reckase, 1985).
Therefore, in this research, an AFE was carried out to evaluate the factor structure of the proposed scale. The analysis was implemented using a polychoric matrix, and the extraction method used total diagonally weighted least Squares (RDWLS) (Asparouhov et al., 2010). The decision on the number of factors to be retained was performed using the technique of Parallel Analysis with a random permutation of the observed data (Timmerman & Lorenzo-Seva, 2011).
To determine the number of retention factors and facilitate the interpretation of the factors (Damásio, 2012), a rotation method was adopted similar to the oblique one by Robust. Promin (Lorenzo-Seva & Ferrando, 2019). Seeking a better re nement of the scale, it was assumed that, for an item to be accepted, it must have a factor loading above 0.60 and cross-loadings between items lower than 0.20 (Howard, 2016).
Seeking to evaluate the quality of the model's adequacy, in order to assess how well it can reproduce the covariance structure or the correlational structure of the variables, the following strategies were used: The root mean square error of approximation (RMSEA), Tuker-Lewis index (TLI) and comparative adjustment index (CFI). According to the literature by Brown (2015), the RMSEA value must be less than 0.08, and the con dence interval cannot reach 0.10. CFI and TLI values must be greater than 0.90. preferably 0.95. The model's reliability, that is, how consistent and reproducible it is in measuring the intended latent variables, was tested using the Composite Reliability indicator, which must have values greater than or equal to 0.70 (Hair et al., 2009).
The stability of the factors was evaluated using the H-index (Lorenzo-Seva & Ferrando, 2019). The H-index proposes to assess how well a set of items represents a common factor. H values range from 0 to 1. High H values (> 0.80) suggest a well-de ned latent variable, likely to be stable in different studies. Values below this demonstrate instability between different studies (Ferrando & Lorenzo-Seva, 2018). Finally, the discrimination parameter and the item thresholds were evaluated using the Reckase parameterization (1985). In this phase, the Factor software was used (Lorenzo-Seva, 2003).

Validity based on relationships with external measures
Seeking to evaluate the evidence of validity based on the relationship with other variables arranged in other questionnaires (Souza et al., 2017), as well as the asymmetry of information. Convergent and discriminant validation tests were performed.

Convergent and discriminant validity
The convergent validity of the factors was evaluated utilizing the average variance extracted (AVE) using standardized factor loadings and the error of the item measure. The CVA estimation was obtained by the factor loadings and measurement errors obtained by Con rmatory Factor Analysis using the Weighted estimator Least Square Mean and Variance (WLSMV). AVE values greater than 0.50 indicated good convergent validity (Maroco et al., 2014).

Validity of Invariance of Measures
Multi-Group Con rmatory Factor Analysis (AFCMG) was performed to verify the invariance of the proposed Model measures in two samples selected at random to the extent of 50% for each (T = Test and A = Sample) of the collected data (J. C. Borsa & de Sousa, 2018). Comparative Fit Index (CFI) tests and the chi-square difference were used to assess the scale invariance. To assume the invariance of the measure, in comparison with the previous model, the t index of the test model in the CFI must not be greater than 0.01 (Cheung & Rensvold, 2002), as well as the χ 2 must present a value of p > 0.05 (Milfont & Fischer, 2010;Zaiţ & Bertea, 2011).

Validity based on the item response pattern
The psychometric properties of the adapted instrument were evaluated using the Rasch model for polytomous data. (Andrich, 1978). In addition, indicators of the reliability of people and items and performance deviations were evaluated through the in t and out t indices.
Regarding reliability indices, values greater than 0.70 or, preferably, values greater than 0.80 are expected (Linacre, 2012). The In t and Out t indices quantify residuals for the items in the model tested (Bond & Fox, 2013a;Linacre, 2012). In t assesses unexpected response patterns from individuals with a latent trait ( theta ) level equivalent to the item's di culty level. Out t, as far as it is concerned, checks for unexpected response patterns from those with a theta level θ below or above the item's di culty level. The mean square indicators can evaluate the In t and Out t estimates ( mean square ; MNSQ) and z-standardized (z standardized ; ZSTD).
For MSNQ, the expected default value is 1. Values greater than 1 indicate that the estimates showed more variation than expected by the Rasch model (ie, mis t), while values lower than 1 indicate less variation than expected by the Rasch model (ie, over tting ). Acceptable MNSQ scores typically range from 0.7 to 1.3 logits (Bond & Fox, 2013b; Boone, 2016), but a less conservative range of 0.5-1.5 logits can also be used (B. D. Wright et al., 1994). As for the ZSTD, values above |2| indicate that the estimate does not adequately t the data.
The DIF was evaluated using the Mantel procedure (B. Wright, 1994). Items whose di culty estimates were statistically different for men and women (p ≤ 0.05) were inspected. The magnitude of the DIF was interpreted through the DIF contrast : values between | 0.00 | and | 0.43 | are considered low/negligible; values between | 0.44 | and | 0.64 | are considered moderates; and values above | 0.64 | are considered high (Linacre, 2012).

Cross-cultural translation of the items into the instrument.
The process of translation and cultural adaptation generated the Brazilian Portuguese version. During the translation process, no signi cant di culties were encountered. However, it took some insigni cant changes to the grammatical structures. Based on the translators' consensus, 23 adapted items were accepted, 19 polyatomic ( Likert scale 1-5), two dichotomous (Yes -No), and one qualitative question.

Validity by internal measures.
Content validity.
Based on the scores attributed to the items, the CVC was calculated to assess the agreement that the judges have on the scale. Based on the score attributed by the judges, the items that did not reach a value greater than or equal to 0 .8 in at least two of the established criteria (clarity, pertinence, and relevance) were corrected.
Five changes were suggested in the statements, namely: RSA 02 "Can you quickly remember the brand symbol or logo?" has been corrected to " Do you think this brand symbol/logo is easy to remember? " (clarity = 0.430); (relevance = 0.650); (relevance = 0.700); BRI 01 "When imagining the product, I realized that the quality of the products is high" changed to "When imagining a product of this brand, I believe that the quality is high" ( clarity = 0.896) (relevance = 0.963); (relevance = 0.963); QLM 01 "I feel that when I purchase the services, I will receive the quality I expect." Changed to "I feel that by purchasing the services of this brand, the quality will meet my expectations.", POQ02 ( clarity = 0.830) (relevance = 0.963); (relevance = 0.896) "While viewing the services of the Brand, would you be able to differentiate from products from other companies?" changed to "After seeing the image above, would you be able to differentiate branded services from other companies?"; WIB03 ( clarity = 0.896) (relevance = 0.963); (relevance = 0.963) "If the prices of the Brand were to rise a little, I would buy from the competition, even with the responsibility of environmental sustainability." Changed to "If the brand's prices went up a little, I would buy from the competition, even though I know they care about environmental sustainability." The other items listed to compose the scale met the established criteria.
After corrections, the Content Validity Coe cient for the full scale, considering the average CVC t for three aspects judged -clarity, coherence, and relevance -was CVC t = 0.87. For clarity of items, the CVC t was 0.84. For the relevance between the set of items of the scale as a whole and the construct that it proposes, the CVC t was evaluated as 0.91. Moreover nally, for the item's relevance to the research, the CVC t was 0.90.
Validity is based on internal structure.
Bartlett 's (2926.2, gl = 210; p < 0.001) and KMO (0.89) tests of sphericity suggested the interpretability of the items' correlation matrix. For stability, Cronbach's alpha presented acceptable values ( Cronbach's = 0. 925). The parallel analysis suggested four more representative factors for the data (Table 2). Note: The number of factors to retain is four, as four factors from actual data have a higher % explained variance than random data.
The factor loadings of the items can be seen in Table 3. In addition, the Composite Reliability indices are also reported, as well as estimates of replicability of the factor scores ( H -index; (Ferrando & Lorenzo-Seva, 2018). From the analysis of the factor loadings of the items, it was possible to identify which of them belong to what dimension, that is, to reorganize them by counting the items that best explain them, as well as which items did not reach factor loadings to be kept in the instrument. The factor structure yielded few cross-loadings, and the factor loading was higher for the initially expected factors than the others. There was only one item (Branding Innovation #01): "The brand offers more innovative products than other companies I know.") that did not project a fair value on the expected factor (i.e., items with factor loadings above 0.30), items that had a factor loading below 0.30 were excluded. In the others, only one item showed a pattern of cross loads (19), being allocated the class with the highest load value.
The composite reliability of the factors was also acceptable (above 0.70) for the SBP factor and marginally acceptable for the other factors. For the measure of replicability of the factor structure ( H -index, (Ferrando & Lorenzo-Seva, 2018), the data suggest that all factors may be replicable in future studies ( H < 0.80). proposed instrument indicates a good statistical t for the model. Comparative Fit Indices (CFI) and Tucker-Lewis index (TLI) above 0.95 indicate good instrument t (Tucker & Lewis, 1973;West et al., 2012).
It is essential to highlight that the indicators of unidimensionality Unidimensional Congruence ( UniCo ) 0.845, Explained Common Variance (ECV) 0.886, and Mean of Item Residual Absolute Loadings 0.177 did not support the hypothesis of unidimensionality of the scale, since the values of UniCo, ECV, and MIRAEL greater than 0.95 suggest that the data can be treated as essentially unidimensional, otherwise, the null hypothesis is accepted (Ferrando & Lorenzo-Seva, 2018), as for the MIREAL (Mean of Residual Item Absolute Loads) less than 0.300 suggests that the data can be treated as essentially one-dimensional. Table 4 breaks down the quality and effectiveness of the factors, the Factor Determination Index (FDI) can be interpreted as the number of different factor levels that can be differentiated based on the factor score estimates.  (Table 5). According to Fornell and Larcker (1981), when one of two factors ( fej ) is greater than or equal to the square of the correction between these factors, discriminant validity is attested. That is, the factors explain more variability of the items that are manifestations of each of the factors than the variability that one factor explains (J. Borsa & Damasio, 2018). Table 6 shows the con gural, metric, and scalar multi-group invariance analysis results of the scale in the different groups investigated.

Measurement invariance
The data of different tests of the Comparative Fit Index (ΔCFI) indicate that the measure presents ne invariance adjustments. To assume measurement invariance, the value assigned to (ΔCFI) must not be greater than 0.01. In this sense, it is possible to observe, through the ΔCFI values, that the scale structure is stable, with no response bias in the samples for the different groups. Validity based on the item response pattern The initial analyzes showed adequate reliability indices for the items (Reliability = 0.98; Separation Index = 7.48) and for people (Reliability = 0.90; Separation Index = 2.93), which suggests that the estimates obtained tend to be replicable. Regarding the performance deviations ( in t and out t ), it is possible to notice that the instrument presents acceptable values (Bond & Fox, 2013b;Boone, 2016), both for the items ( In t MNSQ = 1.00, ZSTD = -0.70; Out t MNSQ = 1.06; ZSTD = -0.13) and for the participants ( In t MNSQ = 1.10, ZSTD = -0.10; Out t MNSQ = 1.06; ZSTD = -0.20).
The item-person map (Fig. 1) shows that the instrument presented, in general, a level of latent trait greater than the di culty of the items (mean theta = 0.67; mean di culty of the items = 0). As can be seen, the most di cult items were WIB01 "After seeing the brand image, and knowing that one of its differentials compared to competitors is that it preserves the Amazon, how much would you be willing to pay (more or less ) for the services sold?" (d = 1.83) and WIB03 "If the brand's prices went up a little, I would buy from the competition, even though I know they care about environmental sustainability" (d = 1.43), and the most speci c item was item SBP 07 "I would consume this mark" (d = -0.96).
It is still possible to see in the graph that most of the items are grouped between -1.5 and approximately 1.5 logit, so there is greater precision in the theta estimation of the participants allocated in this continuum range. The Test Information Curve supports this information. Regarding the statistics of the items individually. Table 7 presents the threshold indicators. The thresholds of all items showed an increasing structure, as theoretically expected. Regarding the DIF for age groups, Table 8 presents the results found. As can be seen, only BRI 04 ("Do you think this brand symbol/logo is easy to remember?") showed signi cant DIF with moderate effect size (DIF Contrast = 0.48).  The information functions for each item are best illustrated by the Item Characteristic Curves ( ICCs ). as shown in Fig. 3a. These curves show that item information is not a static quantity but conditional on the item's di culty levels. For example. as shown in the item-person map. item WIB01 shows more signi cant progress towards the respondent's endorsement level (θ) for positive responses to the questionnaire (I agree and agree).

Discussion
The main objective of this study was to develop and investigate the psychometric properties of precision and validity of a scale for brand evaluation equity based on the environmental sustainability of the Amazon in the Brazilian context. Consumer attitudes towards brand equity and environmental sustainability have been mentioned in the literature as relevant dimensions in choice. persistence in choosing products and business performance. and adapting the dimensions for environmentally sustainable brands (Aprile and Punzo. 2022;Cowan and Dai. 2014;Defrancesco et al.. 2017).
The cross-cultural adaptation process is essential when a scale is used in a different language. environment. and time to reduce the risk of bias in a study (Herdman et al.. 1998). To develop a more suitable and easily understood instrument. we adapted the procedures recommended by the Standards for Psychometric Instruments. In this study. the thematic adaptation of the items did not raise discrepant translation and cross-cultural adaptation adjustments.
The central adaptations made from the original questionnaires were the addition of three questions. the cross-cultural adjustment in terms speci c to the eld of study. and the conversion of the original pro les of the questionnaires to the objective proposed in this research. The validation phases followed the Standard psychometric instrument validation standards.
The Standards have important general implications for comparative psychometric studies investigating consumer perceptions of environmentally sustainable brands. Standardized scale assessment tools should demonstrate a similar interpretation of test items crossculturally.
Suggestions for changes to items RSA 02. QLM 01. and WIB 03 were given by the most colloquial transcription and were easy to understand by the interview participants. In this sense. a person who was lling out the questionnaire would not have more signi cant di culties in interpreting the previously legal question. The change in item BRI 01 was suggested considering the change of the term "perceive" to "believe" since imagining that the item has some quality factor is semantically closer to belief than perception. Given that CVC i presented satisfactory results for all aspects judged. the change seemed adequate. suggesting that the term was semantically adapted.
Regarding the Content Validation of the instrument proposed by the Standards standards using the content validity coe cient. it can be said that the CVC t for the entire instrument presented satisfactory values of language clarity ( CVC t = 0.84). practical relevance ( CVC t = 0.91) and theoretical relevance ( CVC t = 0.90). having Content Validity Coe cient for the test as a whole of CVC t =0.87 and an error Pe J = 0.003.
The exploratory factor structure con rmed the initial assignments proposed in this research for constructing the instrument. establishing the nal model with four factors. The Perception of Quality Factor -POQwas assigned ve questions describing the perception of the potential consumer interviewed when the evaluated brand presented the product. For the strategic brand positioning factor -SBP. ten questions were assigned. The SBP dimension is theoretically based on the concept of strategic positioning described by López and Alcañiz. (2000). The authors de ne the strategic positioning of the brand as the performance of the brand in the search for factors that make up the success of the business. such as technological innovation and relation to the ethical concepts of the consumer and the environment. Finally. the willingness-to-buy factor -is composed of 2 questions. and the innovation factor in retail -BRI 4 questions.
In the composite reliability. the factors PEB. DCM. and BRI presented values marginally accepted by the literature. According to Valentini and Damásio, (2016) the change in the homogeneity of the factor loadings may be due to the smaller number of items in the factor in order to limit the interpretation of the results of the composite reliability.
Validation based on relationships with external measures was evaluated using the Mean-Variance Extracted VEM. The proposed scale obtained fair values to corroborate the invariance of the SBP (0.845). POQ (0.964). WIB (0.889) and BRI (0.677) instruments.
According to Maroco et al., (2014). the VEM value for each dimension to be accepted must be greater than 0.50. Furthermore. the operational measures obtained by the factors when analyzed pairwise showed satisfactory values to attest to the validity hypothesis when analyzing the discriminant validity of the instrument.
The invariance data by multi-group con rmatory factor analysis (AFCMG). through the values of con gurable. metric. and scalar invariance. shows that the structures measured between the groups and the scalar units are the same for the groups studied since the values of difference from the Comparative Fit Index (ΔCFI) are below 0.01. Factor invariance is an essential component of the iterative process of demonstrating measurement equivalence of latent constructs across groups. including gender and age subpopulations.
When measurement equivalence is present. the relationship between the latent variable and the observed variable remains unchanged across the analyzed populations. Raju et al. (2002). This premise succinctly demonstrates the importance of measurement equivalence in psychometric instruments. suggesting that it is possible to apply the scale when the objective is to analyze. in different groups. the perceptions of consumers concerning brands related to environmental sustainability in the Amazon.
The validity based on the response pattern to items using Item Response Theory techniques when associated with invariance analyses.
allows evaluating the similarity of items in a given instrument developed for different groups (Sireci, 2021). In this research. the IRT results support the validation of the instrument for this phase. Furthermore. the values of reliability of the items (0.98) and Separation Index of the items (7.48). as well as the index of the reliability of the participants = 0.90 and the index of separation of the participants (2.93). suggest that the estimates obtained tend to be replicable. In t s and out t s values of the instrument of the items and the participants prove to be acceptable (Bond and Fox. 2013b;Boone. 2016). Item WIB01 required a higher level of q for positive endorsement (agree and agree). The consumer is willing to pay a higher price (premium) up to the limit of 10%. results close to those obtained because of D'Amico et al., (2016) when analyzing the probability that a consumer will pay (or not) a premium for a product with sustainable certi cation. and that the brand equity sustainability individually does not promote the consumer's interest in paying more for the product. corroborating the ndings by Rahmani et al., (2019). On the other hand. item SBP 07. which presented the lowest level q for positive endorsement (agree and agree) shows that the consumer after knowing all the information about the brand and agreeing with the value of the products plus a maximum of 10% premium increase. would tend to consume the brands presented in the survey.
The instrument information curve brie y explains how well items. in general. provide statistical information about the latent trait. The curves are mathematical functions of each other where Standard Error (θ) = 1 -√ Information (θ). Higher information along the θ scale leads to lower standard errors. resulting in more accurate θ estimates. In addition. the scale information values can be used to calculate which level of θ the instrument reliably ts best. The range of information obtained by the validated items being between 0 ≤ θ ≤ +4 indicates that the instrument best applies to potential consumers who tend to respond moderately (between disagree and agree).
The transformation of the endorsement estimates q present in the Characteristic Curve of the Instrument makes it possible to verify that the test participants. on average. responded 50% above the positive score of the scale > 4 (I agree on the Likert scale). The transformation to the original scale metric provides a more familiar frame of reference to interpret the sum of scores (Bean and Bowen. 2021b). having properties favorable to reliability in the analysis of the latent trait. improving and making the approach in scores more accurate (Edwards and Wirth. 2009). However. ceiling and oor effects. participants. and issues corroborating such an occurrence were not identi ed.
The results of the analysis of the Differential item working (DIF) obtained by the test support the premise that only the BRI 04 ("Do you believe that this brand symbol/logo is easy to remember?" ) presents a varied DIF for subjects of different age groups who present the same level of latent variables (Callegaro Borsa et al.. 2012). Furthermore. the literature suggests that short-term recognition losses in adults may be linked to the activity performed during situations in which attention is divided (Craik. 2018). a fact that may have affected the participants of the older age group during the questionnaire in a digital environment.
The nal scale with 22 items. available in electronic version. has good psychometric properties. Pilot questionnaires based on the work of Netemeyer et al. (2004). Yoo et al. (2017). Grigorescu et al. (2019). and Lin. (2015) in a total of 32 items covering ve dimensions it was reduced to 20 items covering four dimensions. Each dimension contains items about the perception of product quality. the perceived positioning of the brand. the willingness of the potential consumer to buy. and the innovation perceived in the branding of the analyzed brand.
Although this research has the objective of developing and validating a psychometric scale. the present ndings contribute to the scienti c literature that studies sustainable brands. as effective branding plans require a broad knowledge about the perception of consumers and potential consumers regarding the dimensions responsible for the strategic positioning of the brand equity.

Conclusion
In short. the proposed scale proves to be psychometrically sound and shows the expected relationships with known variables associated with the perception of potential consumers. Future studies should investigate further evidence of validity. including additional measures of cognitive functioning. and apply them to clinical group examinations to further test their usefulness. For future studies. we recommend the application of the instrument in person and comparing the respondents' performance and the scale's psychometric properties.