Study area and sampling sites
Awash River Basin represents one of the most utilized landscape catchments in Ethiopia (Berhe et al., 2013; Degefu et al., 2013; Asfaw et al., 2014). It has an area of 112,696 km2, is home to nearly 15 million people, and consequently is one of the most important basins in the country (Tesfaye & Wolff, 2014; Englmaier et al., 2020). This river emerges in the central highlands in Ethiopia at an altitude of 3000 m.a.s.l (Berhe et al., 2013; Likasa, 2013; Bussi et al., 2021) and is subjected to severe pollution primarily from its tributary rivers (Berhe et al., 2013; Asfaw et al., 2014; Getachew et al., 2020). The river flows 1250 km (Englmaier et al., 2020) through KHD to the northeast along the Great Rift Valley where it finally drains into Lake Abe (Likasa, 2013) at the border of Ethiopia and Djibouti (Berhe et al., 2013; Tesfaye & Wolff, 2014). The upper Awash River basin (UARB), located between 8o and 9oN and 38oand 39oE (Reys, 2016) (Fig. 1), covers an area of 11,600 km2 including Addis Ababa city, the capital of Ethiopia (Bussi et al., 2021). The KHD, constructed in 1960 with an installed capacity of 42 MW (Degefu et al., 2015) and a dam height of 42 meters (Reis et al., 2011), is located in the UARB (Fig. 1).
Figure 1
There are more than 1000 meters of elevation drop from the source of Awash River at US1 (2386 m.a.s.l) to the downstream of KHD at DS5 (1345 m.a.s.l). Several tributary rivers and streams join the Awash River at different sections from the left and right sides (Fig. 1). Before sampling, 15 sites were selected with an average distance of 12 kilometers between two successive sampling points. Three study reaches were defined, upstream (US), midstream (MS), and downstream (DS) of KHD. Five sites in the US reach, selected near the source of the Awash River, are minimally impaired. However, five sites selected in the MS reach above the KHD, and five sites selected in the DS reach were highly degraded from anthropogenic activities (Degefu et al., 2013; Asfaw et al., 2014; Getachew et al., 2021a).
Aquatic habitat selection
Both macroinvertebrates and physicochemical samples were collected from riffle/run habitats. Habitats were defined using velocity to depth ratio validation and visual identification after Queensland_Government (2019). The habitat was categorized as riffle when the velocity to depth ratio is > 0.032/s, as a run for velocities between 0.0124/s and 0.032/s.
Macroinvertebrate sampling and identification
Two-minute kick samples (Getachew et al., 2021b) were collected with a D-frame dip net with a mesh size of 0.5mm (Barbour et al., 1999; Hauer & Resh, 2017) from riffle/run habitats. Immediately after the collection of samples, large leaves, sticks, and stones were individually rinsed and inspected for organisms and discarded. Samples were then preserved in 96% ethanol alcohol after Barbour et al. (1999) before being sorted and identified to the family level. By considering similar climatic conditions relative to other non-tropical regions, the South African benthic macroinvertebrate keys (Harrison, 2009; Schael, 2009) were used for taxa identification at the family level.
Physicochemical and benthic substrates data collection
The river width (m), depth (m), velocity (m/s), and temperature (oC) were measured at each site. To determine the river depth at each site, an average of three measurements was taken. The surface river velocity was computed by dividing the 10m distance, measured at each site for sampling, by the average time the floating object took to travel that distance and multiplying the quotient by the correction factor (0.85) to estimate the average water velocity of the stream (Ngoma & Wang, 2018). The GPS reading such as altitude (m.a.s.l) and the location of sites were recorded in the field using a global positioning system (GERMIN-72H). Substrate types such as bedrock, boulder, cobble, gravel, fine gravel, and sand and fines were visually characterized by changes in habitat features at each sampling site. These substrate types have been converted to a single substrate index (SI) values after Jowett and Richardson (1990) using the formula: SI = 0.08* bedrock + 0.07* boulder + 0.06* cobble + 0.05 *gravel + 0.04*fine gravel + 0.03* sand and fines.
The DO, pH, temperature, and EC were measured in-situ using a digital handheld portable multi-parameter (HACH HQ40D probe). The Measurements were done starting from downstream to the upstream to minimize disturbance of the sites before sampling. Similarly, turbidity was measured using a handheld turbidity meter. Nitrate-nitrogen and phosphate were measured using DR3900, HACH spectrophotometer in the laboratory. The BOD5 was also measured using the Winkler method (Carpenter, 2006).
Metric selection
Taxon richness (Mwedzi et al., 2020), Simpson index (Simpson, 1949), Shannon index, evenness, Margalef’s index (Washington, 1984; Clarke et al., 2001), the Biological Monitoring Working Party (BMWP) score (Armitage et al., 1983), and ETHbios score developed for Ethiopia (Lakew & Moog, 2015), %EPT, and EPT richness calculated from orders of Ephemeroptera, Plecoptera, and Trichoptera were some of the selected metrics. Also, the functional feeding groups (FFG) such as %Scrapers, %Shredders, %Collectors, %Filterers, %Predators (Wang et al., 2013), were considered to evaluate differences in the macroinvertebrate communities between the three study reaches in the UARB.
Data analysis
We used the relative abundances of the macroinvertebrate taxa to carry out the multivariate tests to control for the sampling times and habitat differences during sampling. Several multivariate analyses techniques were used to assess variations in macroinvertebrate community structure and composition between the three study reaches.
Analysis of Similarity (ANOSIM)
The ANOSIM (Clarke & Green, 1988), which allows a test of the null hypothesis that there are no macroinvertebrate community structure differences among groups of samples collected from three reaches (Clarke & Gorley, 2006), was carried out in PRIMER 6, using log(x + 1) transformed relative abundance data with 999 permutations. The differences among the reaches were measured by the global R statistic that was calculated as \(\text{R}=\frac{4({\stackrel{-}{\text{r}}}_{\text{B}}-{\stackrel{-}{\text{r}}}_{\text{W}})}{\text{n}(\text{n}-1)}\), where \({\stackrel{-}{r}}_{B}\) and \({\stackrel{-}{r}}_{W}\) are the mean rank between-group similarity and within-group similarity, respectively, and n is the total number of samples (Cao et al., 2005). ANOSIM calculates a test statistic R that ranges from − 1 to 1 (Chapman & Underwood, 1999). When the R is closer to 1, there is a good separation among the groups and a value closer to 0 shows weak separation (Chapman & Underwood, 1999; Clarke & Gorley, 2006) and the negative values indicate that the samples within a group are less similar to one another than to the samples of other groups, probably due to inappropriate sampling designs (Chapman & Underwood, 1999). An associated P-values to R-statistics in the analysis of ANOSIM highlights the significance level of the test (Clarke & Gorley, 2006).
Non-metric multidimensional scaling (MDS)
The MDS plot in PRIMER 6 (Clarke & Gorley, 2006) was also used based on the relative abundance of the macroinvertebrate taxa to assess the (dis)similarity of macroinvertebrate communities in the three study reaches. Also, the data were log(x + 1) transformed to improve the normality of the data before ordination. The taxa resemblance matrix was then analyzed using Bray-Curtis similarity with 999 permutations as a function of sampling sites. As a rule of thumb, an MDS ordination with a stress value equal to or below 0.1 is considered fair while values equal to or below 0.05 indicate a good fit (Clarke, 1993). Also, we used the SIMPER routine in PRIMER (Clarke & Gorley, 2006) on the relative abundance data to determine the percentage (dis)similarity in macroinvertebrate communities between the three reaches.
Shannon diversity index
The Shannon diversity index was also analyzed using the natural logarithm in PRIMER to compare the diversity of macroinvertebrates in each reach. The water quality of the river in the upstream, midstream, and downstream reaches of KHD is determined from information provided in Fernando et al. (1998) and Baliton et al. (2020) as very high, high, moderate, poor, and very poor with a corresponding Shannon diversity index of ≥ 3.5, 3.00-3.49, 2.50–2.99, 2.00-2.49, and ≤ 1.99, respectively.
Kruskal-Wallis test
The Kruskal-Wallis test was carried out to assess differences in continuous dependent variables by a categorical independent variable with assumptions that the samples drawn from the population are random and the observations are independent of each other. This test is used when the assumptions of one-way ANOVA are not met. The Kruskal-Wallis test was used to examine for significant differences in a range of metrics among the three study reaches. The metrics analyzed using the Kruskal-Wallis test include taxa richness, total abundance, the BMWP (Armitage et al., 1983), ETHbios (Aschalew & Moog, 2015), Shannon-Wiener index, evenness, Margalef’s index, Simpson index, %Scrappers, %Shredders, %Collectors, %Filterer and %Predators, %EPT, and EPT richness. A post hoc test, for the statistically significant metrics results in a Kruskal-Wallis test, was then run to determine where the differences lay using a Bonferroni-corrected p-value to reduce the instance of false positives. The Bonferroni corrected P-values were analyzed in SPSS version 20 statistical software.
Canonical Correspondence Analysis (CCA)
Detrended Correspondence Analysis (DCA) was applied using CANOCO 4.5 (Ter Braak & Smilauer, 2002) to examine whether Redundancy Analysis (RDA) or Canonical Correspondence Analysis (CCA) would be appropriate (Ter Braak & Wiertz, 1994) to analyze the data. The DCA yielded gradient lengths that were higher than three standard deviations, therefore CCA was used for the data analysis. Macroinvertebrate abundance data were log-transformed, log(x + 1), before analysis to obtain homogeneity of variance. Environmental variables, except for pH, were square-root transformed in PRIMER statistical software (Clarke & Gorley, 2006) and standardized since the variables were measured in different units and transferred to CANOCO 4.5 model for analysis. In the CCA model, forward selection of environmental variables with Monte Carlo permutations of 499 was used to determine whether the variables exerted a significant P-value of < 0.05 impact on macroinvertebrate distributions. The statistical significance of eigenvalues and species-environment correlations generated by the CCA were tested using Monte Carlo permutations.
BIOENV routine
The BIOENV routine in PRIMER, the typical setup in the exploration of environmental variables that best correlate to sample similarities of the biological community (Clarke & Gorley, 2006), was also used in this study. In this case, the similarity matrix of the community is fixed, while subsets of the environmental variables are used in the calculation of the environmental similarity matrix. A Spearman correlation coefficient was then calculated between the two matrices and the best subset of environmental variables was identified with a 999 permutation test. The similarity matrix of environmental data was based on normalized "Euclidean" distances to remove the effect of different units between parameters and the biological variable matrix was based on a Bray-Curtis similarity measure on the log (x + 1) transformed data (Dixon, 2003; Clarke & Gorley, 2006).