Soil Science Society of America Journal 63:1829-1835 (1999)
© 1999 Soil Science Society of America
DIVISION S-6-SOIL & WATER MANAGEMENT & CONSERVATION
Variability in Soil Erosion Data from Replicated Plots
Mark A. Nearinga,
Gerard Goversb and
L.Darrell Nortona
a National Soil Erosion Lab., USDA-ARS, West Lafayette, IN 47907-1196 USA
b Lab. for Experimental Geomorphology, Physical and Regional Geography, Katholieke Universiteit Leuven, Redingenstraat 16, 3000 Leuven, Belgium
nearing{at}ecn.purdue.edu
 |
ABSTRACT
|
|---|
Understanding and quantifying the large, unexplained variability in soil erosion data are critical for advancing erosion science, evaluating soil erosion models, and designing erosion experiments. We hypothesized that it is possible to quantify variability between replicated soil erosion field plots under natural rainfall, and thus determine the principal factor or factors which correlate to the magnitude of the variability. Data from replicated plot pairs for 2061 storms, 797 annual erosion measurements, and 53 multi-year erosion totals were used. Thirteen different soil types and site locations were represented in the data. The relative differences between replicated plot pair data tended to be lesser for greater magnitudes of measured soil loss, thus indicating that soil loss magnitude was a principal factor for explaining variance in the soil loss data. Using this assumption, we estimated the coefficient of variation of within-treatment, plot replicate values of measured soil loss. Variances between replicates decreased as a power function
of measured soil loss, and were independent of whether the measurements were event-, annual-, or multi-year values. Coefficients of variation ranged on the order of 14% for a measured soil loss of 20 kg/m2 to greater than 150% for a measured soil loss of less than 0.01 kg/m2 These results have important implications for both experimental design and for using erosion data to evaluate prediction capability for erosion models.
 |
INTRODUCTION
|
|---|
DATA FROM SOIL EROSION PLOTS contain a great amount of unexplained variability, which is a critical consideration in using erosion data to evaluate the performance of soil erosion models and for experimental design. This variability is due both to natural and measurement variability. When comparing measured rates of erosion to predicted values, it is to be expected that a portion of any difference between the two will be due to model error, but that a portion will also be due to unexplained variance of the measured sample value from the representative, mean value for a particular treatment. Variability is also the essential criterion for estimating the number of experimental replicates necessary to establish confidence intervals on mean values of treatments and for comparison between experimental treatments. The quantification of variance in soil erosion data is critical to the advancement of erosion science.
Unfortunately, however, knowledge of variability in soil erosion data is quite limited. Only one erosion study has been conducted with a sufficient number of replicated erosion plots to allow an in-depth analysis of variability. Wendt et al. (1986) measured soil erosion rates on 40 cultivated, fallow, experimental plots located at Kingdom City, MO, in 1981. All of the 40 plots were cultivated and in other ways treated identically. The coefficients of variation for the 25 storms ranged from 18 to 91%, with 15 of the storms falling in the range of less than 30%. The more erosive storms tended to show the lesser degree of variability. Of the 15 storms with mean erosion rates of greater than 0.1 kg/m2 (1.0 Mg/ha), 13 showed coefficients of variation of less than 30%. The results of the study indicated that "only minor amounts of observed variability could be attributed to any of several measured plot properties, and plot differences expressed by the 25 events did not persist in prior or subsequent runoff and soil loss observations at the site." While the study of Wendt et al. (1986) is informative, it is also limited in scope to a single treatment for a single year at a single site.
Ruttimann et al. (1995) reported a statistical analysis of data from four sites, each with five to six reported treatments. Each treatment had three replications. Reported coefficients of variation of soil loss ranged from 3.4 to 173.2%, with an average of 71%. The authors concluded by suggesting "as many replications as possible" for erosion experiments.
A part of the process of evaluating the performance of soil erosion models involves comparing model predictions to data from measured plots or small watersheds. Unless one has some knowledge of the level of variability in erosion data, it is difficult to delineate that portion of the observed error coming from the model prediction from that part of the error which is resulting from unexplained variability in the measured value itself. As such, it is difficult to define quantitative criteria for model acceptability when comparing model results to measured data. Risse et al. (1993) applied the Universal Soil Loss Equation (USLE) to 1700 plotyears of data from 208 natural runoff plots. Annual values of measured soil loss averaged 3.51 kg/m2 with an average magnitude of prediction error of 2.13 kg/m2, or approximately 60% of the mean. Zhang et al. (1996) applied the Water Erosion Prediction Project (WEPP) computer simulation model to 290 annual values and obtained an average of 2.18 kg/m2 for the measured soil loss, with an average magnitude of prediction error of 1.34 kg/m2, or approximately 61% of the mean. In both cases, the relative errors tended to be greater for the lower soil loss values. Given these results and others from similar types of studies (Liu et al., 1997; Rapp, 1994; Govers, 1991), the question remains: are the predictions good enough relative to measured data? What is an acceptable and expected level of model prediction error?
As mentioned above, comprehensive data with a large number of replications for evaluating unexplained variance in measured soil erosion data is lacking. However, from a large collection of values of differences between replicated plots it is possible to estimate the population variance of real-world, replicated, erosion plots. The objective of this study was to quantify variability between replicated soil erosion field plots under natural rainfall. The procedure was to evaluate the differences between soil loss values from a large number of replicated plot pairs, and to use that information to estimate population variances for soil erosion plots in general. Also, we make recommendations and discuss implications of the results for purposes of experimental design.
 |
Methods and materials
|
|---|
Soil Erosion Plot Data
The soil erosion plot data used for this study was taken from the repository of the USDA-ARS National Soil Erosion Research Laboratory located at West Lafayette, IN. Event values of soil loss were from seven sites in the USA (Table 1)
, and there were a total of 2061 replicated storm events in the data set. Annual values of soil loss were used from 13 sites (Table 2)
, with a total of 797 replicated pairs of plots. Multi-year totals of soil loss were taken as the sums of the annual values. The plots ranged from 2 to 8 m in width, and most were 22 m in length. Slopes ranged from 3 to 16% in gradient. A more detailed description of the plots used in this study may be found in Risse et al. (1993).
View this table:
[in this window]
[in a new window]
|
Table 1 Site, cropping and management, and data collection period for the replicated plot data for individual events
|
|
View this table:
[in this window]
[in a new window]
|
Table 2 Site, cropping and management, and data collection period for the replicated plot data for annual soil loss values
|
|
Relative Differences in Replicated Erosion Plot Data
The first part of this research, as stated above, was intended to quantify the differences between soil loss values from a large number of replicated plots. Because of the great range of the observed measured values of soil loss used in this study, we chose to use a relative difference term, Rdiff (non-dimensional), that we define as
 | (1) |
where M1 and M2 are the paired values of soil loss from two replicate plots. The properties of Rdiff are such that its value may range from -1 to +1, and when
, then
. For each pair of plots, A and B, two values of Rdiff were computed. For the first Rdiff value, the measured soil loss from Plot A is designated as M1 and that from Plot B is designated as M2. For the second Rdiff value, the measured soil loss from Plot B is designated as M1 and that from Plot A is designated as M2.
Values of Rdiff are plotted against M1 in Fig. 1, 2, and 3
for event, annual, and multi-year totals, respectively. Note that although the Rdiff values for the pairs of plots are necessarily the same in absolute value, with one positive and the other negative, the graphs are not, and cannot be, symmetrical. This is because we plot the Rdiff values against the value of M1, which differs for each value of the pair. Also note that while there exist several values of
, which indicates a value of M2 equal to zero, there are no points plotted for
. In the latter case, M1 is equal to zero (see Eq. [1]), and since the graphs are logarithmic on the x-axis, no values of
can be plotted.

View larger version (44K):
[in this window]
[in a new window]
|
Fig. 1 Relative differences in event data of soil loss between replicated plots, Rdiff, as computed by Eq. [1] vs. the measured soil loss value, M1 (kg/m2), for the data from Table 1
|
|

View larger version (38K):
[in this window]
[in a new window]
|
Fig. 2 Relative differences in annual data of soil loss between replicated plots, Rdiff, as computed by Eq. [2] vs. the measured soil loss value, M1 (kg/m2), for the data from Table 2
|
|

View larger version (19K):
[in this window]
[in a new window]
|
Fig. 3 Relative differences in multi-year data of soil loss between replicated plots, Rdiff, as computed by Eq. [2] vs. the measured soil loss value, M1 (kg/m2), for the data from Table 2
|
|
The most apparent trend observed in Fig. 1, 2, and 3 is that the spread of the distribution of the Rdiff values about the Rdiff = 0 line decreases with increasing measured soil loss, M1. This means that for large values of measured soil losses, one can expect that there will be a tendency for the differences between the measured values from replicated plots to be less. It follows logically, as we show mathematically in the next section, that the coefficient of variation in measured soil loss for replicates decreases with increasing soil loss values as well.
Within-Treatment Variance of the Replicated Plots
Our goal was to compute the within-treatment variance of soil loss from pairs of replicated data. The essential problem is how to estimate the variance of the measured values from the variance of differences in replicated plots when each of the two plots in the pair are from the same population but the pairs come from different populations. To address the problem, we are required to assume that the variances of the replicate populations are a function only of the magnitude of soil loss. This is not a comprehensive assumption because variance in replicates undoubtedly is a function of other factors. However, no other information on the controls of variation is available. Also, results from this study suggest that the magnitude of soil loss does capture a significant degree of the variation in population variance (Fig. 1, 2, and 3). Thus, we do not consider this assumption to be restrictive for our purposes. The second assumption of the analyses is that the two replicate plots are members of the same population, which is an implicit assumption in replicated experiments. Certainly in practice, there are differences between replicates in any real-world case. On the other hand, these differences are the same that the experimentalist must face in conducting fieldwork, and similarly for the modeler in using the data to evaluate a model. Also, the study of Wendt et al. (1986) suggested that even when extensive plot properties are known, they are not necessarily of significant value in explaining the observed variance between the plots. Thus, we do not consider this second assumption to be restrictive in terms of real-world application.
Suppose we take a random pair of values, M1 and M2, from a distribution of measured soil loss values (Fig. 4)
. In this case, we can assume that the variance
2M1, and
2M2 are equivalent and that the co-variance
M1,M2 is zero. Since
 | (2) |

View larger version (14K):
[in this window]
[in a new window]
|
Fig. 4 Theoretical, schematic diagram of a probability density function for a population of soil losses, with points indicated for illustrative purposes of two soil loss values M1 and M2
|
|
(Walpole and Myers, 1993) we can estimate the variance of the population,
2M, from
2
as
 | (3) |
over small ranges of M within which
2M changes minimally.
To estimate the variances, the data for the replicates were ranked in increasing order of M1 and the coefficient of variation, CVM, was estimated by Eq. [3] sequentially for groups of 90, 30, and 25 replicated pairs for the event, annual, and multi-year data, respectively. The number of data points used for each computation was a compromise between the desire to have as many numbers as possible in each computation in order to achieve the best estimate of CV from each group, and the desire to maintain the least amount of spread in the M1 values as possible for each group. Larger groups could be used for the data sets with greater numbers of data points in the series. The choice of the actual size of the groups for each data set was ultimately a subjective decision.
The procedure for the computation of CVM was as follows. After the data pairs were ranked in increasing order of M1, the value of M2 - M1 for each pair was computed. The sample variance,
2
, of values 1 through 90 (in the case of the event data) of M2 - M1 was computed, as well as the average of M1 for that same group. The value of
2M was then computed from
2
by Eq. [3]. The computation
 | (4) |
is straightforward. The second value of CVM is then computed for the group of pairs 2 through 91, and so forth. In the graph (Fig. 5)
, CVM was plotted against
1 for the event, annual, and multi-year data.

View larger version (29K):
[in this window]
[in a new window]
|
Fig. 5 Estimated coefficients of variation, CVM (fraction), as computed by Eq. [3] and [4] from sequences of soil loss differences from replicated plots as a function of the measured soil loss for event, annual, and multi-year data
|
|
The analysis produced two important results: (i) the logarithm of CVM was linearly related to the logarithm of M1 for all three sets of data (Fig. 5), and (ii) the log-linear (power) relationship was not statistically
different (in terms of both slopes and intercepts) for event, annual, and multi-year data. Thus, we were able to combine and sequentially order the data from all three data sets, estimate CVM as before using Eq. [3] and [4] with groups of 90 sequentially paired plot data, and graph CVM against
1. The resultant relationship was:
 | (5) |
 | (6) |
where CVM is expressed as a fraction and M is in units of (kg/m2) (Fig. 6)
.

View larger version (24K):
[in this window]
[in a new window]
|
Fig. 6 Estimated coefficients of variation, CVM (fraction), as computed by Eq. [3] and [4] from sequences of soil loss differences from replicated plots as a function of the measured soil loss for the data combined
|
|
The high coefficient of determination of the log-linear regression (Eq. [5]) lends support to the assumption made previously that the variance between replicates is largely a function of the magnitude of the soil loss.
 |
Replicate Variability and Numbers of Replicates for Experiments
|
|---|
Recommended numbers of replications for experimental design purposes can be estimated on the basis of the confidence intervals about the mean (Walpole and Myers, 1993; Ott, 1977). The estimation equation for nR for purposes of determining the mean value of soil loss, M, can be written as
 | (7) |
where z
/2 is distance from each direction of the mean on the standard normal curve for which the area under the curve has an area of (1 -
), and
is the desired degree of accuracy expressed as a fraction of the mean. For example, if the coefficient of variation is expected to be 0.50, and we want to design an experiment for which we can be 90% confident that the population mean of soil loss is within plus or minus 40% of the measured mean, then
. Application of Eq. [7] results in a value of nR of 4.2. Thus we would want to design an experiment with at least five replications for this case, or, alternatively, lower expectations regarding accuracy of the measured mean. One might, for example, choose to use three plots and expect that the population mean will be within plus or minus 47% of the measured value.
It is important to note here that Eq. [7] is to be used for estimating mean experimental values for a treatment. If, on the other hand, one wishes to compare means between treatments, the number of replicates for each treatment must be essentially two times that expressed by Eq. [7]. For our example above, if we wanted to be able to differentiate between two treatments for which the means were 40% different, we would want
replications of each treatment. Neither of these examples take into consideration testing for type II errors, for which case the numbers of replications might be greatly increased (Walpole and Myers, 1993; Ott, 1977).
Equations [6] and [7] may be used together to estimate recommended numbers of replicates for erosion studies. However, a problem in application immediately arises because Eq. [6] requires prior knowledge of the value, which we are trying to measure, which is soil loss, M. The number of replicates will vary greatly depending on the measured soil loss, because the coefficient of variation varies greatly with the magnitude of soil loss. The problem is unavoidable. A second unavoidable problem in application comes from the fact that Eq. [6] represents a mean response between measured values and variance. Individual data sets may have a great deal more (or less) variance than estimated (Fig. 6).
 |
Discussion
|
|---|
Given the limitations in the information presented here on using our variability estimates for experimental design purposes, use of Eq. [6] and [7] can only be considered a guide rather than an absolute. Nonetheless, the results do provide basic information which can be helpful. For example, we now have quantitative evidence that more plots are necessary to obtain the same level of confidence in erosion data when erosion rates are low. For a certain number of available plots, the experimentalist can estimate the level of differences between treatments that one might expect to differentiate statistically.
One limitation of the analyses which were conducted here is related to the group sizes for estimating the variance of (M1 - M2). A different choice of group sizes would affect the level of fit between coefficient of variation and soil loss magnitude (Fig. 6). Small group sizes would result in a lower r2 value. Thus the reported level of fit for Fig. 5 and 6 should not be taken as an absolute, but only as an estimate for the specific method used here. Also, there are some problems with the regression since observations are not independent (the first observation point for the event data is based on data pairs 190, the second on 291, and so on). Also, the first and last points are used only once, while points in the middle of the range are used 90 times. The extreme pair values are not given the same weight in the regression as are the mid-points. While these limitations may compromise the accuracy (r2) of the relationships here derived between the coefficient of variation and soil loss magnitude, there is no doubt about the basic results: i.e., the magnitudes and trend of the coefficients of variation as shown in Fig. 6 are essentially correct for the data used in this study.
In the case where erosion rates are expected to be very low, the estimated variance will show that a precise measure of soil loss is not practical. When, for example, the coefficient of variation is of the order of 100% (Fig. 6), one theoretically would be required to have 68 plot replicates in order to be 90% confident that the sample mean falls within 20% of the population mean. In that case, one might choose to use five replicates to be 90% confident that the sample mean falls within 74% of the population mean. What this implies in a practical sense is that small differences in field conditions may not be measurable, and that one might limit experiments to those which evaluate major differences. One can also make the argument from the practical perspective that there may be little interest in differentiating small differences in treatments when erosion rates are low in any case.
The experimentalist also now knows that time may be used to his/her benefit when differentiation between treatments is desired. Since variance between replicates decreases with measured amount of soil loss regardless (apparently) of the time over which the data is collected, one might choose to measure soil loss for a longer period of time to reduce variance between plots. Also, it may be that not only does variance decrease as measured soil loss increases, it appears from Fig. 6 that the variation in the variance also decreases with increasing measured soil loss (although we did not attempt here to quantify this observation). If so, this result would indicate that one might be more assured that the relationship expressed by Eq. [6] will be reliable at greater soil losses.
These results also have significant implications for evaluating erosion models. Very often scientists attempt to test a model by comparing model predictions against data from erosion plots or similar data (Zhang et al., 1996; Risse et al., 1993; Rapp, 1994; Liu et al., 1997). The question in such cases inevitably arises as to whether the predictions are or are not satisfactory relative to the data. This issue has not been adequately addressed, largely because we rarely have information on the nature of erosion variability. Perhaps it is best illustrated by example. Let us suppose that the model evaluator is working with a measured value of soil loss of 1.37 kg/m2, and a model predicted value of 1.10 kg/m2. Suppose further that the mean value of the population of replicates is 1.30 kg/m2, though one cannot know this from the information given from a single plot. In this example, the prediction error is
(negative is under-predicted). In fact, of the -0.27 kg/m2, -0.2 kg/m2 is due to the fact that the model missed the mean of the population by -0.20 kg/m2, and -0.07 kg/m2 is due to the fact that the measured mean varied from the population mean by
. In essence, the issue for the model evaluator is to partition the prediction error into the portion associated with its two components, the part associated with the difference between the prediction and the population mean, and the part associated with the difference between the individual sample value and the population mean. While it is not the intent of this study to outline a comprehensive solution to this problem, it is readily clear that the type of quantification conducted in this study of population variance is essential to its resolution.
This study provides estimates of variability for within-treatment soil loss data which have not been available heretofore. This study quantifies explicitly for the first time that the coefficient of variation in soil erosion data tends to be much greater when measured soil loss values are relatively small. Though it yet remains a subjective task to design soil loss plot studies to meet desired levels of measurement confidence, the information provided here gives important guidance for the experimental designer. Also, for the first time we have information which may help us to incorporate measurement variability into our analyses of the capability for soil erosion models to predict measured erosion data. This is an important subject which warrants further investigation.
Received for publication August 5, 1998.
 |
REFERENCES
|
|---|
- Govers G. Rill erosion on arable land in central Belgium: rates, controls, and predictability. Catena 1991;18:133-155.
- Liu B.Y., Nearing M.A., Baffaut C., Ascough J.C., II. The WEPP watershed model: III. Comparisons to measured data from small watersheds. Trans. ASAE 1997;40:945-951.
- Ott L. An introduction to statistical methods and data analysis. North Scituate, MA: Duxbury Press, 1977.
- Rapp J.F. Error assessment of the Revised Universal Soil Loss Equation using natural runoff plot data. Tucson, AZ: School of Renewable Natural Resources, Univ. of Arizona, 1994 M.S. Thesis..
- Risse L.M., Nearing M.A., Nicks A.D., Laflen J.M. Assessment of error in the universal soil loss equation. Soil Sci. Soc. Am. J. 1993;57:825-833.[Abstract/Free Full Text]
- Ruttimann M., Schaub D., Prasuhn V., Ruegg W. Measurement of runoff and soil erosion on regularly cultivated fields in Switzerlandsome critical considerations. Catena 1995;25:127-139.
- Walpole R.E., Myers R.H. Probability and statistics for engineers and scientists, 5th ed Englewood Cliffs, NJ: Prentice Hall, 1993.
- Wendt R.C., Alberts E.E., Hjelmfelt A.T., Jr. Variability of runoff and soil loss from fallow experimental plots. Soil Sci. Soc. Am. J. 1986;50:730-736.[Abstract/Free Full Text]
- Zhang X.C., Nearing M.A., Risse L.M., McGregor K.C. Evaluation of runoff and soil loss predictions using natural runoff plot data. Trans. ASAE 1996;39:855-863.