|
|
||||||||
Dep. of Crop and Soil Sciences, Michigan State Univ., East Lansing, MI 48824-1325
* Corresponding author (kravche1{at}msu.edu).
| ABSTRACT |
|---|
|
|
|---|
Abbreviations: CV, coefficient of variation IDW, inverse distance weighting KK, ordinary point kriging with true spatial structure known KU, ordinary point kriging with spatial structure determined based on a sample variogram N/S ratio, nugget to sill ratio OM, organic matter content
| INTRODUCTION |
|---|
|
|
|---|
A substantial amount of research has been conducted regarding the appropriate number of samples needed to characterize a central tendency of a soil property with a specified degree of accuracy (McBratney and Webster, 1983; Webster and Oliver, 1990). However, the number of samples needed to obtain an accurate map has attracted much less attention. Typically, the larger the number of samples, the more accurate the map of the soil property (Wollenhaupt et al., 1994; Mueller et al., 2001). However, the cost of sample collection and analysis can quickly exceed any potential benefits from applying site-specific management. Hence, when choosing the optimal number of soil samples for mapping soil properties, the number of samples needs to be balanced with sampling costs. Although previous research suggests that soil sampling on 60-m grids (Hammond, 1992) or even 30-m grids (Franzen and Peck, 1993) might be needed for developing soil property maps of acceptable accuracy, most commercial soil sampling is conducted on a 1-ha grid basis.
If a soil property can be accurately mapped based on a reasonable number of collected samples then the map can be of substantial value for site-specific management. However, if the number of samples required to produce an accurate map of soil property is prohibitively large, a producer would be better off using uniform management based on a mean soil property value for the field. Insufficiently intensive sampling is a waste of time and money since it does not provide the level of accuracy needed for successful site-specific management.
Although the importance of spatial structure for accurate mapping is generally recognized (Leenaers et al., 1990; Flatman and Yfantis, 1996; Sadler et al., 1998), no quantitative information exists regarding the level of mapping accuracy that can be achieved with a certain number of soil samples for soil properties of certain spatial structures. Spatial structure in data distribution is described using a geostatistical characteristic called a variogram. The variogram parameters for characterizing spatial structure include the N/S ratio and the spatial correlation range. The N/S ratio defines the proportion of short-range variability that cannot be described by a geostatistical model in the studied field. The spatial correlation range defines the distance over which soil property values are correlated with each other. Small N/S ratios and large spatial correlation ranges usually indicate that higher accuracy can be achieved in mapping the variable (Isaaks and Srivastava, 1989).
Another factor that affects mapping accuracy is the interpolation procedure used to convert discrete sample data into a continuous map. The two methods most commonly used in agricultural practice are IDW and kriging (Franzen and Peck, 1995; Weisz et al., 1995). A number of studies have compared the performance of these methods in agricultural settings (Laslett et al., 1987; Warrick et al., 1988; Wollenhaupt et al., 1994; Gotway et al., 1996; Kravchenko and Bullock, 1999; Mueller et al., 2001). However, the results are rather controversial with some authors favoring IDW and others kriging.
Most of the studies used either cross-validation or jack-knifing (independent test data sets) for comparing the accuracy of the interpolation procedures. Conclusions obtained based on a single soil data set used in cross-validation or a single randomly selected test data set used in jack-knifing may not be reliable. The fact that the comparisons between the interpolation procedures in most of the studies were made based on a single sample realization is partially to blame for the controversy surrounding performance of IDW and kriging in agricultural applications. Recent studies that attempted to alleviate this problem by using either multiple test data sets or grids of the same size but with different starting points (Chang et al., 1999; Mueller et al., 2001) reported substantial variation in accuracy obtained for different test data sets. Another reason for discrepancies is that interpolation method parameters (such as the optimal power value for IDW or the optimal number of nearest neighbors for both IDW and kriging) needed for optimum performance of the interpolation procedure vary depending on soil data variability and spatial structure. For example, coefficient of variation, skewness, and kurtosis are reported as the properties of the data sets that may affect optimum power value for IDW (Weber and Englund, 1994; Gotway et al.,1996; Kravchenko and Bullock, 1999, Mueller et al., 2001).
From a theoretical standpoint, kriging is the optimal interpolation procedure (Isaaks and Srivastava, 1989). However, its correct application requires an accurate determination of the spatial structure via variogram construction and model fitting. At least 50 to 100 samples might be required to obtain a reliable variogram that correctly describes spatial structure (Webster and Oliver, 1992). Even when a sufficient number of samples is available, sample variogram calculation and variogram model fitting can be tedious and time-consuming. Although IDW does not have the statistical advantages of kriging, it is less laborious and there are no restrictions on the number of samples needed to perform interpolation.
The objectives of the study are (i) to evaluate the effect of sampling density on mapping accuracy of soil properties with diverse spatial structures and diverse variability, and (ii) to compare performance of IDW and ordinary kriging for interpolating soil properties with diverse spatial structures and diverse variability. Although the effect of sampling scheme configurations can significantly affect map accuracy, only point sampling on a regular grid is considered in the study.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Three soil properties with different variabilities were used in the study. Soil OM represented a soil property with low variability. Coefficient of variation for the OM data in the study was equal to 12.0%. Soil K content represented a soil property with medium variability (CV of 40%). Soil P content represented a soil property with high variability (CV of 65%). Level of data variability is of importance in site-specific management, since soil properties with high variability are potentially better candidates to be managed on a site specific basis than the more uniformly distributed soil properties (Schmidt et al., 2002). On the other hand, mapping soil properties with higher variability can be less accurate than that of soil properties with lower variability (Isaaks and Srivastava, 1989).
Based on the original 256 soil samples for each of the three soil properties, the data sets were simulated with three different spatial structures using a simulated annealing procedure (Deutsch and Journel, 1998). The simulated annealing algorithm produces a new data set with desired statistical or geostatistical characteristics based on the original data. The simulated annealing procedure begins with creating an initial data set by assigning a random value at each of the grid nodes of the simulated data set. The random values are drawn from the population distribution of the soil property constructed based on the original measured soil sample data of the 256 data points. Then, the variogram of the initial simulated data set is calculated and compared with the variogram of the desired spatial structure. After that, the initial data set is perturbed by drawing a new value for a randomly selected grid node, the variogram for the perturbed data set is again calculated and compared with the desired variogram. If the perturbed value leads to a closer correspondence between the observed and desired variograms, it is retained, otherwise a new random value is drawn and the calculations and comparisons are repeated. The process continues until a variogram of the perturbed data set closely matches the desired spatial structure. The detailed description of the objective functions and convergence criteria of the simulated annealing procedure is provided by Deutsch and Journel (1998).
In this study, each simulated data set consisted of 2209 data points located on a 47 by 47 grid with 9.7 m between grid points. The simulated data sets were assumed to represent an exhaustively sampled field.
For each studied soil property three spatial structure scenarios representing weak, medium, and strong spatial structure were considered. The N/S ratio was used as a characteristic of the strength in spatial structure of the data. The N/S ratio of 0.6 corresponded to a weak spatial structure, that is, 60% of the data variability consisted of unexplainable, short distance, random variation. Medium and strong spatial structures had N/S ratios of 0.3 and 0.1, respectively. The selected N/S ratio values were consistent with those reported in the literature for various soil properties. The reported N/S ratios ranged from 0.01 to 1.0 (Cambardella and Karlen, 1999; Chang et al., 1999; Mueller et al., 2001), with the majority of reported variograms having N/S ratios of 0.1 to 0.6.
A spatial correlation range of 97 m was used in all the simulations. The range was selected as an average correlation range for the observed soil properties in the studied field. It was determined based on preliminary sample variograms calculated from the 256 original soil samples and was consistent with the correlation ranges for soil properties reported in the literature (Cambardella and Karlen, 1999; McBratney and Pringle, 1999; Kravchenko and Bullock, 1999; Mueller et al., 2001). An example of sample variograms for the exhaustive simulated P data sets with 0.1, 0.3, and 0.6 N/S ratios is shown in Fig. 1 .
|
|
|
|
|
The two criteria used to check and compare map accuracies were the mean square error (MSE) and goodness-of-prediction criterium G (Agterberg, 1984; Gotway et al., 1996). Mean square error was calculated as a sum of squared differences between the actual test data values and the map estimates. The G criterium was calculated as
![]() | [1] |
For both inverse distance and kriging interpolation methods, the value of variable Z at an unsampled location x0, Z*(x0), is estimated based on the data from the surrounding locations, Z(xi) as
![]() | [2] |
![]() | [3] |
Kriging calculates the wi values by taking into account the spatial structure of data distribution represented by a sample variogram (Isaaks and Srivastava, 1989, Goovaerts, 1997). Ordinary point kriging was used in the study. Similar to IDW, kriging performance is affected by the number of the closest samples used in estimation. For each test data set, the numbers of closest samples ranging from 4 to 20 were considered and the number that resulted in the lowest MSE value was retained.
The other factor affecting the accuracy of kriging is the quality of the variogram model. Two scenarios of fitting variogram models to sample variograms were considered. In the first scenario it was assumed that the true spatial structure of the data distribution, that is, the true variogram parameters, was known for grid data sets of all sizes. Sample variograms of the exhaustive data sets were fitted with variogram models. The model parameters were used further in creating interpolated soil property maps for all other grid sizes. For the remainder of the paper this scenario is called kriging with known spatial structure (KK). In this scenario, the accuracy of kriging was considered under optimal, although unrealistic conditions. In the second scenario, sample variograms based on the grid data were calculated and parameters were obtained by fitting the sample variograms with the variogram models using weighted least squares fitting (Geostatistical Analyst of ArcMap 8.1, ESRI, 2000). This scenario is hereafter called kriging with unknown spatial structure (KU).
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
|
The patterns in the relationships between G values and number of samples for the data sets with medium and low CVs were similar to those observed for the data sets with high CV (Tables 2 and 3). However, for data with N/S ratio of 0.1, G values at smaller grids (9-25 grid points) were higher in data sets with high/medium CV than in the data sets with low CV.
In summary, for data sets with strong spatial structure (N/S ratio of 0.1) and high CV, KK was more accurate than the field average at all sample sizes, that is, the mean G values were significantly greater than zero (P = 0.01) (Table 1). For data sets with strong spatial structure and medium CV, kriging was more accurate than the field average at all but the smallest grid (9 grid points) (Table 2). For data sets with strong spatial structure and low CV, kriging was more accurate than the field average at grids with 36 to 529 grid points and was not more accurate than the field average at grids with 9, 16, and 25 grid points (Table 3). For data sets with medium spatial structure (N/S ratio of 0.3) and all the CV values, kriging was always more accurate than the field average for grids with 49 to 529 grid points. It was occasionally more accurate than the field average at smaller grids (9, 16, 25, and 36 grid points) (Tables 1 3). For data sets with weak spatial structure (N/S ratio of 0.6) kriging was more accurate than the field average only at the most intensive grids (529 samples for high CV; 144, 225, and 529 for medium CV; 225 and 529 for low CV) (Tables 13).
The range of G values obtained from 100 test data sets was surprisingly large (Tables 13) supporting concerns that a single test data set is not sufficient to draw conclusions regarding performance of sampling schemes or interpolation methods. For example, G values for low CV with N/S ratio of 0.3 and 36 grid samples ranged from -3.9 to 23.5%, leaving a lot of room for inconsistency in conclusions that could be reached regarding performance of the mapping strategy if only results from a single test data set were available.
Effect of Inaccurate Description of Spatial Structure
Sample variograms for simulated data with low CV and N/S ratio of 0.1 are shown in Fig. 4
. The spatial structure was clearly seen in the variogram based on 529 grid points with a 20-m distance between the grid points. The variogram parameters obtained by fitting a spherical model to the data were very similar to those obtained for the exhaustive data set. The difference between the variogram parameters determined from samples and the true values increased as the number of grid data points decreased. However, spatial structure still was clearly seen on the variograms with 225 and 144 grid points with the distances between grid points of 30 and 40 m, respectively. Spatial structure was poorly represented in the sample variograms calculated based on 81, 64, and 49 points, with respective 50-, 60-, and 70-m distances between grid points. Although 81 data points is formally sufficient for calculating a reliable variogram, poor results at this grid size are most likely caused by the distance between the grid points being equal to half of the spatial correlation range of the studied soil property. Grid sampling schemes that can produce only a few points for the sample variogram at distances smaller than the correlation range are notorious for poor representation of the data spatial structure. The importance of not only the number of the grid points but also of the distance between grid points is supported by the results of Shi et al. (2000). They observed G values as high as 28% for P interpolation with just 30 grid samples. However, in their study the distance between the grid points (50 m) was less than 1/3 of the spatial correlation range for P.
|
Kriging with unknown spatial structure for high CV and all N/S ratios performed similarly to KK at the three densest grids (with 144, 225, and 529 grid points), and was substantially worse than KK at smaller grid sizes. For data sets with medium CV the performances of KU and KK were similar at the four densest grids (with 81, 144, 225, and 529 grid points). For low CV, both KU and KK performed equally well for all grids (from 49 to 529 grid points). This observation implies that when a true variogram is unknown, higher accuracy of kriging maps can be achieved for soil properties with lower variability than for those with high variability.
Performance of Optimal Inverse Distance Weighting
The values of the IDW exponents that produced the lowest MSE were recorded for each of 100 test data sets and the means are shown in Table 4. Contrary to the previous observations of Gotway et al. (1996), Kravchenko and Bullock (1999), and Mueller et al. (2001), no relationship was observed between the optimal exponent values and data CVs. However, N/S ratios were strongly related to the optimal exponent values. The highest exponent values were observed for data with a N/S ratio of 0.1. Slightly lower values were observed for data with a N/S ratio of 0.3. The lowest exponent value of 1 was observed for almost all grid sizes with a N/S ratio of 0.6 in data sets from all three CV groups. For the data sets with a small number of points (25 and less) an exponent of 1 also was the optimal regardless of the CV and N/S ratio.
|
There are several possibilities for obtaining variogram parameters when the data sets are too small or too widely spaced to produce a reliable variogram. First, for certain soil properties, such as soil texture, topographical, or landscape information can be helpful in deducing variogram parameters (van Groenigen et al., 1999). Second, average or proportional average variograms developed by McBratney and Pringle (1999) based on the published experimental variograms can provide approximate estimates of spatial structure for different soil properties. However, accuracy of this approach will be limited since spatial structure varies widely from field to field. For example, there is evidence that field management, e.g., organic versus conventional may be responsible for differences in spatial structure of soil properties in different fields (Cambardella and Karlen, 1999). Third, taking a certain number of additional samples located at shorter distances between grid points may provide the needed short distance information for determining variogram nugget and range values. Computer simulation procedures have been developed for determining optimum number and location of the additional short distance samples (van Groenigen et al., 1999). Further research is needed to determine applicability of these options for different soil properties in different fields.
There was no consistent difference between performance of IDW and KU, except that for soil properties with a N/S ratio of 0.1 and medium and low CV, KU performed somewhat better than IDW (Tables 13). Kriging with unknown spatial structure was not possible for grids with 49 grid points and fewer.
For KU, the variogram parameters were optimized by achieving the best fit between the sample variogram and the variogram model. The variogram parameters were not optimized based on the test data, while the value of the IDW optimal exponent parameter was obtained based on the prediction accuracy of the test data. Hence, the comparison between KU and IDW in this study was somewhat unfair for kriging. Kravchenko and Bullock (1999) found that kriging with variogram parameters optimized using cross-validation performed better than IDW in most of the studied fields. Selection of the variogram parameters based on either crossvalidation or jack-knifing criteria would improve performance of KU. However, to our knowledge there is no software that would do it automatically, and the manual fitting is often too labor and time-consuming to be generally recommended.
No matter how carefully spatial structure is determined, KU performance still is not expected to be better than that of KK. Although KK produced statistically significantly more accurate results than IDW in the majority of cases, the magnitude of the differences was relatively small. Therefore, if the spatial structure of the data is not known IDW can be expected to produce results almost as accurate as those of kriging.
| CONCLUSIONS |
|---|
|
|
|---|
Kriging with true variogram parameters known performed significantly better than IDW for a majority of the grid sizes and spatial structures. The only data set size where the accuracy of kriging was not different from that of IDW at all soil properties and spatial structures was that with the 529 grid points and 20-m distance between grid points, indicating that choice of interpolation procedure is not important for large intensive data sets.
Kriging with variogram parameters estimated based on the sample variogram performed similar to IDW for data sets with sufficient points. However, it was much less accurate than IDW when a reliable sample variogram could not be obtained because of either an insufficient number of data points or too large a distance between the data points. Even when the distance between the grid data points exceeded the spatial correlation range of the studied data, IDW still was a valuable interpolation method, particularly for soil properties with medium and strong spatial structure. Hence, IDW is recommended to be used for small data sets for which variogram parameters are not known and for the data sets with large distances between the grid points.
Received for publication August 14, 2002.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
W. K. Jung, N. R. Kitchen, K. A. Sudduth, and S. H. Anderson Spatial Characteristics of Claypan Soil Properties in an Agricultural Field Soil Sci. Soc. Am. J., June 21, 2006; 70(4): 1387 - 1397. [Abstract] [Full Text] [PDF] |
||||
![]() |
T.-L. Liu, K.-W. Juang, and D.-Y. Lee Interpolating Soil Properties Using Kriging Combined with Categorical Information of Soil Maps Soil Sci. Soc. Am. J., May 23, 2006; 70(4): 1200 - 1209. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. N. Kravchenko, G. P. Robertson, S. S. Snap, and A. J. M. Smucker Using Information about Spatial Variability to Improve Estimates of Total Soil Carbon Agron. J., May 3, 2006; 98(3): 823 - 829. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. G. Mueller, N. B. Pusuluri, K. K. Mathias, P. L. Cornelius, and R. I. Barnhisel Site-Specific Soil Fertility Management: A Model for Map Quality Soil Sci. Soc. Am. J., November 1, 2004; 68(6): 2031 - 2041. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. G. Mueller, N. B. Pusuluri, K. K. Mathias, P. L. Cornelius, R. I. Barnhisel, and S. A. Shearer Map Quality for Ordinary Kriging and Inverse Distance Weighted Interpolation Soil Sci. Soc. Am. J., November 1, 2004; 68(6): 2042 - 2047. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| The SCI Journals | Agronomy Journal | Crop Science | |||
| Journal of Natural Resources and Life Sciences Education |
Vadose Zone Journal | ||||
| Journal of Plant Registrations | Journal of Environmental Quality |
The Plant Genome | |||