SSSAJ Journal of Natural Resources and Life Sciences Education
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (107)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Shepherd, K. D.
Right arrow Articles by Walsh, M. G.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Shepherd, K. D.
Right arrow Articles by Walsh, M. G.
GeoRef
Right arrow GeoRef Citation
Agricola
Right arrow Articles by Shepherd, K. D.
Right arrow Articles by Walsh, M. G.
Related Collections
Right arrow Ecological Risk Assessment
Right arrow Remote Sensing
Right arrow Soil Analysis
Soil Science Society of America Journal 66:988-998 (2002)
© 2002 Soil Science Society of America

DIVISION S-8—NUTRIENT MANAGEMENT & SOIL & PLANT ANALYSIS

Development of Reflectance Spectral Libraries for Characterization of Soil Properties

Keith D. Shepherd* and Markus G. Walsh

International Centre for Research in Agroforestry (ICRAF), P.O. Box 30677, Nairobi, Kenya

* Corresponding author (k.shepherd{at}cgiar.org)


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Methods for rapid estimation of soil properties are needed for quantitative assessments of land management problems. We developed a scheme for development and use of soil spectral libraries for rapid nondestructive estimation of soil properties based on analysis of diffuse reflectance spectroscopy. A diverse library of over 1000 archived topsoils from eastern and southern Africa was used to test the approach. Air-dried soils were scanned using a portable spectrometer (0.35–2.5 µm) with an artificial light source. Soil properties were calibrated to soil reflectance using multivariate adaptive regression splines (MARS), and screening tests were developed for various soil fertility constraints using classification trees. A random sample of one-third of the soils was withheld for validation purposes. Validation r2 values for regressions were: exchangeable Ca, 0.88; effective cation-exchange capacity (ECEC), 0.88; exchangeable Mg, 0.81; organic C concentration, 0.80; clay content, 0.80; sand content, 0.76; and soil pH, 0.70. Validation likelihood ratios for diagnostic screening tests were: ECEC <4.0 cmolc kg-1, 10.8; pH <5.5, 5.6; potential N mineralization >4.1 mg kg-1 d-1, 2.9; extractable P <7 mg kg-1, 2.9; exchangeable K <0.2 cmolc kg-1, 2.6. We show the response of prediction accuracy to sample size and demonstrate how the predictive value of spectral libraries can be iteratively increased through detection of spectral outliers among new samples. The spectral library approach opens up new possibilities for modeling, assessment and management of risk in soil evaluations in agricultural, environmental, and engineering applications. Further research should test the use of soil reflectance in pedotransfer functions for prediction of soil functional attributes.

Abbreviations: ECEC, effective cation-exchange capacity • ECECclay, ECEC divided by clay fraction • MARS, multivariate adaptive regression splines • RMSE, root mean square error


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
CONVENTIONAL ASSESSMENTS of soil capacity to perform specific production, engineering, or environmental functions rely on local calibration of observations on soil functional capacity to measured soil properties. However, soil analyses are expensive and dense sampling is required to adequately characterize spatial variability of an area, making broad-scale quantitative evaluation difficult (Dent and Young, 1981). New, rapid methods to quantify soil properties and variability are needed for the development of risk-based systems of soil interpretations that are designed to quantify prediction uncertainty so that users may be able to employ such information in decision-making (Dewayne Mays, 1996; McKenzie et al., 2000).

Diffuse reflectance spectroscopy is now routinely used for the rapid nondestructive characterization of a wide range of materials (Davies and Giangiacomo, 2000). Spectral signatures of materials are defined by their reflectance or absorbance, as a function of wavelength in the electromagnetic spectrum. Under controlled conditions, the signatures result from electronic transitions of atoms and vibrational stretching and bending of structural groups of atoms that form molecules or crystals. Fundamental features in reflectance spectra occur at energy levels that allow molecules to rise to higher vibrational states. For example, the fundamental features related to various components of soil organic matter generally occur in the mid- to thermal-infrared range (2.5–25 µm), but their overtones (at one half, one third, one fourth etc. of the wavelength of the fundamental feature) occur in the near-infrared (0.7–1.0 µm) and short-wave infrared (1.0–2.5 µm) regions. Soil clay minerals have very distinct spectral signatures in the short-wave infrared region because of strong absorption of the overtones of SO2-4, CO2-3, and OH- and combinations of fundamental features of, for example, H2O and CO2 (Hunt, 1982; Clark, 1999). The visible (0.4–0.7 µm) region has been widely used for color determinations in soil and geological applications as well as in the identification of Fe oxides and hydroxides (Ben-Dor et al., 1999). Since the mid-1980s, developments in instrument technology and chemometrics (the application of mathematical and statistical techniques to chemical data) have led to the increased use of spectroscopy in the laboratory and field and from space platforms, notably in geological studies (Clark, 1999).

Recent research has demonstrated the ability of reflectance spectroscopy to provide nondestructive rapid prediction of soil physical, chemical, and biological properties in the laboratory (Ben-Dor and Banin, 1995; Janik et al., 1998; Reeves et al., 1999). There has been some success with reflectance spectroscopy for sensing of soil organic matter in the field (Sudduth and Hummel, 1993), and for the discrimination of major soil types from satellite multi-spectral and aircraft hyperspectral data (Baumgardner et al., 1985; Coleman et al., 1993; Palacios-Orueta et al., 1999). Despite these indications of the potential of the technique, there are few examples of the application of reflectance spectroscopy for nondestructive assessment of soils (Janik et al., 1998; Myer, 1998). Although geological spectral libraries exist that include soil mineral spectra (e.g., Clark, 1999), there are few examples of soil spectral libraries that include a wide diversity of soils with information on physical, chemical, and biological properties (Ben-Dor et al., 1999; Malley et al., 2000; Chang, 2001). In particular there has been little focus on the development of soil spectral libraries for application to risk-based approaches to soil evaluation that explicitly consider uncertainty in predictions and interpretations of soil properties.

We propose a scheme for the use of spectral libraries as a tool for building risk-based approaches to soil evaluation (Fig. 1) . The ability to rapidly and nondestructively characterize soils using reflectance spectroscopy permits thorough sampling of the variation within a target population of soils (Stenberg et al., 1995). Soil properties or attributes of soil functional capacity are measured for only a selection of soils, designed to sample the variation in the spectral library, and then calibrated to soil reflectance. If, on the basis of cross-validation or holdout-validation methods, calibrations are found to be insufficiently accurate for user requirements, the calibration sample size can be increased. The resultant calibrations between soil functional attributes and soil reflectance are then used to predict the soil functional attributes for the entire soil library and for new samples that belong to the same population as the library soils. Poorly described soils, whose spectra are not representative of the library spectra, are further characterized and added to the calibration library.



View larger version (33K):
[in this window]
[in a new window]
 
Fig. 1. Logical scheme for use of reflectance spectral libraries in a risk-based approach to prediction of soil functional attributes.

 
The success of the spectral library approach will depend primarily on the ability to build robust models for prediction of soil attributes from soil reflectance spectra. The aim of this study was to test the overall spectral library approach for the prediction of several important soil properties and soil fertility tests. The specific objectives were to test basic relationships between soil properties and soil reflectance, and investigate the response of prediction error to (i) variation in calibration sample size, and (ii) screening for library outliers when predicting new samples (corresponding to the decision nodes in Fig. 1).


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Soil Characterization
The library soils consisted of topsoil (0- to 15- or 0- to 20-cm depth) samples taken from multilocation experiments, on-farm trials, and soil surveys conducted in eastern and southern Africa during 1993 through 1999 for which basic soil properties had been analyzed by the soil laboratory of the International Centre for Research in Agroforestry. The library included soils from Malawi, Kenya, Rwanda, Tanzania, Uganda, Zambia, and Zimbabwe, taken from a wide variety of landscape positions, parent materials, and land uses. Although the soils were not formally classified on-site, they were sampled from areas broadly mapped (1:1 million) as Soil Taxonomy orders Alfisols, Andisols, Aridisols, Entisols, Histosols, Inceptisols, Mollisols, Oxisols, Ultisols, and Vertisols.

The soils were air-dried, passed through a 2-mm sieve, and stored in paper bags at room temperature. They were analyzed using standard methods widely used for tropical soils. Soil pH was determined in water using a 1:2.5 soil/solution ratio. Samples were extracted with 1 M KCl using a 1:10 soil/solution ratio, and analyzed by NaOH titration for exchangeable acidity and by atomic absorption spectrometry for exchangeable Ca and Mg, and exchangeable Na by flame photometry (ISFEIP, 1972; Yurimaguas Experiment Station Staff, 1989). Samples with pH >=5.5 were assumed to have zero exchangeable acidity and samples with pH <7.5, zero exchangeable Na. Samples were extracted with 0.5 M NaHCO3 + 0.01 M EDTA (pH 8.5, modified Olsen) using a 1:10 soil/solution ratio and analyzed by flame photometer for exchangeable K and colorimetrically (molybdenum blue) for extractable P (ISFEIP, 1972; Yurimaguas Experiment Station Staff, 1989). Organic C was determined colorimetrically after H2SO4–dichromate oxidation at 150°C (Heanes, 1984). Nitrogen mineralization potential was determined by determination of ammonium production with 7-d anaerobic incubations at 40°C (Keeney, 1982). Particle-size distribution was determined using the hydrometer method after pretreatment with H2O2 to remove organic matter (Gee and Bauder, 1986). Effective cation-exchange capacity was calculated as the sum of exchangeable acidity and exchangeable bases, and ECECclay was calculated as ECEC divided by the clay fraction.

Reflectance Measurements
Soil diffuse reflectance spectra were recorded for each library sample using a FieldSpec FR spectroradiometer (Analytical Spectral Devices Inc., Boulder, Colorado) at wavelengths from 0.35 to 2.5 µm with a spectral sampling interval of 1 nm. The optical setup was as recommended by the instrument manufacturers (Analytical Spectral Devices Inc., 1997), commonly used in geological applications. Samples were illuminated from above (Fig. 2) with two tungsten quartz halogen filament lamps in housings with aluminum reflectors (Lowel pro-lamp, Lowel-Light Manufacturer Inc., New York, NY) with 50W bulb; ~3200 K color temperature (WelchAllyn, Skaneateles Falls, NY) . The lamps were placed each side of the sample, with the light beam 30° from vertical, to give a distance of 50 cm between the lamps and the sample. Reflected light was collected with a 25° field-of-view foreoptic angled at a 30° from vertical and perpendicular to the plane of illumination at a distance of 5 cm from the sample (Fig. 2).



View larger version (104K):
[in this window]
[in a new window]
 
Fig. 2. Optical setup used for reflectance measurements with a portable spectrometer. A pistol grip is used to point the optical cable at the soil sample. The sample is placed on a rotary stage and illuminated by tungsten quartz halogen lamps.

 
Air-dried soil samples ground to pass a 2-mm sieve were packed in 12 mm deep, 55-mm diam. polystyrene petri dishes. Air-dried soils were used for convenience and to minimize affects of variation in soil moisture on reflectance (Ben-Dor et al., 1999). The dishes were over-filled with soil then excess soil scraped off using a blade to ensure a flat surface flush with the top of the dish. To sample within dish variation, reflectance spectra were recorded at four positions, successively rotating the sample through 90° between readings. Variation in readings within dishes can occur because of different individual spectrometers in the instrument having different fields of view (Analytical Spectral Devices Inc, 1997). The average of ten spectra (the manufacturer's default value) was recorded at each position to minimize instrument noise. Before reading each sample, ten white reference spectra were recorded using calibrated spectralon (Labsphere, Sutton, NH) placed at the same distance from the foreoptic as the soil sample. Reflectance readings for each wavelength band were expressed relative to the average of the white reference readings. Preliminary investigations showed that coefficients of variation in average relative reflectance were ~1% among rotations within a sample dish, and 2% for replicate dishes from a soil sample. With this method, a single operator can comfortably scan several hundred samples a day.

Statistical Methods
Multivariate relationships among soil properties were analyzed to establish to what degree good spectral calibrations could have resulted from interdependencies among soil variables. Conditional independence assumptions among soil properties were tested using graphical linear modeling approaches (Edwards, 2000). Graphical modeling is a form of multivariate analysis that uses graphs to represent models. The graphs display the structure of both associational and causal dependencies among the variables in the model, allowing the conditional as well as the marginal associations to be studied. By considering the conditional associations among variables the approach helps to identify spurious associations that can occur when studying only marginal pairwise associations among variables. Where necessary, Box-Cox (Box and Cox, 1964) transformations were applied prior to analysis to obtain approximately multivariate normally distributed values. A backwards selection procedure was then applied to conditional independence assumptions among soil variables. This iteratively tests all the marginal associations among variables in the model and deletes those associations that do not significantly contribute towards the fit of the fully saturated base model, using maximum likelihood estimation.

The raw spectral reflectance data was preprocessed prior to statistical analysis as follows. Relative reflectance spectra were resampled by selecting every tenth-nanometer value from 0.35 to 2.5 µm. This was done to reduce the volume of data for analysis and to match it more closely to the spectral resolution of the instrument (3 to 10 nm). The reflectance values were then transformed with first derivative processing (differentiation with second-order polynomial smoothing with a window width of 20 nm) using a Savitzky-Golay filter, as described by Fearn (2000). Derivative transformation is known to minimize variation among samples caused by variation in grinding and optical set-up (Marten and Naes, 1989). Multiplicative scatter correction (used to compensate for additive and multiplicative effects in spectral data) and normalization (sample-wise scaling) of the reflectance data (both described in Vandeginste et al., 1998) did not improve calibrations and were not used. Wavebands in regions of low signal to noise ratio or displaying noise because of splicing between the individual spectrometers (Analytical Spectral Devices Inc., 1997) were omitted leaving 198 wavebands for analysis. The omitted bands were 0.35 through 0.38 µm, 0.97 through 1.01 µm, and 2.46 through 2.50 µm.

Variation in overall spectral shape among samples was explored by displaying spectra identified by a central-composite sampling design based on Euclidean distances from the center sample in principal component space (Massart et al., 1997; CAMO Inc., 1998). The first three principal components were used as the design factors with equal weighting given to each component. The spectra situated closest to the center, cube (distance of one standard deviation from center), and star (distance of 1.98 standard deviations from center) points of the design were selected. Principal components analysis was conducted with the Unscrambler version 7.5 (CAMO Inc., 1998). Spectra were also plotted continuum-removed, using ENVI (Research Systems Inc., 1999), to help detect subtle differences in spectral absorption features among soils. Continuum removal is used to normalize reflectance spectra so that individual absorption features can be compared from a common baseline. The continuum is a convex hull, consisting of straight-line segments fitted over local spectral maxima (Research Systems Inc., 1999). The patterns of correlation between individual soil variables and derivative reflectance at each waveband were also explored.

Individual soil variables were then calibrated against the 198-reflectance wavebands using MARS (MARS version 2.0, Salford Systems Inc., San Diego, CA). Multivariate adaptive regression splines is a new approach to regression modeling developed for data mining applications (Friedman, 1991; Steinberg et al., 2001). Data mining approaches are appropriate for large multivariate data sets when there is little theoretical knowledge available to guide the model-building process. Multivariate adaptive regression splines is a nonlinear multiple regression technique that builds flexible models by fitting piecewise linear regressions. When a target variable is regressed against a predictor variable, the slope of the regression line is allowed to change at certain points (termed knots) along the predictor axis. The variables and knot positions used are found via an intensive search procedure. Each such relationship, which may include interaction terms, is represented as a basis function (Steinberg et al., 2001). In fact, in our analyses no interactions between dependent variables were allowed. Multivariate adaptive regression splines first constructs an overly large model by adding basis functions, which are then deleted in order of least contribution to the model until an optimal model is found. The number of degrees of freedom charged for knot optimization was determined using ten-fold cross-validation. The maximum number of basis functions was varied to provide the best model in terms of lowest generalized cross-validation measure. In ten-fold cross-validation the calibration data is divided into ten roughly equal parts, each containing a similar distribution for the dependent variable. Nine parts of the data are used to develop a calibration model, which is then tested on the remaining one tenth of the data. This process is repeated until each part of the data has been withheld. The results of the ten tests are then combined to provide error rates for the calibration model (Massart et al., 1997). Generalized cross-validation is an approximate version of cross-validation that is less computationally demanding (Friedman, 1991); it is the average-squared residual of the fit to the data times a penalty to account for the increased variance associated with increasing model complexity (i.e,. number of basis functions). After experimenting with several alternative calibration methods, MARS was found to give the best average prediction performance on holdout validation samples. These alternative methods included partial least squares regression; classification and regression trees (CART; Brieman et al., 1984; Steinberg and Cola, 1997), including the use of bootstrap aggregation and adaptive resampling and combining (i.e., averaging of a large number of trees generated by resampling and replacement from the original training data); hybrid models, with separate partial least squares models combined from subsets of data identified using regression trees; and schemes for spectral matching.

For each soil variable, calibration models were developed on a random sample of two-thirds of the soil samples of the entire library. The same random selection pattern was used for all the calibrations but the sample selection varied according to the number of samples available for each soil property. The calibrations were tested by predicting the soil variables on validation data sets composed of the remaining one-third of the samples. The calibrations were developed on the transformed soil variables, but the calibration and validation results were back-transformed for evaluation of predictive performance. No samples were omitted from the analysis in either the calibration or validation data sets. Prediction success was evaluated on predicted and actual observations using the coefficient of determination (r2), root mean square error (RMSE) and bias. Root mean square error and bias were also calculated separately for each quartile of the predicted variable.

To test predictive performance for given threshold values of selected soil variables, a number of soil fertility screening tests were defined based on critical limits commonly reported in the literature (Cochrane et al., 1985; Landon, 1991). Samples were classified either as abnormal or normal based on a cut-off value defined by the critical limit (Table 1). Classification trees were used to develop calibrations for each soil test using CART version 4.0 (Steinberg and Cola, 1997) with the 198-reflectance wavebands as dependent variables. A classification tree is built from decision rules that repeatedly split the data set into increasingly homogeneous subsets. The decision rules use the dependent variables to give the best separation of classes in the predictor variable (here, normal and abnormal cases) in terms of greatest reduction in variance. Output from the model fitting procedure is a decision tree. The splitting-rule used in these analyses was the Gini index of diversity (a measure of node impurity, described by Brieman et al., 1984). The classification trees were grown using a randomly selected calibration data set consisting of two-thirds of the samples, using ten-fold cross-validation (Brieman et al., 1984). The predictive ability of the resulting models was then further tested using the remaining one-third of the samples withheld for validation.


View this table:
[in this window]
[in a new window]
 
Table 1. Definition of soil fertility screening tests. Critical limits and interpretations are for illustrative purposes only.

 
To evaluate the stability of the soil test predictions using small calibration samples, a second set of trees was grown using only 10% of the samples for calibration and the remaining 90% of samples for validation. Calibration sample size was also varied systematically from 5 to 67% of soils for selected variables but keeping the validation data set constant at 33% of library samples. Again, no samples were omitted from the analysis in either the calibration or validation data sets. Predictive performance was assessed using diagnostic screening tests, commonly used in clinical medicine (e.g., Jones and Payne, 1997). The aim is to evaluate how successful a particular test is in diagnosing abnormal as distinct from normal cases. Predictive performance was assessed using the sensitivity (percentage of abnormal cases correctly predicted), specificity (percentage of normal cases correctly predicted), and the positive likelihood ratio (percentage sensitivity/[100 - percentage specificity]), which indicates the value of the test for increasing certainty about a positive diagnosis (e.g., Jones and Payne, 1997). The product of pretest odds for the tested population and the likelihood ratio of the test, gives the posttest odds of abnormality in an individual soil. Confidence intervals were estimated as given in Simel et al. (1991).

A procedure for screening new samples for outlier detection was developed. A principal components model is fitted to derivative reflectance spectra for the existing library samples. Spectral outliers in new samples are identified with respect to this model using a classification method (significance = 5%) known as soft independent modeling of class analogy. This method tests whether new samples are members of the existing library class or not based on measures of object-to-model distance and leverage (CAMO Inc., 1998; Vandeginste et al., 1998).


    RESULTS AND DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Soil Library Properties
Although not all analyses were available for all soils, there was wide variation in individual soil properties in the soil library (Table 2). The wide range in ECECclay values indicated a wide range in clay mineralogy from soils dominated by 1:1 type clays (<20 cmolc kg-1) to those dominated by 2:1 types (>40 cmolc kg-1). There were comparatively few samples with high levels of organic C (>30 g kg-1), exchangeable acidity (>0.1 cmolc kg-1), exchangeable Na (>0.1 cmolc kg-1), or silt (>350 g kg-1). The generally low correlation and partial correlation coefficients among soil properties indicated a reasonable spread of soil properties in the multivariate data space (Table 3). In fact, the graphical model analysis showed that exchangeable Ca was conditionally independent (P = 0.05) of extractable P and exchangeable K, and that exchangeable P was conditionally independent of silt. Thus their association with the other soil variables could account for the correlation among these variables.


View this table:
[in this window]
[in a new window]
 
Table 2. Number of soils and the 2.5th, 25th, and 50th (median), 75th, and 97.5th percentiles for each soil property in the soils library.

 

View this table:
[in this window]
[in a new window]
 
Table 3. Correlation coefficients (upper triangle) and partial correlation coefficients (lower triangle) for selected soil properties (n = 661).

 
Spectral Library Properties
Albedo (relative reflectance averaged across the entire spectrum) of all the soils in the library ranged from 0.09, with an almost featureless spectrum, to 0.54 (Fig. 3a) . Other workers have shown that differences in soil albedo are broadly related to soil organic matter concentrations (Ben-Dor et al. 1999). The untransformed soil spectra followed the same basic shape as described by other workers (Ben-Dor et al., 1999) with prominent absorption features around 1.4, 1.9, and 2.2 µm. These features are associated with clay minerals, for example OH features of free water at 1.4 and 1.9 µm, and lattice OH features at 1.4 and 2.2 µm (Hunt, 1982). There was interesting variation among soils in absorption features in the visible range (Fig. 3b) and in the broad absorption feature around 0.9 µm (Fig. 3a). These features are commonly associated with Fe2+ and Fe3+ (Hunt, 1982), but can also be influenced by organic matter (Ben-Dor et al., 1999)



View larger version (46K):
[in this window]
[in a new window]
 
Fig. 3. Diffuse reflectance spectra from the Africa soils library. (A) Samples with smallest Euclidean distance to central composite design points (Design) for the first three principal components of the entire soils library (n = 1170) and the spectra with highest (High) and lowest (Low) albedo. (B) The visible wavelength part of the spectra shown continuum removed to emphasize the absorption features. (C) Correlation of soil organic C and clay concentration and effective cation-exchange capacity (ECEC) with the first derivatives of the relative reflectance at different wavelengths.

 
Basic soil physical and chemical properties showed high correlation with derivative reflectance values near the principal absorption features within the visible (0.4–0.6 µm) and short-wave infrared (1.4, 1.9, 2.2, 2.3, and 2.4 µm) wavelength regions (Fig. 3c). There were few areas of the spectrum that displayed low correlation (r2 < 0.2) with soil properties. Other workers (e.g., Palacios-Orueta and Ustin, 1998) have also found that for the purpose of calibrating soil properties to spectral characteristics, it is preferable to use information over the entire spectrum, rather than attempting to interpret individual absorption features. Soil spectra result from overlapping absorption features of many organic and inorganic components, thus subtle differences in spectral shape may provide valuable information about soil properties.

Prediction of Soil Properties using Multivariate Adaptive Regression Splines
Good calibrations (r2 > 0.75) were obtained for soil pH, ECEC, exchangeable Ca, exchangeable Mg, organic C, and particle-size distribution (Table 4). The calibration models with the largest r2 values for generalized cross-validation for a given attribute also resulted in the largest validation r2 values, indicating that the generalized ten-fold cross-validation was effective in safeguarding against over-fitting. The level of prediction accuracy achieved on the validation data sets (Fig. 4) is sufficiently high for studies in which spatial or temporal variability of an attribute is large relative to the accuracy of its measurement, as typically found in large-area applications and farm advisory work. Root mean squared error was larger at high than low values for ECEC, exchangeable Ca, exchangeable Mg, sand, and organic C. Bias was also larger at high than low values of organic C; it increased from -0.3 g kg-1 for predicted values below 24 g kg-1 to 2.7 for values above 24 g kg-1. The poorer predictive performance at high values for these variables may be because of error in the laboratory analytical methods rather than genuine lack of prediction power. Increased analytical error could be expected at higher concentrations because of greater variability in amounts of ion extracted and the need for increasing number of dilutions. This hypothesis was supported by trends in the available data on variability in duplicate laboratory determinations. For example, the RMSE for ECEC laboratory duplicates increased from 0.4 at <15 cmolc kg-1 to 1.1 at >15 cmolc kg-1. The RMSE for organic C laboratory duplicates increased dramatically from 0.9 at <20 g kg-1 to 7.4 at >0.2 g kg-1. The H2SO4–dichromate oxidation method also underestimates organic C in these soils at values of >0.2 g kg-1 compared with the dry combustion method (A. Albrecht, personal communication, 2000). Other sources of analytical error can be expected because of (i) variation in analytical technique among batches analyzed over several years, (ii) changes in soil properties between the times of analytical and spectral measurement, and (iii) variation among subsamples used for analytical and spectral measurements.


View this table:
[in this window]
[in a new window]
 
Table 4. Calibration of soil properties to first derivative reflectance spectra using multivariate adaptive regression splines. The coefficient of determination (r2), bias and root mean square error (RMSE) are given for back-transformed data.

 


View larger version (38K):
[in this window]
[in a new window]
 
Fig. 4. Scatterplot comparison of actual and predicted values for different soil properties for the validation data sets. Calibration models were developed with multivariate adaptive regression splines (MARS) using a random selection of two-thirds of the total number of soils and tested on the remaining one-third holdout sample. No samples were omitted from either calibration or validation datasets. Error bars show the root mean standard error (RMSE) for the quartile ranges of the predicted values.

 
Very poorly predicted values in the validation data sets (standardized residuals of >3 for differences between predicted and actual values) showed no particular pattern with regard to soil type or region. However, all the poorly predicted values for ECEC (n = 4) and exchangeable Ca (n = 8) had pH values of >7.5. It is probable that presence of free CaCO3 inflated the exchangeable Ca and ECEC values in these samples. Recalibration after omitting values with pH >7.0 reduced the validation RMSE from 3.8 to 2.8 for ECEC and from 2.8 to 2.5 for exchangeable Ca (n = 319).

Because exchangeable Ca and Mg displayed high partial correlation, the conditional independence assumptions between actual and predicted values for these two variables were explored further with graphical modeling, using a coherent backwards selection procedure. It was established that both the relationship between actual and predicted exchangeable Ca and that between actual and predicted exchangeable Mg displayed conditional dependence (P = 0.05). Thus the spectral test provided more information about exchangeable Mg than was provided by the actual and predicted exchangeable Ca values.

Calibration models established for exchangeable K, extractable P, and N mineralization potential were not stable (validation r2 < 0.5). Janik et al. (1998) also observed poor prediction of bicarbonate-extractable K and P with mid-infrared analysis. Chang et al., (2001) reported that ability to predict levels of extractable cations varied with the extraction method, but the reasons for the differences were not clear. Because soil supply of nutrients to plants depends on many interrelated soil factors, further work should investigate whether plant response to N, P, and K can be better predicted from soil reflectance than from soil extractions.

Prediction of Soil Tests using Classification Trees
For many agricultural and engineering applications, such as soil fertility evaluation, it is often sufficient to classify a soil with respect to a critical test value, rather than needing a precise estimate of a soil property. Using a one-third holdout sample for validation, reasonable predictive performance was achieved for all the soil screening tests (Table 5) with positive likelihood ratios ranging from 2.7 to 11.4. Although exchangeable K and extractable P were predicted moderately poorly from soil reflectance using MARS calibrations, the relationships were still strong enough to permit reasonable discrimination of soils falling above or below specific cut-off values. There are few comparable data from screening tests in the soil science literature (Dewayne Mays, 1996), but for comparison purposes, the values reported here fall within the range of likelihood ratios commonly published for screening tests in the medical literature (Jones and Payne, 1997).


View this table:
[in this window]
[in a new window]
 
Table 5. Prediction of soil tests from reflectance spectra using classification trees. Validation data are given for two calibration sample sizes: (i) two-thirds and (ii) ten percent of the total soils.

 
In practice, prediction success will also depend not only on the performance of the screening test, but also on the prevalence of abnormal cases in the sampled population. If abnormal cases were relatively abundant, say at a population prevalence of 50%, the percentage of cases that would be correctly diagnosed by the tests in Table 5 would range from 73 to 91%. However, if abnormal cases had a population prevalence of only 10%, then the corresponding correctly diagnosed percentages would range from 20 to 52%. On the other hand, in the case of low extractable P, the actual prevalence rate in the validation data set was 63%, giving 83% correct diagnosis of P-deficient cases. In some cases, there may be scope for adjusting the critical cut-off limit to increase the prevalence of abnormal cases in the sampled population. It is also possible to adjust the performance of a test by varying splitting rules, which may improve prediction where costs of misclassification are unequal. For example, the cost of misclassifying a P-deficient soil may be judged to be greater than misclassifying a P-sufficient soil, in which case the sensitivity of the test will be more important than its specificity. In this case, by adjusting the priors when fitting the calibration tree, the sensitivity of the low-extractable P test was improved from 70 (Table 5) to 86%, although at the expense of specificity which decreased from 75 to 49%. If abnormal cases were then screened with a further diagnostic test of high specificity, then extremely accurate diagnosis would be possible.

Although using full spectrum data always resulted in the best predictions, there were often good surrogate node splitters in different parts of the spectrum. This indicates possible flexibility in the choice of wavelength ranges for calibrations. For example, likelihood ratios for the low ECECclay test for models using full spectrum were 8.3 (95% confidence interval, 5.1–13.3); visible wavelength ranges, 4.7 (3.2–6.8); and short wave infrared, 7.7 (5.0–11.9). For low extractable P, the corresponding likelihood ratios were 2.9 (2.2–3.8), 2.4 (1.8–3.1), and 2.2 (1.8–2.8), respectively. These results indicate opportunity for use of simplified spectrometer designs for specific screening tests, e.g., use of visible and near infrared, handheld spectrometers that are commercially available.

Because soil reflectance provides an integrated measure of several fundamental soil properties, including surface charge characteristics, particle-size distribution, and organic C, soil functional attributes could be predicted better directly from soil reflectance than indirectly from laboratory soil tests. For example, Lins and Cox (1989) found that prediction of optimum P fertilizer rate from extractable P was greatly improved when clay content or surface area was considered. Our results demonstrate moderate ability of reflectance measurements to discriminate soils with low extractable P, as well as good prediction of clay and surface charge characteristics.

Response to Calibration Sample Size
The response of predictive performance of the MARS regressions to variation in calibration sample size was investigated for three key soil properties (Fig. 5) . Predictive performance decreased gradually with decreasing sample size at large sample sizes, but rapidly decreased as sample size decreased below about 100 to 200 samples. Prediction performance of ECEC was less sensitive to sample size than clay or organic C. We suggest that initial investments into building reasonably large calibration libraries (several hundred soils) are worthwhile to allow such responses to be investigated. Once calibration sample size is large enough to provide stable results, then only calibration maintenance will be required to include library outliers among new samples (Fig. 1).



View larger version (18K):
[in this window]
[in a new window]
 
Fig. 5. Response of coefficient of determination to sample size for effective cation-exchange capacity (ECEC), organic C concentration and clay content. Results are for predicted and observed values based on a validation data set of 33% of the library soils, as calibration sample size was varied from 5 to 67% of the total number of library soils. Calibrations were conducted using MARS regressions.

 
For soil tests, the effect of reducing calibration sample size from 67 to 10% of the library soils varied with the individual test and performance measure (Table 5). For low ECEC and low extractable-P tests, sensitivity was little affected but specificity decreased with the smaller sample, whereas for high-exchangeable K the reverse was true. Performance of high ECEC, low ECECclay, and low-exchangeable K tests was relatively insensitive to sample size, whereas for low pH, high ECECclay, high-extractable P, and high N mineralization potential, both sensitivity and specificity were equally affected. However, positive likelihood ratios were greater than 2.0 in all tests using a 10% calibration sample, except for the extractable P tests, for which the ratios were ~1.5. If abnormal cases were at a population prevalence of 50%, the percentage of cases that would be correctly diagnosed using the 10% calibration samples would range from 60 to 87% for the different tests in Table 5.

Our results indicate that in some cases, very small calibration sample sizes may provide adequate predictive performance. For example, when a calibration tree for high ECECclay was built on a random sample of only 34 soils and used to predict abnormal cases for the remaining 647 library soils, using the prevalence rate of 27% abnormal cases that occurred in the library, a predictive efficiency of 86% was obtained, with a positive predictive value of 71% and negative predictive value of 92%. On the other hand, where diagnostic performance is less than desired, combining additional screening tests based on other information, such as land use, topography, or satellite imagery, could be a preferable strategy to that of increasing the calibration sample size. Calibrations could also be improved by restricting geographical extent (e.g., Sudduth and Hummel, 1996), but global models may be more robust than local models in terms of ability to predict new samples.

Library Outlier Detection
To test the outlier screening procedure (Fig. 1), the ECEC values at pH <7.0 for southern African soils (n = 274) were taken as the existing spectral library calibration data set, and the corresponding values for eastern Africa (n = 697) were taken as new samples to be predicted. A principal components model was fitted to the southern Africa spectra and then outliers in the eastern Africa spectra identified with respect to this model using soft independent modeling of class analogy. The outlying spectra (n = 53) were then added to the southern Africa calibration data set (Table 6). In a second test, an additional 86 randomly selected eastern African soils were added to the calibration data set, giving a total of 20% of the eastern African soils included in the calibration. These strategies were compared with random sampling of the same number of soils (Table 6). Multivariate adaptive regression splines models were fitted to the three calibration data sets.


View this table:
[in this window]
[in a new window]
 
Table 6. Prediction of effective cation-exchange capacity (ECEC) for eastern African soils from calibrations based on southern African soils, with and without inclusion of spectral outliers and random samples from the eastern African soils. Multivariate adaptive regression splines (MARS) regressions were used for prediction.

 
Incorporation of outliers significantly improved prediction of ECEC in eastern African soils (Table 6), but addition of further randomly selected samples was necessary to achieve high prediction accuracy (r2 > 0.8). The additional random samples probably helped to more adequately sample the variability in attribute data from soils with similar spectra. Using spectral outliers resulted in better prediction than using the same number of randomly selected soils, but there was less advantage to including spectral outliers with a larger random sample (Data Set 5 in Table 6). However, incorporation of outliers in the calibration library increased the range of the calibration and this strategy is generally preferable to expand the population of soils that can be predicted. From these results, we recommend that in addition to library outliers, a certain percentage of all new samples be analyzed and added to the calibration library. These samples would then also serve validation purposes. Further studies are required to identify optimal procedures for the selection of spectral outliers and to test systematic versus random sampling of the spectral data space. Whether global spectral libraries, covering large geographical extent, are more robust than local libraries should also be investigated.


    CONCLUSIONS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
We have described a conceptual framework for use of spectral libraries in predicting soil properties with sparse data sets and tested these concepts using a spectral library of diverse African topsoils. Reflectance spectroscopy used in the laboratory can provide rapid and simultaneous prediction of several fundamental soil physical and chemical properties. Calibrations and screening tests for various soil properties and soil fertility constraints can be developed, based on a limited number of samples selected from soil spectral libraries, to an accuracy level that is typically acceptable for large-area applications. Even for site-specific management, the method would allow large numbers of samples to be taken from a field, which may give a better overall estimate of a given soil property than more accurate measurements of the property at a lower density sampling. The number of calibration samples required depends on the strength of the calibration with soil reflectance for a given soil attribute and the required level of accuracy. When predicting new soil samples, detection of spectral outliers allows the population of soils for which predictions are applicable to be systematically increased, thereby iteratively increasing the value of the spectral library. Computer programs could be developed for routine use of spectral libraries as an integral part of the spectrometer software.

A spectral library approach provides a tool for generalizing results of soil assessments that are conducted at a limited number of sites, and thereby increases the efficiency of expensive and time-consuming soil-related studies. The rapid nature of the measurement allows soil variability to be more adequately sampled than with conventional approaches and thereby facilitates risk-based approaches to soil assessments. For example, knowledge of the uncertainty in prediction of soil functional attributes, taking into account soil variability, allows users to make informed decisions about the trade off between the cost of the measurement and the risk (or potential for regret) associated with using the prediction.

Further investigations should test reflectance spectroscopy for direct prediction of a wide range of soil functional attributes for agricultural, environmental, and engineering applications, both in the laboratory and field, and develop operational schemes for its use in risk-based soil assessments. Because soil reflectance provides an integrated measure of number of fundamental soil properties, such calibrations could perform better, and would certainly be more rapid, than pedotransfer functions based on conventional measurements of soil properties. Soil functional attributes that are often predicted from basic soil properties tested in this study include net primary productivity, plant growth response to soil constraints and ameliorants, soil erodibility, soil compressibility and shrinkage, water retention and conductivity, and capacity to adsorb wastes and pollutants.

The spectral library approach provides a coherent framework for linking soil information with remote sensing information for improved spatial prediction of soil functional capacity. Remote sensing of soil properties directly from space platforms is hampered by problems such as atmospheric interference, shade and shadow effects, mixtures of materials within pixels, and variation in soil moisture content. Studies on the effect of soil moisture content on calibrations between soil functional attributes and soil reflectance would help to evaluate the potential of reflectance spectroscopy in the field. Future studies should explore approaches that combine soil spectral libraries, and other geo-referenced information, such as from digital terrain models and field observations, with information from multi- and hyper-spectral remote sensing imagery (e.g., Shepherd and Walsh, 2000).


    ACKNOWLEDGMENTS
 
We thank Paul Smithson and Peter Mbugua for facilitating access to the ICRAF soil laboratory databases and Luka Anjeho for field technical support. We gratefully acknowledge the Rockefeller Foundation and Swedish International Development Cooperation Agency for financial support. We thank three anonymous reviewers for helpful comments on an earlier draft of this manuscript.

Received for publication May 30, 2001.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 




This article has been cited by other articles:


Home page
Soil Sci.Home page
C. Nduwamungu, N. Ziadi, G. F. Tremblay, and L.-E. Parent
Near-Infrared Reflectance Spectroscopy Prediction of Soil Properties: Effects of Sample Cups and Preparation
Soil Sci. Soc. Am. J., September 11, 2009; 73(6): 1896 - 1903.
[Abstract] [Full Text] [PDF]


Home page
Soil Sci.Home page
G. M. Vasques, S. Grunwald, and J. O. Sickman
Modeling of Soil Organic Carbon Fractions Using Visible-Near-Infrared Spectroscopy
Soil Sci. Soc. Am. J., January 21, 2009; 73(1): 176 - 184.
[Abstract] [Full Text] [PDF]


Home page
Soil Sci.Home page
L. Wielopolski, G. Hendrey, K. H. Johnsen, S. Mitra, S. A. Prior, H. H. Rogers, and H. A. Torbert
Nondestructive System for Analyzing Carbon in the Soil
Soil Sci. Soc. Am. J., September 1, 2008; 72(5): 1269 - 1277.
[Abstract] [Full Text] [PDF]


Home page
Soil Sci.Home page
Y. Wu, J. Chen, J. Ji, P. Gong, Q. Liao, Q. Tian, and H. Ma
A Mechanism Study of Reflectance Spectroscopy for Investigating Heavy Metals in Soils
Soil Sci. Soc. Am. J., May 16, 2007; 71(3): 918 - 926.
[Abstract] [Full Text] [PDF]


Home page
Soil Sci.Home page
T. H. Waiser, C. L. S. Morgan, D. J. Brown, and C. T. Hallmark
In Situ Characterization of Soil Clay Content with Visible Near-Infrared Diffuse Reflectance Spectroscopy
Soil Sci. Soc. Am. J., March 12, 2007; 71(2): 389 - 396.
[Abstract] [Full Text] [PDF]


Home page
Soil Sci.Home page
M. R. Nanni and J. A. M. Dematte
Spectral Reflectance Methodology in Comparison to Traditional Soil Analysis
Soil Sci. Soc. Am. J., February 2, 2006; 70(2): 393 - 407.
[Abstract] [Full Text] [PDF]


Home page
J. Environ. Qual.Home page
M. J. Cohen, J. P. Prenger, and W. F. DeBusk
Visible-Near Infrared Reflectance Spectroscopy for Rapid, Nondestructive Assessment of Wetland Soil Quality
J. Environ. Qual., July 5, 2005; 34(4): 1422 - 1434.
[Abstract] [Full Text] [PDF]


Home page
Agron. J.Home page
K. D. Shepherd, C. A. Palm, C. N. Gachengo, and B. Vanlauwe
Rapid Characterization of Organic Resource Quality for Soil and Livestock Management in Tropical Agroecosystems Using Near-Infrared Spectroscopy
Agron. J., September 1, 2003; 95(5): 1314 - 1322.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (107)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Shepherd, K. D.
Right arrow Articles by Walsh, M. G.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Shepherd, K. D.
Right arrow Articles by Walsh, M. G.
GeoRef
Right arrow GeoRef Citation
Agricola
Right arrow Articles by Shepherd, K. D.
Right arrow Articles by Walsh, M. G.
Related Collections
Right arrow Ecological Risk Assessment
Right arrow Remote Sensing
Right arrow Soil Analysis


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
The SCI Journals Agronomy Journal Crop Science
Journal of Natural Resources
and Life Sciences Education
Vadose Zone Journal
Journal of Plant Registrations Journal of
Environmental Quality
The Plant Genome