Published online 12 March 2007
Published in Soil Sci Soc Am J 71:507-514 (2007)
DOI: 10.2136/sssaj2005.0391
© 2007 Soil Science Society of America
677 S. Segoe Rd., Madison, WI 53711 USA
SOIL PHYSICS
Rapid Prediction of Soil Water Retention using Mid Infrared Spectroscopy
L. J. Janika,
R. H. Merrya,
S. T. Forrestera,*,
D. M. Lanyonb and
A. Rawsonc
a CSIRO Land & Water PMB 2, Glen Osmond, South Australia 5064
b CSIRO Sustainable Ecosystems Waite Rd. Urrbrae South Australia 5062
c New South Wales Dep. of Natural Resources c/o Faculty of Science and Agriculture Charles Sturt Univ., Leeds Pde, Orange PO Box 883 NSW 2800
* Corresponding author (sean.forrester{at}csiro.au).
 |
ABSTRACT
|
|---|
Soil-water properties vary widely with soil composition and texture, but measurements are often time consuming and expensive to determine using traditional laboratory methods. Mid-infrared (MIR) spectroscopy is sensitive to soil composition, allowing multivariate calibrations to be derived between volumetric soil water retention and MIR spectra. Mid-infrared partial least squares (PLS) models can be derived from the spectra of soils and reference data, and can be used to predict the water retention solely from the MIR spectra of unknown samples. Regressions between laboratory-determined volumetric water retentions,
v, at matric suctions from 1 to 1500 kPa and values predicted by MIR PLS analysis are presented for a broad variety of surface soils from southern Australia. Cross-validation produced coefficient of determination values ranging from 0.67 to 0.87 and standard error of cross-validation in the range 4.1 to 3.2. Prediction robustness was tested using an independent set of samples for values of
v at field capacity (10-kPa suction) and permanent wilting point (1500-kPa suction). The prediction standard error for the test set was higher than for cross-validation. This was attributed to a mismatch between spectra for the test set and those of the calibration samples, resulting in a reduced ability of the calibration samples to model the test set spectra. The MIR PLS prediction method performed at least as well as some pedotransfer functions and was shown to be a rapid and inexpensive method for the prediction of volumetric soil moisture content for a range of soil types at a range of matric suctions.
Abbreviations: MIR, mid-infrared NSW, New South Wales PC, principal component PCA, principal components analysis PLS, partial least squares RPD, residual predictive deviations SECV, standard error of cross-validation SEP, standard error of prediction
 |
INTRODUCTION
|
|---|
Soil water retention is an important property for determining moisture content and is affected by soil density, particle size, mineral and organic composition, and pore-space density and distribution. Tests for soil water retention are, however, underutilized largely due to the relatively high cost and long turnaround time of the laboratory analysis. The most commonly used measurements for water retention are the volumetric water retention percentage (
v) at field capacity (which is arguably in the matric suction range 833 kPa) and at wilting point (with a matric suction of 1500 kPa). These two values broadly describe the extractable plant-available water, but a multipoint curve of the distribution of volumetric water contents from saturation to 1500 kPa is required for a full description of the water holding capacity of the soil.
Current laboratory methods for the determination of water retention require accurately maintained matric suctions on intact sections of soil cores for lengthy periods (Cresswell, 2002). These methods use ceramic suction plates for matric suctions up to 80 kPa or pressure plate extractors for matric suctions from 80 to 1500 kPa. Determining these values is expensive and time consuming and so this service is generally not offered for routine soil testing. A rapid, inexpensive alternative method to determine soil water retention, with acceptable precision, is needed as a surrogate for the laboratory method.
Pedotransfer functions and physicoempirical models have been used in response to this need, using more readily available and less expensive soil data (e.g., Arya and Paris, 1981; Rawls et al., 1982; Haverkamp and Parlange, 1986; Vereecken et al., 1989; da Silva and Kay, 1997). These functions are based on relationships developed between volumetric soil water retention and other soil properties such as soil texture, clay content, sand content, bulk density, and organic matter content (Rawls et al., 1982; Saxton et al., 1986). Clay content, sand content, and bulk density have been described as the most important of these soil properties to predict the water retention at field capacity and wilting point (Saxton et al., 1986), although soil organic matter content has been shown to be of some importance (Rawls et al., 1982; da Silva and Kay, 1997). Minasny et al. (1999) evaluated a number of different approaches to the development of pedotransfer functions for water retention using a data set for 840 Australian soils and found that pedotransfer functions developed elsewhere could not be applied directly to Australian soils, due to some unique soil properties and different specifications for particle size fractions. As noted by Bastet et al. (1997), pedotransfer function performance varied according to the pedological origin of the soil on which they were developed. They addressed these problems by examining a combination of modeling techniques, using particle-size distribution and bulk density data to successfully predict water content at different matric potentials. Parametric estimation using extended nonlinear regression was found to be the preferred method.
The relationship between soil water retention and soil structure can be partly explained by the underlying soil composition and chemistry. For example, porous soils are more likely to contain heavy clays and organic matter, which cause porosity due to expansion and contraction with successive wetting and drying cycles. Compacted soils, by comparison, have high density, low pore volumes, and are likely to be dominated by sand and nonreactive soil minerals (Williams et al., 1983).
Soil mineral and organic matter soil components result in specific infrared spectral signatures due to the vibrations of molecular groups within the mineral and organic molecular groups (Janik et al., 1998; Reeves et al., 2001). In the near infrared (NIR), covering the 700- to 2500-nm spectral region, the spectra of soils show vibrational absorbances due to OH in minerals, and to OH, CH, and NH organic functional groups in soil organic matter (Viscarra Rossel and McBratney, 1998a,1998b; Reeves et al., 1999). The mid-infrared (MIR), with vibrations in the spectral region from 4000 cm1 (2500 nm) to 400 cm1 (25000 nm), is sensitive to groups containing protons and also to heavier atoms such as in SiO, AlO, and FeO groups in minerals (Nguyen et al., 1999; van der Marel and Beutelspacher, 1976; Janik and Skjemstad, 1995, Janik et al.,1998; Reeves et al., 2001). Quartz (sand) and kaolinite clays give particularly strong spectral signatures near 1100 to 1000 cm1 (SiO stretching vibration) and 3690 to 3620 cm1 (clay lattice AlOH vibrations), respectively, and tend to occur more in compacted soils and those with high bulk density (Mullins et al., 1987; Dixon and Weed, (1989). Soil organic matter can be identified by peaks due to alkylCH2 at 2930 to 2850 cm1, protein amide near 1680 cm1, carboxylate anion at 1600 and 1400 cm1, and carboxylic acids near 1720 cm1 (van der Marel and Beutelspacher, 1976; McCarty et al., 2002). Spectral absorbances of many of these vibrations can be quantified and correlated with soil water properties and have been used to predict the soil water content (Viscarra Rossel et al., 2006). Some minerals, for example smectitic clays, have a negative interlayer charge balanced by Na, Ca, and K cations with varying degrees of hydration characterized by MIR peaks near 3450 and 1630 cm1 (Zviaginal et al., 2004). Mid-infrared spectroscopy may therefore offer an alternative to pedotransfer functions for the determination of soil water retention.
Partial least squares can be used to model the relationships between infrared spectral intensities and soil properties through derived PLS loadings, scores, and coefficients (Janik and Skjemstad, 1995; Janik et al., 1998). The PLS scores are, in effect, the scaling terms for the loadings used to model the spectra in the PLS calibration set, and the spectral intensities can be scaled with the PLS coefficients to allow the prediction of analyte concentrations from spectra of the unknowns (Haaland and Thomas, 1988). Similar predictions have been reported for soil analysis using NIR spectra (e.g., Odlare et al., 2005; Chang et al., 2005), but the MIR is expected to perform better due to its high sensitivity to quartz, a major constituent in most soils, as well as its sensitivity to clay composition. The MIR PLS method should therefore be able to provide a rapid and inexpensive surrogate method for the prediction of
v directly from soil spectra with good analytical accuracy.
Unfortunately, however, the full potential of MIR PLS to predict soil analyte properties is not always achieved. Partial least squares models assume that calibration models developed for a particular calibration set can also model the spectra of the unknown samples. If the compositional or analytical profile of some of the unknowns is substantially different from that of the samples in the calibration set (outliers) then the values of their PLS scores will lie outside the range of score values for the calibration spectra (score space). One solution to this problem is to analyze some of the extreme outliers by standard methods and then include these in the calibration set and the PLS regression remodeled. This kind of validation, although crucial to test the validity of PLS predictions, is sometimes not performed to save cost, resulting in an overoptimistic PLS model that may be unable to cope with true unknowns.
We set out to show that PLS regression, calibrated for a small set of widely variable soil types (CSIRO Land and Water data set), can be used as a simple and rapid surrogate method for the prediction of soil water retention from MIR spectra. Furthermore, the PLS model can be expanded to allow the prediction of samples from a much larger set of significantly different soil types (New South Wales Department of Natural Resource Management [DNRM] data set).
 |
MATERIALS AND METHODS
|
|---|
Soils
Ninety-six soil samples with widely varying soil properties were sourced from 36 sites across New South Wales, Victoria, and South Australia in southeastern Australia. The samples and data were provided by the CSIRO Land and Water (Canberra laboratory). This primary data set was called the CSIRO calibration set. A further 916 soil samples (from across New South Wales) were provided by the New South Wales DNRM to test or validate the PLS prediction models and was called the NSW validation set. Subsamples of the soil cores taken from the top 100 mm were air dried and ground to pass a 2-mm sieve. Further subsamples (approximately 7 g of each soil) were crushed in a vibrating ring mill equipped with a 50-mm-diameter, 50-g steel puck for 60 s to reduce the particle size to <0.1-mm diameter for spectral scanning.
Water Retention Data
Volumetric water retention values (
v) obtained from soil equilibrated at matric suctions of 1, 3, 5, 10, and 50 kPa were determined by suction plate (Cresswell, 2002) and determined gravimetrically at 500 and 1500 kPa by pressure plate (Cresswell, 2002, Method 504.02). The gravimetric water content values were converted to volumetric percentages (% v/v) using the sample bulk density data (Cresswell, 2002). The data for these soil samples, described in Table 1, showed the wide range of variation in these soils typical of many soils across southeastern Australia. Only data for field capacity (10 kPa) and the permanent wilting point (1500 kPa) were provided by DNRM for the NSW validation set shown in Table 2.
View this table:
[in this window]
[in a new window]
|
Table 1. Volumetric water retention statistics for the CSIRO calibration set showing the number of samples, minimum and maximum (MinMax), range, median, and standard deviation at each matric suction.
|
|
View this table:
[in this window]
[in a new window]
|
Table 2. Volumetric water retention statistics for the New South Wales validation set, showing the number of samples, minimum and maximum (MinMax), range, median and standard deviation at each matric suction.
|
|
Mid-Infrared Soil Spectra
Mid-infrared diffuse reflectance spectra were collected using approximately 100 mg of the air-dried soils scanned in a PerkinElmer Spectrum-One Fourier transform mid-infrared (FTIR) spectrometer (PerkinElmer, Wellesley, MA). Scans were for 60 s in the frequency (wavenumber) range 7800 to 450 cm1 (wavelength range 128022000 nm) at a resolution of 8 cm1, although the frequency range was restricted to 4000 to 500 cm1 (250020000 nm), the optimal frequency range for this spectrometer. The FTIR spectrometer was equipped with an extended range KBr beam splitter, a high-intensity ceramic source, a deuterium triglycine sulfate (DTGS) Peltier-cooled detector, and a PerkinElmer autofocusing diffuse reflectance accessory. Spectra were expressed in absorbance units [log(1/reflectance)]. Background reference scans were performed on silicon carbide (SiC) disks, assumed to have a reflectivity of 1 (100%).
Partial Least Squares Analysis
Spectra were first exported into Grams-SPC format (Thermo Electron Grams/AI, Thermo Fisher Scientific, Waltham, MA), and then into Unscrambler Version 9.1 (Camo Software AS, Oslo, Norway) for PLS calibration. As explained by Geladi and Kowalski, (1986), PLS regression is a bilinear modeling method where spectral and dependent variable reference data are projected onto a small number of "latent" variables (PLS loadings). The procedure for PLS analysis adopted here is similar to that described by Haaland and Thomas, (1988), and later by Janik and Skjemstad (1995) and Janik et al. (1998) for soils. The calibration spectra, which were first mean centered, and corresponding analytic reference data are transformed during PLS calibration into a small set of PLS loadings and loading scaling terms (scores), combining the spectral and
v values in PLS calibration models. These calibrations are optimized (trained) to determine the minimum number of required PLS terms (factors) by using cross-validation, where each sample is removed in turn from the calibration set and its value predicted from the remaining samples; this is known as leave-one-out cross-validation. The PLS loading weights, where the first few loading weights are the primary spectral signatures of the soil components correlated with the property of interest, e.g., water retention in this case, can also be determined to help understand the relationship between spectral features and water retention values at the various matric suctions. The loading weights therefore give qualitative information on the correlations between the measured data and the sample properties, while the distribution of scores in scatter plots can be used to describe the relationships between samples or scores combined with regression coefficients to yield predictions.
 |
RESULTS AND DISCUSSION
|
|---|
Figure 1
depicts the average of the CSIRO calibration spectra. Peaks are clearly evident for kaolinite and gibbsite in the hard-setting acidic soils, smectite in high-pH and heavy-clay soils, and quartz in sandy soils (Nguyen et al., 1999). Quartz (as sand) usually dominates Australian soils, resulting in many strong MIR diffuse reflectance peaks in the 2000 to 1800 cm1 region. These quartz peaks are often overlapped with other clay mineral and soil organic matter peaks in the spectral region from 1400 to 500 cm1 (Janik et al., 1998). The quartz peaks near 1100 to 1000 cm1 are severely distorted due to reflectance effects from large-particle-size sand (Nguyen et al., 1999).

View larger version (15K):
[in this window]
[in a new window]
|
Fig. 1. Mid-infrared (MIR) diffuse reflectance mean spectrum of samples from the CSIRO calibration soil set. Assignments of the major mineral and organic components are indicated.
|
|
The principal components analysis (PCA) score2 vs. score1 map for the CSIRO calibration set is depicted in Fig. 2a
, with principal components PC-1 and PC-2 accounting for 59 and 19% of the total spectral variability, respectively. Positive peaks in the first PCA loading, shown in Fig. 2b, suggest that PC-1 is due largely to kaolinite clay (AlOH stretching vibrations at 36923620 cm1) and gibbsite (peaks at 3524, 3452, and 3388 cm1). As discussed above, most of the negative peaks below 2000 cm1 (particularly 20001788 cm1) and the positive peak near 1100 cm1 are due to quartz. Figure 2b also suggests that the second principal component, PC-2, is due almost entirely to gibbsite. There is some evidence of carbonate, with an absorbance peak near 2516 cm1, and organic matter (alkyl peaks at 29302850 cm1).

View larger version (18K):
[in this window]
[in a new window]
|
Fig. 2. (a) Principal components analysis (PCA) score2 (PC-2) vs. score1 (PC-1) plot and (b) the corresponding PCA loadings for the CSIRO calibration samples in the 4000 to 450 cm1 frequency range.
|
|
The first two PLS loading weights, determined by cross-validation of the CSIRO calibration set for water contents at 10 and 1500 kPa, are illustrated in Fig. 3a and 3b
. These represent the "pure" soil components that correlate most strongly with water contents at 10 and 1500 kPa. Additional soil components contribute more strongly in the subsequent loading weights. In Fig. 3a and 3b, the first loading weight was characterized by peaks corresponding to kaolinite clay, gibbsite, and quartz (negative peaks). In the second loading weight, there was a small contribution from organic matter (alkyl peaks near 28502930 cm1) at 10-kPa suction. The peaks due to quartz and organic matter were slightly stronger at 10 than 1500 kPa.

View larger version (27K):
[in this window]
[in a new window]
|
Fig. 3. Partial least squares loading weights for principal components score1 (PC-1) and score2 (PC-2) for volumetric water retention for the CSIRO calibration set at matric suctions (a) 10 kPa and (b) 1500 kPa.
|
|
Table 3 presents a summary of the PLS cross-validation regression statistics for the CSIRO calibration set. The regression coefficient of determination (R2) and standard error of cross-validation (SECV) provide a measure of the fit of predicted values with respect to the PLS regression. They do not, however, fully describe the capability of PLS to predict accurately. Other metrics such as regression slope, bias, and the residual predictive deviation (RPD) are also considered important. The RPD is the ratio of the standard deviation (SD) to the SECV or to the standard error of prediction (SEP) for actual predictions, where an RPD > 3 is considered to be of analytic quality, while RPDs between 2 and 3 are considered "good," 1.5 to 2 "medium," and RPDs <1.5 are considered to be "poor," with indicator accuracy only (Cozzolino et al., 2005).
View this table:
[in this window]
[in a new window]
|
Table 3. Partial least squares (PLS) cross-validation statistics for volumetric water retention at matric suctions from 1 to 1500 kPa for the CSIRO calibration set, showing the optimum number of PLS components (PCs) used in the calibration model, the regression slope, intercept, and bias, plus the coefficient of determination (R2), standard error of cross-validation (SECV), and ratio of SD to SECV (RPD) for predicted volumetric water content at each matric suction.
|
|
Cross-validation for the full CSIRO calibration set produced R2 values between 0.67 and 0.87, and SECVs in the range 4.1 to 3.2. Figures 4a and 4b
illustrate the results of the cross-validation for the water retention at 10- and 1500-kPa suctions, with an approximately even spread of prediction deviation along the full range of data values. For water retention at 1- to 10-kPa suctions, illustrated in Table 3, the optimum number of PLS factors used to minimize the cross-validation error was between 5 and 7, whereas 9 PLS factors were required for the water contents at 50- to 500-kPa suction. Using more factors increased the risk of overfitting the model and reducing the accuracy of predictions for unknowns. Regression bias was very low, but regression slopes ranged from 0.73 to 0.79 for water contents at 1 and 3 kPa, respectively, and from 0.81 to 0.91 for water contents at 50- to 1500-kPa suctions. The RPDs ranged from approximately 1.9 to almost 3, and were thus considered to have "medium" prediction quality.

View larger version (13K):
[in this window]
[in a new window]
|
Fig. 4. Partial least squares (PLS) cross-validation regression plot for the prediction of volumetric water retention v (% v/v) at (a) 10 kPa suction and (b) 1500 kPa suction for the CSIRO calibration set.
|
|
The PCA score map for the combined CSIRO calibration set plus the NSW validation set (Fig. 5a
) indicates that a significant number of samples from the NSW validation set are outside the range of the CSIRO calibration PCA scores. This is a potential problem, particularly for those soils very high in kaolinite (high PC-1), as shown in the loading weight spectrum in Fig. 5b. The poor coverage of the scores from the NSW validation set, relative to those for the CSIRO calibration set, means that extrapolation by the model would be required for the prediction of samples high in kaolinite. This would be expected to reduce subsequent PLS prediction accuracy.

View larger version (26K):
[in this window]
[in a new window]
|
Fig. 5. (a) Principal components analysis (PCA) score2 (PC-2) vs. score1 (PC-1) plot for the combined spectral data for the CSIRO calibration samples, New South Wales (NSW) validation samples, and the 35 NSW validation samples representing outliers with scores lying outside the extremities of the CSIRO calibration samples, and (b) the corresponding PCA loadings for PC-1 and PC-2.
|
|
For the combined CSIRO calibration plus NSW validation sets, the loading plot in Fig. 5b suggests that PC-1 is due largely to sand (positive loadings for quartz in the 17882000 cm1 region) and a small negative contribution due to kaolinite at 3620 to 3692 cm1. Component PC-2 is due partly to gibbsite and partly to soil organic matter (with peaks near 2930 and 2860 cm1 as well as between 1450 and 1720 cm1). It therefore appears, from the PCA loadings, that a significant difference between the CSIRO calibration and NSW validation data sets may be due to higher sand content in the NSW validation set.
A test of the robustness of the CSIRO PLS calibration model is its predictive ability for an independent soil set, for example the NSW validation set. Before testing the prediction of the NSW validation set, however, cross-validation of the NSW validation set was first performed to determine the best prediction to be expected for the NSW spectra, assuming the lowest prediction errors would come from an "internal" cross-validation model rather than from a model used to predict an "external" independent test set. Cross-validation of water retentions at 10 and 1500 kPa from the NSW validation set resulted in errors similar to that of the CSIRO calibration set, with R2 values of 0.80 and 0.85, SECVs of 5.7 and 3.0, and regression slopes of 0.81 and 0.86, respectively (see Table 4). The RPD values of 2.5 and 2.8, respectively, were consistent with medium-high accuracy.
View this table:
[in this window]
[in a new window]
|
Table 4. Partial least squares (PLS) cross-validation and prediction statistics for the New South Wales (NSW) validation data set presenting statistics for volumetric water retentions v (v%) at matric suctions of 10 kPa and 1500 kPa. Statistics are shown for predictions using the CSIRO calibration and for cross-validation and predictions using the CSIRO calibration plus 35 NSW validation samples (CSIRO + 35 NSW). Also shown are the prediction statistics using loge preprocessing. Data shown for the optimum number of PLS components (PCs) used in the model, the regression slope, intercept and bias, and the coefficient of determination (R2), standard error (where SE refers to SE of cross-validation or SE of prediction) and ratio of SD to SE (RPD).
|
|
Prediction of NSW validation data from the CSIRO calibration model was far less accurate, with R2 values of 0.54 and 0.72, SEP of 6.7 and 4.3, and regression slopes of 0.50 and 0.82 at 10 and 1500 kPa, respectively. The resulting RPDs of 2.1 and 2.0, respectively, were consistent with low-medium accuracy. The regression plots between values predicted from the CSIRO calibration and laboratory reference values are illustrated in Fig. 6a and 6b
for suctions of 10 and 1500 kPa, respectively.

View larger version (19K):
[in this window]
[in a new window]
|
Fig. 6. Regression plots for volumetric water retention predicted by mid-infrared partial least squares (MIR PLS) (% v/v) at matric suctions (a) 10 kPa and (b) 1500 kPa for the New South Wales validation set using the CSIRO calibration samples.
|
|
To improve the ability to predict water retention for the NSW validation set, a relatively small number of representative outliers from the NSW validation data were added to the CSIRO calibration model, which increased its spectral diversity. Thirty-five extreme outliers (3.8% of the data set) were selected from the PCA score plot of the combined CSIRO calibration and NSW validation samples illustrated in Fig. 5a (shown in Fig. 5a as NSW outliers). Prediction of the NSW validation set using the combined CSIRO calibration plus the 35 NSW validation outliers resulted in a significantly improved prediction (Table 4). Figures 7a and 7b
present the regression between the predicted and measured water retentions at 10 and 1500 kPa for the remaining NSW validation data set. By comparison, the root mean square residual error of the predicted and observed water contents presented by Minasny et al. (1999) ranged from 0.0723 to 0.1466 m3 m3 at 10 kPa and from 0.0228 to 0.1127 m3 m3 at 1500 kPa, compared with 0.0707 m3 m3 (SEP = 6.3% v/v) at 10 kPa and 0.0500 m3 m3 (3.8% v/v) at 1500 kPa obtained using our MIR PLS method.

View larger version (20K):
[in this window]
[in a new window]
|
Fig. 7. Regression plots for predicted volumetric water retention (% v/v) at matric suctions (a) 10 kPa and (b) 1500 kPa for the New South Wales (NSW) validation set using the combined CSIRO calibration plus 35 NSW validation samples.
|
|
The MIR PLS method discussed above should be, in principle, more accurate than pedotransfer functions, in that it is based on the spectral patterns resulting from the composition of the soils, including the amount and type of clay minerals and organic matter present in the samples. The MIR PLS method does not require predetermined functional relationships between soil properties and water content. Apart from the dependence on soil chemistry and composition, the spectral characteristics can also be affected by particle size (Nguyen et al., 1999), which may be related indirectly to soil water retention properties. The MIR PLS method therefore does not require prior knowledge of specific soil parameters such as organic matter, clay, or sand content needed for successful evaluation using pedotransfer functions. With regard to pedotransfer functions, the implied relationship between clay content and soil water retention may not always be valid in Australia due to the wide variation in types of clay. Clay type can vary from kaolinite and illite with hard-setting soil properties to heavy clays with self-mulching soil properties. The MIR spectra are affected by changes in the mineralogical variability but pedotransfer functions may fail to account for this variability.
Mid-infrared PLS calibration models can be readily developed from archived soil data and MIR spectra, as was done here. If calibration data is not available, however, acquiring new analytical data can be time consuming and therefore expensive. The calibration samples should always be tested with an independent test set and should include the range of unknown soils to be analyzed or predictions may be unreliable.
Partial least squares cross-validation is a method commonly used to maximize the amount of information available for training the PLS model from small and expensive calibration data by taking advantage of all the calibration reference data. This method can, in some circumstances, result in overoptimistic statistics in that the validation data used in cross-validation are not truly independent since they are derived from within the same calibration set. Models from small calibration sets may be required to predict results for much larger soil sets, exacerbating this problem. The relatively small CSIRO calibration set used in this study is typical of the limited data sets available in Australia. While the MIR PLS cross-validation resulted in apparently good cross-validation accuracy in this study, we were unable to confirm its robustness for "real" predictions without testing it with truly independent samples such as the NSW validation set.
Where relatively large data sets become available, they can be randomly split into separate "calibration" and "validation" sets. This method provides a more independent validation, since a lower proportion of calibration samples would be used to validate the model, but still suffers from some interdependence between calibration and validation data sets and can result in a higher (but more realistic) validation error.
In this study, the lack of accuracy of the original CSIRO calibration set in predicting the NSW validation data was evident. As seen in Fig. 6a and 6b, prediction of the NSW test samples from the CSIRO set resulted in reduced R2 values of 0.54 and 0.72, and increased SEP values of 6.7 and 4.3 compared with the CSIRO cross-validation results.
Of particular concern in the prediction of water content for NSW samples from the CSIRO calibration was that the regression slopes were very lowbetween 0.50 and 0.82. Inclusion of 35 outlier samples from the NSW validation set into the CSIRO calibration set significantly improved the accuracy of predicting the NSW validation set. The process of upgrading the calibration data by including some samples with known analytical values from the unknown sample set confirms the general need to validate predictions for routine analysis of unknown samples, as well as to provide a means of improving the predictive ability of the PLS calibrations for new samples.
There was a marked curvature of the regression in Fig. 6a and 6b, with predicted values being significantly underpredicted at increasing
v values. Such curvature can arise from either a nonlinear MIR intensity response to water content or to a different chemistry or sample composition corresponding to low water retention compared with high water values. The regression curvature was marginally reduced on including the extra NSW samples with the CSIRO calibration samples, as seen in Fig. 7a and 7b. In an attempt to eliminate prediction curvature, the
v values for the CSIRO calibration samples were preprocessed using loge before developing the calibration model. The choice of the loge function in this case to correct regression curvature was purely speculative, since a number of nonlinear functions may have been used with a similar effect. After prediction, the predicted
v values were simply reconverted back to the original volumetric percentage units using the exp(x) function, where x is the predicted
v value. Figures 8a and 8b
depict the regressions between predicted
v at matric suctions of 10 and 1500 kPa, respectively, for the NSW data set after using the loge processing function, with the regression statistics included in Table 4. The regression R2 values were unchanged for the 10-kPa prediction and increased from 0.74 to 0.77 for the 1500-kPa prediction, with a marginal increase in the SEP, due largely to increases in prediction error at high
v values. The value of any linearization preprocessing transform therefore remains questionable except perhaps for predictions of low values of
v. These results support the use of MIR PLS as an alternative to current pedotransfer functions in predicting soil water retention at a range of matric potentials.

View larger version (21K):
[in this window]
[in a new window]
|
Fig. 8. Regression plots for predicted volumetric water retention (% v/v) at matric suctions (a) 10 kPa and (b) 1500 kPa for the New South Wales (NSW) validation set using the combined CSIRO calibration plus 35 NSW validation samples after preprocessing the water retention data with a logepreprocessing transform before partial least squares analysis.
|
|
 |
CONCLUSIONS
|
|---|
Mid-infrared with PLS is a simple, rapid, and inexpensive surrogate method to predict
v across the range of matric suctions from 1 to 1500 kPa. Predictions are based on the link between soil texture, bulk density, and the spectral signatures of some clay minerals, organic matter, and quartz. Accuracy of the PLS predictions appear to be better than those from some pedotransfer functions that simply rely on texture and bulk density data.
While the MIR PLS cross-validation appears to provide estimations with an accuracy comparable to those using pedotransfer functions, predictions for independent validation samples show that true unknown samples can be poorly predicted using a single calibration model for soils that are markedly different from those of the calibration samples. Where the spectral characteristics of the unknowns, as determined by their PLS loadings and scores, lie outside those of the calibration set, prediction errors can be unacceptably high. Prediction accuracy of unknowns can be significantly improved by analyzing a relatively small subset of samples from the unknown soil set and incorporating these within an expanded calibration set, leading to a more robust prediction model. Considerable regression curvature was observed for the prediction of
v unknowns by MIR PLS regression. Apart from a slight reduction in curvature resulting from the incorporation of spectral outlier samples from the unknown sample set into the calibration set, regression curvature could be almost eliminated by use of a loge preprocessing function on the
v calibration values. This loge transform, however, barely improved the R2 and significantly increased the SEP, so that the use of linearization preprocessing transforms remains questionable.
The MIR PLS method provides similar, if not better, accuracy and easier use for predicting soil water retention of soils compared with some pedotransfer functions. It also has the added advantage of quick processing time and much lower cost due to the need for only an infrared spectrum, scanned in minutes, as the sole input to a PLS model for simultaneous predictions of water retention ranging from field capacity to the wilting point.
 |
NOTES
|
|---|
All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Permission for printing and for reprinting the material contained herein has been obtained by the publisher.
Received for publication December 5, 2005.
 |
REFERENCES
|
|---|
- Arya, L.M., and J.F. Paris. 1981. A physicoempirical model to predict the soil moisture characteristic from particle size distribution and bulk density data. Soil Sci. Soc. Am. J. 45:10231030.[ISI]
- Bastet, G., A. Bruand, M. Voltz, M. Bornand, and P. Quétin. 1997. Performance of available pedotransfer functions for predicting the water retention properties of French soils. p. 981991. In M.Th. van Genuchten et al. (ed.) Characterization and measurement of the hydraulic properties of unsaturated porous media. Univ. of California, Riverside, CA.
- Chang, G.W., D.A. Laird, and G.R. Hurburgh. 2005. Influence of soil moisture on near-infrared reflectance spectroscopic measurement of soil properties. Soil Sci. 170:244255.
- Cozzolino, D., A.F. Montossi, and R. San Julian. 2005. The use of visible (VIS) and near infrared (NIR) reflectance spectroscopy to predict fibre diameter in both clean and greasy wool samples. Anim. Sci. 80:333338.
- Cresswell, H.P. 2002. The soil water characteristic. p. 5984. In N.J. McKenzie et al. (ed.) Soil physical measurement and interpretation for land evaluation. CSIRO Publ., Melbourne.
- da Silva, L.M., and B.D. Kay. 1997. Estimating the least limiting water range of soils from properties and management. Soil Sci. Soc. Am. J. 61:877883.
- Dixon, J.B., and S.B. Weed. 1989. Minerals in soil environments. 2nd ed. SSSA Book Ser. 1. SSSA, Madison, WI.
- Geladi, P., and B.R. Kowalski. 1986. Partial least-squares regression: A tutorial. Anal. Chim. Acta 185:117.
- Haaland, D.M., and V.T. Thomas. 1988. Partial least-squares methods for spectral analyses. 1. Relation to other quantitative calibration methods and the extraction of qualitative information. Anal. Chem. 60:11931202.
- Haverkamp, R., and J.Y. Parlange. 1986. Predicting the water-retention curve from particle-size distribution: 1. Sandy soils without organic matter. Soil Sci. 142:325339.
- Janik, L.J., R.H. Merry, and J.O. Skjemstad. 1998. Can mid infrared diffuse reflectance analysis replace soil extractions? Aust. J. Exp. Agric. 38:681696.
- Janik, L.J., and J.O. Skjemstad. 1995. Characterization and analysis of soils using mid-infrared partial least squares. II. Correlations with some laboratory data. Aust. J. Soil Res. 33:637650.
- McCarty, G.W., J.B. Reeves III, R.F. Follett, and J.M. Kimble. 2002. Mid-infrared and near-infrared diffuse reflectance spectroscopy for soil carbon measurement. Soil Sci. Soc. Am. J. 66:640646.[Abstract/Free Full Text]
- Minasny, B., A.B. McBratney, and K.L. Bristow. 1999. Comparison of different approaches to the development of pedotransfer functions for water retention curves. Geoderma 93:225253.[CrossRef][ISI]
- Mullins, C.E., I.M. Young, A.G. Bengough, and G.J. Ley. 1987. Hard-setting soils. Soil Use Manage. 3:7983.
- Nguyen, T.T., L.J. Janik, and M. Raupach. 1999. Diffuse reflectance infrared Fourier transform (DRIFT) spectroscopy in soil studies. Aust. J. Soil Res. 29:4967.
- Odlare, M., K. Svensson, and M. Pell. 2005. Near infrared reflectance spectroscopy for assessment of spatial soil variation in an agricultural field. Geoderma 126:193202.[CrossRef][ISI]
- Rawls, W.J., D.L. Brakensiek, and K.E. Saxton. 1982. Estimation of soil water properties. Trans. ASAE 25:13161320.
- Reeves, J.B., III, G.W. McCarty, and J.J. Meisenger. 1999. Near-infrared diffuse reflectance spectroscopy for the analysis of agricultural soil. J. Near Infrared Spectrosc. 7:179193.
- Reeves, J.B., III, G.W. McCarty, and V.B. Reeves. 2001. Mid-infrared diffuse reflectance spectroscopy for the quantitative analysis of agricultural soil. J. Agric. Food Chem. 49:766772.[CrossRef][ISI][Medline]
- Saxton, K.E., W.J. Rawls, J.S. Romberger, and R.I. Papendick. 1986. Estimating generalized soil-water characteristics from texture. Soil Sci. Soc. Am. J. 50:10311036.
- van der Marel, H.W., and H. Beutelspacher. 1976. Clay and related minerals. In H.W. van der Marel and H. Beutelspacher (ed.) Atlas of infrared spectroscopy of clay minerals and their admixtures. Elsevier Scientific, Amsterdam.
- Vereecken, H., J. Maes, J. Feyen, and P. Darius. 1989. Estimating the soil moisture retention characteristic from texture, bulk density and carbon content. Soil Sci. 148:389403.
- Viscarra Rossel, R.A., and A.B. McBratney. 1998a. Soil chemical analytical accuracy and costs: Implications from precision agriculture. Aust. J. Exp. Agric. 38:765775.
- Viscarra Rossel, R.A., and A.B. McBratney. 1998b. Laboratory evaluation of a proximal sensing technique for simultaneous measurement of clay and water content. Geoderma 85:1939.[CrossRef][ISI]
- Viscarra Rossel, R.A., D.J.J. Walvoort, A.B. McBratney, L.J. Janik, and J.O. Skjemstad. 2006. Visible, near-infrared, mid-infrared or combined diffuse reflectance spectroscopy for simultaneous assessment of various soil properties. Geoderma 131:5975.[CrossRef][ISI]
- Williams, J., R.E. Prebble, W.T. Williams, and C.T. Hignett. 1983. The influence of texture, structure and clay mineralogy on the soil moisture characteristic. Aust. J. Soil Res. 21:1532.
- Zviaginal, B.B., D.K. McCarty, J. Rodo, and V.A. Drits. 2004. Interpretation of infrared spectra of dioctahedral smectites in the region of OH stretching vibrations. Clays Clay Miner. 52:399410.[Abstract/Free Full Text]
This article has been cited by other articles:

|
 |

|
 |
 
G. Tranter, B. Minasny, A. B. McBratney, R. A. V. Rossel, and B. W. Murphy
Comparing Spectral Soil Inference Systems and Mid-Infrared Spectroscopic Predictions of Soil Moisture Retention
Soil Sci. Soc. Am. J.,
August 20, 2008;
72(5):
1394 - 1400.
[Abstract]
[Full Text]
[PDF]
|
 |
|