SSSAJ Journal of Natural Resources and Life Sciences Education
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published online 12 March 2007
Published in Soil Sci Soc Am J 71:507-514 (2007)
DOI: 10.2136/sssaj2005.0391
© 2007 Soil Science Society of America
677 S. Segoe Rd., Madison, WI 53711 USA
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Janik, L. J.
Right arrow Articles by Rawson, A.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Janik, L. J.
Right arrow Articles by Rawson, A.
GeoRef
Right arrow GeoRef Citation
Agricola
Right arrow Articles by Janik, L. J.
Right arrow Articles by Rawson, A.
Related Collections
Right arrow Soil Methods/Instrumentation
Right arrow Soil Analysis
Right arrow Soil Physics

SOIL PHYSICS

Rapid Prediction of Soil Water Retention using Mid Infrared Spectroscopy

L. J. Janika, R. H. Merrya, S. T. Forrestera,*, D. M. Lanyonb and A. Rawsonc

a CSIRO Land & Water PMB 2, Glen Osmond, South Australia 5064
b CSIRO Sustainable Ecosystems Waite Rd. Urrbrae South Australia 5062
c New South Wales Dep. of Natural Resources c/o Faculty of Science and Agriculture Charles Sturt Univ., Leeds Pde, Orange PO Box 883 NSW 2800

* Corresponding author (sean.forrester{at}csiro.au).


    ABSTRACT
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Soil-water properties vary widely with soil composition and texture, but measurements are often time consuming and expensive to determine using traditional laboratory methods. Mid-infrared (MIR) spectroscopy is sensitive to soil composition, allowing multivariate calibrations to be derived between volumetric soil water retention and MIR spectra. Mid-infrared partial least squares (PLS) models can be derived from the spectra of soils and reference data, and can be used to predict the water retention solely from the MIR spectra of unknown samples. Regressions between laboratory-determined volumetric water retentions, {theta}v, at matric suctions from 1 to 1500 kPa and values predicted by MIR PLS analysis are presented for a broad variety of surface soils from southern Australia. Cross-validation produced coefficient of determination values ranging from 0.67 to 0.87 and standard error of cross-validation in the range 4.1 to 3.2. Prediction robustness was tested using an independent set of samples for values of {theta}v at field capacity (10-kPa suction) and permanent wilting point (1500-kPa suction). The prediction standard error for the test set was higher than for cross-validation. This was attributed to a mismatch between spectra for the test set and those of the calibration samples, resulting in a reduced ability of the calibration samples to model the test set spectra. The MIR PLS prediction method performed at least as well as some pedotransfer functions and was shown to be a rapid and inexpensive method for the prediction of volumetric soil moisture content for a range of soil types at a range of matric suctions.

Abbreviations: MIR, mid-infrared • NSW, New South Wales • PC, principal component • PCA, principal components analysis • PLS, partial least squares • RPD, residual predictive deviations • SECV, standard error of cross-validation • SEP, standard error of prediction


    INTRODUCTION
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Soil water retention is an important property for determining moisture content and is affected by soil density, particle size, mineral and organic composition, and pore-space density and distribution. Tests for soil water retention are, however, underutilized largely due to the relatively high cost and long turnaround time of the laboratory analysis. The most commonly used measurements for water retention are the volumetric water retention percentage ({theta}v) at field capacity (which is arguably in the matric suction range 8–33 kPa) and at wilting point (with a matric suction of 1500 kPa). These two values broadly describe the extractable plant-available water, but a multipoint curve of the distribution of volumetric water contents from saturation to 1500 kPa is required for a full description of the water holding capacity of the soil.

Current laboratory methods for the determination of water retention require accurately maintained matric suctions on intact sections of soil cores for lengthy periods (Cresswell, 2002). These methods use ceramic suction plates for matric suctions up to 80 kPa or pressure plate extractors for matric suctions from 80 to 1500 kPa. Determining these values is expensive and time consuming and so this service is generally not offered for routine soil testing. A rapid, inexpensive alternative method to determine soil water retention, with acceptable precision, is needed as a surrogate for the laboratory method.

Pedotransfer functions and physicoempirical models have been used in response to this need, using more readily available and less expensive soil data (e.g., Arya and Paris, 1981; Rawls et al., 1982; Haverkamp and Parlange, 1986; Vereecken et al., 1989; da Silva and Kay, 1997). These functions are based on relationships developed between volumetric soil water retention and other soil properties such as soil texture, clay content, sand content, bulk density, and organic matter content (Rawls et al., 1982; Saxton et al., 1986). Clay content, sand content, and bulk density have been described as the most important of these soil properties to predict the water retention at field capacity and wilting point (Saxton et al., 1986), although soil organic matter content has been shown to be of some importance (Rawls et al., 1982; da Silva and Kay, 1997). Minasny et al. (1999) evaluated a number of different approaches to the development of pedotransfer functions for water retention using a data set for 840 Australian soils and found that pedotransfer functions developed elsewhere could not be applied directly to Australian soils, due to some unique soil properties and different specifications for particle size fractions. As noted by Bastet et al. (1997), pedotransfer function performance varied according to the pedological origin of the soil on which they were developed. They addressed these problems by examining a combination of modeling techniques, using particle-size distribution and bulk density data to successfully predict water content at different matric potentials. Parametric estimation using extended nonlinear regression was found to be the preferred method.

The relationship between soil water retention and soil structure can be partly explained by the underlying soil composition and chemistry. For example, porous soils are more likely to contain heavy clays and organic matter, which cause porosity due to expansion and contraction with successive wetting and drying cycles. Compacted soils, by comparison, have high density, low pore volumes, and are likely to be dominated by sand and nonreactive soil minerals (Williams et al., 1983).

Soil mineral and organic matter soil components result in specific infrared spectral signatures due to the vibrations of molecular groups within the mineral and organic molecular groups (Janik et al., 1998; Reeves et al., 2001). In the near infrared (NIR), covering the 700- to 2500-nm spectral region, the spectra of soils show vibrational absorbances due to –OH in minerals, and to –OH, –CH, and NH organic functional groups in soil organic matter (Viscarra Rossel and McBratney, 1998a,1998b; Reeves et al., 1999). The mid-infrared (MIR), with vibrations in the spectral region from 4000 cm–1 (2500 nm) to 400 cm–1 (25000 nm), is sensitive to groups containing protons and also to heavier atoms such as in Si–O, Al–O, and Fe–O groups in minerals (Nguyen et al., 1999; van der Marel and Beutelspacher, 1976; Janik and Skjemstad, 1995, Janik et al.,1998; Reeves et al., 2001). Quartz (sand) and kaolinite clays give particularly strong spectral signatures near 1100 to 1000 cm–1 (Si–O stretching vibration) and 3690 to 3620 cm–1 (clay lattice Al–OH vibrations), respectively, and tend to occur more in compacted soils and those with high bulk density (Mullins et al., 1987; Dixon and Weed, (1989). Soil organic matter can be identified by peaks due to alkyl–CH2 at 2930 to 2850 cm–1, protein amide near 1680 cm–1, carboxylate anion at 1600 and 1400 cm–1, and carboxylic acids near 1720 cm–1 (van der Marel and Beutelspacher, 1976; McCarty et al., 2002). Spectral absorbances of many of these vibrations can be quantified and correlated with soil water properties and have been used to predict the soil water content (Viscarra Rossel et al., 2006). Some minerals, for example smectitic clays, have a negative interlayer charge balanced by Na, Ca, and K cations with varying degrees of hydration characterized by MIR peaks near 3450 and 1630 cm–1 (Zviaginal et al., 2004). Mid-infrared spectroscopy may therefore offer an alternative to pedotransfer functions for the determination of soil water retention.

Partial least squares can be used to model the relationships between infrared spectral intensities and soil properties through derived PLS loadings, scores, and coefficients (Janik and Skjemstad, 1995; Janik et al., 1998). The PLS scores are, in effect, the scaling terms for the loadings used to model the spectra in the PLS calibration set, and the spectral intensities can be scaled with the PLS coefficients to allow the prediction of analyte concentrations from spectra of the unknowns (Haaland and Thomas, 1988). Similar predictions have been reported for soil analysis using NIR spectra (e.g., Odlare et al., 2005; Chang et al., 2005), but the MIR is expected to perform better due to its high sensitivity to quartz, a major constituent in most soils, as well as its sensitivity to clay composition. The MIR PLS method should therefore be able to provide a rapid and inexpensive surrogate method for the prediction of {theta}v directly from soil spectra with good analytical accuracy.

Unfortunately, however, the full potential of MIR PLS to predict soil analyte properties is not always achieved. Partial least squares models assume that calibration models developed for a particular calibration set can also model the spectra of the unknown samples. If the compositional or analytical profile of some of the unknowns is substantially different from that of the samples in the calibration set (outliers) then the values of their PLS scores will lie outside the range of score values for the calibration spectra (score space). One solution to this problem is to analyze some of the extreme outliers by standard methods and then include these in the calibration set and the PLS regression remodeled. This kind of validation, although crucial to test the validity of PLS predictions, is sometimes not performed to save cost, resulting in an overoptimistic PLS model that may be unable to cope with true unknowns.

We set out to show that PLS regression, calibrated for a small set of widely variable soil types (CSIRO Land and Water data set), can be used as a simple and rapid surrogate method for the prediction of soil water retention from MIR spectra. Furthermore, the PLS model can be expanded to allow the prediction of samples from a much larger set of significantly different soil types (New South Wales Department of Natural Resource Management [DNRM] data set).


    MATERIALS AND METHODS
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Soils
Ninety-six soil samples with widely varying soil properties were sourced from 36 sites across New South Wales, Victoria, and South Australia in southeastern Australia. The samples and data were provided by the CSIRO Land and Water (Canberra laboratory). This primary data set was called the CSIRO calibration set. A further 916 soil samples (from across New South Wales) were provided by the New South Wales DNRM to test or validate the PLS prediction models and was called the NSW validation set. Subsamples of the soil cores taken from the top 100 mm were air dried and ground to pass a 2-mm sieve. Further subsamples (approximately 7 g of each soil) were crushed in a vibrating ring mill equipped with a 50-mm-diameter, 50-g steel puck for 60 s to reduce the particle size to <0.1-mm diameter for spectral scanning.

Water Retention Data
Volumetric water retention values ({theta}v) obtained from soil equilibrated at matric suctions of 1, 3, 5, 10, and 50 kPa were determined by suction plate (Cresswell, 2002) and determined gravimetrically at 500 and 1500 kPa by pressure plate (Cresswell, 2002, Method 504.02). The gravimetric water content values were converted to volumetric percentages (% v/v) using the sample bulk density data (Cresswell, 2002). The data for these soil samples, described in Table 1, showed the wide range of variation in these soils typical of many soils across southeastern Australia. Only data for field capacity (10 kPa) and the permanent wilting point (1500 kPa) were provided by DNRM for the NSW validation set shown in Table 2.


View this table:
[in this window]
[in a new window]

 
Table 1. Volumetric water retention statistics for the CSIRO calibration set showing the number of samples, minimum and maximum (Min–Max), range, median, and standard deviation at each matric suction.{dagger}

 

View this table:
[in this window]
[in a new window]

 
Table 2. Volumetric water retention statistics for the New South Wales validation set, showing the number of samples, minimum and maximum (Min–Max), range, median and standard deviation at each matric suction.{dagger}

 
Mid-Infrared Soil Spectra
Mid-infrared diffuse reflectance spectra were collected using approximately 100 mg of the air-dried soils scanned in a PerkinElmer Spectrum-One Fourier transform mid-infrared (FTIR) spectrometer (PerkinElmer, Wellesley, MA). Scans were for 60 s in the frequency (wavenumber) range 7800 to 450 cm–1 (wavelength range 1280–22000 nm) at a resolution of 8 cm–1, although the frequency range was restricted to 4000 to 500 cm–1 (2500–20000 nm), the optimal frequency range for this spectrometer. The FTIR spectrometer was equipped with an extended range KBr beam splitter, a high-intensity ceramic source, a deuterium triglycine sulfate (DTGS) Peltier-cooled detector, and a PerkinElmer autofocusing diffuse reflectance accessory. Spectra were expressed in absorbance units [log(1/reflectance)]. Background reference scans were performed on silicon carbide (SiC) disks, assumed to have a reflectivity of 1 (100%).

Partial Least Squares Analysis
Spectra were first exported into Grams-SPC format (Thermo Electron Grams/AI, Thermo Fisher Scientific, Waltham, MA), and then into Unscrambler Version 9.1 (Camo Software AS, Oslo, Norway) for PLS calibration. As explained by Geladi and Kowalski, (1986), PLS regression is a bilinear modeling method where spectral and dependent variable reference data are projected onto a small number of "latent" variables (PLS loadings). The procedure for PLS analysis adopted here is similar to that described by Haaland and Thomas, (1988), and later by Janik and Skjemstad (1995) and Janik et al. (1998) for soils. The calibration spectra, which were first mean centered, and corresponding analytic reference data are transformed during PLS calibration into a small set of PLS loadings and loading scaling terms (scores), combining the spectral and {theta}v values in PLS calibration models. These calibrations are optimized (trained) to determine the minimum number of required PLS terms (factors) by using cross-validation, where each sample is removed in turn from the calibration set and its value predicted from the remaining samples; this is known as leave-one-out cross-validation. The PLS loading weights, where the first few loading weights are the primary spectral signatures of the soil components correlated with the property of interest, e.g., water retention in this case, can also be determined to help understand the relationship between spectral features and water retention values at the various matric suctions. The loading weights therefore give qualitative information on the correlations between the measured data and the sample properties, while the distribution of scores in scatter plots can be used to describe the relationships between samples or scores combined with regression coefficients to yield predictions.


    RESULTS AND DISCUSSION
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Figure 1 depicts the average of the CSIRO calibration spectra. Peaks are clearly evident for kaolinite and gibbsite in the hard-setting acidic soils, smectite in high-pH and heavy-clay soils, and quartz in sandy soils (Nguyen et al., 1999). Quartz (as sand) usually dominates Australian soils, resulting in many strong MIR diffuse reflectance peaks in the 2000 to 1800 cm–1 region. These quartz peaks are often overlapped with other clay mineral and soil organic matter peaks in the spectral region from 1400 to 500 cm–1 (Janik et al., 1998). The quartz peaks near 1100 to 1000 cm–1 are severely distorted due to reflectance effects from large-particle-size sand (Nguyen et al., 1999).


Figure 1
View larger version (15K):
[in this window]
[in a new window]

 
Fig. 1. Mid-infrared (MIR) diffuse reflectance mean spectrum of samples from the CSIRO calibration soil set. Assignments of the major mineral and organic components are indicated.

 
The principal components analysis (PCA) score2 vs. score1 map for the CSIRO calibration set is depicted in Fig. 2a , with principal components PC-1 and PC-2 accounting for 59 and 19% of the total spectral variability, respectively. Positive peaks in the first PCA loading, shown in Fig. 2b, suggest that PC-1 is due largely to kaolinite clay (Al–OH stretching vibrations at 3692–3620 cm–1) and gibbsite (peaks at 3524, 3452, and 3388 cm–1). As discussed above, most of the negative peaks below 2000 cm–1 (particularly 2000–1788 cm–1) and the positive peak near 1100 cm–1 are due to quartz. Figure 2b also suggests that the second principal component, PC-2, is due almost entirely to gibbsite. There is some evidence of carbonate, with an absorbance peak near 2516 cm–1, and organic matter (alkyl peaks at 2930–2850 cm–1).


Figure 2
View larger version (18K):
[in this window]
[in a new window]

 
Fig. 2. (a) Principal components analysis (PCA) score2 (PC-2) vs. score1 (PC-1) plot and (b) the corresponding PCA loadings for the CSIRO calibration samples in the 4000 to 450 cm–1 frequency range.

 
The first two PLS loading weights, determined by cross-validation of the CSIRO calibration set for water contents at 10 and 1500 kPa, are illustrated in Fig. 3a and 3b . These represent the "pure" soil components that correlate most strongly with water contents at 10 and 1500 kPa. Additional soil components contribute more strongly in the subsequent loading weights. In Fig. 3a and 3b, the first loading weight was characterized by peaks corresponding to kaolinite clay, gibbsite, and quartz (negative peaks). In the second loading weight, there was a small contribution from organic matter (alkyl peaks near 2850–2930 cm–1) at 10-kPa suction. The peaks due to quartz and organic matter were slightly stronger at 10 than 1500 kPa.


Figure 3
View larger version (27K):
[in this window]
[in a new window]

 
Fig. 3. Partial least squares loading weights for principal components score1 (PC-1) and score2 (PC-2) for volumetric water retention for the CSIRO calibration set at matric suctions (a) 10 kPa and (b) 1500 kPa.

 
Table 3 presents a summary of the PLS cross-validation regression statistics for the CSIRO calibration set. The regression coefficient of determination (R2) and standard error of cross-validation (SECV) provide a measure of the fit of predicted values with respect to the PLS regression. They do not, however, fully describe the capability of PLS to predict accurately. Other metrics such as regression slope, bias, and the residual predictive deviation (RPD) are also considered important. The RPD is the ratio of the standard deviation (SD) to the SECV or to the standard error of prediction (SEP) for actual predictions, where an RPD > 3 is considered to be of analytic quality, while RPDs between 2 and 3 are considered "good," 1.5 to 2 "medium," and RPDs <1.5 are considered to be "poor," with indicator accuracy only (Cozzolino et al., 2005).


View this table:
[in this window]
[in a new window]

 
Table 3. Partial least squares (PLS) cross-validation statistics for volumetric water retention at matric suctions from 1 to 1500 kPa for the CSIRO calibration set, showing the optimum number of PLS components (PCs) used in the calibration model, the regression slope, intercept, and bias, plus the coefficient of determination (R2), standard error of cross-validation (SECV), and ratio of SD to SECV (RPD) for predicted volumetric water content at each matric suction.

 
Cross-validation for the full CSIRO calibration set produced R2 values between 0.67 and 0.87, and SECVs in the range 4.1 to 3.2. Figures 4a and 4b illustrate the results of the cross-validation for the water retention at 10- and 1500-kPa suctions, with an approximately even spread of prediction deviation along the full range of data values. For water retention at 1- to 10-kPa suctions, illustrated in Table 3, the optimum number of PLS factors used to minimize the cross-validation error was between 5 and 7, whereas 9 PLS factors were required for the water contents at 50- to 500-kPa suction. Using more factors increased the risk of overfitting the model and reducing the accuracy of predictions for unknowns. Regression bias was very low, but regression slopes ranged from 0.73 to 0.79 for water contents at 1 and 3 kPa, respectively, and from 0.81 to 0.91 for water contents at 50- to 1500-kPa suctions. The RPDs ranged from approximately 1.9 to almost 3, and were thus considered to have "medium" prediction quality.


Figure 4
View larger version (13K):
[in this window]
[in a new window]

 
Fig. 4. Partial least squares (PLS) cross-validation regression plot for the prediction of volumetric water retention {theta}v (% v/v) at (a) 10 kPa suction and (b) 1500 kPa suction for the CSIRO calibration set.

 
The PCA score map for the combined CSIRO calibration set plus the NSW validation set (Fig. 5a ) indicates that a significant number of samples from the NSW validation set are outside the range of the CSIRO calibration PCA scores. This is a potential problem, particularly for those soils very high in kaolinite (high PC-1), as shown in the loading weight spectrum in Fig. 5b. The poor coverage of the scores from the NSW validation set, relative to those for the CSIRO calibration set, means that extrapolation by the model would be required for the prediction of samples high in kaolinite. This would be expected to reduce subsequent PLS prediction accuracy.


Figure 5
View larger version (26K):
[in this window]
[in a new window]

 
Fig. 5. (a) Principal components analysis (PCA) score2 (PC-2) vs. score1 (PC-1) plot for the combined spectral data for the CSIRO calibration samples, New South Wales (NSW) validation samples, and the 35 NSW validation samples representing outliers with scores lying outside the extremities of the CSIRO calibration samples, and (b) the corresponding PCA loadings for PC-1 and PC-2.

 
For the combined CSIRO calibration plus NSW validation sets, the loading plot in Fig. 5b suggests that PC-1 is due largely to sand (positive loadings for quartz in the 1788–2000 cm–1 region) and a small negative contribution due to kaolinite at 3620 to 3692 cm–1. Component PC-2 is due partly to gibbsite and partly to soil organic matter (with peaks near 2930 and 2860 cm–1 as well as between 1450 and 1720 cm–1). It therefore appears, from the PCA loadings, that a significant difference between the CSIRO calibration and NSW validation data sets may be due to higher sand content in the NSW validation set.

A test of the robustness of the CSIRO PLS calibration model is its predictive ability for an independent soil set, for example the NSW validation set. Before testing the prediction of the NSW validation set, however, cross-validation of the NSW validation set was first performed to determine the best prediction to be expected for the NSW spectra, assuming the lowest prediction errors would come from an "internal" cross-validation model rather than from a model used to predict an "external" independent test set. Cross-validation of water retentions at 10 and 1500 kPa from the NSW validation set resulted in errors similar to that of the CSIRO calibration set, with R2 values of 0.80 and 0.85, SECVs of 5.7 and 3.0, and regression slopes of 0.81 and 0.86, respectively (see Table 4). The RPD values of 2.5 and 2.8, respectively, were consistent with medium-high accuracy.


View this table:
[in this window]
[in a new window]

 
Table 4. Partial least squares (PLS) cross-validation and prediction statistics for the New South Wales (NSW) validation data set presenting statistics for volumetric water retentions {theta}v (v%) at matric suctions of 10 kPa and 1500 kPa. Statistics are shown for predictions using the CSIRO calibration and for cross-validation and predictions using the CSIRO calibration plus 35 NSW validation samples (CSIRO + 35 NSW). Also shown are the prediction statistics using loge preprocessing. Data shown for the optimum number of PLS components (PCs) used in the model, the regression slope, intercept and bias, and the coefficient of determination (R2), standard error (where SE refers to SE of cross-validation or SE of prediction) and ratio of SD to SE (RPD).

 
Prediction of NSW validation data from the CSIRO calibration model was far less accurate, with R2 values of 0.54 and 0.72, SEP of 6.7 and 4.3, and regression slopes of 0.50 and 0.82 at 10 and 1500 kPa, respectively. The resulting RPDs of 2.1 and 2.0, respectively, were consistent with low-medium accuracy. The regression plots between values predicted from the CSIRO calibration and laboratory reference values are illustrated in Fig. 6a and 6b for suctions of 10 and 1500 kPa, respectively.


Figure 6
View larger version (19K):
[in this window]
[in a new window]

 
Fig. 6. Regression plots for volumetric water retention predicted by mid-infrared partial least squares (MIR PLS) (% v/v) at matric suctions (a) 10 kPa and (b) 1500 kPa for the New South Wales validation set using the CSIRO calibration samples.

 
To improve the ability to predict water retention for the NSW validation set, a relatively small number of representative outliers from the NSW validation data were added to the CSIRO calibration model, which increased its spectral diversity. Thirty-five extreme outliers (3.8% of the data set) were selected from the PCA score plot of the combined CSIRO calibration and NSW validation samples illustrated in Fig. 5a (shown in Fig. 5a as NSW outliers). Prediction of the NSW validation set using the combined CSIRO calibration plus the 35 NSW validation outliers resulted in a significantly improved prediction (Table 4). Figures 7a and 7b present the regression between the predicted and measured water retentions at 10 and 1500 kPa for the remaining NSW validation data set. By comparison, the root mean square residual error of the predicted and observed water contents presented by Minasny et al. (1999) ranged from 0.0723 to 0.1466 m3 m–3 at 10 kPa and from 0.0228 to 0.1127 m3 m–3 at 1500 kPa, compared with 0.0707 m3 m–3 (SEP = 6.3% v/v) at 10 kPa and 0.0500 m3 m–3 (3.8% v/v) at 1500 kPa obtained using our MIR PLS method.


Figure 7
View larger version (20K):
[in this window]
[in a new window]

 
Fig. 7. Regression plots for predicted volumetric water retention (% v/v) at matric suctions (a) 10 kPa and (b) 1500 kPa for the New South Wales (NSW) validation set using the combined CSIRO calibration plus 35 NSW validation samples.

 
The MIR PLS method discussed above should be, in principle, more accurate than pedotransfer functions, in that it is based on the spectral patterns resulting from the composition of the soils, including the amount and type of clay minerals and organic matter present in the samples. The MIR PLS method does not require predetermined functional relationships between soil properties and water content. Apart from the dependence on soil chemistry and composition, the spectral characteristics can also be affected by particle size (Nguyen et al., 1999), which may be related indirectly to soil water retention properties. The MIR PLS method therefore does not require prior knowledge of specific soil parameters such as organic matter, clay, or sand content needed for successful evaluation using pedotransfer functions. With regard to pedotransfer functions, the implied relationship between clay content and soil water retention may not always be valid in Australia due to the wide variation in types of clay. Clay type can vary from kaolinite and illite with hard-setting soil properties to heavy clays with self-mulching soil properties. The MIR spectra are affected by changes in the mineralogical variability but pedotransfer functions may fail to account for this variability.

Mid-infrared PLS calibration models can be readily developed from archived soil data and MIR spectra, as was done here. If calibration data is not available, however, acquiring new analytical data can be time consuming and therefore expensive. The calibration samples should always be tested with an independent test set and should include the range of unknown soils to be analyzed or predictions may be unreliable.

Partial least squares cross-validation is a method commonly used to maximize the amount of information available for training the PLS model from small and expensive calibration data by taking advantage of all the calibration reference data. This method can, in some circumstances, result in overoptimistic statistics in that the validation data used in cross-validation are not truly independent since they are derived from within the same calibration set. Models from small calibration sets may be required to predict results for much larger soil sets, exacerbating this problem. The relatively small CSIRO calibration set used in this study is typical of the limited data sets available in Australia. While the MIR PLS cross-validation resulted in apparently good cross-validation accuracy in this study, we were unable to confirm its robustness for "real" predictions without testing it with truly independent samples such as the NSW validation set.

Where relatively large data sets become available, they can be randomly split into separate "calibration" and "validation" sets. This method provides a more independent validation, since a lower proportion of calibration samples would be used to validate the model, but still suffers from some interdependence between calibration and validation data sets and can result in a higher (but more realistic) validation error.

In this study, the lack of accuracy of the original CSIRO calibration set in predicting the NSW validation data was evident. As seen in Fig. 6a and 6b, prediction of the NSW test samples from the CSIRO set resulted in reduced R2 values of 0.54 and 0.72, and increased SEP values of 6.7 and 4.3 compared with the CSIRO cross-validation results.

Of particular concern in the prediction of water content for NSW samples from the CSIRO calibration was that the regression slopes were very low—between 0.50 and 0.82. Inclusion of 35 outlier samples from the NSW validation set into the CSIRO calibration set significantly improved the accuracy of predicting the NSW validation set. The process of upgrading the calibration data by including some samples with known analytical values from the unknown sample set confirms the general need to validate predictions for routine analysis of unknown samples, as well as to provide a means of improving the predictive ability of the PLS calibrations for new samples.

There was a marked curvature of the regression in Fig. 6a and 6b, with predicted values being significantly underpredicted at increasing {theta}v values. Such curvature can arise from either a nonlinear MIR intensity response to water content or to a different chemistry or sample composition corresponding to low water retention compared with high water values. The regression curvature was marginally reduced on including the extra NSW samples with the CSIRO calibration samples, as seen in Fig. 7a and 7b. In an attempt to eliminate prediction curvature, the {theta}v values for the CSIRO calibration samples were preprocessed using loge before developing the calibration model. The choice of the loge function in this case to correct regression curvature was purely speculative, since a number of nonlinear functions may have been used with a similar effect. After prediction, the predicted {theta}v values were simply reconverted back to the original volumetric percentage units using the exp(x) function, where x is the predicted {theta}v value. Figures 8a and 8b depict the regressions between predicted {theta}v at matric suctions of 10 and 1500 kPa, respectively, for the NSW data set after using the loge processing function, with the regression statistics included in Table 4. The regression R2 values were unchanged for the 10-kPa prediction and increased from 0.74 to 0.77 for the 1500-kPa prediction, with a marginal increase in the SEP, due largely to increases in prediction error at high {theta}v values. The value of any linearization preprocessing transform therefore remains questionable except perhaps for predictions of low values of {theta}v. These results support the use of MIR PLS as an alternative to current pedotransfer functions in predicting soil water retention at a range of matric potentials.


Figure 8
View larger version (21K):
[in this window]
[in a new window]

 
Fig. 8. Regression plots for predicted volumetric water retention (% v/v) at matric suctions (a) 10 kPa and (b) 1500 kPa for the New South Wales (NSW) validation set using the combined CSIRO calibration plus 35 NSW validation samples after preprocessing the water retention data with a loge–preprocessing transform before partial least squares analysis.

 

    CONCLUSIONS
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Mid-infrared with PLS is a simple, rapid, and inexpensive surrogate method to predict {theta}v across the range of matric suctions from 1 to 1500 kPa. Predictions are based on the link between soil texture, bulk density, and the spectral signatures of some clay minerals, organic matter, and quartz. Accuracy of the PLS predictions appear to be better than those from some pedotransfer functions that simply rely on texture and bulk density data.

While the MIR PLS cross-validation appears to provide estimations with an accuracy comparable to those using pedotransfer functions, predictions for independent validation samples show that true unknown samples can be poorly predicted using a single calibration model for soils that are markedly different from those of the calibration samples. Where the spectral characteristics of the unknowns, as determined by their PLS loadings and scores, lie outside those of the calibration set, prediction errors can be unacceptably high. Prediction accuracy of unknowns can be significantly improved by analyzing a relatively small subset of samples from the unknown soil set and incorporating these within an expanded calibration set, leading to a more robust prediction model. Considerable regression curvature was observed for the prediction of {theta}v unknowns by MIR PLS regression. Apart from a slight reduction in curvature resulting from the incorporation of spectral outlier samples from the unknown sample set into the calibration set, regression curvature could be almost eliminated by use of a loge preprocessing function on the {theta}v calibration values. This loge transform, however, barely improved the R2 and significantly increased the SEP, so that the use of linearization preprocessing transforms remains questionable.

The MIR PLS method provides similar, if not better, accuracy and easier use for predicting soil water retention of soils compared with some pedotransfer functions. It also has the added advantage of quick processing time and much lower cost due to the need for only an infrared spectrum, scanned in minutes, as the sole input to a PLS model for simultaneous predictions of water retention ranging from field capacity to the wilting point.


    NOTES
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Permission for printing and for reprinting the material contained herein has been obtained by the publisher.

Received for publication December 5, 2005.


    REFERENCES
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 




This article has been cited by other articles:


Home page
Soil Sci.Home page
G. Tranter, B. Minasny, A. B. McBratney, R. A. V. Rossel, and B. W. Murphy
Comparing Spectral Soil Inference Systems and Mid-Infrared Spectroscopic Predictions of Soil Moisture Retention
Soil Sci. Soc. Am. J., August 20, 2008; 72(5): 1394 - 1400.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Janik, L. J.
Right arrow Articles by Rawson, A.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Janik, L. J.
Right arrow Articles by Rawson, A.
GeoRef
Right arrow GeoRef Citation
Agricola
Right arrow Articles by Janik, L. J.
Right arrow Articles by Rawson, A.
Related Collections
Right arrow Soil Methods/Instrumentation
Right arrow Soil Analysis
Right arrow Soil Physics


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
The SCI Journals Agronomy Journal Crop Science
Journal of Natural Resources
and Life Sciences Education
Vadose Zone Journal
Journal of Plant Registrations Journal of
Environmental Quality
The Plant Genome