SSSAJ Journal of Natural Resources and Life Sciences Education
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (38)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Beauchemin, S.
Right arrow Articles by Beauchemin, M.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Beauchemin, S.
Right arrow Articles by Beauchemin, M.
GeoRef
Right arrow GeoRef Citation
Agricola
Right arrow Articles by Beauchemin, S.
Right arrow Articles by Beauchemin, M.
Related Collections
Right arrow Soil Methods/Instrumentation
Right arrow Soil Analysis
Right arrow Soil Chemistry
Soil Science Society of America Journal 66:83-91 (2002)
© 2002 Soil Science Society of America

DIVISION S-2 - SOIL CHEMISTRY

Principal Component Analysis Approach for Modeling Sulfur K-XANES Spectra of Humic Acids

Suzanne Beauchemin*,a,c, Dean Hesterbergb and Mario Beauchemina,c

a Agriculture and Agri-Food Canada, Soils and Crops Research and Development Centre, 2560 Hochelaga Blvd., Sainte-Foy, QC, Canada G1V 2J3
b Department of Soil Science, North Carolina State University, Box 7619, 3235 Williams Hall, Raleigh, NC 27695-7619
c Canada Centre for Remote Sensing, 588 Booth Street, 4th floor, Ottawa, ON, Canada K1A 0Y7

* Corresponding author (Suzanne.Beauchemin{at}NRCan.gc.ca)


    ABSTRACT
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Quantitative application of x-ray absorption near edge structure (XANES) spectroscopy to soils and other geochemical systems requires a determination of the proportions of multiple chemical species that contribute to the measured spectrum. Two common approaches to fitting XANES spectra are spectral deconvolution and least-squares linear combination fitting (LCF). The objective of this research was to evaluate principal component analysis (PCA) coupled with target transformation to model S K-XANES spectra of humic acid samples, and to compare the results with least-squares LCF. Principal component analysis provided a statistical basis for choosing the number of standard species to include in the fitting model. Target transformation identified which standards were statistically more likely to explain the spectra of the humic acid samples. The selected standards and the scaling coefficients obtained by the PCA approach deviated by <=6 mol% from results obtained by performing LCF using a large number of binary, ternary, and quaternary combinations of seven S standards. Because no energy shift is allowed in the PCA approach, fitting may be refined, when appropriate, by using afterwards a least-squares method that includes energy offset parameters. Statistical ranking of the most likely standard spectra contributing to the unknown spectra enhanced LCF by reducing the analysis to a smaller set of standard spectra. The PCA approach is a valuable complement to other spectral fitting techniques as it provides statistical criteria that improve insight to the data, and lead to a more objective approach to fitting.

Abbreviations: IE, imbedded error functionIND, indicator functionLCF, linear combination fittingPCA, principal component analysisXANES, x-ray absorption near edge structureXAS, x-ray absorption spectroscopy.


    INTRODUCTION
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
PRINCIPAL COMPONENT ANALYSIS has been used since the 1970s in chemistry to gain insight into spectral data of systems containing a mixture of chemical compounds that contribute to a spectral signature. Different examples related to absorption and emission spectra, gas chromatography, mass spectrometry, and nuclear magnetic resonance spectroscopy are discussed in Malinowski (1991) and show the power of this multivariate statistical tool to interpret large datasets. Similarly, the PCA approach has been used successfully in the unmixing modeling of remote sensing spectra (Huete, 1986; Huete and Escadafal, 1991; Garcia-Haro et al., 1996). Those studies have all outlined its ability to separate a spectral mixture into independent sources of variability. PCA has also been used in the modeling of XANES spectroscopic data (Fernandez-Garcia et al., 1995; Wasserman, 1997; Wasserman et al., 1996, 1998; Ressler et al., 2000) and some software packages for treating x-ray absorption spectroscopic (XAS) data have integrated PCA as a tool. Yet, it is not a widely used approach for XAS data analysis. Another statistical approach for extracting factors from spectral mixtures is partial least squares (PLS) analysis (Tobias, 1995), but this methods has been less commonly used than PCA.

X-ray absorption near-edge structure spectroscopic data can be modeled by spectral deconvolution, or least-squares fitting using a linear combination of known species to fit an unkown spectrum (Fendorf, 1999). In spectral deconvolution, the shape of x-ray absorption peaks is considered to be Lorentzian, although the instrumental contribution is Gaussian and often dominates (Fendorf, 1999). Thus, single Gaussian or a combination of Lorentzian and Gaussian line shapes have been used in fitting (Frank et al., 1994; Huffman et al., 1991, respectively). Spectral deconvolution can determine oxidation states and quantify species by comparing areas of the Gaussian and Lorentzian peaks with those of standard species (Reynolds et al., 1999). In least-squares LCF, one determines the proportions of the spectra for selected standards that, when summed, yield the least-squares fit to the spectrum for an unknown sample (Vairavamurthy et al., 1994). Least-squares LCF has the advantage of including an optional energy offset parameter, which makes the energy calibration between samples and standards less critical in some cases than in the deconvolution method (Fendorf, 1999). Also, least-squares LCF can be done across a wide energy range of the spectrum and thus enhances the discrimination of different species having identical spectral features at a given energy but distinct features at other energies (Vairavamurthy et al., 1994). Reynolds et al. (1999) took advantage of both methods by first performing spectral deconvolution to identify and quantify As species, and then refined the results using least-squares LCF with energy offset parameters.

Both spectral deconvolution and LCF methods first consider a priori information for fitting unknown spectra. Band resolution techniques such as spectral deconvolution rely on an a priori assumption regarding the shape of the band, while least-squares LCF involves fitting pure standard species to resolve an unknown mixture. Principal component analysis works in a different way by first considering the statistical variance within an experimental data set composed of a group of unknown samples. The data set is then redefined into a reduced number of independent sources of variability. A subsequent analysis, the target transformation, offers the possibility to test which standard species are most likely part of the solution. Therefore, a major advantage of the PCA approach is that no a priori assumption is needed regarding the shape of the band. Also, suspected components can be evaluated individually without a priori knowledge of other species present in the sample (Malinowski, 1991). Furthermore, target transformation is of clear interest when working with complex matrices such as soil or sediment in which many different chemical forms of an element may coexist. In such cases, the number of possible standard species to consider is often large, requiring the testing of many possible combinations of standards in the fitting. Such modeling can be time-consuming and computationally intensive. In addition, the result converged upon by some iterative computer algorithms used to determine the least-squares fit in LCF may depend on the initial guess supplied by the user (Synergy Software, 1997).

In regard to XAS of geochemical systems, Wasserman (1997) reviewed briefly the theory of PCA and showed its application to speciating Fe in coal samples from eight different sources by XANES spectroscopy. Ressler et al. (2000) performed PCA and least-squares fitting on Mn K-XANES spectra to identify and quantify the chemical species in Mn particulates emitted from gasoline engines containing a Mn tricarbonyl fuel additive. Principal component analysis was used on Cu K-XANES data to provide additional insight on the type of Cu(II) bonding in goethite-humate complexes (Alcacio et al., 2001). Fernandez-Garcia et al. (1995) applied PCA to resolve the evolution of Cu and Pb species in Cu-Pb bimetallic catalysts during reduction treatment based on the changes in their XANES spectra. In other applications, PCA was used to interpret dispersive time-resolved x-ray absorption spectra for vanadium phosphate catalysts (Coulston et al., 1997) and extended x-ray absorption fine structure (EXAFS) spectroscopic data from uranyl solution (Wasserman et al., 1999) or Mo and TiO2 catalysts (Fay et al., 1992).

In the present study, the PCA technique was applied to S K-XANES data originating from soil humic acid samples. The speciation of these samples using XANES spectroscopy and least-squares LCF was reported by Hutchison et al. (2001), who also discussed chemical aspects of the results. The aim of the present research was to determine the relative merits of the PCA approach using the same dataset. The specific objectives were to (i) briefly review PCA and target transformation, (ii) apply the PCA approach to S K-XANES data, and (iii) compare the results from the PCA approach with those obtained by Hutchison et al. (2001) using least-squares LCF.


    MATERIALS AND METHODS
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Sulfur XANES data
The S K-XANES data of Hutchison et al. (2001) were used for the application of PCA. Spectra from five humic acid samples from a 4-h aeration experiment plus a humic acid sample that was not subjected to aeration (total of six samples) were selected for the present study. In brief, the study of Hutchison et al. (2001) aimed at determining the effect of pH and aeration on S oxidation state in humic acid extracted from a soil in a salt marsh, located in eastern North Carolina. Five pH levels were tested during the aeration of humic acid solutions: pH 11.5, 12.0, 12.2, 12.5, and 13.0 (samples hereafter identified as HA_OXpH11.5 to HA_OXpH13.0). The other humic acid sample originated from a distinct initial base extraction done at pH 12.8 under O2-free conditions (Humic acid 2 in Hutchison et al., 2001). X-ray absorption near edge structure spectra from seven standards were selected to compose the target matrix: elemental S, benzyl disulfide, cysteic acid (sulfonate), benzyl sulfoxide, methionine (organic sulfide), Na2SO4 (inorganic sulfate), and chitin sulfate (ester sulfate). Elemental S, benzyl disulfide, and methionine represent reduced forms of S, benzyl sulfoxide is lowly oxidized while sulfonate and sulfate species are highly oxidized forms (Vairavamurthy et al., 1997). Sulfur K-XANES data were collected in fluorescence mode at beamline X-19A at the National Synchrotron Light Source, Brookhaven National Laboratory (New York).

Principal Component Analysis Approach
We used the PCA approach described in Malinowski (1991). Malinowski (1991) refers to PCA as principal factor analysis to avoid confusion with the term component which has a different meaning in chemistry. However, this can be misleading as factor analysis is a distinct approach used in confirmatory analysis and differs from PCA in the way it extracts the variance from the data (Tabachnick and Fidell, 1989; Hatcher, 1994). In this paper, we used the terms PCA and target transformation instead of significant factor analysis and target factor analysis, respectively, as used by Malinowski (1991). We call the PCA approach, the entire analysis sequence, including PCA, target transformation, and estimation of the real matrices (Fig. 1) .



View larger version (22K):
[in this window]
[in a new window]
 
Fig. 1. Flow chart showing the main steps of the PCA approach.

 
The code provided by Malinowski (1991) for PCA and target transformation was translated into SAS/IML language using PROC IML (SAS system 6.12, SAS Institute Inc., Cary, NC), and it was adapted to include some other tests.

Selection and Preparation of Datasets
The set of experimental samples selected must first be factor analyzable, which means that the data can be modeled as a linear sum of product terms (Malinowski, 1991). This assumption is usually valid for most XANES data as the overall sample absorption at a given energy is considered to comprise the weighted sums of absorption from its individual constituents (Fendorf and Sparks, 1996; Wasserman, 1997). The data must also be reliable and complete. In case of missing points, PCA can be performed on a smaller set of complete data (Malinowski, 1991). Apart from these basic considerations, the size and choice of the experimental data set depends on the research objective and the availability of data (Malinowski, 1991). The experimental data set can comprise samples of heterogeneous nature (e.g., Huete and Escadafal, 1991) or more homogeneous material for which the effect of time (e.g., Coulston et al., 1997) or of different treatments (e.g., Fernandez-Garcia et al., 1995) is studied.

In the current study, two input matrices were constructed from S K-XANES data (Fig. 1). Normalized data (energies normalized to the elemental S edge, baseline, and background corrected) were interpolated as needed to obtain a common energy scale for all spectra to construct the column matrix used in the PCA. Because the number of rows of the data matrix is an important variable in the estimation of different statistical criteria used in the PCA approach, the interpolation process should not generate more rows (data points) than what were initially measured. The analysis was restricted to the energy range from -5 to 15 eV (relative to elemental S K-edge at 2472 eV). The data matrix was composed of the six unknown spectral mixtures represented by the six humic acid spectra, each spectrum being a column (or vector) of the 200 rows by six columns matrix. Similarly, the target matrix comprised the seven spectra of the standard species to be tested (seven suspected vectors).

Principal Component Analysis
A flow chart of the main steps involved in the PCA approach is presented in Fig. 1. The aim of the first step (see Box 1 in Fig. 1—PCA) is to define the number of significant independent sources of variation, the components, which can regenerate the experimental data set (data matrix) when linearly combined. The PCA approach assumes that observed spectral mixtures can be expressed as a linear combination of components, each component being weighted differently. The experimental spectral data matrix (of r rows by c columns) can thus be expressed as the product of two matrices:

[1]
in which X is the matrix of independent spectral features (absorbance of individual species) and Y is the matrix of the loadings or relative contributions of the independent features in the spectral mixtures. To ultimately identify X and Y, PCA initially decomposes the data matrix into an abstract eigenspectra matrix (R) and an abstract eigenvector matrix (C) according to

[2]
in which R and C are purely mathematical solutions and are devoid of physical or chemical meaning. Different mathematical techniques can be used to extract eigenvectors and eigenvalues from a data matrix.

We used the singular value decomposition (SVD) technique because it is more robust and accurate to pick up small differences between eigenvalues. A description of the technique can be found in Malinowski (1991) and Ressler et al. (2000).

Eigenvalues and eigenvectors are the two main outputs from PCA. Each eigenvector represents an independent abstract component or source of variation affecting the experimental spectra of the data matrix, while the associated eigenvalue gives the relative variance in the experimental data matrix explained by that component. In PCA, the eigenvector associated with the highest eigenvalue is extracted first, then the second eigenvector is orthogonally defined to capture the highest remaining variance, and so on. The maximum number of extractable eigenvectors is defined by the number of rows (r) or columns (c) in the data matrix, whichever is smaller. For XANES data, the number of columns, given by the number of experimental spectra in the data set, is usually less than the number of rows representing the number of energy data points measured. Therefore, as a multivariate tool aimed at reducing information from a large data set, PCA becomes useful when the data matrix contains several XANES spectra.

One of the major goals of PCA is to determine the minimum number of significant components required to satisfactorily regenerate the data matrix, using a reduced space. The most significant eigenvalues reveal eigenvectors that are considered as true sources of structural variation in the data matrix and constitute the primary set of n eigenvalues. The smallest eigenvalues and their associated eigenvectors are left out in the reproduction of the data matrix and are considered as experimental error (secondary set of c-n members, c being the maximum extractable number of eigenvectors in our case). The n primary eigenvectors reflect the presence of n distinct components in the original spectral mixtures. The determination of the number of significant components (n, Box 1 in Fig. 1) is a crucial step as the subsequent target analysis is based on the matrices reproduced using the reduced space:

[3]

The bar indicates that the construction of R and C and the subsequent reproduction of D relies only on the n significant eigenvectors retained.

We used three different criteria proposed by Malinowski (1991) to define the number of significant components. The imbedded error (IE) and the indicator (IND) functions are empirical methods that rely on the secondary eigenvalues. Both functions should show a minimum value when the correct number of significant components is reached. The third criterion, is a one-tailed F test based on the calculation of reduced eigenvalues (REV). The null hypothesis, H0:REVn = REV°pool, tests whether a given component belongs to the pool of components considered as experimental noise whereas the alternative, H1:REVn > REV°pool, indicates that the given component has a greater contribution than the experimental error and that structural contribution is present at some chosen level of significance ({alpha}). Therefore, if the calculated F value is greater than a critical F{alpha} (or, equivalently, the probability of the calculated F is less than the critical {alpha} adopted), then H0 is rejected and the associated component is accepted as a significant one.

Target Transformation
The n significant components defined in the PCA step represent the main independent sources of variation in the experimental data, but have no chemical or physical meaning. Target transformation (Box 2 in Fig. 1) is a subsequent step that allows testing of suspected targets as being potentially part of the solution to explain the structural variation in the set of spectra for the unknown samples. In this study, the suspected targets are the chemical species used as standards. Target transformation will then redefine abstract eigenvectors and eigenspectra into chemically meaningful real matrices. Target transformation is powerful and unique as it allows to test individually each suspected target (standard) without a priori knowledge of or assumptions about other species present in the spectral mixture.

Target analysis is accomplished by transforming the eigenspectra matrix using an oblique rotation of the axes. The aim of the oblique rotation is to determine a vector tl (Box 2, Fig. 1) that will define a predicted vector l that matches as closely as possible, in the least-square sense, the suspected vector tested (xl). The target vector (or suspected vector) xl is the lth column of the matrix containing spectra from all standard species to be tested (or the lth standard spectrum). The suspected vector (xl) is accepted as part of the solution if its spectral signature can be reconstructed based on the eigenvectors retained in the PCA step.

Two different statistical criteria were used to judge if a given chemical standard (suspected vector) was an acceptable target. The SPOIL function proposed by Malinowski (1991) indicates whether the vector of the chemical standard tested fits well or instead increases the error in the matrix reproduced when that vector is included in the target transformation. The SPOIL values are calculated as the quotient of real error in the target divided by the error in the data matrix. A target is considered acceptable if its SPOIL value is <3, moderately acceptable if the value lies between 3 and 6, and unacceptable if the value is >6. A one-tailed F test proposed by Malinowski (1991) was also used, in which tabulated F{alpha} values are compared with F values obtained from the ratio of variance associated with the fitted test vector divided by the variance associated with the data matrix. The null hypothesis is that the predicted vector is equal to the tested target vector , versus the alternative that they significantly differ from each other (H1 : l != xl). If the calculated F value is less than a critical tabulated F{alpha} value (or if the probability of the calculated F is greater than {alpha}), then H0 is accepted and the target vector is an acceptable solution, meaning that spectral features of this standard are acceptable for describing features in the spectra for the dataset of unknowns.

Estimation of the Real Matrices
When target transformation analysis is completed and the acceptable standard components have been identified, the real eigenvector Y matrix can be obtained using the complete transformation matrix T composed of the selected transformation vectors tl (Box 3, Fig. 1). The Y matrix provides the relative contribution of each targeted standard in the spectrum of each sample. In this study, the scaling coefficients obtained for each mixture and provided by Y were afterwards renormalized so that their sum equals 1. The predicted data matrix can finally be calculated based on the matrix Xbasic containing the selected real spectra of known species and Y (Box 3, Fig. 1).

Least-Squares Linear Combination Fitting
Results from the PCA approach were compared with results of Hutchison et al. (2001) obtained through nonlinear, least-squares LCF of S K-XANES data for humic acid using more than 50 binary, ternary, and quaternary combinations of the selected standards. Linear combination fitting was done using the computer program Kaleidagraph (Hutchison et al., 2001), and an energy shift in the fitting of the spectrum was allowed (Vairavamurthy et al., 1994). For example, the general equation for the ternary fit was

[4]
where Yfit(E) is the fitted fluorescence yield for a humic acid sample of unknown composition, Xi and Yi are the energy and fluorescence yield data for the spectrum of the chemical standard i (i = 1–3 for a ternary fit), mi is the fitted parameter reflecting the relative contribution of each standard species to the overall fit (i.e., the proportion of total S present as that chemical species in the sample of unknown composition), and ai is the fitted energy shift for each standard. Chi-squared values were adopted as a goodness-of-fit criterion. In addition, fits were considered unacceptable if mi < 0 or ai > (0.5 eV).


    RESULTS AND DISCUSSION
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Principal Component Analysis
The first component extracted by PCA using the S K-XANES data for humic acids accounted for 99.47% of the total variance within the data while the second and third component explained only 0.44 and 0.08% of the total variance (Table 1). X-ray absorption near edge structure spectroscopic data typically show a very high eigenvalue for the first component with subsequently extracted eigenvalues being relatively less significant compared with the first one. Similar results were obtained with data sets of Cu and Zn K-XANES spectra that we evaluated (data not shown) as well as by Fay et al. (1992). This behavior makes the proportion of variance explained and the scree plot, two common criteria used in other research fields to define the number of significant components (Tabachnick and Fidell, 1989; Hatcher, 1994), of low utility when analyzing XANES data. Therefore, these evaluation criteria were not used here.


View this table:
[in this window]
[in a new window]
 
Table 1. Results from principal component analysis of S K-XANES spectra for six humic acid samples.{dagger}

 
The IND function and F test were better alternatives. The IND function reached a minimum value of 1.45 x 103 at n = 3 (Table 1). This result agreed with the F test showing only the first three components as significant at {alpha} = 0.05 (Table 1). The first three principal components explained 99.99% of the variance in the experimental data set and led to a very close reproduction of the data, with the reproduced curve overlapping the experimental data points (Fig. 2) . Similar results were observed for the three other samples (data not shown). The IE function failed to pass through a minimum like the IND function (Table 1), and comparable behavior was observed with Cu and Zn K-XANES spectroscopic data (data not shown here). Malinowski (1991) reported that the IND function was more sensitive than the IE function to indicate the appropriate number of components, as the presence of systematic errors, sporadic errors, errors not randomly distributed, or errors not uniformly spread throughout the data will prevent the IE function from reaching a minimum.



View larger version (17K):
[in this window]
[in a new window]
 
Fig. 2. Examples of S K-XANES spectra of three humic acid samples compared with the reproduced data using three significant abstract components extracted by PCA. The energy scale is relative to S(0) K-edge at 2472 eV.

 
It is essential that more than one criteria be used in choosing the number of significant components as the determination of n is a crucial step that may influence the solution obtained in the subsequent target analysis. The IE function was not informative for our results on XANES data, while the F test and the IND function came out as the best criteria. Malinowski (1991) warned that the IND function was an empirical function that should be used with caution. Malinowski (1988) reported that using a critical threshold between 5 and 10% to evaluate the significance of the F test gave the best agreement with results from the IND function. Fay et al. (1992) showed that results from the F test may occasionally present atypical behavior as observed in data from Table 1, where the probability of F at n = 2 was greater than the probability of F at n = 3.

Target Transformation
Based on the results obtained from PCA, target transformation was performed using three components. According to SPOIL and calculated F values, the elemental S, benzyl sulfoxide, and methionine standards were rejected as potential targets to explain the variation in our data (SPOIL values > 6 and prob. F <= 0.001; Table 2). Sodium sulfate, cysteic acid, and benzyl disulfide came out as marginal candidates in the SPOIL test, with SPOIL values ranging between 3 and 6. Similarly, the probabilities of the calculated F values for Na2SO4, cysteic acid, and benzyl disulfide were greater than those for the other three standards but still remained <0.05 (Table 2). Chitin sulfate was a very good target, with a SPOIL value <3 and a probability for the F value close to 0.05.


View this table:
[in this window]
[in a new window]
 
Table 2. Results from target transformation.

 
SPOIL and F tests provide statistical basis to reject or accept a given standard. Visual inspection of the transformed targets can also be used as graphical aid to decide on the best match with the original standard (Wasserman, 1997; Ressler et al., 2000). For each of the four possible targets, Fig. 3 compares spectra for the predicted standard vectors resulting from target transformation with spectra for the corresponding chemical standards. The predicted vectors roughly simulate the corresponding standards in terms of the main edge position for all of them, and the first postedge feature for the benzyl disulfide. However, except for the chitin sulfate for which the predicted target gave a close fit to the data (Fig. 3D), the other three standards yielded XANES spectra with consistently higher peak intensity than the predicted targets generated on the basis of variance analysis from the experimental data (Fig. 3A, 3B, and 3C). This result may reflect either differences in error between the standards and samples, or standards that do not fully represent species in the samples. With regard to the first explanation, targets with SPOIL values between three and six can be true components, but with relatively large error compared with the error in the data matrix. Consequently, their inclusion in the reproduction of the data matrix leads to an increase in the error (they spoil the regenerated matrix; Malinowski, 1991). Although the S in the standards was of higher concentration than in the humic acid samples (650 vs 23 mmol S kg-1; Hutchison et al., 2001), all data were obtained from the same experimental beamline set-up. Also, the level of spectral noise in the normalized standard spectra was comparable with that of the humic acid samples (range of standard deviations calculated from the baseline between -20 and -10 eV was 0.001 to 0.007 for the standards compared with 0.001 to 0.006 for the samples). It is thus unlikely that inclusion of these particular standards in the target transformation would spoil the reproduced data if they were true components of the sample. Hence, the marginal fits obtained between the predicted and real targets for Na2SO4, cysteic acid, and benzyl disulfide (Fig. 3) would instead suggest the presence of very close, yet different, species in the humic acid samples than those represented by the standards tested here (Wasserman, 1997). Given the origin of humic acids, there is likely a variety of S species in closely related structures that are only approximated by the standards chosen for this study. A comparison of the chitin sulfate and Na2SO4 standards provides a good example. The ester sulfate species (chitin sulfate) is a better alternative than the Na2SO4 standard to account for the highly-oxidized S species in our systems as essentially no inorganic sulfate should be present in the humic acid samples because of the extraction and purification procedures used (Hutchison et al., 2001). Target transformation clearly shows that the Na2SO4 is a marginal candidate compared with chitin sulfate (Fig. 3A vs. 3D). Therefore, Na2SO4 was not retained in the fitting analysis.



View larger version (23K):
[in this window]
[in a new window]
 
Fig. 3. Predicted targets obtained through target transformation of the S K-XANES spectra for (A) Na2SO4, (B) cysteic acid, (C) benzyl disulfide, and (D) chitin sulfate compared with the normalized K-XANES spectra for these species. The energy scale is relative to S(0) K-edge at 2472 eV.

 
Fitting Results for Humic Acids
Chitin sulfate, cysteic acid, and benzyl disulfide standards were used in the composition of the transformation matrix for calculating the real eigenvector matrix (Y; Box 3, Fig. 1) to determine the relative contribution of each standard in each of the six humic acid spectra. In contrast to the least-squares LCF used by Vairavamurthy et al. (1994) for XANES data fitting, the estimation of the scaling coefficients (Y matrix) from the PCA approach includes no energy offset parameters and no sum-to-one constraint on the coefficients during the fitting. After completing the PCA approach, the coefficient of each standard was normalized to one by dividing by the sum of all coefficients. The normalized coefficients obtained from PCA, expressed in mol%, are reported in Table 3 along with results obtained by least-squares LCF under different constraints. Fitting results from PCA were in good agreement with those from least-squares LCF obtained by Hutchison et al. (2001), which allowed energy shifts. Discrepancies of up to 6 mol% were observed for the chitin sulfate and cysteic acid standards between the two approaches, except for the humic acid 2 sample that had differences <2.5 mol%. In LCF fitting, the humic acid 2 sample showed the lowest fitted energy shift (ai = -0.18 eV for chitin sulfate vs ai > -0.45 eV for other samples). We also evaluated fitting results by replacing chitin sulfate by Na2SO4, which had lower energy offset parameters. A high coefficient of correlation (r = 0.98; data not presented) between energy shift parameters and deviations between PCA and LCF fitting results was found when considering all sulfate species (chitin sulfate and Na2SO4). Therefore, the discrepancies in fitting results between PCA and Hutchison et al. (2001) seemed mainly due to the lack of energy offset parameters in the PCA approach. According to Waldo et al. (1991), the exclusion of an energy offset in the fitting can introduce large uncertainties in the results, especially when quantifying species of low concentration. For the S K-XANES data evaluated here (Table 3), the PCA approach still gave good results. However, for data vulnerable to higher shifts in the energy scale (e.g., because of instabilities in the synchrotron beamline monochromator), the absence of an energy shift parameter in the PCA approach can be overcome by performing PCA and target transformation first, then performing nonlinear least-squares fitting using the selected standards to refine the fitting (see e.g., Ressler et al., 2000). In this case, the PCA approach provides valuable insights to the number of components to include in the LCF, and it provides more robust, statistically based criteria for selecting specific standards to include in the LCF. Hence, the PCA approach can greatly reduce the time and effort needed for data analysis using least-squares LCF by reducing the fitting analysis to a smaller number of combinations of standards.


View this table:
[in this window]
[in a new window]
 
Table 3. Fitting results for the six humic acid samples using the Principle component analysis (PCA) approach compared with results from Hutchison et al. (2001) using least-squares Linear combination fitting (LCF). Also presented are results from the least-squares constrained method cited in Garcia-Haro et al. (1996).

 
In calculating Y matrix, the sum-to-one constraint on the scaling coefficients is not imposed during the fitting and the coefficients (proportions of standards) are normalized afterwards to a sum of 1. The lack of constraint during fitting can lead to a slightly different solution compared with a constrained fitting (Settle and Drake, 1993). To evaluate the discrepancy between the PCA approach with no constraint on the fitting coefficients and a least-squares LCF using a constraint that coefficients sum to one (no energy shift in either case), the standard spectra selected from the target analysis were also used in fitting based on an alternative constrained, least-squares, linear combination method described in Garcia-Haro et al. (1996). The proportions obtained by the latter method were also close to those obtained with the PCA approach (Table 3). The largest discrepancy was of 3.8 mol% for the chitin standard of the HA_OXpH13.0 sample, but typical discrepancies were <2.5 mol%. Settle and Drake (1993) pointed out that the simplest approach that sets any negative estimates to zero and renormalizes the remaining ones to a sum of one, provides acceptable results compared with the most sophisticated constrained fitting methods that are often computationally demanding.

Limitations of the PCA Approach
Because of its different oxidation states, S K-XANES spectra show distinct spectral features that were easy to capture by the PCA approach. In our study, three significant components were defined, the target analysis accordingly identified three best standards, and the estimation of real matrices yielded at once positive scaling coefficients. However, different XANES data may not always be so straightforward to analyze. For elements showing one main oxidation state and subtle features around a single peak in the spectrum, target analysis may not distinguish between subtle spectral features of different standard species as we achieved with the S data. For example, the number of standards that come out as potential targets may be greater than the number of significant components retained. In such cases, the selection process for the best combination can be achieved using least-squares. LCF fitting and chi-squared values as a criterion of goodness-of-fit, supported by sound knowledge of the chemistry in the studied system.

Principal component analysis provides a statistical basis for defining the number of individual species to include in the spectral fitting. In this regard, it is a valuable complementary tool for band resolution or LCF as it helps to avoid an excessive number of bands or standards in the fit (Malinowski, 1991). This is true, however, as long as the species are independent (uncorrelated). Because target transformation is an oblique rotation, the targeted vectors may be correlated, increasing the difficulty in the interpretation of results (Tabachnick and Fidell, 1989). Again, target analysis and the interpretation of its outcome will be most successful if theoretical hypotheses on the chemistry involved (e.g., choice of the probable species in the system) are considered (Malinowski, 1991). Furthermore, as the PCA approach is based on the variance in the experimental dataset, its efficiency will be maximized when all data come from the same synchrotron x-ray beamline set-up and conditions, and share similar experimental noise (Wasserman, 1997). In reality, these conditions are difficult to meet as the collection of XANES data on a set of samples and standards may be spread over more than one beamtime period, possibly resulting in different experimental errors among samples (depending on beamline optimization and performance). In some cases, experimental data may even originate from a different beamline compared with standards, which may affect the performance of target analysis.

As with any quantitative analysis of chemical species in a mixture, the PCA approach and other data analysis methods for analyzing XANES data are limited by how well the available set of standards actually represent species in the samples of unknown composition. Target transformation can at least demonstrate how closely the selected standards fit the vector space defined by the experimental spectra. However, if minor structural or compositional differences between chemical species in a sample cannot be resolved by XANES spectroscopy, then data analysis is inherently limited by the technique. Humic acid, for example, comprises chemically and structurally complex molecules that may contain various forms of ester, amine, and C-bonded S, disulfides, and polysulfides (Stevenson, 1994, p. 113–140; Vairavamurthy et al., 1997). Vairavamurthy et al. (1997) and Morra et al. (1997) found that reduced organic S species such as disulfides and polysulfides could not be easily distinguished because of similarities in their S K-XANES spectra. Nevertheless, a reasonable approach to quantitative XANES analysis is to compile a database of spectra for chemically-meaningful standards, then use a combination of chemical insight and the PCA approach to choose the most appropriate subset of unique standards for quantitative fitting of spectra of unknown samples.


    CONCLUSIONS
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
In this study, PCA and target transformation were used (i) to determine the number of independent components in a data set of S K-XANES spectra, (ii) to identify which standards were most likely present in the humic acid samples, and (iii) to determine the proportion of each selected standard using the PCA approach. The selected standards and the associated scaling coefficients obtained from the PCA approach were comparable with best-fit results obtained with the same data using least-squares LCF on a large number of binary, ternary, and quaternary combinations. Target transformation further revealed that chitin sulfate was a better standard than Na2SO4 to explain our experimental data and that, although cysteic acid and benzyl disulfide were retained in the fitting, the nature of the real species in the spectra of humic acids may be slightly different.

The results showed that the PCA approach is a valuable tool to complement other modeling techniques commonly used in the interpretation of XANES spectra. It provides insight on the nature of the data while significantly saving time in the modeling. Principal component analysis provides a statistical basis for choosing the number of standards to include in the fitting, resulting in a more objective approach. Target transformation ranks and identifies the most likely standards so that the fitting (e.g., LCF) can be done on a smaller set of standards. The latter aspect constitutes an important asset when dealing with heterogeneous, multicomponents material such as soil or sediment. Depending on the nature of spectral data, the scaling coefficients generated from the PCA approach can be used directly, or as a first approximation of proportions of different chemical species present in different samples. Subsequent refined fitting obtained through least-squares methods may include energy offset parameters or constraints on the coefficients during the fitting analysis.


    ACKNOWLEDGMENTS
 
Research at NC State was conducted with funding from U.S. National Science Foundation Grant No. 9614920 and support from the North Carolina Agricultural Research Service (NC-ARS). The authors are grateful to Ms. Kimberley Hutchison for her assistance with linear combination fitting. We also thank Dr. Anne Légère (Agriculture and Agri-Food Canada, Quebec) and Dr. Stephen R. Wasserman (Argonne National Laboratory, Argonne, IL) for their constructive comments on an earlier version of the paper.


    NOTES
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Contribution no. 698.

Received for publication January 23, 2001.


    REFERENCES
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 




This article has been cited by other articles:


Home page
J. Environ. Qual.Home page
B. Ajiboye, O. O. Akinremi, Y. Hu, and D. N. Flaten
Phosphorus Speciation of Sequential Extracts of Organic Amendments Using Nuclear Magnetic Resonance and X-ray Absorption Near-Edge Structure Spectroscopies
J. Environ. Qual., October 16, 2007; 36(6): 1563 - 1576.
[Abstract] [Full Text] [PDF]


Home page
Soil Sci.Home page
B. Ajiboye, O. O. Akinremi, and A. Jurgensen
Experimental Validation of Quantitative XANES Analysis for Phosphorus Speciation
Soil Sci. Soc. Am. J., June 29, 2007; 71(4): 1288 - 1291.
[Abstract] [Full Text] [PDF]


Home page
Soil Sci.Home page
J. Kinyangi, D. Solomon, B. Liang, M. Lerotic, S. Wirick, and J. Lehmann
Nanoscale Biogeocomplexity of the Organomineral Assemblage in Soil: Application of STXM Microscopy and C 1s-NEXAFS Spectroscopy
Soil Sci. Soc. Am. J., August 22, 2006; 70(5): 1708 - 1718.
[Abstract] [Full Text] [PDF]


Home page
Can MineralHome page
M. E. Fleet
XANES SPECTROSCOPY OF SULFUR IN EARTH MATERIALS
Can Mineral, December 1, 2005; 43(6): 1811 - 1838.
[Abstract] [Full Text] [PDF]


Home page
Can MineralHome page
M. E. Fleet, X. Liu, S. L. Harmer, and P. L. King
SULFUR K-EDGE XANES SPECTROSCOPY: CHEMICAL STATE AND CONTENT OF SULFUR IN SILICATE GLASSES
Can Mineral, October 1, 2005; 43(5): 1605 - 1618.
[Abstract] [Full Text] [PDF]


Home page
Mineral MagHome page
H. A. L. Rowland, A. G. Gault, J. M. Charnock, and D. A. Polya
Preservation and XANES determination of the oxidation state of solid-phase arsenic in shallow sedimentary aquifers in Bengal and Cambodia
Mineralogical Magazine, October 1, 2005; 69(5): 825 - 839.
[Abstract] [Full Text] [PDF]


Home page
J. Environ. Qual.Home page
J. J. D'Amore, S. R. Al-Abed, K. G. Scheckel, and J. A. Ryan
Methods for Speciation of Metals in Soils: A Review
J. Environ. Qual., September 8, 2005; 34(5): 1707 - 1745.
[Abstract] [Full Text] [PDF]


Home page
J. Environ. Qual.Home page
K. G. Scheckel and J. A. Ryan
Spectroscopic Speciation and Quantification of Lead in Phosphate-Amended Soils
J. Environ. Qual., July 1, 2004; 33(4): 1288 - 1295.
[Abstract] [Full Text] [PDF]


Home page
American MineralogistHome page
P. A. O'Day, P. A. O'Day, N. Rivera Jr., R. Root, and S. A. Carroll
X-ray absorption spectroscopic study of Fe reference compounds for the analysis of natural sediments
American Mineralogist, April 1, 2004; 89(4): 572 - 585.
[Abstract] [Full Text] [PDF]


Home page
Soil Sci.Home page
N. Khare, D. Hesterberg, S. Beauchemin, and S.-L. Wang
XANES Determination of Adsorbed Phosphate Distribution between Ferrihydrite and Boehmite in Mixtures
Soil Sci. Soc. Am. J., March 1, 2004; 68(2): 460 - 469.
[Abstract] [Full Text] [PDF]


Home page
J. Environ. Qual.Home page
S. Beauchemin, D. Hesterberg, J. Chou, M. Beauchemin, R. R. Simard, and D. E. Sayers
Speciation of Phosphorus in Phosphorus-Enriched Agricultural Soils Using X-Ray Absorption Near-Edge Structure Spectroscopy and Chemical Fractionation
J. Environ. Qual., September 1, 2003; 32(5): 1809 - 1819.
[Abstract] [Full Text] [PDF]


Home page
Reviews in Mineralogy and GeochemistryHome page
G. E. Brown Jr. and N. C. Sturchio
An Overview of Synchrotron Radiation Applications to Low Temperature Geochemistry and Environmental Science
Reviews in Mineralogy and Geochemistry, January 1, 2002; 49(1): 1 - 115.
[Full Text] [PDF]


Home page
Reviews in Mineralogy and GeochemistryHome page
S. C. B. Myneni
Soft X-ray Spectroscopy and Spectromicroscopy Studies of Organic Molecules in the Environment
Reviews in Mineralogy and Geochemistry, January 1, 2002; 49(1): 485 - 579.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (38)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Beauchemin, S.
Right arrow Articles by Beauchemin, M.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Beauchemin, S.
Right arrow Articles by Beauchemin, M.
GeoRef
Right arrow GeoRef Citation
Agricola
Right arrow Articles by Beauchemin, S.
Right arrow Articles by Beauchemin, M.
Related Collections
Right arrow Soil Methods/Instrumentation
Right arrow Soil Analysis
Right arrow Soil Chemistry


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
The SCI Journals Agronomy Journal Crop Science
Journal of Natural Resources
and Life Sciences Education
Vadose Zone Journal
Journal of Plant Registrations Journal of
Environmental Quality
The Plant Genome