SSSAJ Journal of Natural Resources and Life Sciences Education
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (25)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Pachepsky, Ya. A.
Right arrow Articles by Rawls, W.J.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Pachepsky, Ya. A.
Right arrow Articles by Rawls, W.J.
GeoRef
Right arrow GeoRef Citation
Agricola
Right arrow Articles by Pachepsky, Ya. A.
Right arrow Articles by Rawls, W.J.
Soil Science Society of America Journal 63:1748-1757 (1999)
© 1999 Soil Science Society of America

DIVISION S-5-PEDOLOGY

Accuracy and Reliability of Pedotransfer Functions as Affected by Grouping Soils

Ya. A. Pachepskya and W.J. Rawlsb

a USDA–ARS, Remote Sensing and Modeling Laboratory, Beltsville, MD 20705 and Duke University Phytotron, Duke University, Durham, NC 27708 USA
b USDA–ARS, Hydrology Laboratory, Beltsville, MD 20705 USA

ypachepsky{at}asrr.arsusda.gov


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 Materials and methods
 Results
 NOTES
 Discussion
 Summary and conclusions
 REFERENCES
 
Pedotransfer functions (PTFs; i.e., dependencies of soil water retention and soil hydraulic conductivity on basic soil parameters available from soil surveys) are widely used to predict soil functioning in agricultural and environmental systems. The reliability of PTFs needs to be assessed by examining the correspondence between measured and estimated data for data set(s) other than the one used to develop a PTF. Our objective was to see whether grouping according to taxonomic unit, soil moisture regime, soil temperature regime, and soil textural class would improve both PTF accuracy and reliability. We estimated soil water contents at matrix potentials of -33 kPa and -1500 kPa for the 447 soil samples from the Oklahoma National Resource Conservation Service database. Dry bulk density, the ratio of cation-exchange capacity (CEC) to clay content, and contents of clay, sand, coarse fragments, and organic matter were used as predictors. The Group Method of Data Handling (GMDH) was used to develop PTFs. To assess accuracy and reliability of the PTFs, we used cross-validation; i.e., repeated random splitting of the data set into subsets for development and validation. The PTF accuracy and reliability was quantified by the root mean square error in the development and validation data set, respectively. Grouping improved the accuracy of PTFs in most cases. None of the grouping criteria proved to be clearly superior. Although PTFs developed from the groups were more accurate than the PTFs developed from the whole database, they were not more reliable. Improving PTF reliability may be an issue distinctly different from improving PTF accuracy.

Abbreviations: {theta}33, volumetric soil water content at matric potential of -33 kPa • {theta}1500, volumetric soil water content at matric potential of -1500 kPa • ANN, artificial neural network • CEC, cation-exchange capacity • GMDH, Group Method of Data Handling • PTF, pedotransfer function • RMSE, root mean square error


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 Materials and methods
 Results
 NOTES
 Discussion
 Summary and conclusions
 REFERENCES
 
INDIRECT estimation of soil hydraulic parameters from readily available or easily measurable data has become an important part of data analysis for soil databases. The term pedotransfer function (PTF), introduced by Bouma (1989), is often used to describe equations that express dependencies of soil water retention and soil hydraulic conductivity on basic soil parameters available from soil surveys. Once established, PTFs can be used in hydrological models to predict soil functioning in agricultural and environmental systems. Pedotransfer functions have been successfully used to estimate regional crop yields (Haskett et al., 1996) and long-term crop production at a local scale (Timlin et al., 1996b), to assess watershed yields (Vertessy et al., 1993) or evapotranspiration losses (Famiglietti and Wood, 1994), to predict loss of chemicals from the soil root zone to ground water (Carsel et al., 1991), and to interpret data from passive microwave remote sensing (Gouweleeuw and van der Griend, 1996).

Pedotransfer functions derived to estimate volumetric soil water content at matric potentials of -33 kPa ({theta}33) and -1500 kPa ({theta}1500) are very popular for several reasons. First, adding {theta}1500 to the list of basic soil properties as an input in PTF improves the accuracy of water content estimates at other matric potentials (Rawls et al., 1991). Secondly, relationships between log-scaled soil matric potentials and volumetric soil water contents are nearly linear between {theta}33 and {theta}1500 (Campbell, 1974). Thirdly, {theta}33 and {theta}1500 estimates provide data about soil water retention in the range of soil water potentials that is most important for the growth and development of plants. Finally, estimations of saturated hydraulic conductivity are most often based on the value of air-filled porosity, defined as the difference between total porosity and {theta}33 (Ahuja et al., 1984). Thus, estimates {theta}33 can facilitate estimation of hydraulic conductivity.

Development of PTFs is an ongoing effort, and earlier results have been summarized in review papers (Rawls et al., 1991; van Genuchten and Leij, 1992; Timlin et al., 1996a). Current research issues in PTF development include (i) developing better mathematical expressions of PTF equations (ii) determining the most important basic soil parameters to be used as PTF inputs, and (iii) identifying soil groups that effectively improve PTF accuracy.

Regression equations are used to develop PTFs (Rawls et al., 1991). Because the most essential input variables can be found automatically using stepwise regression, initially, linear and polynomial regressions were applied. Preliminary transformation of PTF input variables, especially logarithmic transformation, was found useful (Williams et al., 1992a). Recently, artificial neural networks (ANNs) were found to produce regression equations that provided more accuracy in PTFs than polynomial regressions (Schaap and Bouten, 1996). The drawback of ANNs is that they do not provide an explicit procedure to select the most essential PTF input variables (Pachepsky et al., 1996).

Another method called Group Method of Data Handling (GMDH) has a built-in algorithm to retain only essential input variables in a flexible net of regression equations, which can be used to relate inputs to outputs (Pachepsky et al., 1998). Group Method of Data Handling is a technique of finding an approximate relationship between a set of input variables and an output variable (Farlow, 1984). When the number of input variables is very large, or the relationship between inputs and output is very complex, GMDH successfully competes with statistical regression (Hecht-Nielsen, 1990). The GMDH employs preliminary estimates of the output variable obtained from quadratic or cubic regression equations that include small subsets of input variables (only two or three variables in each subset). Although the accuracy of these preliminary estimates is low, it appears that such estimates can be better predictors of the output variable than some of the input variables. The best of these estimates are then included in the set of input variables, and again small subsets of variables from this set are used to build new estimates. After several iterations, this process produces a hierarchical network of polynomial regressions that (i) describes the relationship between the original inputs and the output with good accuracy (ii) includes only those original input variables that are related to the output, and (iii) has a relatively small number of coefficients compared with polynomial regressions that include all input variables. All three of these features are valuable in data analysis and forecast. More details on GMDH and an example of its application to develop PTFs using soil penetration resistance as an additional input variable have been presented by Pachepsky et al. (1998). Regression trees represent yet another technique of PTF building that can portray a nonlinear and conditional relationship between a soil hydraulic parameter and basic soil properties (Breiman et al., 1994; MacKenzie and Jacquier, 1997).

The PTF accuracy is assessed from the correspondence between measured and estimated data for the data set from which a PTF has been developed. Pedotransfer function accuracy has been characterized by various quantitative measures, such as the mean error, the standard deviation of the mean error, the mean squared error, determination coefficient R2, etc. (Leenhardt, 1995; Williams et al., 1992b; Tietje and Tarpenhinrichs, 1993; Kern, 1995). When soils are grouped by similarities in origin or properties, PTF accuracy improves. Examples of soil groups include lithomorphic classes (Franzmeier, 1991), hydraulic-functional horizons (Wösten et al., 1985), genetic classification (Leenhardt, 1995), texture (Clapp and Hornberger, 1978), and numerical soil classification (Williams et al., 1983).

In contrast to accuracy, the reliability of a PTF needs to be assessed from the correspondence between measured and estimated data for the data set(s) other than the one used to develop a PTF. The PTF reliability studies compared accuracies of several PTFs applied to a particular regional database. In an earlier study (Tietje and Tarpenhinrichs, 1996), PTFs developed from regional databases were quite reliable when applied in regions with similar soil and landscape history. Thus, taxonomy-based grouping has potential to improve both accuracy and reliability of PTFs.

Cross-validation, that is, splitting the available data set into development and validation subsets, can also be used to estimate the reliability of statistical regression models (Borowiak, 1987; Hjorth, 1994). Parameters of a model are found within the development subset, and the model is tested for accuracy within this subset. Then the model is tested separately against the data in the validation subset. This process is repeated several times after randomly resplitting the data to create additional development and validation subsets. Replicate tests then provide data for a statistical characterization of the reliability. To our knowledge, however, cross-validation reliability assessments have not been used in developing PTFs.

The objectives of this work were (i) to see whether grouping according to taxonomic unit, soil moisture regime, soil temperature regime, and soil textural class improved the accuracy of {theta}33 and {theta}1500 PTFs developed from the NRCS soil pedon database for Oklahoma, and (ii) to determine whether such grouping improves the reliability of PTFs as assessed by cross-validation.


    Materials and methods
 TOP
 ABSTRACT
 INTRODUCTION
 Materials and methods
 Results
 NOTES
 Discussion
 Summary and conclusions
 REFERENCES
 
Grouping Soils
We used the 447 pedons in the USDA–NRCS database from the state of Oklahoma. The soil basic properties chosen for the analysis were sand, clay, organic matter, and coarse fragment contents, a bulk density at -33 kPa, and the CEC/clay ratio. These properties have been shown to be the most important in predicting {theta}33 and {theta}1500 (Rawls et al., 1991). The data were obtained in digital form from CD-ROM (Soil Survey Staff, 1994). Based on the NRCS soil descriptions, we used four criteria to group the soils: (i) soil great group (ii) soil moisture regime (iii) soil temperature regime, and (iv) soil textural class.

Developing Pedotransfer Function with the Group Method of Data Handling
The Group Method Data Handling was used to develop the PTFs. The general functioning of the GMDH algorithm can be understood from the following example. Let the original data contain one column of observed values of y and N columns containing observed values of the independent variables x1, x2, ... , xN. The preliminary estimates are obtained using quadratic regressions

(1)

Each iteration consists of three steps. Step 1 consists of obtaining preliminary estimates of y using quadratic regressions. All independent variables x1, x2, ... , xN are taken two at a time to become u and v in Eq. [1], and regression polynomials (1) are constructed so that values of z best fit the dependent variable y. The resulting columns of zm values, m = 1, 2, N (N - 1)/2, contain estimates of y from each polynomial, and are interpreted as new variables that may have better predictive capability than the original x1, x2, ... , xN. Step 2 consists of screening out the least effective new variables using a statistical selection criteria (Farlow, 1984). The list of input variables is modified at the end of Step 2. Step 3 consists of testing whether the set of equations can be further improved. The smallest value of the selection criterion obtained from this iteration is compared with the smallest value obtained from the previous iteration. If an improvement is achieved, Steps 1 and 2 are repeated; otherwise the iterations stop and the network is built.

The version of the GMDH algorithm used in this study is coded in the commercial software ModelQuest (AbTech, Charlottesville, VA, 1996). This software uses three input variables at a time to obtain preliminary estimates from either linear combinations of input variables or cubic polynomials of two or three independent variables. The number of variables to retain in the input list is limited. Barron's (1984) criterion is used to screen out new variables and to stop the iterations. The whole data set is divided into development and validation data sets in a proportion specified by the user. The number of samples in a development data set should be at least twice as large as the number of coefficients in an equation to model these data. Both original input variables and the output variable are normalized to have zero mean and unit variance, and the normalized variables participate in network building.

Assessing Pedotransfer Function Accuracy and Reliability
To quantify the PTF accuracy, we used root mean square error (RMSE) and R2. Both statistics were calculated for development and validation subsets to characterize the accuracy and the reliability of PTF, respectively. The random splitting of data into the development and validation subsets was repeated 10 times. RMSE values were calculated for each of these 10 replications; and estimates of average RMSE values, along with the estimates of variances, were derived from the replications for the development and validation data sets. We used the default ModelQuest ratio of 3:1 to split data into development and validation sets in each replication. Statistical significance of differences was tested at the 0.05 significance level.


    Results
 TOP
 ABSTRACT
 INTRODUCTION
 Materials and methods
 Results
 NOTES
 Discussion
 Summary and conclusions
 REFERENCES
 
We found a wide variation of total numbers of samples in groups that could be formed according to these criteria. Trials showed that even when the GMDH algorithm was stopped after the first iteration, we typically had about 20 coefficients in the polynomial regression. Therefore, we decided to have the number of samples in any group greater than (or at least close to) 50, and were able to find enough samples to form the following groups: (i) Haplustolls, Argiustolls, Paleustolls, Udarents, and Ustochrepts1 by soil great groups (ii) aridic and udic soils by moisture regime (iii) thermic and mesic by temperature regime, and (iv) sandy loams, silt loams, and loams by texture class. The total numbers of samples and the ranges of {theta}33 and {theta}1500 are given in Table 1 for each group.


View this table:
[in this window]
[in a new window]
 
Table 1 Number of samples and ranges of estimated soil water retention parameters in data groups selected in the NRCS soil pedon data base for Oklahoma

 
Examples of the GMDH application to build PTFs to estimate water content at -33 kPa ({theta}33) and -1500 kPa ({theta}1500) for all Oklahoma data sets are shown in Fig. 1a and 1b , respectively. The GMDH needed one iteration to build the PTF equation for {theta}33 and two iterations for {theta}1500. Clay content, dry bulk density, and the ratio of CEC to clay content were selected as the best predictors to estimate {theta}33. To estimate {theta}1500, the GMDH algorithm created an auxiliary variable z1 that combined clay, sand, and organic matter contents, a dry bulk density at -33 kPa, and the ratio of CEC to clay content, and then combined this variable (z1) with the ratio of CEC to clay content.




View larger version (56K):
[in this window]
[in a new window]
 
Fig. 1 GMDH networks to estimate water contents (a) at -33 kPa {theta}33 and (b) at -1500 kPa {theta}1500 developed for Oklahoma soils. Variables x1 to x6 are obtained by normalizing original soil parameters, and the outputs of the networks are denormalized to obtain actual values of {theta}33 and {theta}1500

 
Accuracy Testing
Soil Water Content at -33 kPa
The average value of RMSE was 0.034 for the whole database. Among the groups, however, the average accuracy of PTFs was higher, and the lowest average RMSE values were about 0.018 (Fig. 2a) . The coefficients of variation of RMSE over replications varied among groups within a range of 2 to 20% and did not depend on the average RMSE.



View larger version (43K):
[in this window]
[in a new window]
 
Fig. 2 Accuracy of the GMDH networks built to estimate soil water contents at -33 kPa ({theta}33) for various groups of Oklahoma soils. Error bars show standard errors

 
Ranking grouping criteria by average RMSE in development data sets resulted in the sequence texture < moisture regime < temperature regime, where grouping by texture gave the lowest average RMSE of the {theta}33 estimates. Grouping by soil great group gave mixed results. Whereas Argiustolls, Paleustolls, Udarents, and Ustochrepts had the highest accuracy, and the same or better accuracy than the grouping by texture, Haplustalfs had the lowest accuracy compared with all other soil grouping (Fig. 2a).

The average value of R2 was 0.881 for the whole database. PTFs developed for textural groups explained a lower percentage of variability than PTFs developed according to other grouping criteria (Fig. 2b). None of the grouping criteria could subdivide data into groups so that the R2 in every group would be higher than the R2 for the whole database.

Sets of variables selected to estimate {theta}33 were different among replications within the same data group. Combinations of variables that have appeared most frequently are shown in Table 2 . There are marked differences among the data groups. Clay content is an essential predictor in all cases. Grouping by soil moisture regimes excludes dry bulk density from the list of essential PTF inputs. Whereas grouping by soil taxonomy or by soil regimes results in excluding soil organic matter content, grouping by soil texture includes soil organic matter in the list of PTF inputs. In sandy loams, the content of coarse fragments was included in the PTF along with predictors shown in Table 2.


View this table:
[in this window]
[in a new window]
 
Table 2 Variables typically selected to estimate soil water content at -33 kPa

 
Including {theta}1500 in the list of potential predictors for {theta}33 did not improve the {theta}33 estimates significantly (data not shown).

Soil Water Content at -1500 kPa
The average value of RMSE was 0.017 m3 m-3 for the whole database. In some data groups, the average accuracy of PTFs was higher than in the whole database, and in other data groups it was lower than in the whole database (Fig. 3a) . The lowest average RMSE values (0.007 m3 m-3) were reached in Ustochrepts and in sandy loams. The worst accuracy (average RMSE close to 0.020 m3 m-3) was in silt loams and in udic soils.



View larger version (38K):
[in this window]
[in a new window]
 
Fig. 3 Accuracy of the GMDH networks built to estimate soil water contents at -1500 kPa ({theta}1500) for various groups of Oklahoma soils. Error bars show standard errors

 
The average value of R2 was 0.953 for the whole database. PTFs developed for textural groups explained less percentage of variability than PTFs developed according to other grouping criteria (Fig. 3b). Grouping by soil temperature regime explained a larger percent of variability than using other grouping criteria or using the whole database.

Sets of variables selected to estimate {theta}1500 were different among replications within the same data group (Table 3) . Clay content is an essential predictor for all of the groups. Grouping by soil moisture regimes excludes dry bulk density from the list of essential PTF inputs. Grouping according to soil taxonomy or by soil regimes results in excluding soil organic matter content from the list of predictors. Silt loams and loams have soil organic matter as a predictor. In loams, the content of coarse fragments was included in PTFs along with predictors shown in Table 3.


View this table:
[in this window]
[in a new window]
 
Table 3 Variables typically selected to estimate soil water content at -1500 kPa

 
Reliability Testing
Soil Water Content at -33 kPa
The average value of RMSE was 0.035 m3 m-3 for the whole database. The coefficients of variation of RMSE over replications varied among groups within a range from 6 to 37% and did not depend on the average RMSE. The average RMSE values in data groups were mostly higher than those for the whole database (Fig. 2b). Groups of Argiustolls and soils with udic moisture regime were exceptions. In terms of reliability, there was no advantage in grouping soils, since none of the average RMSE values in data groups differed significantly from RMSE for the whole database.

Average values of R2 in validation data sets were smaller than in development data sets for both the whole database and for the groups. Grouping by moisture and temperature regimes had the advantage of a small decrease in R2 in the validation data sets, compared with R2 in the development data sets. The average values of R2 in validation data sets were significantly smaller than those in the development data sets in all textural groups and in two of five groups selected according soil taxonomy. The smallest average values of R2, 0.152 and 0.217, were found in validation data sets for sandy loam texture and Ustochrepts soil great group.

Soil Water Content at -1500 kPa
The reliability of the {theta}1500 estimates has been assessed from average RMSE and R2 values for the validation data sets. The average value of RMSE was 0.017 m3 m-3 for the whole database. The average RMSE values in data groups were mostly higher than those for the whole database (Fig. 3a). The group of Argiustolls samples, as well as sandy loams and loams, were exceptions. In terms of reliability, there was no advantage in grouping soils, since none of the average RMSE values in data groups differed significantly from RMSE for the whole database.

Average values of R2 in validation data sets were smaller than those in the development data sets for both the whole database and for the groups. Grouping by the temperature regime had the advantage of a small decrease in R2 in the validation data sets, compared with the R2 in the development data sets. The average values of R2 in the validation data sets were notably smaller than those in the development data sets in Ustochrepts and in sandy loams (Fig. 3b).

Relationships Between the Pedotransfer Function Accuracy, Pedotransfer Function Reliability, and the Number of Samples
The accuracy of the {theta}33 estimates seemed to depend on the number of samples in a group (Fig. 4a) . No such dependence was seen for the {theta}1500 estimates. The ratio of the average RMSE values in development and validation data sets was close to one when the number of samples in the group was >200 (Fig. 4b). With smaller numbers of samples, this ratio ranged from 1.1 to 4 for the {theta}33 estimates and from 1.1 to 2.5 for the {theta}1500 estimates. We did not find a dependence between RMSE values for the {theta}33 estimates in development and validation data sets (Fig. 4c). For the {theta}1500 estimates, the average RMSE values in validation data sets tended to increase as the corresponding average RMSE values in the development data sets increased. Values of R2 in development and validation data sets showed a dependence illustrated in Fig. 4d. When the R2 value in the development data set was 0.880 or more, the R2 value in validation data sets tended to be just slightly less. When the value of R2 was less than 0.880, the R2 value in validation data sets tended to be much smaller than in the development data sets.



View larger version (35K):
[in this window]
[in a new window]
 
Fig. 4 Reliability of GMDH networks to estimate soil water contents at -33 kPa ({theta}33, {circ}) and at -1500 kPa ({theta}1500, {triangledown}) as related to the accuracy of the networks and to the total number of samples

 

    Discussion
 TOP
 ABSTRACT
 INTRODUCTION
 Materials and methods
 Results
 NOTES
 Discussion
 Summary and conclusions
 REFERENCES
 
The accuracy of the {theta}33 estimates (0.02–0.04 m3 m-3) and the {theta}1500 estimates (0.01–0.02 m3 m-3) was comparable or better than the accuracy of published estimates from other databases. The RMSE values reported in literature range from 0.02 to 0.07 m3 m-3 both for {theta}33 and {theta}1500 (e.g., Bruand et al., 1994; Leenhardt, 1995; Bell and van Keulen, 1996; Ahuja et al., 1985; Shuh et al., 1988). The ratio of the RMSE of the estimates to the observed range of values was between 0.08 and 0.10 for {theta}33 and between 0.04 and 0.08 for {theta}1500.

The PTF accuracy for the groups was better than that of the whole database in most cases. These results are similar to the results of other researchers who estimated effects of various grouping on PTF accuracy (Franzmeier, 1991; Wösten et al., 1985; Pachepsky et al., 1992; Leenhardt, 1995; Clapp and Hornberger, 1978; Williams et al., 1983). In our study, the improvements in accuracy after grouping could be caused by similarity in water retention relations within the same group, by the smaller numbers of samples within groups, or by both of these reasons. Argiustolls, represented by the group of 111 samples, had a significantly better accuracy of {theta}33 estimates than mesic and aridic soils, represented by almost the same number of samples. Accuracy of {theta}33 estimates within textural groups was practically the same, in spite of large differences among the total numbers of samples in these groups. Similar observations could be made for {theta}1500. For example, the accuracy of {theta}1500 estimates within sandy loam and loam groups was the best, although they had the number of samples similar to that in the Haplustalf group. In general, the results of this study seem to indicate that grouping results in more accurate PTFs because of the similarity in PTF relations within groups.

We did not find a unique best way of grouping. Textural grouping and grouping by soil moisture regimes yielded better average {theta}33 accuracy than grouping by soil great groups or grouping by temperature regimes (Fig. 2). At the same time, within soil great groups, Argiustolls and Ustochrepts had better {theta}33 accuracy than textural groups. The situation is similar to the one observed by Leenhardt (1995), who found that genetic grouping yielded better accuracy of field capacity estimates in swelling soils, whereas grouping by soil horizons had better PTF accuracy in non-swelling soils. For the {theta}1500 estimates, we did not see any single grouping criteria that consistently yielded better accuracy than that for the whole database. Besides, some groups with relatively high accuracy of {theta}33 estimates had relatively low accuracy of {theta}1500 estimates and vice versa, the Haplustalf group and silty loam group being examples (Fig. 2 and 3). Since the same soil can be included in several groups formed according to different criteria, we suggest using several grouping criteria and to build PTFs for all the groups. When a PTF needs to be used for a particular soil, groups including this soil must be ranked by their PTF accuracy, and the best PTF must be used.

Basic soil variables picked by the GMDH to be included in the PTFs varied by groups. Different sets of basic variables were selected in groups formed with the same selection criterion (Table 2). Note that dry bulk density and organic matter content are often missing from the variables selected as predictors of {theta}33 (Table 2). This is not uncommon for studies on PTF development and reliability assessment. Rawls et al. (1982) excluded dry bulk density from the list of variables to estimate {theta}33. Williams et al. (1983) ranked soil basic parameters by their importance as water-retention predictors and found soil organic matter to be the 12th in the list. Petersen et al. (1968) indicated that soil organic C showed no relation to {theta}33. Saxton et al. (1986) noted that estimates of {theta}33 showed a small sensitivity to organic matter content, and that at potentials lower than -10 kPa, dry bulk density did not affect these estimates. Bruand et al. (1994) compared predictors of {theta}33 and found clay content to be as good as dry bulk density. Shuh et al. (1988) found estimates based solely on texture to be sufficiently reliable. Kern (1995) did not find significant difference in reliability of two PTFs that were developed from the same database with and without organic matter content as an input variable. Tietje and Tarpenhinrichs (1993) concluded that only high organic matter contents would preclude using PTFs that did not contain organic matter in the list of its inputs. Although organic matter content and dry bulk density are related to soil structure, they may have no close relation to soil particle arrangement defining water retention in some soils. Note that soil organic matter was picked to estimate {theta}33 for only the textural groups. One reason for this can be the relatively narrow range of sand and clay contents and textural components ratios in these groups defined according the USDA texture classification. The textural parameters may then have become less informative and all available information including the organic matter content is used to make up for this deficiency.

Several PTFs developed for {theta}1500 do not contain dry bulk density. The gravimetric soil water content at -1500 kPa often can be estimated very reliably (R2 > 0.900) for soils of known and uniform clay mineralogy. Our grouping criteria might provide the uniformity in clay mineralogy and this could cause the absence of dry bulk density in several PTFs in Table 3. Another reason for the dry bulk density absence may be that no significant improvements in PTFs can be achieved by converting textural components' contents to the volumetric basis by multiplying by bulk density values. This was observed, for example, by Van den Berg et al. (1997) in PTFs for {theta}1500.

The cross-validation of PTFs showed that RMSE of validation data sets in groups did not differ significantly from or even exceeded the RMSE of the whole database. The average validation RSME of the {theta}33 estimates was markedly less than the validation RMSE of the whole database in Argiustolls and udic soils (Fig. 2). However, this difference was not significant, and no significant differences were found between the average validation RMSE in the whole database and the average validation RMSE in any of the groups. Although PTFs in groups may be more accurate, when tested on independent data sets, they yield errors that are the same or worse than those of the equations developed from the whole statewide database. The high reliability of PTFs developed from large databases was observed by Tietje and Tarpenhinrichs (1993) and by Kern (1995). These authors compared the accuracy of several PTFs applied to their databases and concluded that PTFs developed by Rawls et al. (1982) from large databases worked the best when compared with PTFs developed from smaller databases.

Cross-validation of PTFs underscored the significance of the total number of samples in groups. In {theta}33 estimation, several groups with the total number of samples less than a hundred had an average validation RMSE that was much larger than the average development RMSE (Fig. 4b). Ustochrepts and sandy loams are the best examples of this trend. A reason for the large difference between validation and development RMSE values can be the relatively large number of coefficients in GMDH equations. Seventy-five percent of 50 to 70 data sets were not enough to find reliable values of these coefficients. The coefficient values were greatly affected by the random selection of the development subset. The GMDH equations developed for data groups larger than a hundred samples produced more reliable equations. Coefficients of these equations were less affected by the random selection of development data sets. Grouping by texture yields excellent accuracy but the lowest reliability of {theta}33 estimates.

Both accuracy and reliability assessment may be dependent on the measure used to characterize PTF. Besides RMSE and R2, mean error and mean relative error are often used (Leenhardt, 1995; Williams et al., 1983; Tietje and Tarpenhinrichs, 1993; Kern, 1995). The slope and the intercept in 1:1 diagrams and statistics of residuals can be used also. Selection of the measure depends on the intended PTF use. When PTFs are used in hydrologic models, an evaluation of the uncertainty in PTF with respect to the uncertainty in results of their application (Wösten and van Genuchten, 1988) is the most appropriate. Our results indicate that different measures may result in different ranking of PTFs by their accuracy and reliability, and a need to estimate the PTF reliability remains an important part of PTF development.


    Summary and conclusions
 TOP
 ABSTRACT
 INTRODUCTION
 Materials and methods
 Results
 NOTES
 Discussion
 Summary and conclusions
 REFERENCES
 
We developed PTFs to estimate water contents at -33 kPa and -1500 kPa from the 447 samples in the NRCS database of Oklahoma soil pedons. The PTFs were regression equations developed for the whole database and for data groups within the database selected according to soil taxonomy, soil water regime, soil temperature regime, and textural classes. Clay, sand, coarse fragments, organic matter contents, bulk density at -0.33 kPa, and the ratio of CEC to clay content were used as predictor variables. We proposed using the GMDH to develop the PTF regression equations. The GMDH algorithm selected the most significant predictor variables for the PTFs and produced explicit equations so that the relative importance of the selected input variables could be assessed.

We evaluated both PTF accuracy and reliability. To assess PTF reliability, we used cross-validation, or data-splitting. Each data set was split into development and validation subsets. PTF accuracy was quantified by the RMSE of the development data set, whereas PTF reliability was assessed from RMSE of the validation data set. We repeated this data splitting 10 times for each data set to estimate both average values and standard errors of RSME. Then we compared the accuracy and the reliability of PTF in groups to those in the whole data set.

Grouping improved the accuracy of PTFs in most cases, probably because of similarities in PTF relations within groups. None of the grouping criteria could be considered to be the best. Textural grouping and grouping by soil moisture regimes yielded better average {theta}33 accuracy than grouping by soil great orders or by temperature regimes. At the same time, within soil great groups, Argiustolls and Ustocrepts had better average {theta}33 accuracy than textural groups. Some groups with comparatively high accuracy of {theta}33 estimates had comparatively low accuracy of {theta}1500 estimates and vice versa. Since the same soil can be included in several groups formed according to different criteria, we suggested using several grouping criteria and to build PTFs for all the groups. When a PTF needs to be used for a particular soil, groups including this soil must be ranked by their PTF accuracy, and the best PTF must be used.

Although PTFs developed in groups were more accurate than PTFs developed from the whole database, they were not more reliable. The average RMSE in validation data sets for the groups did not differ significantly from the average RMSE in validation data sets for the whole database. Our results indicate that PTF accuracy does not have a close relation to PTF reliability, and that a separate investigation is needed to show that data grouping will produce an improvement in reliability of PTFs compared with that for the whole data set. Data splitting, or cross-validation, presents a viable technique for PTF reliability assessment.AbTech Corp. 1996; Gouweleeuw van de Griend 1996; MacKenzie Jacqier 1997


    ACKNOWLEDGMENTS
 
This study was a part of the NASA/ARS Southern Great Plain Hydrology Experiment (SPG97), which was conducted in the summer of 1997 and encompassed an area of 10000 km2 in the state of Oklahoma. The work on PTFs was conducted in response to the SPG97 objective to examine the utility of PTFs in connection with large-scale soil moisture mapping. We appreciate advice of Dr. Christine Evans.


    NOTES
 TOP
 ABSTRACT
 INTRODUCTION
 Materials and methods
 Results
 NOTES
 Discussion
 Summary and conclusions
 REFERENCES
 
1 Currently classified as Haplustepts. Back

Received for publication February 24, 1998.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 Materials and methods
 Results
 NOTES
 Discussion
 Summary and conclusions
 REFERENCES
 




This article has been cited by other articles:


Home page
Soil Sci.Home page
K. Parasuraman, A. Elshorbagy, and B. C. Si
Estimating Saturated Hydraulic Conductivity Using Genetic Programming
Soil Sci. Soc. Am. J., September 28, 2007; 71(6): 1676 - 1684.
[Abstract] [Full Text] [PDF]


Home page
Vadose Zone JHome page
V. Sheikh and E. E. van Loon
Comparing Performance and Parameterization of a One-Dimensional Unsaturated Zone Model across Scales
Vadose Zone J., August 23, 2007; 6(3): 638 - 650.
[Abstract] [Full Text] [PDF]


Home page
Soil Sci.Home page
A. Nemes, W. J. Rawls, and Y. A. Pachepsky
Influence of Organic Matter on the Estimation of Saturated Hydraulic Conductivity
Soil Sci. Soc. Am. J., June 28, 2005; 69(4): 1330 - 1337.
[Abstract] [Full Text] [PDF]


Home page
Soil Sci.Home page
C. A. Seybold, R. B. Grossman, and T. G. Reinsch
Predicting Cation Exchange Capacity for Soil Survey Using Linear Models
Soil Sci. Soc. Am. J., May 6, 2005; 69(3): 856 - 863.
[Abstract] [Full Text] [PDF]


Home page
Soil Sci.Home page
C. M. P. Vaz, M. de Freitas Iossi, J. de Mendonca Naime, A. Macedo, J. M. Reichert, D. J. Reinert, and M. Cooper
Validation of the Arya and Paris Water Retention Model for Brazilian Soils
Soil Sci. Soc. Am. J., April 11, 2005; 69(3): 577 - 583.
[Abstract] [Full Text] [PDF]


Home page
Soil Sci.Home page
J. Tomasella, Ya. Pachepsky, S. Crestana, and W. J. Rawls
Comparison of Two Techniques to Develop Pedotransfer Functions for Water Retention
Soil Sci. Soc. Am. J., July 1, 2003; 67(4): 1085 - 1092.
[Abstract] [Full Text] [PDF]


Home page
Vadose Zone JHome page
H. Lin and H. Lin
Hydropedology: Bridging Disciplines, Scales, and Data
Vadose Zone J., February 1, 2003; 2(1): 1 - 11.
[Abstract] [Full Text] [PDF]


Home page
Soil Sci.Home page
B. Minasny and A. B. McBratney
The Neuro-m Method for Fitting Neural Network Parametric Pedotransfer Functions
Soil Sci. Soc. Am. J., March 1, 2002; 66(2): 352 - 361.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (25)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Pachepsky, Ya. A.
Right arrow Articles by Rawls, W.J.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Pachepsky, Ya. A.
Right arrow Articles by Rawls, W.J.
GeoRef
Right arrow GeoRef Citation
Agricola
Right arrow Articles by Pachepsky, Ya. A.
Right arrow Articles by Rawls, W.J.


HOME