Document datum02.doc, Ver. 20020725.
Datumshift, errorestimation and grosserror detection when using leastsquares collocation for geoid determination.
By
C.C.Tscherning
Department of Geophysics,
University of Copenhagen
Juliane Maries Vej 30
DK2100 Copenhagen Ø., Denmark
cct@gfy.ku.dk
Prepared for the International School on the
Determination and use of the Geoid.
Draft, July 2002.
Abstract. The use of LeastSquares Collocation for datumshift determination as well as grosserror detection is described. Examples using GEOCOL with data from the New Mexico test area are included.
1. Introduction.
When using leastsquares collocation (LSC) for gravity field modeling, an initial hypothesis is that the model (the approximation to the anomalous potential, ) is associated with a geocentric reference system, and that the zero and first order spherical harmonic coefficients are all zero. The data to be used to determine the model must then also refer to this system, or have to be transformed to the system. A well known example is height anomalies determined as the difference between ellipsoidal heights from GPS and normal heights determined from levelling.
All levelling datum have an inherent error, due to the fact that the zero levels have been fixed by a convention and not through a physical measurement. We will denote the error for a particular system by N_{0}. In geodetic practice this does not cause problems, because height differences are the quantity of interest. (In oceanography, however, the absolute heights are important.)
The ellipsoidal heights may also suffer from errors due to the fact that most positions are determined differentially, i.e. with respect to a set of reference points. If these reference points are in a nongeocentric system such as NAD83 a conversion to a geocentric system must be done, see e.g. (Smith and Milbert, 1999). This conversion is generally given as a 7parameter similarity transformation.
The parameters of this kind of transformation are easily estimated if we have two sets of 3dimensional positions either the Cartesian coordinates or latitude, longitude and ellipsoidal height, using a leastsquares adjustment.
When a purely gravimetric quasigeoid is compared to a surface constructed from height anomalies derived from GPS and levelling one will often note that the surfaces disagree, see e.g. Jiang and Duquenne (1996). Frequently they are related through a bias or tilt. In this case the two surfaces may be brought to a close agreement by estimating and applying the bias and the tilt. But these 3 quantities (height bias, tilt in East and West) corresponds exactly to a 3parameter datum shift using a translation of the center of the reference ellipsoid (D x, D y, D z), see e.g. Torge (2001, p. 319).
A simple adjustment used for the determination of the parameters will in general be sufficient to obtain an agreement between the gravimetric geoid and the height anomalies. One can say that the gravimetric geoid has been used as an interpolator to construct a height reference surface.
However, this procedure does not take into account that the two data types are physically correlated, so that both the gravimetric geoid and the height reference surface may be improved.
Furthermore, a simple 3parameter adjustment does not take into account the spatially varying quality of the gravimetric geoid or the spatial distribution of GPS/leveling derived height anomalies. This is however possible using LSC and we will in the following section describe how LSC can be used to estimate a gravimetric geoid, a corresponding height reference surface and N_{0} or datumshift parameters. The reference surface may be used to convert GPS ellipsoidal heights to normal heights in the used height system. (A further conversion to orthometric heights should then be made if necessary using standard procedures).
In LSC the errors in the data must (should) be taken into account. But in many cases are the errors not known, and the data may include grosserrors. The error estimates of leveled heights are generally only known as the error of the height differences relative to set of higher order bench marks. (See http://www.ngs.noaa.gov/GEOID/GPSonBM99/format99.txt ). Furthermore, the errors are correlated, and the variancecovariance matrix is in general not available. For ellipsoidal heights determined using GPS the error estimates available are also relative errors. These heights are frequently in error due to erroneous identification of the reference point or the antenna height.
LSC filters the data when used for the determination of . We may then as done in geodetic network adjustment inspect the residuals (see e.g. Pope, 1976), by using the model for the computation of the estimates of the data used to determine the model. (We are here taking advantage of the property of LSC, which for errorfree data will give errorfree estimates of these data). This requires that we have unbiased estimates, which we will not have due to the N_{0}problem and the possibly nongeocentric datum. Consequently we have to estimate N_{0} and the datum shift parameters simultaneously with the determination of the model. Another requirement is that we have a statistically homogeneous data distribution. Unfortunately this is often not true, but there are anyway (as described in section 4) possibilities for using the residuals for grosserror detection.
In the following section the theory will be briefly reviewed without proofs. (The reader will find proofs in (Moritz, 1980)). Then the theory is exemplified using the New Mexico test data also used in (Tscherning, 1994), plus GPS/levelling height anomalies from the area obtained from http://www.ngs.noaa.gov/GEOID/GPSonBM99/gpsbm99.html .
2. Parameter and errorestimation using LSC.
We will suppose that our observations y_{i} may be expressed as
where L_{i} is a linear functional, e_{i} is the error, A_{i} a vector of partial derivatives and X the vector of parameters, see (Tscherning, 1994, section 2). If the unknown parameter is equal to N_{0} then A_{i} = 1 for all values of i which are associated with height anomalies and 0.3 mgal/m for gravity anomalies.
If no parameters are present we obtain a LSC approximation to T by requiring the square of the norm of the model plus the variance of the noise to be minimalized. When parameters are present we also require the square of the norm of the parameter vector to be minimalised.
The covariance between two quantities will be denoted COV( L_{i}, L_{j} )= C_{ij}. If one of the functionals are the evaluation of T in a point P, we write COV(P, L_{i} )=C_{Pi}. The norm of a functional is denoted , the variancecovariance of the noise and , a n x n matrix, where n is the number of observations.
Then an estimate of T and of the parameters X are obtained as
where W is the apriori weight matrix for the parameters (Generally the zero matrix). The associated error estimates are with
the mean square error of the parameter vector
and the mean square error of an estimated quantity
.
These quantities look as being very complicated to evaluate, but a simple modification of Choleskys algorithm where the matrices A and P are used as extensions
solves the problem. (If observations such as differences between coordinates in a local datum and in a geocentric datum are used, as slight modification of the equations are necessary).
3. Datum shift and height system bias (N_{0}) estimation.
The program GEOCOL (Tscherning, 1974, 1976) may be used to estimate a gravity field model and one or more of the components of a 7parameter similarity transformation datumshift or of a datum shift given as the change in geoid height (or height anomaly) and deflections of the vertical in a datum origin. In case the parameters are (D x, D y, D z) we have (in spherical approximation)
where j is the latitude and l the longitude. Also a simple height system bias, N_{0} may be determined.
Here we should keep in mind, that the determination of a datumshift requires data which covers a large area, see Fig. 1. However, if the area is not large, this will be reflected in large error estimates .
This will be illustrated using 2920 gravity data and 20 height anomalies from the New Mexico White Sands test area. The height anomalies were obtained from ellipsoidal heights, h, determined by GPS and orthometric heights, H, in the North Americal Vertical Datum 1988 (NAVD88) converted to normal heights, N^{*}. Ellipsoidal heights were available both in the continental North American Datum 1983 (NAD83) and the global ITRF94 datum.
Initially the contribution from the EGM96 spherical harmonic model (Lemoine et al., 1998) complete to degree 360 were subtracted from both datatypes. The RTM effect (Torge, 2001, p. 288) was also subtracted, see Table 1.
TABEL 1. Smooting obtained by subtracting EGM96 and the residual topographic effects.
Height anomalies, NAD83 (20 values) 
Height anomalies ITRF94 (20 values) 
Gravity anomalies (2920 values) 
Original 
minusEGM96 
minusEGMTC 
Original 
minusEGM96 
minusEGMTC 
Original 
minusEGM96 
minusEGMTC 

m 
m 
m 
m 
m 
m 
mgal 
mgal 
mgal 

Mean 
23.22 
1.09 
1.24 
24.27 
0.04 
0.20 
9.2 
2.9 
0.3 
S.dev. 
1.08 
0.16 
0.13 
1.08 
0.16 
0.13 
30.4 
21.3 
13.2 
Min 
23.99 
0.72 
1.00 
25.06 
0.33 
0.05 
58.7 
74.8 
41.0 
Max 
19.88 
1.36 
1.49 
20.92 
0.30 
0.45 
162.5 
78.0 
45.7 
(It is preferable to use residual data because the spatial statistical distribution of the data becomes more homogeneous. The physical correlation between the residual height anomalies and gravity anomalies will also increase.)
A model was estimated from the reduced gravity data and used for the estimation of the height anomalies. The model was used to predict the residual height anomalies, see Table 2.
TABLE 2. Results of predicting height anomalies in NAD83 and in ITRF94 without datum corrections.
Differences observedpredicted (m) 
Height anomalies in ITRF94 
Height anomalies in NAD83 
mean 
0.26 
1.30 
Standard deviation 
0.06 
0.06 
Min 
0.19 
1.23 
max 
0.37 
1.41 
We can see that there is an excellent agreement between the height anomaly differences in all cases. This led us to adopt a mean error of 0.02 m for the height anomalies converted to an absolute zero origin. But there are considerable biases.
The 20 residual height anomalies were then added to the gravity anomalies and a new model was determined together with a 3parameter translational datumshift or a height bias shift, respectively, see Table 3.
TABLE 3. Result of determination of datumshift and of height bias N_{0 } from 2992 gravity anomalies and 20 height anomalies in NAD83 and in ITRF94.
Data 
NAD83 
ITRF94 
From 

Units: m 
Value 
Error estimate 
Value 
Error estimate 
Smith, Milbert 
N_{0 }only 
1.29 
0.03 
0.26 
0.03 
0.31 
D x 
5.94 
1.82 
4.84 
1.82 
0.97 
D y 
1.08 
0.91 
0.81 
0.92 
1.94 
D z 
3.36 
1.17 
3.76 
1.18 
0.55 
It is comforting to see the large errorestimates. We can not expect better results using data from such a small area. Formally the error of the estimated quasigeoid was improved from 0.07 m to 0.02 m when only N_{0} was determined, see Fig. 2 and 3, but it should be interpreted as the improvement of the height anomaly differences. The value of N_{0} for ITRF94 is in good agreement with the value found by Smith and Milbert(1999, Table 3) for the NAVD88 bias.
4. Verification of the errorestimates of the data and grosserror search.
The detection of possible grosserrors using the differences between observed values and values calculated using LSC has been sucesfully applied on a number of data types, see Tscherning, (1990, 1991), Albertella et al.(2000).
Using GEOCOL we may compute the estimates of the data used as input as well as the errorestimates. These aposteriori errorestimates will generally be smaller than the apriori ("input") error estimates. The estimate of the error of the difference will then be the Gaussian sum of the apriori error and the aposteriori error.
LSC will perform a filtering, so that if closely correlated data (typically data located close to the point with which the value is associated) differ from a value of a specific measurement, a residual larger than the error estimate of the difference will occur. If this is the case, we have found a "suspected gross error". A threshold may be defined, equal to e.g. 3 times the error of the difference. Now different factors may play a role: we may have under or overestimated the apriori error.
If the original data (supposed to be distributed in a regular manner in space) follow a normal distribution (may be verified from a histogram), then the ratio between the absolute value of the difference observed minus predicted and the aposteriori error estimate should follow a tdistribution. If it does not do this, something is wrong. GEOCOL will as output bin the residuals divided by the aposteriori error in 50 % intervals. The percentages of values in each bin may then be used as an indicator of the quality of the apriori errorestimate.
Unfortunately the errorestimates does not take into regard that the error will be small in areas with a smooth gravity field and large where the gravity field varies much.
The inspection of residuals is therefore quite problematic, and should only be used with great caution. Only residuals 5 or more times larger than the error of the difference should be regarded as grosserrors. These values should not necessarily be deleted, but they should have assigned a much larger apriori error, and the model should be determined again.
The procedure described above was used on the New Mexico residual freeair gravity anomalies. The result is summarized in Table 4.
TABLE 4. Number of suspected grosserrors as a function of apriori data noise standard deviation and percentage of residuals divided by the aposteriori error in 0.5 bins. Suspected gross errors are defined as measurements where the residual is 4.0 times larger than the aposteriori error (mgal).
Interval of ratio of absolute value of residual divided by the aposteriori error estimate.
Error 
Numb. 
00.5 
0.51.0 
1.01.5 
1.52.0 
2.02.5 
2.53.0 
3.03.5 
3.5 
0.2 
649 
17 
15 
12 
10 
8 
6 
5 
27 
0.4 
321 
22 
19 
14 
10 
8 
6 
5 
15 
1.0 
56 
34 
24 
16 
10 
7 
3 
2 
3 
It seem like the residuals for apriorierror 1.0 mgal is most close to a tdistribution. The error estimates of the height anomalies did also increase from 0.02 m to 0.05 m, which seems realistic. There are however too many (56) suspected grosserrors. One would normally expect below 1 %. The values of the differences observed minus predicted which are more than 6 mgal are found in Table 5.
Table 5. Suspected grosserrors with residuals larger than 6 mgal.
Number 
Latitude 
Longitude 
Height 
Reduced value 
apriori error 
Obspred 
Error of differences 
deg. 
deg. 
m 
mgal 
mgal 
mgal 
mgal 
16493 33.19160 105.23660 1791.0 13.94 1.00 6.41 1.17
15448 32.96660 106.55830 1316.0 38.24 1.00 6.59 1.15
15446 32.93830 106.54830 1208.0 17.90 1.00 7.90 1.22
15759 32.89330 107.52490 1621.0 16.37 1.00 6.66 1.12
15584 32.81110 107.27280 1294.0 3.52 1.00 6.01 1.14
5153 32.78490 105.62160 2328.0 22.31 1.00 7.51 1.12
15291 32.75660 106.39660 1191.0 6.98 1.00 12.96 1.23
15290 32.73990 106.39490 1034.0 42.07 1.00 14.17 1.20
15275 32.69330 106.45660 1201.0 22.66 1.00 7.06 1.26
15274 32.68990 106.39490 1025.0 41.06 1.00 9.87 1.23
15257 32.65490 106.00160 1220.0 25.18 1.00 7.21 1.15
15699 32.62160 107.71660 1662.0 18.54 1.00 10.29 1.20
5681 32.57660 107.70160 1817.0 15.63 1.00 8.95 1.14
3777 32.37330 106.60490 1522.0 16.22 1.00 7.77 1.14
3751 32.32330 106.62990 1497.0 10.01 1.00 6.81 1.11
13616 32.01990 106.60160 1165.0 5.63 1.00 8.77 1.12
2395 31.94160 106.52330 1480.0 15.93 1.00 6.16 1.12
_________________________________________________________________________________
All the station numbers above 10000 (12 out of 15) correspond to stations which are located inside the masses when using the RTM method. It seems that the method to correct for this introduces an error, which is disturbingly large. This means that we have been able to identify grosserrors, and even find a likely explanation for their occurrence.
5. Conclusion.
LSC may be used for datum shift determination or height system bias estimation by combining gravity data and height anomalies. However, in order to determine reliable estimates of a datumshift a large area is needed. This limits the applicability of the method.
It is furthermore required that we know the error standard deviations of the data and that grosserrors have been deleted or downweighted. The validity of the apriori error estimates as well as the occurrence of outliers may be investigated by comparing the aposteriori determined differences between observed and predicted values with the error estimate of the difference obtained using LSC.
6. References.
Albertella A., F.Migliaccio, F.Sanso' and C.C.Tscherning: Scientific Data Production Quality Assessment Using Local Spacewise Preprocessing. H.Suenkel (ED.) Eoetvos to mGal, Final report, pp. 313329, April 2000.
Jiang, Z. and H.Duquenne: On the combined adjustment of a gravimetrically determined geoid and GPS leveling stations., J. of Geodesy, Vol. 70, pp. 505514, 1996.
Pope, A.J.: The statistics of residuals and the detection of outliers. NOAA TR NOS NGS 30, NOS/NOAA, Rockville, Md., 1976.
Lemoine, F.G., S.C. Kenyon, J.K. Factor, R.G. Trimmer, N.K. Pavlis, D.S. Chinn, C.M. Cox, S.M. Klosko, S.B. Luthcke, M.H. Torrence, Y.M. Wang, R.G. Williamson, E.C. Pavlis, R.H. Rapp, and T.R. Olson, The Development of the Joint NASA GSFC and the National Imagery and Mapping Agency (NIMA) Geopotential Model EGM96, NASA/TP1998206861, Goddard Space Flight Center, Greenbelt, MD, July, 1998.
Moritz, H.: Advanced Physical Geodesy. H.Wichmann Verlag, Karlsruhe, 1980.
Smith, D.A. and D.G.Milbert: The GEOID96 high resolution geoid height model for the United States. J. of Geodesy, Vol. 73, pp. 219  236, 1999.
Torge, W.: Geodesy, 3. Ed., 2001.
Tscherning, C.C.: A FORTRAN IV Program for the Determination of the Anomalous Potential Using Stepwise Least Squares Collocation. Reports of the Department of Geodetic Science No. 212, The Ohio State University, Columbus, Ohio, 1974.
Tscherning, C.C.: Determination of DatumShift Parameters using Least Squares Collocation. Bollettino di Geodesia e Scienze Affini, Vol. XXXV, No. 2, pp. 173183, 1976.
Tscherning, C.C.: Local Approximation of the Gravity Potential by Least Squares Collocation. In: K.P.Schwarz (Ed.): Proceedings of the International Summer School on Local Gravity Field Approximation, Beijing, China, Aug. 21  Sept. 4, 1984. Publ. 60003, Univ. of Calgary, Calgary, Canada, pp. 277362, 1985.
Tscherning, C.C.: A strategy for grosserror detection in satellite altimeter data applied in the BalticSea area for enhanced geoid and gravity determination. Proc. 11th General meeting Nordic Geodetic Commission, Copenhagen, May 1990, pp. 90106, Korto
g Matrikelstyrelsen, Copenhagen, 1990.
Tscherning, C.C.: The use of optimal estimation for grosserror detection in databases of spatially correlated data. Bulletin d'Information, no. 68, pp.7989, Bureau Gravimetrique International, 1991.
Tscherning, C.C.: Geoid determination by leastsquares collocation using GRAVSOFT. Lecture Notes "Int. School of the Determination and Use of the Geoid", Milano, Oct., 1994, pp. 135  164, published by International Geoid Service, 1994.