Large Sample Properties

views updated

Large Sample Properties

In empirical work, researchers typically use estimators of parameters, test statistics, or predictors to learn about a given feature of an underlying model; these estimators are functions of random variables, and as such are themselves random variables. Data are used to obtain estimates, which are realizations of the corresponding estimators—that is, random variables. Ordinarily, the researcher has available only a single sample of n observations and obtains a single estimate based on this sample; the researcher then wishes to make inferences about the underlying feature of interest. Inference involves the estimation of a confidence interval, a p-value, or a prediction interval, and it requires knowledge about the sampling distribution of the estimator that has been used.

In a small number of cases, exact distributions of estimators can be derived for a given sample size n. For example, in the classical linear regression model, if errors are assumed to be identically, independently, and normally distributed, ordinary least squares estimators of the intercept and slope parameters can be shown to be normally distributed with variance that depends on the variance of the error terms, which can be estimated by the sample variance of the estimated residuals. In most cases, however, exact results for the sampling distributions of estimators with a finite sample are unavailable; examples include maximum likelihood estimators and most nonparametric estimators.

Large sample, or asymptotic, properties of estimators often provide useful approximations of sampling distributions of estimators that can be reliably used for inference-making purposes. Consider an estimator

of some quantity θ. The subscript n denotes the fact that θ^_n is a function of the n random variables Y₁, …, Y_n this suggests an infinite sequence of estimators for n = 1, 2, …, each based on a different sample size. The large sample properties of an estimator θ^_n determine the limiting behavior of the sequence {θ^;_n | n = 1, 2, …} as n goes to infinity, denoted n → ∞. Although the distribution of θ^_n may be unknown for finite n, it is often possible to derive the limiting distribution of θ^_n as n → ∞. The limiting distribution can then be used as an approximation to the distribution of θ^_n when n is finite in order to estimate, for example, confidence intervals. The practical usefulness of this approach depends on how closely the limiting, asymptotic distribution of θ^_n approximates the finite-sample distribution of the estimator for a given, finite sample size n. This depends, in part, on the rate at which the distribution of θ^_n converges to the limiting distribution, which is related to the rate at which θ^_n converges to θ.

CONSISTENCY

The most fundamental property that an estimator might possess is that of consistency. If an estimator is consistent, then more data will be informative; but if an estimator is inconsistent, then in general even an arbitrarily large amount of data will offer no guarantee of obtaining an estimate “close” to the unknown θ. Lacking consistency, there is little reason to consider what other properties the estimator might have, nor is there typically any reason to use such an estimator.

An estimator θ^_n of θ is said to be weakly consistent if the estimator converges in probability, denoted

This occurs whenever lim _{n → ∞} P (|θ^ – θ|< ε) = 1

for any ε > 0. Other, stronger types of consistency have also been defined, as outlined by Robert J. Serfling in Approximation Theorems of Mathematical Statistics (1980). Convergence in probability means that, for any arbitrarily small (but strictly positive) ε, the probability of obtaining an estimate different from θ by more than ε in either direction tends to 0 as n → ∞.

Note that weak consistency does not mean that it is impossible to obtain an estimate very different from θ using a consistent estimator with a very large sample size. Rather, consistency is an asymptotic, large sample property; it only describes what happens in the limit. Although consistency is a fundamental property, it is also a minimal property in this sense. Depending on the rate, or speed, with which θ^_n converges to θ, a particular sample size may or may not offer much hope of obtaining an accurate, useful estimate.

A sequence of random variables {θ^_n| n = 1, 2, … } with distribution functions F_n is said to converge in distribution to a random variable θ^ with distribution function F if, for any ε > 0, there exists an integer n₀ = n ₀(ε) such that at every point of continuity t of F,|F_n (t ) – F_(t)|<ε for all n ≥ n ₀. Convergence in probability implies convergence in distribution, which is denoted by .

Often, weakly consistent estimators that can be written as scaled sums of random variables have distributions that converge to a normal distribution. The Lindeberg-Levy Central Limit Theorem establishes such a result for the sample mean: If, Y _1, Y_2, … Y_n are independent draws from a population with mean µ and finite variance σ², then the sample mean

may be used to estimate µ, and

The factor n ^½ is the rate of convergence of the sample mean, and it serves to scale the left-hand side of the above expression so that its limiting distribution, as n → ∞, is stable—in this instance, a standard normal distribution. This result allows one to make inference about the population mean µ —even when the distribution from which the data are drawn is unknown—by taking critical values from the standard normal distribution rather than the often unknown, finite-sample distribution F_n.

Standard, parametric estimation problems typically yield estimators that converge in probability at the rate n ^½. This provides a familiar benchmark for gauging convergence rates of other estimators. The fact that the sample mean converges at rate n ^½ means that fewer observations will typically be needed to obtain statistically meaningful results than would be the case if the convergence rate were slower. However, the quality of the approximation of the finite-sample distribution of a sample mean by the standard normal is determined by features such as skewness or kurtosis of the distribution from which the data are drawn. In fact, the finite sample distribution function F_n (or the density or the characteristic functions) of the sample mean can be written as an asymptotic expansion, revealing how features of the data distribution affect the quality of the normal approximation suggested by the central limit theorem. The best-known of these expansions is the Edgeworth expansion, which yields an expansion of F_n in terms of powers of n and higher moments of the distribution of the data. Among those who explain these principles in detail are Harald Cramér in Biometrika (1972), Ole E. Barndorff-Nielsen and David Roxbee Cox in Inference and Asymptotics (1994), and Pranab K. Sen and Julio M. Singer in Large Sample Methods in Statistics: An Introduction with Applications (1993).

Many nonparametric estimators converge at rates slower than n ^½. For example, the Nadarya-Watson kernel estimator (Nadarya 1964; Watson 1964) and the local linear estimator (Fan and Gijbels 1996) of the conditional mean function converge at rate n^1/(4+d), where d is the number of unique explanatory variables (not including interaction terms); hence, even with only one right-hand side variable, these estimators converge at a much slower rate, n ^1/5, than typical parametric estimators. Moreover, the rate of convergence becomes slower with increasing dimensionality, a phenomenon often called the curse of dimensionality. Another example is provided by data envelopment analysis (DEA) estimators of technical efficiency; under certain assumptions, including variable returns to scale, these estimators converge at rate n^2/(1+d), where d the number of inputs plus the number of outputs. Léopold Simar and Paul W. Wilson discuss this principle in the Journal of Productivity Analysis (2000).

The practical implications of the rate of convergence of an estimator with a convergence rate slower than n ^½ can be seen by considering how much data would be needed to achieve the same stochastic order of estimation error that one would achieve with a parametric estimator converging at rate n ^½ while using a given amount of data. For example, consider a bivariate regression problem with n = 20 observations. Using a nonparametric kernel estimator or a local linear estimator, one would need m observations to attain the same stochastic order of estimation error that would be achieved with parametric, ordinary least-squares regression; setting m ^1/5 = 20 ^½ yields m ≈ 1,789.

The large sample properties of parametric and nonparametric estimators offer an interesting trade-off. Parametric estimators offer fast convergence, therefore it is possible to obtain meaningful estimates with smaller amounts of data than would be required by nonparametric estimators with slower convergence rates. But this is valid only if the parametric model that is estimated is correctly specified; if not, there is specification error, raising the question of whether the parametric estimator is consistent. On the other hand, nonparametric estimators largely avoid the risk of specification error, but often at the cost of slower convergence rates and hence larger data requirements. The convergence rate achieved by a particular estimator determines what might reasonably be considered a “large sample” and whether meaningful estimates might be obtained from a given amount of data.

CENTRAL LIMIT THEOREM

Aris Spanos, in his book Probability Theory and Statistical Inference: Econometric Modeling with Observational Data (1999, pp. 464–465), lists several popular misconceptions concerning the large sample properties of estimators. It is sometimes claimed that the central limit theorem ensures that various distributions converge to a normal distribution in cases where they do not. The Lindeberg-Levy central limit theorem concerns a particular scaled sum of random variables, but only under certain restrictions (e.g., finite variance). Other scaled summations may have different limiting distributions. Spanos notes that there is a central limit theorem for every member of the Levy-Khintchine family of distributions that includes not only the normal Poisson, and Cauchy distributions, but also a set of infinitely divisible distributions. In addition, continuous functions of scaled summations of random variables converge to several well-known distributions, including the chi-square distribution in the case of quadratic functions.

SEE ALSO Central Limit Theorem; Demography; Maximum Likelihood Regression; Nonparametric Estimation; Sampling

BIBLIOGRAPHY

Barndorff-Nielsen, Ole E., and David Roxbee Cox. 1989.Asymptotic Techniques for Use in Statistics. London: Chapman and Hall.

Barndorff-Nielsen, Ole E., and David Roxbee Cox. 1994. Inference and Asymptotics. London: Chapman and Hall.

Cramér, Harald. 1972. On the History of Certain Expansions Used in Mathematical Statistics. Biometrika 59 (1): 205–207.

Fan, Jianqing, and Irène Gijbels. 1996. Local Polynomial Modelling and Its Applications. London: Chapman and Hall.

Nadarya, E. A. 1964. On Estimating Regression. Theory of Probability and Its Applications 10: 186–190.

Sen, Pranab K., and Julio M. Singer. 1993. Large Sample Methods in Statistics: An Introduction with Applications. New York: Chapman and Hall.

Serfling, Robert J. 1980. Approximation Theorems of Mathematical Statistics. New York: Wiley.

Simar, Léopold, and Paul W. Wilson. 2000. Statistical Inference in Nonparametric Frontier Models: The State of the Art. Journal of Productivity Analysis 13 (1): 49–78.

Spanos, Aris. 1999. Probability Theory and Statistical Inference: Econometric Modeling with Observational Data. Cambridge, U.K.: Cambridge University Press.

Watson, G. S. 1964. Smooth Regression Analysis. Sankhya, series A, 26: 359–372.

Paul W. Wilson

International Encyclopedia of the Social Sciences