In probability theory and statistics, the central limit theorems, abbreviated as CLT,[1][2] are theorems about the limiting behaviors of aggregated probability distributions. They say that given a large number of independent random variables, their sum will follow a stable distribution. If the variance of the random variables is finite, then a Gaussian distribution will result. This is one of the reasons why this distribution is also known as normal distribution.
The best known and most important of these is known as the central limit theorem. It is about large numbers of random variables with the same distribution, each with an identical finite variance and expected value.
More specifically, if
are n identical and independently distributed random variables with mean
and standard deviation
, then the distribution of their sample mean,
, as n gets large, is approximately normal with mean
and standard deviation
.[3] Furthermore, the distribution of their sum,
, as n gets large, is also approximately normal, with mean
and standard deviation
.[2]
There are different generalisations of this theorem. Some of these generalisations no longer require an identical distribution of all random variables. In these generalisations, another precondition makes sure that no single random variable has a bigger influence on the outcome than the others. Examples are the Lindeberg and Lyapunov conditions.
The name of the theorem is based on a paper George Pólya written in 1920, About the Central Limit Theorem in Probability Theory and the Moment problem.[4]
Related pages
References
|
|---|
|
|
|---|
| Continuous data | |
|---|
| Count data | |
|---|
| Summary tables | |
|---|
| Dependence | |
|---|
| Graphics |
- Bar chart
- Biplot
- Box plot
- Control chart
- Correlogram
- Fan chart
- Forest plot
- Histogram
- Pie chart
- Q–Q plot
- Run chart
- Scatter plot
- Stem-and-leaf display
- Radar chart
- Violin plot
|
|---|
|
|
|
|---|
| Study design |
- Population
- Statistic
- Effect size
- Statistical power
- Optimal design
- Sample size determination
- Replication
- Missing data
|
|---|
| Survey methodology | |
|---|
| Controlled experiments | |
|---|
| Adaptive Designs |
- Adaptive clinical trial
- Up-and-Down Designs
- Stochastic approximation
|
|---|
| Observational Studies |
- Cross-sectional study
- Cohort study
- Natural experiment
- Quasi-experiment
|
|---|
|
|
|
|---|
| Statistical theory | |
|---|
| Frequentist inference | | Point estimation |
- Estimating equations
- Unbiased estimators
- Mean-unbiased minimum-variance
- Rao–Blackwellization
- Lehmann–Scheffé theorem
- Median unbiased
- Plug-in
|
|---|
| Interval estimation | |
|---|
| Testing hypotheses |
- 1- & 2-tails
- Power
- Uniformly most powerful test
- Permutation test
- Multiple comparisons
|
|---|
| Parametric tests |
- Likelihood-ratio
- Score/Lagrange multiplier
- Wald
|
|---|
|
|---|
| Specific tests | | | Goodness of fit | |
|---|
| Rank statistics |
- Sign
- Signed rank (Wilcoxon)
- Rank sum (Mann–Whitney)
- Nonparametric anova
- 1-way (Kruskal–Wallis)
- 2-way (Friedman)
- Ordered alternative (Jonckheere–Terpstra)
|
|---|
|
|---|
| Bayesian inference | |
|---|
|
|
|
|---|
| Correlation | |
|---|
| Regression analysis |
- Errors and residuals
- Regression validation
- Mixed effects models
- Simultaneous equations models
- Multivariate adaptive regression splines (MARS)
|
|---|
| Linear regression | |
|---|
| Non-standard predictors |
- Nonlinear regression
- Nonparametric
- Semiparametric
- Isotonic
- Robust
- Heteroscedasticity
- Homoscedasticity
|
|---|
| Generalized linear model | |
|---|
| Partition of variance |
- Analysis of variance (ANOVA, anova)
- Analysis of covariance
- Multivariate ANOVA
- Degrees of freedom
|
|---|
|
|
Categorical / Multivariate / Time-series / Survival analysis |
|---|
| Categorical |
- Cohen's kappa
- Contingency table
- Graphical model
- Log-linear model
- McNemar's test
- Cochran-Mantel-Haenszel statistics
|
|---|
| Multivariate |
- Regression
- Manova
- Principal components
- Canonical correlation
- Discriminant analysis
- Cluster analysis
- Classification
- Structural equation model
- Multivariate distributions
|
|---|
| Time-series | | General |
- Decomposition
- Trend
- Stationarity
- Seasonal adjustment
- Exponential smoothing
- Cointegration
- Structural break
- Granger causality
|
|---|
| Specific tests |
- Dickey–Fuller
- Johansen
- Q-statistic (Ljung–Box)
- Durbin–Watson
- Breusch–Godfrey
|
|---|
| Time domain |
- Autocorrelation (ACF)
- Cross-correlation (XCF)
- ARMA model
- ARIMA model (Box–Jenkins)
- Autoregressive conditional heteroskedasticity (ARCH)
- Vector autoregression (VAR)
|
|---|
| Frequency domain | |
|---|
|
|---|
| Survival | | Survival function |
- Kaplan–Meier estimator (product limit)
- Proportional hazards models
- Accelerated failure time (AFT) model
- First hitting time
|
|---|
| Hazard function | |
|---|
| Test | |
|---|
|
|---|
|
|
Applications |
|---|
| Biostatistics | |
|---|
| Engineering statistics |
- Chemometrics
- Methods engineering
- Probabilistic design
- Process / quality control
- Reliability
- System identification
|
|---|
| Social statistics | |
|---|
| Spatial statistics |
- Cartography
- Environmental statistics
- Geographic information system
- Geostatistics
- Kriging
|
|---|
|
|