Glossary of probability and statistics

A statistical study in which the objective is to measure the effect of some variable on an outcome relative to a different variable. For example, how will my headache feel if I take aspirin, versus if I do not take aspirin? Causal studies may be either experimental or observational.^[1]

central limit theorem

chi-squared distribution

chi-squared test

concomitants

In a statistical study, concomitants are any variables whose values are unaffected by treatments, such as a unit’s age, gender, and cholesterol level before starting a diet (treatment).^[1]

conditional distribution

Given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X (written "Y | X") is the probability distribution of Y when X is known to be a particular value

conditional probability

The probability of some event A, assuming event B. Conditional probability is written P(A|B), and is read "the probability of A, given B"

confidence interval

In inferential statistics, a CI is a range of plausible values for the population mean.^[2] For example, based on a study of sleep habits among 100 people, a researcher may estimate that the overall population sleeps somewhere between 5 and 9 hours per night. This is different from the sample mean, which can be measured directly.

confidence level

Also known as a confidence coefficient, the confidence level indicates the probability that the confidence interval (range) captures the true population mean. For example, a confidence interval with a 95 percent confidence level has a 95 percent chance of capturing the population mean. Technically, this means that, if the experiment were repeated many times, 95 percent of the CIs would contain the true population mean.^[2]

continuous variable

correlation

Also called correlation coefficient, a numeric measure of the strength of linear relationship between two random variables (one can use it to quantify, for example, how shoe size and height are correlated in the population). An example is the Pearson product-moment correlation coefficient, which is found by dividing the covariance of the two variables by the product of their standard deviations. Independent variables have a correlation of 0

count data

Data arising from counting that can take only non-negative integer values

covariance

Given two random variables X and Y, with expected values

E(X)=\mu

and

E(Y)=\nu

, covariance is defined as the expected value of random variable

(X - \mu) (Y - \nu)

, and is written

\operatorname{cov}(X, Y)

. It is used for measuring correlation

D

data

data analysis

data set

A sample and the associated data points

data point

A typed measurement — it can be a Boolean value, a real number, a vector (in which case it's also called a data vector), etc

degrees of freedom

dependent variable

descriptive statistics

E

elementary event

An event with only one element. For example, when pulling a card out of a deck, "getting the jack of spades" is an elementary event, while "getting a king or an ace" is not

estimator

A function of the known data that is used to estimate an unknown parameter; an estimate is the result from the actual application of the function to a particular set of data. The mean can be used as an estimator

expected value

The sum of the probability of each possible outcome of the experiment multiplied by its payoff ("value"). Thus, it represents the average amount one "expects" to win per bet if bets with identical odds are repeated many times. For example, the expected value of a six-sided die roll is 3.5. The concept is similar to the mean. The expected value of random variable X is typically written E(X) for the operator and

\mu

(mu) for the parameter

experiment

Any procedure that can be infinitely repeated and has a well-defined set of outcomes

event

A subset of the sample space (a possible experiment's outcome), to which a probability can be assigned. For example, on rolling a die, "getting a five or a six" is an event (with a probability of one third if the die is fair)

F

frequency distribution

G

grouped data

H

histogram

I

independent variable

J

joint distribution

Given two random variables X and Y, the joint distribution of X and Y is the probability distribution of X and Y together

joint probability

The probability of two events occurring together. The joint probability of A and B is written

P(A\cap B)

P(A, \ B).

K

kurtosis

A measure of the "peakedness" of the probability distribution of a real-valued random variable. Higher kurtosis means more of the variance is due to infrequent extreme deviations, as opposed to frequent modestly sized deviations

L

likelihood function

A conditional probability function considered a function of its second argument with its first argument held fixed. For example, imagine pulling a numbered ball with the number k from a bag of n balls, numbered 1 to n. Then you could describe a likelihood function for the random variable N as the probability of getting k given that there are n balls : the likelihood will be 1/n for n greater or equal to k, and 0 for n smaller than k. Unlike a probability distribution function, this likelihood function will not sum up to 1 on the sample space

M

marginal distribution

Given two jointly distributed random variables X and Y, the marginal distribution of X is simply the probability distribution of X ignoring information about Y

marginal probability

The probability of an event, ignoring any information about other events. The marginal probability of A is written P(A). Contrast with conditional probability

mean

1. The expected value of a random variable

2. The arithmetic mean is the average of a set of numbers, or the sum of the values divided by the number of values

mode

multimodal distribution

multivariate random variable

A vector whose components are random variables on the same probability space

mutual exclusivity

mutual independence

A collection of events is mutually independent if for any subset of the collection, the joint probability of all events occurring is equal to the product of the joint probabilities of the individual events. Think of the result of a series of coin-flips. This is a stronger condition than pairwise independence

N

non-sampling error

normal distribution

null hypothesis

The statement being tested in a test of statistical significance Usually the null hypothesis is a statement of 'no effect' or 'no difference'."^[3] For example, if one wanted to test whether light has an effect on sleep, the null hypothesis would be that there is no effect. It is often symbolized as H₀.

O

outlier

P

pairwise independence

A pairwise independent collection of random variables is a set of random variables any two of which are independent

parameter

Can be a population parameter, a distribution parameter, an unobserved parameter (with different shades of meaning). In statistics, this is often a quantity to be estimated

In Bayesian inference, this represents prior beliefs or other information that is available before new data or observations are taken into account

population parameter

See parameter

posterior probability

The result of a Bayesian analysis that encapsulates the combination of prior beliefs or information with observed data

probability

probability density

Describes the probability in a continuous probability distribution. For example, you can't say that the probability of a man being six feet tall is 20%, but you can say he has 20% of chances of being between five and six feet tall. Probability density is given by a probability density function. Contrast with probability mass

probability density function

Gives the probability distribution for a continuous random variable

probability distribution

A function that gives the probability of all elements in a given space: see List of probability distributions

probability measure

The probability of events in a probability space

probability plot

probability space

A sample space over which a probability measure has been defined

Q

quantile

quartile

R

random variable

A measurable function on a probability space, often real-valued. The distribution function of a random variable gives the probability of different results. We can also derive the mean and variance of a random variable

range

The length of the smallest interval which contains all the data

responses

In a statistical study, any variables whose values may have been affected by the treatments, such as cholesterol levels after following a particular diet for six months.^[1]

S

sample

That part of a population which is actually observed

sample mean

The arithmetic mean of a sample of values drawn from the population. It is denoted by

{\overline {x}}

. An example is the average test score of a subset of 10 students from a class. Sample mean is used as an estimator of the population mean, which in this example would be the average test score of all of the students in the class.

sample space

The set of possible outcomes of an experiment. For example, the sample space for rolling a six-sided die will be {1, 2, 3, 4, 5, 6}

sampling

A process of selecting observations to obtain knowledge about a population. There are many methods to choose on which sample to do the observations

sampling distribution

The probability distribution, under repeated sampling of the population, of a given statistic

A measure of the asymmetry of the probability distribution of a real-valued random variable. Roughly speaking, a distribution has positive skew (right-skewed) if the higher tail is longer and negative skew (left-skewed) if the lower tail is longer (confusing the two is a common error)

spaghetti plot

standard deviation

The most commonly used measure of statistical dispersion. It is the Square root of the variance, and is generally written

\sigma

(Sigma)

standard error

standard score

statistic

The result of applying a statistical algorithm to a data set. It can also be described as an observable random variable

statistical graphics

statistical hypothesis testing

statistical independence

Two events are independent if the outcome of one does not affect that of the other (for example, getting a 1 on one die roll does not affect the probability of getting a 1 on a second roll). Similarly, when we assert that two random variables are independent, we intuitively mean that knowing something about the value of one of them does not yield any information about the value of the other

statistical inference

Inference about a population from a random sample drawn from it or, more generally, about a random process from its observed behavior during a finite period of time

statistical model

statistical population

A set of entities about which statistical inferences are to be drawn, often based on random sampling. One can also talk about a population of measurements or values

statistical dispersion

Statistical variability is a measure of how diverse some data is. It can be expressed by the variance or the standard deviation

statistical parameter

A parameter that indexes a family of probability distributions

statistical significance

statistics

stem-and-leaf display

symmetric probability distribution

systematic sampling

T

treatments

Variables in a statistical study that are conceptually manipulable. For example, in a health study, following a certain diet is a treatment whereas age is not.^[1]

trial

Can refer to each individual repetition when talking about an experiment composed of any fixed number of them. As an example, one can think of an experiment being any number from one to n coin tosses, say 17. In this case, one toss can be called a trial to avoid confusion, since the whole experiment is composed of 17 ones.

U

units

In a statistical study, the objects to which treatments are assigned. For example, in a study examining the effects of smoking cigarettes, the units would be people.^[1]

V

variance

A measure of its statistical dispersion of a random variable, indicating how far from the expected value its values Typically are. The variance of random variable X is typically designated as

\operatorname{var}(X)

\sigma_X^2

, or simply

\sigma ^{2}

Z

References

1 2 3 4 5 Reiter, Jerome (January 24, 2000). "Using Statistics to Determine Causal Relationships". American Mathematical Monthly. doi:10.2307/2589374.
1 2 Pav Kalinowski. Understanding Confidence Intervals (CIs) and Effect Size Estimation. Association for Psychological Science Observer April 10, 2010. http://www.psychologicalscience.org/index.php/publications/observer/2010/april-10/understanding-confidence-intervals-cis-and-effect-size-estimation.html
↑ Moore, David; McCabe, George (2003). Introduction to the Practice of Statistics (4 ed.). New York: W.H. Freeman and Co. p. 438. ISBN 9780716796572.

External links

"A Glossary of DOE Terminology", NIST/SEMATECH e-Handbook of Statistical Methods, NIST, retrieved 28 February 2009
Statistical glossary, "statistics.com", retrieved 28 February 2009
Probability and Statistics on the Earliest Uses Pages (Univ. of Southampton)

Statistics

Descriptive statistics

Continuous data

Center	Mean arithmetic geometric harmonic Median Mode

Dispersion	Variance Standard deviation Coefficient of variation Percentile Range Interquartile range

Shape	Moments Skewness Kurtosis L-moments

Count data

Index of dispersion

Summary tables

Dependence

Graphics

Data collection

Study design	Population Statistic Effect size Statistical power Sample size determination Missing data

Survey methodology	Sampling Standard error stratified cluster Opinion poll Questionnaire

Controlled experiments	Design control optimal Controlled trial Randomized Random assignment Replication Blocking Interaction Factorial experiment

Uncontrolled studies	Observational study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Point estimation	Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in

Interval estimation	Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife

Testing hypotheses	1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons

Parametric tests	Likelihood-ratio Wald Score

Specific tests

Z (normal) Student's t-test F

Goodness of fit	Chi-squared Kolmogorov–Smirnov Anderson–Darling Normality (Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC

Rank statistics	Sign Sample median Signed rank (Wilcoxon) Hodges–Lehmann estimator Rank sum (Mann–Whitney) Nonparametric anova 1-way (Kruskal–Wallis) 2-way (Friedman) Ordered alternative (Jonckheere–Terpstra)

Bayesian inference

Correlation	Pearson product–moment Partial correlation Confounding variable Coefficient of determination

Regression analysis	Errors and residuals Regression model validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)

Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression

Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Heteroscedasticity Homoscedasticity

Generalized linear model	Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions

Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality

Specific tests	Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey

Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR)

Frequency domain	Spectral density estimation Fourier analysis Wavelet

Survival

Survival function	Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time

Hazard function	Nelson–Aalen estimator

Test	Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics

Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification

Social statistics	Actuarial science Census Crime statistics Demography Econometrics National accounts Official statistics Population statistics Psychometrics

Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

Category
Portal
Commons
WikiProject

This article is issued from Wikipedia - version of the 11/20/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.

Glossary of probability and statistics

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

Z

See also

References

External links