Variance of the mean and predicted responses

Part of a series on
Regression analysis
Models
  • Linear regression
  • Simple regression
  • Polynomial regression
  • General linear model
  • Generalized linear model
  • Vector generalized linear model
  • Discrete choice
  • Binomial regression
  • Binary regression
  • Logistic regression
  • Multinomial logistic regression
  • Mixed logit
  • Probit
  • Multinomial probit
  • Ordered logit
  • Ordered probit
  • Poisson
Estimation
Background
  • icon Mathematics portal
  • v
  • t
  • e

In regression, mean response (or expected response) and predicted response, also known as mean outcome (or expected outcome) and predicted outcome, are values of the dependent variable calculated from the regression parameters and a given value of the independent variable. The values of these two responses are the same, but their calculated variances are different. The concept is a generalization of the distinction between the standard error of the mean and the sample standard deviation.

Background: simple linear regression

In simple linear regression (i.e., straight line fitting with errors only in the y-coordinate), the model is

y i = α + β x i + ε i {\displaystyle y_{i}=\alpha +\beta x_{i}+\varepsilon _{i}\,}

where y i {\displaystyle y_{i}} is the response variable, x i {\displaystyle x_{i}} is the explanatory variable, εi is the random error, and α {\displaystyle \alpha } and β {\displaystyle \beta } are parameters. The mean, and predicted, response value for a given explanatory value, xd, is given by

y ^ d = α ^ + β ^ x d , {\displaystyle {\hat {y}}_{d}={\hat {\alpha }}+{\hat {\beta }}x_{d},}

while the actual response would be

y d = α + β x d + ε d {\displaystyle y_{d}=\alpha +\beta x_{d}+\varepsilon _{d}\,}

Expressions for the values and variances of α ^ {\displaystyle {\hat {\alpha }}} and β ^ {\displaystyle {\hat {\beta }}} are given in linear regression.

Variance

Variance of the mean response

Since the data in this context is defined to be (x, y) pairs for every observation, the mean response at a given value of x, say xd, is an estimate of the mean of the y values in the population at the x value of xd, that is E ^ ( y x d ) y ^ d {\displaystyle {\hat {E}}(y\mid x_{d})\equiv {\hat {y}}_{d}\!} . The variance of the mean response is given by

Var ( α ^ + β ^ x d ) = Var ( α ^ ) + ( Var β ^ ) x d 2 + 2 x d Cov ( α ^ , β ^ ) . {\displaystyle \operatorname {Var} \left({\hat {\alpha }}+{\hat {\beta }}x_{d}\right)=\operatorname {Var} \left({\hat {\alpha }}\right)+\left(\operatorname {Var} {\hat {\beta }}\right)x_{d}^{2}+2x_{d}\operatorname {Cov} \left({\hat {\alpha }},{\hat {\beta }}\right).}

This expression can be simplified to

Var ( α ^ + β ^ x d ) = σ 2 ( 1 m + ( x d x ¯ ) 2 ( x i x ¯ ) 2 ) , {\displaystyle \operatorname {Var} \left({\hat {\alpha }}+{\hat {\beta }}x_{d}\right)=\sigma ^{2}\left({\frac {1}{m}}+{\frac {\left(x_{d}-{\bar {x}}\right)^{2}}{\sum (x_{i}-{\bar {x}})^{2}}}\right),}

where m is the number of data points.

To demonstrate this simplification, one can make use of the identity

( x i x ¯ ) 2 = x i 2 1 m ( x i ) 2 . {\displaystyle \sum (x_{i}-{\bar {x}})^{2}=\sum x_{i}^{2}-{\frac {1}{m}}\left(\sum x_{i}\right)^{2}.}

Variance of the predicted response

The predicted response distribution is the predicted distribution of the residuals at the given point xd. So the variance is given by

Var ( y d [ α ^ + β ^ x d ] ) = Var ( y d ) + Var ( α ^ + β ^ x d ) 2 Cov ( y d , [ α ^ + β ^ x d ] ) = Var ( y d ) + Var ( α ^ + β ^ x d ) . {\displaystyle {\begin{aligned}\operatorname {Var} \left(y_{d}-\left[{\hat {\alpha }}+{\hat {\beta }}x_{d}\right]\right)&=\operatorname {Var} (y_{d})+\operatorname {Var} \left({\hat {\alpha }}+{\hat {\beta }}x_{d}\right)-2\operatorname {Cov} \left(y_{d},\left[{\hat {\alpha }}+{\hat {\beta }}x_{d}\right]\right)\\&=\operatorname {Var} (y_{d})+\operatorname {Var} \left({\hat {\alpha }}+{\hat {\beta }}x_{d}\right).\end{aligned}}}

The second line follows from the fact that Cov ( y d , [ α ^ + β ^ x d ] ) {\displaystyle \operatorname {Cov} \left(y_{d},\left[{\hat {\alpha }}+{\hat {\beta }}x_{d}\right]\right)} is zero because the new prediction point is independent of the data used to fit the model. Additionally, the term Var ( α ^ + β ^ x d ) {\displaystyle \operatorname {Var} \left({\hat {\alpha }}+{\hat {\beta }}x_{d}\right)} was calculated earlier for the mean response.

Since Var ( y d ) = σ 2 {\displaystyle \operatorname {Var} (y_{d})=\sigma ^{2}} (a fixed but unknown parameter that can be estimated), the variance of the predicted response is given by

Var ( y d [ α ^ + β ^ x d ] ) = σ 2 + σ 2 ( 1 m + ( x d x ¯ ) 2 ( x i x ¯ ) 2 ) = σ 2 ( 1 + 1 m + ( x d x ¯ ) 2 ( x i x ¯ ) 2 ) . {\displaystyle {\begin{aligned}\operatorname {Var} \left(y_{d}-\left[{\hat {\alpha }}+{\hat {\beta }}x_{d}\right]\right)&=\sigma ^{2}+\sigma ^{2}\left({\frac {1}{m}}+{\frac {\left(x_{d}-{\bar {x}}\right)^{2}}{\sum (x_{i}-{\bar {x}})^{2}}}\right)\\[4pt]&=\sigma ^{2}\left(1+{\frac {1}{m}}+{\frac {(x_{d}-{\bar {x}})^{2}}{\sum (x_{i}-{\bar {x}})^{2}}}\right).\end{aligned}}}

Confidence intervals

The 100 ( 1 α ) % {\displaystyle 100(1-\alpha )\%} confidence intervals are computed as y d ± t α 2 , m n 1 Var {\displaystyle y_{d}\pm t_{{\frac {\alpha }{2}},m-n-1}{\sqrt {\operatorname {Var} }}} . Thus, the confidence interval for predicted response is wider than the interval for mean response. This is expected intuitively – the variance of the population of y {\displaystyle y} values does not shrink when one samples from it, because the random variable εi does not decrease, but the variance of the mean of the y {\displaystyle y} does shrink with increased sampling, because the variance in α ^ {\displaystyle {\hat {\alpha }}} and β ^ {\displaystyle {\hat {\beta }}} decrease, so the mean response (predicted response value) becomes closer to α + β x d {\displaystyle \alpha +\beta x_{d}} .

This is analogous to the difference between the variance of a population and the variance of the sample mean of a population: the variance of a population is a parameter and does not change, but the variance of the sample mean decreases with increased sample size.

General case

The general case of linear regression can be written as

y i = j = 1 n X i j β j + ε i {\displaystyle y_{i}=\sum _{j=1}^{n}X_{ij}\beta _{j}+\varepsilon _{i}\,}

Therefore, since y d = j = 1 n X d j β ^ j {\displaystyle y_{d}=\sum _{j=1}^{n}X_{dj}{\hat {\beta }}_{j}} the general expression for the variance of the mean response is

Var ( j = 1 n X d j β ^ j ) = i = 1 n j = 1 n X d i S i j X d j , {\displaystyle \operatorname {Var} \left(\sum _{j=1}^{n}X_{dj}{\hat {\beta }}_{j}\right)=\sum _{i=1}^{n}\sum _{j=1}^{n}X_{di}S_{ij}X_{dj},}

where S is the covariance matrix of the parameters, given by

S = σ 2 ( X T X ) 1 . {\displaystyle \mathbf {S} =\sigma ^{2}\left(\mathbf {X^{\mathsf {T}}X} \right)^{-1}.}

See also

References

  • Draper, N. R.; Smith, H. (1998). Applied Regression Analysis (3rd ed.). John Wiley. ISBN 0-471-17082-8.
  • v
  • t
  • e
Computational statistics
Correlation and dependence
Regression analysis
Regression as a
statistical model
Linear regression
Predictor structure
Non-standard
Non-normal errors
Decomposition of variance
Model exploration
Background
Design of experiments
Numerical approximation
Applications