Simple Linear Regression
For simple linear regression we follow the conventions below.
Observations and Estimates
The Forecasting
library uses as input data observations for the
independent variable and the dependent variable. It provides estimates
for the coefficients of the simple linear regression line.
\(N\) |
number of observations |
\(x_i, i \in \{1\ldots N\}\) |
observations of the independent variable |
\(y_i, i \in \{1\ldots N\}\) |
observations of the dependent variable |
\(\bar{x}=(1/N)\sum_{i=1}^{N}x_{i}\) |
average of the independent observations |
\(\bar{y}=(1/N)\sum_{i=1}^{N}y_{i}\) |
average of the dependent observations |
\(\hat{y}_i, i \in \{1\ldots N\}\) |
predictions of the dependent variable |
\(\beta_{0}, \beta_{1}\) |
coefficients of the linear relationship (random) |
\(\hat{\beta}_{0}, \hat{\beta}_{1}\) |
coefficients of the linear regression line (estimates) |
\(e_i, i \in \{1\ldots N\}\) |
error (residual) for observation data points |
Linear Relationship
The linear relationship between \(x_i\) and \(y_i\) is modeled by the equation:
where \(\epsilon_i\) is an error term which averages out to 0 for every \(i\).
Linear Regression
The random \(\beta_{0}\) and \(\beta_{1}\) are estimated by \(\hat{\beta}_{0}\) and \(\hat{\beta}_{1}\), such that the prediction for \(y_i\) is given by the equation:
So, the predictions based on simple linear regression corresponding to the observation data points \((x_i,y_i)\) are provided in \(\hat{y}_i, i \in \{1\ldots N\}\).
Residuals
The error (residual) \(e_i\) for the data point \(i\) is the difference between the observed \(y_i\) and the predicted \(\hat{y}_i\), so \(e_i = y_i - \hat{\beta}_0 - \hat{\beta}_1x_i\). In order to obtain the residuals, the user will need to provide a one-dimensional parameter declared over the set of observations.
Variation Components
Given the values of the observations, the estimates, and the residuals, several components of variation can be computed, such as sum of squares total = SST, sum of squares error = SSE, and sum of squares regression = SSR, which are defined as follows:
These components of variation satisfy the relation \(SST = SSE + SSR\).
Furthermore, it is also possible to compute the coefficient of determination = \(R^2\), the sample linear correlation = \(r_{xy}\), and the standard error of the estimate = \(s_e\), which are defined as follows:
Predeclared Index vcs
The linear regression functions return the values of the line
coefficients in a parameter declared over the index forecasting::co
declared as follows:
Set LRcoeffSet{
Index: co;
Definition: {
data {
0, ! Intercept Coefficient of Regression Line
1 ! Slope Coefficient of Regression Line
}
}
}
Whenever one of the linear regression
functions communicates back components of variations, it uses
identifiers declared over the index forecasting::vcs
declared as
follows:
Set VariationCompSet {
Index: vcs;
Definition: {
data {
SST, ! Sum of Squares Total
SSE, ! Sum of Squares Error
SSR, ! Sum of Squares Regression
Rsquare, ! Coefficient of Determination
MultipleR, ! Sample Linear Correlation Rxy
Se ! Standard Error
}
}
}
In order to obtain the variation components, the
user will need to provide a parameter indexed over forecasting::vcs
to the linear regression functions.