# Simple Linear Regression

For simple linear regression we follow the conventions below.

## Observations and Estimates

The Forecasting library uses as input data observations for the independent variable and the dependent variable. It provides estimates for the coefficients of the simple linear regression line.

 $$N$$ number of observations $$x_i, i \in \{1\ldots N\}$$ observations of the independent variable $$y_i, i \in \{1\ldots N\}$$ observations of the dependent variable $$\bar{x}=(1/N)\sum_{i=1}^{N}x_{i}$$ average of the independent observations $$\bar{y}=(1/N)\sum_{i=1}^{N}y_{i}$$ average of the dependent observations $$\hat{y}_i, i \in \{1\ldots N\}$$ predictions of the dependent variable $$\beta_{0}, \beta_{1}$$ coefficients of the linear relationship (random) $$\hat{\beta}_{0}, \hat{\beta}_{1}$$ coefficients of the linear regression line (estimates) $$e_i, i \in \{1\ldots N\}$$ error (residual) for observation data points

## Linear Relationship

The linear relationship between $$x_i$$ and $$y_i$$ is modeled by the equation:

$y_i = \beta_{0} + \beta_{1}x_i + \epsilon_i$

where $$\epsilon_i$$ is an error term which averages out to 0 for every $$i$$.

## Linear Regression

The random $$\beta_{0}$$ and $$\beta_{1}$$ are estimated by $$\hat{\beta}_{0}$$ and $$\hat{\beta}_{1}$$, such that the prediction for $$y_i$$ is given by the equation:

(1)$\hat{y}_i = \hat{\beta}_{0} + \hat{\beta_{1}}x_i$

So, the predictions based on simple linear regression corresponding to the observation data points $$(x_i,y_i)$$ are provided in $$\hat{y}_i, i \in \{1\ldots N\}$$.

## Residuals

The error (residual) $$e_i$$ for the data point $$i$$ is the difference between the observed $$y_i$$ and the predicted $$\hat{y}_i$$, so $$e_i = y_i - \hat{\beta}_0 - \hat{\beta}_1x_i$$. In order to obtain the residuals, the user will need to provide a one-dimensional parameter declared over the set of observations.

## Variation Components

Given the values of the observations, the estimates, and the residuals, several components of variation can be computed, such as sum of squares total = SST, sum of squares error = SSE, and sum of squares regression = SSR, which are defined as follows:

$SST = \sum_{i=1}^{N}(y_i - \bar{y})^2$
$SSE = \sum_{i=1}^{N}(y_i - \hat{y}_i)^2 = \sum_{i=1}^{N}e_i^2$
$SSR = \sum_{i=1}^{N}(\hat{y}_i - \bar{y})^2$

These components of variation satisfy the relation $$SST = SSE + SSR$$.

Furthermore, it is also possible to compute the coefficient of determination = $$R^2$$, the sample linear correlation = $$r_{xy}$$, and the standard error of the estimate = $$s_e$$, which are defined as follows:

$R^2 = \frac{SSR}{SST}$
$\begin{split}r_{xy} = \left\{ \begin{array}{ll} +\sqrt{R^2} & \textrm{ if } \hat{\beta}_1 \geq 0 \\ -\sqrt{R^2} & \textrm{ if } \hat{\beta}_1 \leq 0 \end{array} \right.\end{split}$
$s_e = \sqrt{\frac{SSE}{N-2}}$

## Predeclared Index vcs

The linear regression functions return the values of the line coefficients in a parameter declared over the index forecasting::co declared as follows:

Set LRcoeffSet{
Index: co;
Definition: {
data {
0,      ! Intercept Coefficient of Regression Line
1       ! Slope Coefficient of Regression Line
}
}
}


Whenever one of the linear regression functions communicates back components of variations, it uses identifiers declared over the index forecasting::vcs declared as follows:

Set VariationCompSet {
Index: vcs;
Definition: {
data {
SST,       ! Sum of Squares Total
SSE,       ! Sum of Squares Error
SSR,       ! Sum of Squares Regression
Rsquare,   ! Coefficient of Determination
MultipleR, ! Sample Linear Correlation Rxy
Se         ! Standard Error
}
}
}


In order to obtain the variation components, the user will need to provide a parameter indexed over forecasting::vcs to the linear regression functions.