Regression Analysis

Regression analysis is an instance of supervised learning. The goal is to estimate the dependent variable $y$ by using the independent variable $x$ .

Variables

$x$ : independent variable, features
$y$ : dependent variable, response
$z$ : (unknown) noise or error
$h$ : a function of $x$
$\theta$ : Parameter or weight vector of the function $h$

Model

$\begin{align*} y= h(\mathbf{x}, \theta) +z \end{align*}$

Goal

Predict the value $y$ corresponding to $\mathbf{x}$ that is outside the training set. Find a function $h$ that fits the dataset $\mathcal{D}=\{(\mathbf{x}_1, y_1), \dots, (\mathbf{x}_n,y_n) \}$

Approach

We have access to the dataset $\mathcal{D}=\{(\mathbf{x}_1, y_1), \dots, (\mathbf{x}_n,y_n) \}$ , which is the training set with $(\mathbf{x}_i, y_i) \in \mathbb{R}^d \times \mathbb{R}$ . In theory, we want to find the optimal function $h^*$ , in practice, we can only estimate the function with the estimate $\hat{h}$ .
We assume $h$ lies under some hypothesis class $\mathcal{H}$ of all available functions.
We assume that the set of $\mathcal{H}$ is parameterized with parameter vector $\theta$ .
Example (all functions are linear): $\mathcal{H} = \{h_\theta (\mathbf{x}) = \langle \mathbf{x,\theta} \rangle: \theta \in \mathbb{R}^d\}$
We now can reduce the optimization problem to the problem of estimating $\theta$
$\theta$ can be learned from the examples or data set $\mathcal{D}$
Find an estimated parameter $\hat{\theta}$ that minimizes the cost function $\mathbb{R} \times \mathbb{R} \rightarrow \mathbb{R}$