Seite wählen

Regression analysis is an instance of supervised learning. The goal is to estimate the dependent variable y by using the independent variable x.

Variables

  • x: independent variable, features
  • y: dependent variable, response
  • z: (unknown) noise or error
  • h: a function of x
  • \theta: Parameter or weight vector of the function h

Model

    \begin{align*} y= h(\mathbf{x}, \theta) +z \end{align*}

Goal

Predict the value y corresponding to \mathbf{x} that is outside the training set. Find a function h that fits the dataset \mathcal{D}=\{(\mathbf{x}_1, y_1), \dots, (\mathbf{x}_n,y_n) \}

Approach

  • We have access to the dataset \mathcal{D}=\{(\mathbf{x}_1, y_1), \dots, (\mathbf{x}_n,y_n) \}, which is the training set with (\mathbf{x}_i, y_i) \in \mathbb{R}^d \times \mathbb{R}. In theory, we want to find the optimal function h^*, in practice, we can only estimate the function with the estimate \hat{h}.
  • We assume h lies under some hypothesis class \mathcal{H} of all available functions.
  • We assume that the set of \mathcal{H} is parameterized with parameter vector \theta.
  • Example (all functions are linear): \mathcal{H} = \{h_\theta (\mathbf{x}) = \langle \mathbf{x,\theta} \rangle: \theta \in \mathbb{R}^d\}
  • We now can reduce the optimization problem to the problem of estimating \theta
  • \theta can be learned from the examples or data set \mathcal{D}
  • Find an estimated parameter \hat{\theta} that minimizes the cost function \mathbb{R} \times \mathbb{R} \rightarrow \mathbb{R}

Optimization


    \begin{align*}\hat{\theta}=\underset{\theta}{\text{arg min}} \sum_{i=1}^n \text{loss}(h(\mathbf{x}_i, \theta), y_i)\end{align*}

Regression analysis models

WordPress Cookie Hinweis von Real Cookie Banner