Seite wählen

General


MAML is a model for meta-learning that learns the model’s parameters \theta such that a small amount of gradient updates can lead to fast leaning on a new task.

Properties

  • MAML does not expand the number of learned parameters
  • No constraint on the architecture or network of the model
  • Can be combined with other deep-learning frameworks such as RNN, CNN, and MLP

Problem Setup

Single Task


Model: f_{\theta}(\mathbf{y}|\mathbf{x})
Dataset: \mathcal{D} = \left\{ \mathbf{x} | \mathbf{y}\right\}_i
Goal: \min_{\theta} \mathcal{L}(\theta, \mathcal{D})
Task: \mathcal{T} \equiv \left\{ p_i(\mathbf{x}), p_i(\mathbf{y}| \mathbf{x}), \mathcal{L}_i\right\}
p_i is the distribution to generate data for tasks, either in super vised learning, classification, or reinforcement learning.

Multi Task


Model: f_{\theta}(\mathbf{y}|\mathbf{x}, z_i)
where z_i is the task index.
Dataset: \mathcal{D} = \left\{ \mathbf{x} | \mathbf{y}\right\}_i
Goal: \min_{\theta}\sum \mathcal{L}_i(\theta, \mathcal{D}_i)

Example


Assume we have a model: f_{\theta} with parameters \theta. We adapt a new task \mathcal{T}_i to the model, our parameter \theta will become \theta_i^'. The new parameter \theta_i^' is computerd using one or more gradient descent updates on task \mathcal{T}_i. One gradient update is thus


    \begin{align*}\theta_i^{'} = \theta - \alpha \nabla_\theta \mathcal{L}_{\mathcal{T}_i} (f_\theta)\end{align*}


The stepsize \alpha can be learned or set as a fixed hyperparameter. By optimizing the model for the performance of f_{\theta_i^{'}} with respect to \theta across the tasks sampled from p(\mathcal{T}_i). The meta objective is thus:


    \begin{align*}\min_{\theta} \sum_{\mathcal{T}_{i} \sim p(\mathcal{T})} \mathcal{L}_{\mathcal{T}_i}(f_{{\theta}_i^{'}}) = \sum_{\mathcal{T}_{i} \sim p(\mathcal{T})} \mathcal{L}_{\mathcal{T}_i}(f_{\theta - \alpha \nabla_\theta \mathcal{L}_{\mathcal{T}_i} (f_\theta)}) \end{align*}


In meta-oprimization, we optmize the model parameters \theta, whereas the objective is computed using the ipdated model parameters \theta^{'}. MAML aims to optimize the model parameters such that one or small number of gradient steps on a new task will produce maximally effective behavior on that task.  The meta-optimization across tasks is performed via stochastic gradient descent (SDG), such that the model parameters \theta are updated as follows


    \begin{align*}\theta \leftarrow \theta - \beta \nabla_{\theta} \sum_{\mathcal{T}_{i} \sim  p(\mathcal{T})} \mathcal{L}_{\mathcal{T}_i}(f_{\theta_{i}^{'}})\end{align*}


where \beta is the meta step size.



Source: paperswithcode.com, https://arxiv.org/abs/1703.03400v3

WordPress Cookie Hinweis von Real Cookie Banner