Combining Modular Skills in Multitask Learning (2202.13914v2)

Published 28 Feb 2022 in cs.LG and cs.CL

Abstract: A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more systematically to new tasks. In this work, we assume that each task is associated with a subset of latent discrete skills from a (potentially small) inventory. In turn, skills correspond to parameter-efficient (sparse / low-rank) model parameterisations. By jointly learning these and a task-skill allocation matrix, the network for each task is instantiated as the average of the parameters of active skills. To favour non-trivial soft partitions of skills across tasks, we experiment with a series of inductive biases, such as an Indian Buffet Process prior and a two-speed learning rate. We evaluate our latent-skill model on two main settings: 1) multitask reinforcement learning for grounded instruction following on 8 levels of the BabyAI platform; and 2) few-shot adaptation of pre-trained text-to-text generative models on CrossFit, a benchmark comprising 160 NLP tasks. We find that the modular design of a network significantly increases sample efficiency in reinforcement learning and few-shot generalisation in supervised learning, compared to baselines with fully shared, task-specific, or conditionally generated parameters where knowledge is entangled across tasks. In addition, we show how discrete skills help interpretability, as they yield an explicit hierarchy of tasks.

Authors (4)

Edoardo M. Ponti (24 papers)
Alessandro Sordoni (53 papers)
Yoshua Bengio (601 papers)
Siva Reddy (82 papers)

Citations (35)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/LucasPCaccia/status/1747405266268659748

https://twitter.com/leopd/status/1759291727595372583

Combining Modular Skills in Multitask Learning (2202.13914v2)

Summary

Related Papers

Tweets