Adaptive Compression-based Lifelong Learning

S. Srivastava, M. Berman, M.B. Blaschko, D. Tuia

Research output: Chapter in Book/Report/Conference proceedingConference paperAcademicpeer-review

Abstract

The problem of a deep learning model losing performance on a previously learned task when fine-tuned to a new one is a phenomenon known as Catastrophic forgetting. There are two major ways to mitigate this problem: either preserving activations of the initial network during training with a new task; or restricting the new network activations to remain close to the initial ones. The latter approach falls under the denomination of lifelong learning, where the model is updated in a way that it performs well on both old and new tasks, without having access to the old task’s training samples anymore. Recently, approaches like pruning networks for freeing network capacity during s-quential learning of tasks have been gaining in popularity. Such approaches allow learning small networks while making redundant parameters available for the next tasks. The common problem encountered with these approaches is that the pruning percentage is hard-coded, irrespective of the number of samples, of the complexity of the learning task and of the number of classes in the dataset. We propose a method based on Bayesian optimization to perform adaptive compression/pruning of the network and show its effectiveness in lifelong learning. Our method learns to perform heavy pruning for small and/or simple datasets while using milder compression rates for large and/or complex data. Experiments on classification and semantic segmentation demonstrate the applicability of learning network compression, where we are able to effectively preserve performances along sequences of tasks of varying complexity.
Original languageEnglish
Title of host publicationProceedings of the British Machine Vision Conference (BMVC)
Number of pages13
Publication statusPublished - 2019
Event30th British Machine Vision Conference - Cardiff, United Kingdom
Duration: 9 Sep 201912 Sep 2019

Conference

Conference30th British Machine Vision Conference
CountryUnited Kingdom
CityCardiff
Period9/09/1912/09/19

Fingerprint

Chemical activation
Semantics
Experiments
Deep learning

Cite this

Srivastava, S., Berman, M., Blaschko, M. B., & Tuia, D. (2019). Adaptive Compression-based Lifelong Learning. In Proceedings of the British Machine Vision Conference (BMVC)
Srivastava, S. ; Berman, M. ; Blaschko, M.B. ; Tuia, D. / Adaptive Compression-based Lifelong Learning. Proceedings of the British Machine Vision Conference (BMVC). 2019.
@inproceedings{59ce692a905d4d97b42fa89fed11ded4,
title = "Adaptive Compression-based Lifelong Learning",
abstract = "The problem of a deep learning model losing performance on a previously learned task when fine-tuned to a new one is a phenomenon known as Catastrophic forgetting. There are two major ways to mitigate this problem: either preserving activations of the initial network during training with a new task; or restricting the new network activations to remain close to the initial ones. The latter approach falls under the denomination of lifelong learning, where the model is updated in a way that it performs well on both old and new tasks, without having access to the old task’s training samples anymore. Recently, approaches like pruning networks for freeing network capacity during s-quential learning of tasks have been gaining in popularity. Such approaches allow learning small networks while making redundant parameters available for the next tasks. The common problem encountered with these approaches is that the pruning percentage is hard-coded, irrespective of the number of samples, of the complexity of the learning task and of the number of classes in the dataset. We propose a method based on Bayesian optimization to perform adaptive compression/pruning of the network and show its effectiveness in lifelong learning. Our method learns to perform heavy pruning for small and/or simple datasets while using milder compression rates for large and/or complex data. Experiments on classification and semantic segmentation demonstrate the applicability of learning network compression, where we are able to effectively preserve performances along sequences of tasks of varying complexity.",
author = "S. Srivastava and M. Berman and M.B. Blaschko and D. Tuia",
year = "2019",
language = "English",
booktitle = "Proceedings of the British Machine Vision Conference (BMVC)",

}

Srivastava, S, Berman, M, Blaschko, MB & Tuia, D 2019, Adaptive Compression-based Lifelong Learning. in Proceedings of the British Machine Vision Conference (BMVC). 30th British Machine Vision Conference, Cardiff, United Kingdom, 9/09/19.

Adaptive Compression-based Lifelong Learning. / Srivastava, S.; Berman, M.; Blaschko, M.B.; Tuia, D.

Proceedings of the British Machine Vision Conference (BMVC). 2019.

Research output: Chapter in Book/Report/Conference proceedingConference paperAcademicpeer-review

TY - GEN

T1 - Adaptive Compression-based Lifelong Learning

AU - Srivastava, S.

AU - Berman, M.

AU - Blaschko, M.B.

AU - Tuia, D.

PY - 2019

Y1 - 2019

N2 - The problem of a deep learning model losing performance on a previously learned task when fine-tuned to a new one is a phenomenon known as Catastrophic forgetting. There are two major ways to mitigate this problem: either preserving activations of the initial network during training with a new task; or restricting the new network activations to remain close to the initial ones. The latter approach falls under the denomination of lifelong learning, where the model is updated in a way that it performs well on both old and new tasks, without having access to the old task’s training samples anymore. Recently, approaches like pruning networks for freeing network capacity during s-quential learning of tasks have been gaining in popularity. Such approaches allow learning small networks while making redundant parameters available for the next tasks. The common problem encountered with these approaches is that the pruning percentage is hard-coded, irrespective of the number of samples, of the complexity of the learning task and of the number of classes in the dataset. We propose a method based on Bayesian optimization to perform adaptive compression/pruning of the network and show its effectiveness in lifelong learning. Our method learns to perform heavy pruning for small and/or simple datasets while using milder compression rates for large and/or complex data. Experiments on classification and semantic segmentation demonstrate the applicability of learning network compression, where we are able to effectively preserve performances along sequences of tasks of varying complexity.

AB - The problem of a deep learning model losing performance on a previously learned task when fine-tuned to a new one is a phenomenon known as Catastrophic forgetting. There are two major ways to mitigate this problem: either preserving activations of the initial network during training with a new task; or restricting the new network activations to remain close to the initial ones. The latter approach falls under the denomination of lifelong learning, where the model is updated in a way that it performs well on both old and new tasks, without having access to the old task’s training samples anymore. Recently, approaches like pruning networks for freeing network capacity during s-quential learning of tasks have been gaining in popularity. Such approaches allow learning small networks while making redundant parameters available for the next tasks. The common problem encountered with these approaches is that the pruning percentage is hard-coded, irrespective of the number of samples, of the complexity of the learning task and of the number of classes in the dataset. We propose a method based on Bayesian optimization to perform adaptive compression/pruning of the network and show its effectiveness in lifelong learning. Our method learns to perform heavy pruning for small and/or simple datasets while using milder compression rates for large and/or complex data. Experiments on classification and semantic segmentation demonstrate the applicability of learning network compression, where we are able to effectively preserve performances along sequences of tasks of varying complexity.

M3 - Conference paper

BT - Proceedings of the British Machine Vision Conference (BMVC)

ER -

Srivastava S, Berman M, Blaschko MB, Tuia D. Adaptive Compression-based Lifelong Learning. In Proceedings of the British Machine Vision Conference (BMVC). 2019