Transfer Learning for NLP with Sequence Representations

Effective Transfer Learning for NLP using Sequence Representations

Presented By Madison May
Madison May
Madison May
ML Architect, Cofounder, indico

Presentation Description

Transfer learning, the practice of applying knowledge gained on one machine learning task to aid the solution of a second task, has seen historic success in the field of computer vision. The output representations of generic image classification models trained on ImageNet have been leveraged to build models that detect the presence of custom objects in natural images. Image classification tasks that would typically require hundreds of thousands of images can be tackled with mere dozens of training examples per class thanks to the use of these pretrained representations. The field of natural language processing, however, has seen more limited gains from transfer learning, with most approaches limited to the use of pretrained word representations.  Other approaches use the mean, max-pool, or last output of sequence representations produced by RNN models as document representations, and learn lightweight models on top of these feature representations in order to leverage knowledge of previously trained NLP models. Unfortunately, in distilling sequence information down to a single fixed length vector per document via pooling, these methods sacrifice potentially useful information contained in the sequence representations. In this talk, we explore parameter and data efficient mechanisms for transfer learning using sequence representations rather than fixed length document vectors as a medium for communication between models and show practical improvements on real-world tasks. In addition, we demo the use of Enso, a newly open-sourced library designed to simplify the benchmarking of transfer learning methods on a wide variety of target tasks.  Enso provides tools for the fair comparison of varied feature representations and target task models as the amount of training data made available to the target model is incrementally increased.

Presentation Curriculum

Effective Transfer Learning for NLP using Sequence Representations
44:04
Hide Content