NLP Papers

2 minute read

Published:

These are the most important transformer papers (in my opinion) that anyone working with Transformers should know. Also, there is a nice summary of Efficient Transformers: A Survey by folks at Google that I highly recommend as well.

AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization

Authors: Xinsong Zhang, Hang Li

ByteDance AI Lab

Year: August 2020

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. T5

Authors: Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu

Google

Year: July 2020

Pre-training via Paraphrasing

Authors: Mike Lewis, Marjan Ghazvininejad, Gargi Ghosh, Armen Aghajanyan, Sida Wang, Luke Zettlemoyer

Facebook

Year: June 2020

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators.

Authors: Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.

Google and Stanford

Year: March 2020

Generalization through Memorization: Nearest Neighbor Language Models

Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, Mike Lewis

Facebook and Stanford Presenation in ACL 2020, “Beyond BERT” by Mike Lewis

Year: Feb 2020

ConveRT: Efficient and Accurate Conversational Representations from Transformers

Authors: Matthew Henderson, Iñigo Casanueva, Nikola Mrkšić, Pei-Hao Su, Tsung-Hsien Wen, Ivan Vulić

PolyAI

Year: Nov 2019

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Authors:Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer

Facebook

Year: October 2019

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Authors: Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut

Google and Toyota Technological Institute

Year: September 2019

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Auhtors: Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov

UW and Facebook

Year: July 2019

Generalized Autoregressive Pretraining for Language Understanding” from Carnegie Mellon and Google Research, XLNet

Authors: Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le

Year: June 2019

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Authors:Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

CMU, Google

Year: May 2019

Cross-lingual Language Model Pretraining

Authors:Guillaume Lample, Alexis Conneau

Facebook

year: January 2019

Improving Language Understanding by Generative Pre-Training

Authors: Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever

OpenAI

Year: June 2018

Deep contextualized word representations ELMo

Auhtors: Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee and Luke Zettlemoyer

Allen Institute for Artificial Intelligence and UW

year: March 2018

Attention is all you need

Authors: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin

Google

Year: Dec 2017