Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing – Google Research Blog

Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing – Google Research Blog

One of the biggest challenges in natural language processing (NLP) is the shortage of training data. Because NLP is a diversified field with many distinct tasks, most task-specific datasets contain only a few thousand or a few hundred thousand human-labeled training examples. However, modern deep learning-based NLP models see benefits from much larger amounts of data, improving when trained on millions, or billions, of annotated training examples. To help close this gap in data, researchers have developed a variety of techniques for training general purpose language representation models using the enormous amount of unannotated text on the web (known as pre-training)....

Published in ai.googleblog.com · by Jacob Devlin · 5 min read · July 28, 2023