Recent advancements in large language models have pushed the boundaries of in-context learning and reasoning, yet bidirectional encoders like BERT have lagged behind in innovation. In this talk, I introduce NeoBERT, a next-generation encoder that redefines bidirectional modeling through state-of-the-art architectural improvements, modern data strategies, and optimized pre-training. I will highlight key enhancements, including an optimal depth-to-width ratio, an extended 4,096-token context length, and efficiency-driven modifications that enable NeoBERT to achieve state-of-the-art results with just 250M parameters.
Additional resources:
- https://arxiv.org/abs/2502.19587 - NeoBERT paper
- https://www.analyticsvidhya.com/blog/2019/09/demystifying-bert-groundbreaking-nlp-framework/ - Some terminologies in BERT