Neural Machine Translation (NMT) models are highly dependent on the size of the dataset. However, it is not always viable to get a huge amount of parallel sentences for a given pair of languages. This is where data augmentation techniques come to rescue. In this talk, we discuss about various data augmentation techniques used for NMT. We first briefly discuss the formal definition and go through the history of these techniques in NMT. We then discuss about back-translation, data diversification and cut-off, with their algorithms and results.

Additional resources:

Data Augmentation for Neural Machine Translation (NMT)