This document summarizes a lecture on recent advances in deep learning for natural language processing. It discusses improvements to network architectures like attention mechanisms and self-attention, which help models learn long-term dependencies and attend to relevant parts of the input. It also discusses improved training methods to reduce exposure bias and the loss-evaluation mismatch. Newer models presented include the Transformer, which uses only self-attention, and BERT, which introduces a pretrained bidirectional transformer encoder that achieves state-of-the-art results on many NLP tasks.