The Brain’s Guide to Dealing with Context in Language Understanding Like the visual cortex, the regions of the brain involved in understanding language represent information hierarchically. But whereas the visual cortex organizes things into a spatial hierarchy, the language regions encode information into a hierarchy of timescale. This organization is key to our uniquely human ability to integrate semantic information across narratives. More and more, deep learning-based approaches to natural language understanding embrace models that incorporate contextual information at varying timescales. This has not only led to state-of-the art performance on many difficult natural language tasks, but also to breakthroughs in our understanding of brain activity. In this talk, we will discuss the important connection between language understanding and context at different timescales. We will explore how different deep learning architectures capture timescales in language and how closely their encodings mimic the brain. Along the way, we will uncover some surprising discoveries about what depth does and doesn’t buy you in deep recurrent neural networks. And we’ll describe a new, more flexible way to think about these architectures and ease design space exploration. Finally, we’ll discuss some of the exciting applications made possible by these breakthroughs.