深度学习学习笔记——C2W1-15——吴恩达与约书亚·本吉奥的访谈-CSDN博客

[Andrew] Hi, Yoshua, I'm really glad you could join us here today.

[Yoshua] I'm very glad, too.

[Andrew] Today you're not just a researcher or engineer in deep learning. You've become one of the institutions and one of the icons of deep learning, but I'd really like to hear the story of how it started. So how did you end up getting into deep learning, and then pursuing this journey?

[Yoshua] Right, well, actually, it started when I was a kid, adolescent, reading a lot of science fiction, like, I guess, many of us. And when I started my graduate studies in 1985, I started reading neural net papers, and that's where I got all excited, and it became really a passion.

[Andrew] And actually, what was that like in, what, mid 80s, right, 1985, reading these papers, do you remember?

[Yoshua] Yeah. Well, coming from the courses I had taking in classical AI with expert systems, and suddenly discovering that there was all this world of thinking about how humans might be learning, and human intelligence. And how we might draw connections between that and artificial intelligence and computers. That was really exciting for me when I discovered this literature, and I started reading the connectionists, of course. So the papers from Geoff Hinton, [INAUDIBLE], and so on. And I worked on recurrent nets, I worked on speech recognition, I worked on HMNs, so graphical models. And then quickly, I moved to AT&T Bell Labs and MIT, where I did postdocs. And that's where I discovered some of the issues with the long-term dependencies with training neural nets. And then shortly after, I got recruited at UdeM back in Montreal, where I had spent most of my adolescent years.

[Andrew] So as someone who's been there for the last several decades and seen it all, certainly seen a lot of it, tell me a bit about how you're thinking about deep learning, about neural networks has evolved over this time?

[Yoshua] We start with experiments, with intuitions, and theory sort of comes later. We now understand a lot better, for example, why Backdrop is working so well, why depth is so important. And these kinds of notions, we didn't have any solid justification for in those days. When we started working on deep nets in the early 2000s, we had the intuition that it made a lot of sense that a deeper network should be more powerful. But we didn't know how to take that and prove it, and of course, our experiments, initially, didn't work.

[Andrew] And actually, what were the most important things that you think turned out to be right? And what were the biggest surprises of what turned out to be wrong, compared to what we knew 30 years ago?

[Yoshua] Sure, so one of the biggest mistakes I made was to think, like everyone else in the 90s, that you needed smooth nonlinearities in order for Backdrop to work. because I thought that if we had something like rectifying nonlinearities, where you have a flat part, that it would be really hard to train, because the derivative would be zero in so many places. And when we started experimenting with ReLU, with deep nets around 2010, I was obsessed with the idea that, we should be careful about whether neurons won't saturate too much on the zero part. But in the end, it turned out that, actually, the ReLU was working a lot better than the sigmoids and attach, and that was a big surprise. We did this, exploring this because of the biological connection, actually, not because we thought that it would be easier to optimize. But it turned out to work better, whereas I thought it would be harder to train.

[Andrew] So let me ask you, what is the relationship between deep learning and the brain? There's the obvious answer, but I'm curious what's your answer to that?

[Yoshua] Well, the initial insight that really got me excited with neural nets was this idea from the connectionists that information is distributed across the activation of many neurons. Rather than being represented by sort of the grandmother cell, as they were calling it, a symbolic representation. That was the traditional view in classical AI. And I still believe this is a really important thing, and I see people rediscovering the importance of that, even recently. So that was really a foundation. The depth thing is something that came later, in the early 2000s, but it wasn't something I was thinking about in the 90s, for example.

[Andrew] Right, right, and I remember you built a lot of relatively shallow, but very distributed representations for the word embeddings, right, very early on.

[Yoshua] Right, that's right, yeah, that's one of the things that I got really excited about in the late 90s. Actually, my brother, Samy, and I worked on the idea that we could use neural nets to tackle the curse of dimensionality, which was believed to be one of the central issues with the statistical learning. And that fact that we could have these distributed presentations could be used to represent joint distributions over many random variables in a very efficient way. And it turned out to work quite well, and then I extended this to joint distributions over sequences of words, and this is how the word embeddings were born. Because I thought, this will allow generalization across words that have similar semantic meaning