This lecture discusses techniques for visualizing and understanding convolutional neural networks (CNNs). It begins by visualizing the filters learned in the first layer of CNNs. It then discusses visualizing the activations and feature vectors from higher layers, including dimensionality reduction techniques. Methods are presented for visualizing which pixels or regions are important for classifications using saliency maps. Techniques are also described for generating images that maximally activate neurons using gradient ascent optimization. The goal is to gain insights into what CNNs have learned from images.
The document discusses generative models and summarizes three popular types: PixelRNN/CNN, variational autoencoders (VAE), and generative adversarial networks (GAN). PixelRNN/CNN are fully visible belief networks that use a neural network to model the probability of each pixel given previous pixels to explicitly define the data distribution. VAEs are variational models that learn a latent representation to implicitly define the data distribution. GANs are implicit density models that train a generator and discriminator in an adversarial manner to generate samples from the data distribution.
This document provides an overview of deep learning, including definitions of AI, machine learning, and deep learning. It discusses neural network models like artificial neural networks, convolutional neural networks, and recurrent neural networks. The document explains key concepts in deep learning like activation functions, pooling techniques, and the inception model. It provides steps for fitting a deep learning model, including loading data, defining the model architecture, adding layers and functions, compiling, and fitting the model. Examples and visualizations are included to demonstrate how neural networks work.
This document summarizes Melanie Swan's presentation on deep learning. It began with defining key deep learning concepts and techniques, including neural networks, supervised vs. unsupervised learning, and convolutional neural networks. It then explained how deep learning works by using multiple processing layers to extract higher-level features from data and make predictions. Deep learning has various applications like image recognition and speech recognition. The presentation concluded by discussing how deep learning is inspired by concepts from physics and statistical mechanics.
The document discusses advance techniques of computational intelligence for biomedical image analysis. It provides an overview of computational intelligence, which involves adaptive mechanisms like artificial neural networks, evolutionary computation, fuzzy systems, and swarm intelligence. These techniques exhibit an ability to learn or adapt to new environments. The document also discusses deep learning techniques like convolutional neural networks and recurrent neural networks that are widely used for tasks like image classification.
https://mcv-m6-video.github.io/deepvideo-2020/
Self-supervised techniques define surrogate tasks to train machine learning algorithms without the need of human generated labels. This lecture reviews the state of the art in the field of computer vision, including the baseline techniques based on visual feature learning from ImageNet data.
Il deep learning ed una nuova generazione di AI - Simone ScardapaneData Driven Innovation
Il deep learning rappresenta una nuova famiglia di tecniche data-driven, che aprono nuovi orizzonti in quello che le macchine possono essere programmate a fare. In pochi anni abbiamo visto automobili che si guidano da sole, robot che imparano a muoversi, campioni di Go sconfitti, e molto altro ancora. Quali sono le sfide tecniche, sociali e scientifiche del prossimo futuro? E, soprattutto, queste tecnologie sono alla portata di tutti? In questo talk daremo una (brevissima) panoramica di queste questioni e delle loro possibili risposte.
Deep neural networks have revolutionized the data analytics scene by improving results in several and diverse benchmarks with the same recipe: learning feature representations from data. These achievements have raised the interest across multiple scientific fields, especially in those where large amounts of data and computation are available. This change of paradigm in data analytics has several ethical and economic implications that are driving large investments, political debates and sounding press coverage under the generic label of artificial intelligence (AI). This talk will present the fundamentals of deep learning through the classic example of image classification, and point at how the same principal has been adopted for several tasks. Finally, some of the forthcoming potentials and risks for AI will be pointed.
This document discusses the history and recent developments in artificial intelligence and deep learning. It covers early work in neural networks from the 1950s through the 1990s, including perceptrons, autoencoders, and connectionism. More recent progress is attributed to greater computing power, larger datasets, and the development of automatic differentiation techniques. Applications discussed include computer vision, natural language processing using word embeddings, and recurrent neural networks for tasks like handwriting generation.
This document summarizes techniques for interpreting convolutional neural networks (CNNs). It discusses visualizing learned weights, feature maps, and using attribution methods like class activation maps and gradient-based approaches to identify important regions of input for predictions. Feature visualization techniques are also covered, which generate examples to understand what patterns CNNs recognize. The document provides examples and references to papers for each interpretability method.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Deep neural networks have boosted the convergence of multimedia data analytics in a unified framework shared by practitioners in natural language, vision and speech. Image captioning, lip reading or video sonorization are some of the first applications of a new and exciting field of research exploiting the generalization properties of deep neural representation. This tutorial will firstly review the basic neural architectures to encode and decode vision, text and audio, to later review the those models that have successfully translated information across modalities. The contents of this tutorial are available at: https://telecombcn-dl.github.io/2019-mmm-tutorial/.
https://imatge.upc.edu/web/publications/diving-deep-sentiment-understanding-fine-tuned-cnns-visual-sentiment-prediction
Campos V, Salvador A, Jou B, Giró-i-Nieto X. Diving Deep into Sentiment: Understanding Fine-tuned CNNs for Visual Sentiment Prediction. In: 1st International Workshop on Affect and Sentiment in Multimedia. Brisbane, Australia: ACM. 2015.
Visual media are powerful means of expressing emotions and sentiments. The constant generation of new content in social networks highlights the need of automated visual sentiment analysis tools. While Convolutional Neural Networks (CNNs) have established a new state-of-the-art in several vision problems, their application to the task of sentiment analysis is mostly unexplored and there are few studies regarding how to design CNNs for this purpose. In this work, we study the suitability of fine-tuning a CNN for visual sentiment prediction as well as explore performance boosting techniques within this deep learning setting. Finally, we provide a deep-dive analysis into a benchmark, state-of-the-art network architecture to gain insight about how to design patterns for CNNs on the task of visual sentiment prediction.
Deep neural networks have achieved outstanding results in various applications such as vision, language, audio, speech, or reinforcement learning. These powerful function approximators typically require large amounts of data to be trained, which poses a challenge in the usual case where little labeled data is available. During the last year, multiple solutions have been proposed to leverage this problem, based on the concept of self-supervised learning, which can be understood as a specific case of unsupervised learning. This talk will cover its basic principles and provide examples in the field of multimedia.
Deep neural networks have revolutionized the data analytics scene by improving results in several and diverse benchmarks with the same recipe: learning feature representations from data. These achievements have raised the interest across multiple scientific fields, especially in those where large amounts of data and computation are available. This change of paradigm in data analytics has several ethical and economic implications that are driving large investments, political debates and sounding press coverage under the generic label of artificial intelligence (AI). This talk will present the fundamentals of deep learning through the classic example of image classification, and point at how the same principal has been adopted for several tasks. Finally, some of the forthcoming potentials and risks for AI will be pointed.
https://imatge-upc.github.io/activitynet-2016-cvprw/
This thesis explore different approaches using Convolutional and Recurrent Neural Networks to classify and temporally localize activities on videos, furthermore an implementation to achieve it has been proposed. As the first step, features have been extracted from video frames using an state of the art 3D Convolutional Neural Network. This features are fed in a recurrent neural network that solves the activity classification and temporally location tasks in a simple and flexible way. Different architectures and configurations have been tested in order to achieve the best performance and learning of the video dataset provided. In addition it has been studied different kind of post processing over the trained network's output to achieve a better results on the temporally localization of activities on the videos. The results provided by the neural network developed in this thesis have been submitted to the ActivityNet Challenge 2016 of the CVPR, achieving competitive results using a simple and flexible architecture.
HML: Historical View and Trends of Deep LearningYan Xu
The document provides a historical view and trends of deep learning. It discusses that deep learning models have evolved in several waves since the 1940s, with key developments including the backpropagation algorithm in 1986 and deep belief networks with pretraining in 2006. Current trends include growing datasets, increasing numbers of neurons and connections per neuron, and higher accuracy on tasks involving vision, NLP and games. Research trends focus on generative models, domain alignment, meta-learning, using graphs as inputs, and program induction.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
The document summarizes research on deep randomized neural networks. It provides an overview of the field, discussing key concepts such as accuracy, complexity of models, and comparing deep randomized neural networks to other approaches like linear models and SVMs. It also reviews several papers that study properties of randomized neural networks, such as their intrinsic dimension and generalization capabilities. Various applications of randomized networks are explored, such as in classification and time series prediction tasks.
These slides summarize the main trends in deep neural networks for video encoding. Including single frame models, spatiotemporal convolutionals, long term sequence modeling with RNNs and their combinaction with optical flow.
Introduction to Generative Adversarial Networks (GANs) by Michał Maj
Full story: https://appsilon.com/satellite-imagery-generation-with-gans/
Deep learning for detecting anomalies and software vulnerabilitiesDeakin University
This document provides an overview of deep learning and its applications in anomaly detection and software vulnerability detection. It discusses key deep learning architectures like feedforward networks, recurrent neural networks, and convolutional neural networks. It also covers unsupervised learning techniques such as word embedding, autoencoders, RBMs, and GANs. For anomaly detection, it describes approaches for multichannel anomaly detection, detecting unusual mixed-data co-occurrences, and modeling object lifetimes. It concludes by discussing applications in detecting malicious URLs, unusual source code, and software vulnerabilities.
The ever-increasing number of parameters in deep neural networks poses challengesfor memory-limited applications. Regularize-and-prune methods aim at meetingthese challenges by sparsifying the network weights. In this context we quantifythe outputsensitivityto the parameters (i.e. their relevance to the network output)and introduce a regularization term that gradually lowers the absolute value ofparameters with low sensitivity. Thus, a very large fraction of the parametersapproach zero and are eventually set to zero by simple thresholding. Our methodsurpasses most of the recent techniques both in terms of sparsity and error rates. Insome cases, the method reaches twice the sparsity obtained by other techniques atequal error rates.
https://mcv-m6-video.github.io/deepvideo-2019/
This lecture provides an overview how the temporal information encoded in video sequences can be exploited to learn visual features from a self-supervised perspective. Self-supervised learning is a type of unsupervised learning in which data itself provides the necessary supervision to estimate the parameters of a machine learning algorithm.
Master in Computer Vision Barcelona 2019.
http://pagines.uab.cat/mcv/
https://mcv-m6-video.github.io/deepvideo-2019/
Overview of deep learning solutions for video processing. Part of a series of slides covering topics like action recognition, action detection, object tracking, object detection, scene segmentation, language and learning from videos.
Neuromorphic Computing indicates a broad area of research that aims at achieving means of physical information processing that are inspired by biological brains. As such, this kind of systems is envisaged as being the ideal approach for implementing artificial neural networks concepts. With the rapid pace of development in Deep Learning, the synergy between the development of neuromorphic hardware and neural network concepts is fundamental to obtain intelligent systems that can exploit the full potential of learning efficiently.
This talk aims at giving a broad overview of the possibilities of such synergy. First, we will quickly explore the fundamental differences between neuromorphic and traditional computing, and then we will focus on concepts, algorithms, and neural architectures that are prone to neuromorphic implementation.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Deep learning and applications in non-cognitive domains IDeakin University
This document outlines an agenda for a presentation on deep learning and its applications in non-cognitive domains. The presentation is divided into three parts: an introduction to deep learning theory, applying deep learning to non-cognitive domains in practice, and advanced topics. The introduction covers neural network architectures like feedforward, recurrent, and convolutional networks. It also discusses techniques for improving training like rectified linear units and skip connections. The practice section will provide hands-on examples in domains like healthcare and software engineering. The advanced topics section will discuss unsupervised learning, structured outputs, and positioning techniques in deep learning.
Vectorland: Brief Notes from Using Text Embeddings for SearchBhaskar Mitra
(Invited talk at Search Solutions 2015)
A lot of recent work in neural models and “Deep Learning” is focused on learning vector representations for text, image, speech, entities, and other nuggets of information. From word analogies to automatically generating human level descriptions of images, the use of text embeddings has become a key ingredient in many natural language processing (NLP) and information retrieval (IR) tasks.
In this talk, I will present some personal learnings from working on (neural and non-neural) text embeddings for IR, as well as highlight a few key recent insights from the broader academic community. I will talk about the affinity of certain embeddings for certain kinds of tasks, and how the notion of relatedness in an embedding space depends on how the vector representations are trained. The goal of this talk is to encourage everyone to start thinking about text embeddings beyond just as an output of a “black box” machine learning model, and to highlight that the relationships between different embedding spaces are about as interesting as the relationships between items within an embedding space.
This document summarizes the key topics covered in Day 3 of a DL Chatbot seminar, including Seq2Seq models with attention mechanisms, advanced Seq2Seq architectures, and advanced attention mechanisms. The topics covered RNN encoder-decoder models, attention scoring methods, hierarchical models, personalized embeddings, copying mechanisms, bidirectional attention, self-attention models like Transformer, and various Seq2Seq implementations in PyTorch. Example papers and concepts are referenced throughout relating to sequence generation, machine translation, image captioning, and question answering.
This document summarizes techniques for interpreting convolutional neural networks (CNNs). It discusses visualizing learned weights, feature maps, and using attribution methods like class activation maps and gradient-based approaches to identify important regions of input for predictions. Feature visualization techniques are also covered, which generate examples to understand what patterns CNNs recognize. The document provides examples and references to papers for each interpretability method.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Deep neural networks have boosted the convergence of multimedia data analytics in a unified framework shared by practitioners in natural language, vision and speech. Image captioning, lip reading or video sonorization are some of the first applications of a new and exciting field of research exploiting the generalization properties of deep neural representation. This tutorial will firstly review the basic neural architectures to encode and decode vision, text and audio, to later review the those models that have successfully translated information across modalities. The contents of this tutorial are available at: https://telecombcn-dl.github.io/2019-mmm-tutorial/.
https://imatge.upc.edu/web/publications/diving-deep-sentiment-understanding-fine-tuned-cnns-visual-sentiment-prediction
Campos V, Salvador A, Jou B, Giró-i-Nieto X. Diving Deep into Sentiment: Understanding Fine-tuned CNNs for Visual Sentiment Prediction. In: 1st International Workshop on Affect and Sentiment in Multimedia. Brisbane, Australia: ACM. 2015.
Visual media are powerful means of expressing emotions and sentiments. The constant generation of new content in social networks highlights the need of automated visual sentiment analysis tools. While Convolutional Neural Networks (CNNs) have established a new state-of-the-art in several vision problems, their application to the task of sentiment analysis is mostly unexplored and there are few studies regarding how to design CNNs for this purpose. In this work, we study the suitability of fine-tuning a CNN for visual sentiment prediction as well as explore performance boosting techniques within this deep learning setting. Finally, we provide a deep-dive analysis into a benchmark, state-of-the-art network architecture to gain insight about how to design patterns for CNNs on the task of visual sentiment prediction.
Deep neural networks have achieved outstanding results in various applications such as vision, language, audio, speech, or reinforcement learning. These powerful function approximators typically require large amounts of data to be trained, which poses a challenge in the usual case where little labeled data is available. During the last year, multiple solutions have been proposed to leverage this problem, based on the concept of self-supervised learning, which can be understood as a specific case of unsupervised learning. This talk will cover its basic principles and provide examples in the field of multimedia.
Deep neural networks have revolutionized the data analytics scene by improving results in several and diverse benchmarks with the same recipe: learning feature representations from data. These achievements have raised the interest across multiple scientific fields, especially in those where large amounts of data and computation are available. This change of paradigm in data analytics has several ethical and economic implications that are driving large investments, political debates and sounding press coverage under the generic label of artificial intelligence (AI). This talk will present the fundamentals of deep learning through the classic example of image classification, and point at how the same principal has been adopted for several tasks. Finally, some of the forthcoming potentials and risks for AI will be pointed.
https://imatge-upc.github.io/activitynet-2016-cvprw/
This thesis explore different approaches using Convolutional and Recurrent Neural Networks to classify and temporally localize activities on videos, furthermore an implementation to achieve it has been proposed. As the first step, features have been extracted from video frames using an state of the art 3D Convolutional Neural Network. This features are fed in a recurrent neural network that solves the activity classification and temporally location tasks in a simple and flexible way. Different architectures and configurations have been tested in order to achieve the best performance and learning of the video dataset provided. In addition it has been studied different kind of post processing over the trained network's output to achieve a better results on the temporally localization of activities on the videos. The results provided by the neural network developed in this thesis have been submitted to the ActivityNet Challenge 2016 of the CVPR, achieving competitive results using a simple and flexible architecture.
HML: Historical View and Trends of Deep LearningYan Xu
The document provides a historical view and trends of deep learning. It discusses that deep learning models have evolved in several waves since the 1940s, with key developments including the backpropagation algorithm in 1986 and deep belief networks with pretraining in 2006. Current trends include growing datasets, increasing numbers of neurons and connections per neuron, and higher accuracy on tasks involving vision, NLP and games. Research trends focus on generative models, domain alignment, meta-learning, using graphs as inputs, and program induction.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
The document summarizes research on deep randomized neural networks. It provides an overview of the field, discussing key concepts such as accuracy, complexity of models, and comparing deep randomized neural networks to other approaches like linear models and SVMs. It also reviews several papers that study properties of randomized neural networks, such as their intrinsic dimension and generalization capabilities. Various applications of randomized networks are explored, such as in classification and time series prediction tasks.
These slides summarize the main trends in deep neural networks for video encoding. Including single frame models, spatiotemporal convolutionals, long term sequence modeling with RNNs and their combinaction with optical flow.
Introduction to Generative Adversarial Networks (GANs) by Michał Maj
Full story: https://appsilon.com/satellite-imagery-generation-with-gans/
Deep learning for detecting anomalies and software vulnerabilitiesDeakin University
This document provides an overview of deep learning and its applications in anomaly detection and software vulnerability detection. It discusses key deep learning architectures like feedforward networks, recurrent neural networks, and convolutional neural networks. It also covers unsupervised learning techniques such as word embedding, autoencoders, RBMs, and GANs. For anomaly detection, it describes approaches for multichannel anomaly detection, detecting unusual mixed-data co-occurrences, and modeling object lifetimes. It concludes by discussing applications in detecting malicious URLs, unusual source code, and software vulnerabilities.
The ever-increasing number of parameters in deep neural networks poses challengesfor memory-limited applications. Regularize-and-prune methods aim at meetingthese challenges by sparsifying the network weights. In this context we quantifythe outputsensitivityto the parameters (i.e. their relevance to the network output)and introduce a regularization term that gradually lowers the absolute value ofparameters with low sensitivity. Thus, a very large fraction of the parametersapproach zero and are eventually set to zero by simple thresholding. Our methodsurpasses most of the recent techniques both in terms of sparsity and error rates. Insome cases, the method reaches twice the sparsity obtained by other techniques atequal error rates.
https://mcv-m6-video.github.io/deepvideo-2019/
This lecture provides an overview how the temporal information encoded in video sequences can be exploited to learn visual features from a self-supervised perspective. Self-supervised learning is a type of unsupervised learning in which data itself provides the necessary supervision to estimate the parameters of a machine learning algorithm.
Master in Computer Vision Barcelona 2019.
http://pagines.uab.cat/mcv/
https://mcv-m6-video.github.io/deepvideo-2019/
Overview of deep learning solutions for video processing. Part of a series of slides covering topics like action recognition, action detection, object tracking, object detection, scene segmentation, language and learning from videos.
Neuromorphic Computing indicates a broad area of research that aims at achieving means of physical information processing that are inspired by biological brains. As such, this kind of systems is envisaged as being the ideal approach for implementing artificial neural networks concepts. With the rapid pace of development in Deep Learning, the synergy between the development of neuromorphic hardware and neural network concepts is fundamental to obtain intelligent systems that can exploit the full potential of learning efficiently.
This talk aims at giving a broad overview of the possibilities of such synergy. First, we will quickly explore the fundamental differences between neuromorphic and traditional computing, and then we will focus on concepts, algorithms, and neural architectures that are prone to neuromorphic implementation.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Deep learning and applications in non-cognitive domains IDeakin University
This document outlines an agenda for a presentation on deep learning and its applications in non-cognitive domains. The presentation is divided into three parts: an introduction to deep learning theory, applying deep learning to non-cognitive domains in practice, and advanced topics. The introduction covers neural network architectures like feedforward, recurrent, and convolutional networks. It also discusses techniques for improving training like rectified linear units and skip connections. The practice section will provide hands-on examples in domains like healthcare and software engineering. The advanced topics section will discuss unsupervised learning, structured outputs, and positioning techniques in deep learning.
Vectorland: Brief Notes from Using Text Embeddings for SearchBhaskar Mitra
(Invited talk at Search Solutions 2015)
A lot of recent work in neural models and “Deep Learning” is focused on learning vector representations for text, image, speech, entities, and other nuggets of information. From word analogies to automatically generating human level descriptions of images, the use of text embeddings has become a key ingredient in many natural language processing (NLP) and information retrieval (IR) tasks.
In this talk, I will present some personal learnings from working on (neural and non-neural) text embeddings for IR, as well as highlight a few key recent insights from the broader academic community. I will talk about the affinity of certain embeddings for certain kinds of tasks, and how the notion of relatedness in an embedding space depends on how the vector representations are trained. The goal of this talk is to encourage everyone to start thinking about text embeddings beyond just as an output of a “black box” machine learning model, and to highlight that the relationships between different embedding spaces are about as interesting as the relationships between items within an embedding space.
This document summarizes the key topics covered in Day 3 of a DL Chatbot seminar, including Seq2Seq models with attention mechanisms, advanced Seq2Seq architectures, and advanced attention mechanisms. The topics covered RNN encoder-decoder models, attention scoring methods, hierarchical models, personalized embeddings, copying mechanisms, bidirectional attention, self-attention models like Transformer, and various Seq2Seq implementations in PyTorch. Example papers and concepts are referenced throughout relating to sequence generation, machine translation, image captioning, and question answering.
The document discusses a neural model called Duet for ranking documents based on their relevance to a query. Duet uses both a local model that operates on exact term matches between queries and documents, and a distributed model that learns embeddings to match queries and documents in the embedding space. The two models are combined using a linear combination and trained jointly on labeled query-document pairs. Experimental results show Duet performs significantly better at document ranking and other IR tasks compared to using the local and distributed models individually. The amount of training data is also important, with larger datasets needed to learn better representations.
The document describes a system for estimating emotion intensity in tweets. It takes a lexicon-based and word vector-based approach to create sentence embeddings for tweets. Various regression models are trained and an ensemble is used to predict emotion intensity scores between 0-1 for anger, sadness, joy and fear. The system achieved third place in predicting emotion intensity and second place for intensities over 0.5. Future work involves using contextual sentence embeddings to improve predictions.
One weekend software hack called "Movie Hack Attack".
Video content is played and is analyzed realtime for sentiment, emotions, and more.
Sentiment shown in chart below, emotions+objects of attention to the left with random picture grabbed from google search, persons/locations/organizations to the right with random picture grabbed from google search.
State of Blockchain 2017: Smartnetworks and the Blockchain EconomyMelanie Swan
Blockchain is a fundamental IT for secure value transfer over networks. For any asset registered in a cryptographic ledger, the whole Internet is a VPN for its confirmation, assurity, and transfer. Blockchain reinvents economics and governance for the digital age. The long-tail structure of digital networks allows personalized economic and governance services. Smartnetworks are a new form of automated global infrastructure for large-scale next-generation projects.
The document describes improvements made to the KyotoEBMT machine translation system. It discusses using forest parsing of input sentences to handle parsing errors and syntactic divergences. It also describes using the Nile alignment tool along with constituent parsing to improve word alignments from the training corpus. New features were added and the reranking was improved by incorporating a neural machine translation-based bilingual language model.
Construisons ensemble le chatbot bancaire dedemain !LINAGORA
Retrouvez les slides réalisées pour notre Meetup collaboratif du jeudi 9 novembre 2017 : "Construisons ensemble le chatbot bancaire de demain !"
Après la publication de son étude sur les chatbots de l'écosystème bancaire "ChatBots et intelligence artificielle arrivent dans les banques : y êtes-vous préparé(e) ?", LinDA, l'agence digitale du groupe LINAGORA, à réaliser un atelier de co-conception du chatbot bancaire de demain.
Cet atelier gratuit d'idéation fut l'occasion d'imaginer, avec plusieurs participants du monde bancaire, la meilleure solution d'agent conversationnel pour leur banque.
Nos animateurs, Christophe Clouzeau (UX Digital Strategist) et Jean-Philippe Mouton (Head of digital consulting), ont appliqué des méthodes de conception UX, utilisées avec nos clients et par les startups innovantes.
Cs231n 2017 lecture10 Recurrent Neural NetworksYanbin Kong
The document discusses recurrent neural networks (RNNs). It provides examples of applications of RNNs, such as image captioning, sentiment classification, machine translation, and video classification. RNNs can process sequential data as well as non-sequential data sequentially. The document outlines the basic structure of an RNN, including how it applies the same function and parameters at each time step. It provides illustrations of different RNN configurations, such as many-to-many, many-to-one, and one-to-many. Finally, it gives an example of applying an RNN for character-level language modeling.
Technological Unemployment and the Robo-EconomyMelanie Swan
Technological Unemployment (jobs outsourced to technology) is coming and the challenge is to steward an orderly and beneficial transition to more intense human-technology collaboration
Exploring Session Context using Distributed Representations of Queries and Re...Bhaskar Mitra
Search logs contain examples of frequently occurring patterns of user reformulations of queries. Intuitively, the reformulation "san francisco" → "san francisco 49ers" is semantically similar to "detroit" →"detroit lions". Likewise, "london"→"things to do in london" and "new york"→"new york tourist attractions" can also be considered similar transitions in intent. The reformulation "movies" → "new movies" and "york" → "new york", however, are clearly different despite the lexical similarities in the two reformulations. In this paper, we study the distributed representation of queries learnt by deep neural network models, such as the Convolutional Latent Semantic Model, and show that they can be used to represent query reformulations as vectors. These reformulation vectors exhibit favourable properties such as mapping semantically and syntactically similar query changes closer in the embedding space. Our work is motivated by the success of continuous space language models in capturing relationships between words and their meanings using offset vectors. We demonstrate a way to extend the same intuition to represent query reformulations.
Furthermore, we show that the distributed representations of queries and reformulations are both useful for modelling session context for query prediction tasks, such as for query auto-completion (QAC) ranking. Our empirical study demonstrates that short-term (session) history context features based on these two representations improves the mean reciprocal rank (MRR) for the QAC ranking task by more than 10% over a supervised ranker baseline. Our results also show that by using features based on both these representations together we achieve a better performance, than either of them individually.
Paper: http://research.microsoft.com/apps/pubs/default.aspx?id=244728
Deep Learning, an interactive introduction for NLP-ersRoelof Pieters
Deep Learning intro for NLP Meetup Stockholm
22 January 2015
http://www.meetup.com/Stockholm-Natural-Language-Processing-Meetup/events/219787462/
The document summarizes a study that explored how people's strategies for giving commands to a robot change over time during a collaborative navigation task. Ten participants each directed a robot for one hour via dialogue. Initially, participants predominantly used metric units like distances in their commands, but over time their commands increasingly referred to environmental landmarks. The study collected audio, text, and robot data to analyze parameters in commands. Future work aims to automate dialogue response generation based on this data.
The document summarizes research conducted by NICT at the WAT 2015 workshop. They tested simple translation techniques like reverse pre-reordering for Japanese-to-English and character-based translation for Korean-to-Japanese. The techniques were found to work effectively and the researchers encourage wider use of these techniques if confirmed through human evaluation at the workshop.
Using Text Embeddings for Information RetrievalBhaskar Mitra
Neural text embeddings provide dense vector representations of words and documents that encode various notions of semantic relatedness. Word2vec models typical similarity by representing words based on neighboring context words, while models like latent semantic analysis encode topical similarity through co-occurrence in documents. Dual embedding spaces can separately model both typical and topical similarities. Recent work has applied text embeddings to tasks like query auto-completion, session modeling, and document ranking, demonstrating their ability to capture semantic relationships between text beyond just words.
Invited Talk for the SIGNLL Conference on Computational Natural Language Learning 2017 (CoNLL 2017) Chris Dyer (DeepMind / CMU) 3 Aug 2017. Vancouver, Canada
talk at KTH 14 May 2014 about matrix factorization, different latent and neighborhood models, graphs and energy diffusion for recommender systems, as well as what makes good/bad recommendations.
The document discusses semantic segmentation, object detection, and instance segmentation in computer vision. It introduces semantic segmentation as assigning a category label to each pixel in an image without differentiating object instances. Fully convolutional networks are described as an effective approach for semantic segmentation, where a network of convolutional layers with downsampling and learnable upsampling such as transpose convolutions is used to make dense per-pixel predictions while efficiently reusing computations.
Cs231n 2017 lecture11 Detection and SegmentationYanbin Kong
The document discusses semantic segmentation using fully convolutional neural networks. It describes how semantic segmentation differs from classification by labeling each pixel rather than the whole image. Fully convolutional networks are proposed to perform semantic segmentation by using convolutional layers for downsampling and transpose convolutional layers for upsampling to make dense per-pixel predictions efficiently. Various methods for upsampling like unpooling and transpose convolutions are also discussed.
Recurrent Neural Network and natural languages processing.pdfDayakerP
The document discusses recurrent neural networks (RNNs) which can process sequential data like text, audio, video. RNNs apply the same function to each element of a sequence using a recurrent hidden state to incorporate information about previous elements. This allows RNNs to model language and generate text. Examples discussed are character-level text generation, machine translation, and processing variable length sequences.
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-ResolutionTaegyun Jeon
This document summarizes deep learning approaches for single image super-resolution (SISR). It begins with an overview of SISR, describing traditional interpolation-based methods and challenges. It then covers recent developments in using deep convolutional neural networks (CNNs) for SISR, summarizing influential models like SRCNN, VDSR, DRCN, and SRGAN. Various CNN architectures are discussed, including residual blocks and generative adversarial networks. The document also reviews SISR datasets, evaluation metrics, and losses like mean squared error and perceptual losses. In summary, it provides a comprehensive overview of the shift from traditional methods to modern deep learning techniques for single image super resolution.
(Research Note) Delving deeper into convolutional neural networks for camera ...Jacky Liu
This document summarizes a research paper on improving camera relocalization using convolutional neural networks. The key contributions are: 1) Developing a new orientation representation called Euler6 to solve issues with quaternion representations, 2) Performing pose synthesis to augment training data and address overfitting on sparse poses, and 3) Proposing a branching multi-task CNN called BranchNet to separately regress orientation and translation while sharing lower level features. Experiments on a benchmark dataset show the techniques reduce relocalization error compared to prior methods.
最近の研究情勢についていくために - Deep Learningを中心に - Hiroshi Fukui
This document summarizes key developments in deep learning for object detection from 2012 onwards. It begins with a timeline showing that 2012 was a turning point, as deep learning achieved record-breaking results in image classification. The document then provides overviews of 250+ contributions relating to object detection frameworks, fundamental problems addressed, evaluation benchmarks and metrics, and state-of-the-art performance. Promising future research directions are also identified.
https://telecombcn-dl.github.io/2019-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
This paper proposes an automatic attendance system using deep learning frameworks. The system has two phases: face detection and face recognition. For face detection, a deep learning model is used that combines scale, context and resolution to detect faces with high accuracy, even for tiny faces. For face recognition, deep features are extracted from detected faces and used for identification with 98.67% accuracy on LFW database. The system aims to develop an efficient face detection and recognition system to automate the attendance taking process in large classrooms.
This lecture was delivered at the Intelligent systems and data mining workshop held in Faculty of Computers and information, Kafer Elshikh University On Wednesday 6 December 2017
This document discusses techniques for instance search using convolutional neural network features. It presents two papers by the author on this topic. The first paper uses bags-of-visual-words to encode convolutional features for scalable instance search. The second paper explores using region-level features from Faster R-CNN models for instance search and compares different fine-tuning strategies. The document outlines the methodology, experiments on standard datasets, and conclusions from both papers.
CNN Structure: From LeNet to ShuffleNetDalin Zhang
This document summarizes the evolution of CNN architectures from LeNet to more recent models like ShuffleNet. It traces the development of techniques such as skip connections in ResNet to reduce information loss, depthwise separable convolutions in Xception to decouple spatial and channel correlations, group convolutions in ResNeXt to introduce cardinality as a new dimension, and channel shuffling in ShuffleNet to improve information flow across feature maps. The document highlights how these newer models have achieved state-of-the-art accuracy on ImageNet with increasingly efficient structures.
Deep residual learning for image recognitionYoonho Shin
This document presents a deep residual learning framework for training very deep neural networks for image recognition tasks. The framework addresses the degradation problem that occurs when networks become substantially deeper by introducing identity shortcut connections that skip one or more layers. Experiments show that residual networks are able to train substantially deeper, over 100 layers deep, and achieve state-of-the-art results on image classification and other computer vision tasks compared to traditional networks. The residual learning framework allows networks to learn residual functions with reference to the layer inputs rather than unreferenced functions.
This document discusses accelerating the learning of convolutional neural networks (CNNs) through parallelization using MPI (Message Passing Interface). Specifically, it discusses implementing data parallelism which distributes both the neural network and training data across multiple nodes. The goal is to develop utilities for MPI parallelization in the Torch7 framework to allow CNNs to be trained more quickly on larger datasets and network architectures. Future work involves acquiring training metrics for published CNN architectures and implementing model parallelism.
Modeling perceptual similarity and shift invariance in deep networksNAVER Engineering
Abstract: While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on ImageNet classification have been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new dataset of human perceptual similarity judgments. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by large margins on our dataset. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.
Despite their strong transfer performance, deep convolutional representations surprisingly lack a basic low-level property -- shift-invariance, as small input shifts or translations can cause drastic changes in the output. Commonly used downsampling methods, such as max-pooling, strided-convolution, and average-pooling, ignore the sampling theorem. The well-known signal processing fix is anti-aliasing by low-pass filtering before downsampling. However, simply inserting this module into deep networks degrades performance; as a result, it is seldomly used today. We show that when integrated correctly, it is compatible with existing architectural components, such as max-pooling and strided-convolution. We observe increased accuracy in ImageNet classification, across several commonly-used architectures, such as ResNet, DenseNet, and MobileNet, indicating effective regularization. Furthermore, we observe better generalization, in terms of stability and robustness to input corruptions. Our results demonstrate that this classical signal processing technique has been undeservingly overlooked in modern deep networks.
This document provides an overview of Gerrit Code Review, including:
- Gerrit allows code review of commits before they are integrated into branches through a "push for code review" workflow. It creates changes and change refs for pushed commits.
- Changes can be reviewed through the Gerrit web interface by viewing diffs and leaving comments. Reviewers can vote on changes using configurable review labels like "Code-Review".
- The standard workflow involves pushing commits for review, reviewing and voting on changes, and then submitting approved changes to branches. Gerrit integrates with Git and provides access controls and permissions on repositories.
This document discusses emotional intelligence and emotion coaching. It defines emotional intelligence as the ability to identify and understand one's own emotions, use emotions during social interactions, use emotional awareness to solve problems, deal with frustration, control how emotions are expressed, and keep distress from overwhelming thinking. Emotion coaching is described as a parenting technique where parents accept children's emotions, use emotional moments to teach life lessons, build trust, and help children develop strategies to handle ups and downs. The benefits of emotion coaching include helping children regulate emotions, problem solve, focus attention, and have healthier relationships.
The document discusses the AlexNet convolutional neural network architecture that won the ImageNet challenge in 2012. It describes the layers of AlexNet in detail, including the input and output sizes of each layer. Some key details are that AlexNet had 5 convolutional layers and 3 fully connected layers, used ReLU activations, dropout, data augmentation, and other techniques. It discusses how AlexNet was implemented across two GPUs due to memory constraints. Finally, it briefly mentions subsequent winners of the ImageNet challenge that improved on AlexNet's architecture.
This document provides guesses and estimates about Apple's potential upcoming iPhone 5C smartphone. It speculates that the iPhone 5C will have a plastic shell, iPhone 5 cameras, and a lower-performance processor to reduce costs. Estimates suggest the bill of materials cost could be reduced to $120-$130 by using a Qualcomm Snapdragon processor. This could allow Apple to price the iPhone 5C at $269-$299 while still maintaining a healthy margin of over 50%. The document argues this price would allow Apple to regain lost market share in China from competitors like Xiaomi offering high-spec phones at very low prices.
apidays New York 2025 - How AI is Transforming Product Management by Shereen ...apidays
From Data to Decisions: How AI is Transforming Product Management
Shereen Moussa, Digital Product Owner at PepsiCo
apidays New York 2025
API Management for Surfing the Next Innovation Waves: GenAI and Open Banking
Convene 360 Madison, New York
May 14 & 15, 2025
------
Check out our conferences at https://www.apidays.global/
Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8
Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io
Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/
Brain, Bytes & Bias: ML Interview Questions You Can’t Miss!yashikanigam1
Preparing for a machine learning role? Get ready to tackle real-world problem-solving questions! From regression vs. classification to the ETL process, expect a deep dive into algorithms and data pipelines. Most live courses for professionals and best online professional certificates now include mock interviews and case studies to gear you up. Mastering these ML interview questions not only helps in cracking top tech interviews but also builds your confidence.
At Tutort Academy, we train you with real-time scenarios and curated interview prep for success.
How Data Annotation Services Drive Innovation in Autonomous Vehicles.docxsofiawilliams5966
Autonomous vehicles represent the cutting edge of modern technology, promising to revolutionize transportation by improving safety, efficiency, and accessibility.
Mastering Data Science: Unlocking Insights and Opportunities at Yale IT Skill...smrithimuralidas
The Data Science Course at Yale IT Skill Hub in Coimbatore provides in-depth training in data analysis, machine learning, and AI using Python, R, SQL, and tools like Tableau. Ideal for beginners and professionals, it covers data wrangling, visualization, and predictive modeling through hands-on projects and real-world case studies. With expert-led sessions, flexible schedules, and 100% placement support, this course equips learners with skills for Coimbatore’s booming tech industry. Earn a globally recognized certification to excel in data-driven roles. The Data Analytics Course at Yale IT Skill Hub in Coimbatore offers comprehensive training in data visualization, statistical analysis, and predictive modeling using tools like Power BI, Tableau, Python, and R. Designed for beginners and professionals, it features hands-on projects, expert-led sessions, and real-world case studies tailored to industries like IT and manufacturing. With flexible schedules, 100% placement support, and globally recognized certification, this course equips learners to excel in Coimbatore’s growing data-driven job market.
apidays New York 2025 - Turn API Chaos Into AI-Powered Growth by Jeremy Water...apidays
Turn API Chaos Into AI-Powered Growth
Jeremy Waterkotte, Solutions Consultant, Alliances at Boomi
apidays New York 2025
API Management for Surfing the Next Innovation Waves: GenAI and Open Banking
Convene 360 Madison, New York
May 14 & 15, 2025
------
Check out our conferences at https://www.apidays.global/
Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8
Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io
Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/
apidays New York 2025 - To tune or not to tune by Anamitra Dutta Majumdar (In...apidays
To tune or not to tune : Benefits and security pitfalls of fine-tuning
Anamitra Dutta Majumdar, Principal Engineer at Intuit
apidays New York 2025
API Management for Surfing the Next Innovation Waves: GenAI and Open Banking
Convene 360 Madison, New York
May 14 & 15, 2025
------
Check out our conferences at https://www.apidays.global/
Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8
Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io
Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/
Monterey College of Law’s mission is to zseoali2660
Monterey College of Law’s mission is to provide a quality legal education in a community law school setting with graduates who are dedicated to professional excellence, integrity, and community service.
apidays New York 2025 - From UX to AX by Karin Hendrikse (Netlify)apidays
From UX to AX: Designing for an AI Agent World
Karin Hendrikse, Senior Software Engineer at Netlify
apidays New York 2025
API Management for Surfing the Next Innovation Waves: GenAI and Open Banking
Convene 360 Madison, New York
May 14 & 15, 2025
------
Check out our conferences at https://www.apidays.global/
Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8
Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io
Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/
BADS-MBA-Unit 1 that what data science and Interpretationsrishtisingh1813
Cs231n 2017 lecture12 Visualizing and Understanding
1. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 12 - May 16, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 12 - May 16, 20171
Lecture 12:
Visualizing and Understanding
2. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 20172
Administrative
Milestones due tonight on Canvas, 11:59pm
Midterm grades released on Gradescope this week
A3 due next Friday, 5/26
HyperQuest deadline extended to Sunday 5/21, 11:59pm
Poster session is June 6
3. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 20173
Last Time: Lots of Computer Vision Tasks
Classification
+ Localization
Semantic
Segmentation
Object
Detection
Instance
Segmentation
CATGRASS, CAT,
TREE, SKY
DOG, DOG, CAT DOG, DOG, CAT
Single Object Multiple ObjectNo objects, just pixels This image is CC0 public domainThis image is CC0 public domain
4. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 20174
This image is CC0 public domain
Class Scores:
1000 numbers
What’s going on inside ConvNets?
Input Image:
3 x 224 x 224
What are the intermediate features looking for?
Krizhevsky et al, “ImageNet Classification with Deep Convolutional Neural Networks”, NIPS 2012.
Figure reproduced with permission.
5. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 20175
First Layer: Visualize Filters
AlexNet:
64 x 3 x 11 x 11
ResNet-18:
64 x 3 x 7 x 7
ResNet-101:
64 x 3 x 7 x 7
DenseNet-121:
64 x 3 x 7 x 7
Krizhevsky, “One weird trick for parallelizing convolutional neural networks”, arXiv 2014
He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016
Huang et al, “Densely Connected Convolutional Networks”, CVPR 2017
6. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 20176
Visualize the
filters/kernels
(raw weights)
We can visualize
filters at higher
layers, but not
that interesting
(these are taken
from ConvNetJS
CIFAR-10
demo)
layer 1 weights
layer 2 weights
layer 3 weights
16 x 3 x 7 x 7
20 x 16 x 7 x 7
20 x 20 x 7 x 7
7. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 20177
FC7 layer
Last Layer
4096-dimensional feature vector for an image
(layer immediately before the classifier)
Run the network on many images, collect the
feature vectors
8. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 20178
Last Layer: Nearest Neighbors
Test image L2 Nearest neighbors in feature space
4096-dim vector
Recall: Nearest neighbors
in pixel space
Krizhevsky et al, “ImageNet Classification with Deep Convolutional Neural Networks”, NIPS 2012.
Figures reproduced with permission.
9. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 20179
Last Layer: Dimensionality Reduction
Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008
Figure copyright Laurens van der Maaten and Geoff Hinton, 2008. Reproduced with permission.
Visualize the “space” of FC7
feature vectors by reducing
dimensionality of vectors from
4096 to 2 dimensions
Simple algorithm: Principle
Component Analysis (PCA)
More complex: t-SNE
10. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201710
Last Layer: Dimensionality Reduction
Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008
Krizhevsky et al, “ImageNet Classification with Deep Convolutional Neural Networks”, NIPS 2012.
Figure reproduced with permission.
See high-resolution versions at
http://cs.stanford.edu/people/karpathy/cnnembed/
11. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201711
Visualizing Activations
Yosinski et al, “Understanding Neural Networks Through Deep Visualization”, ICML DL Workshop 2014.
Figure copyright Jason Yosinski, 2014. Reproduced with permission.
conv5 feature map is
128x13x13; visualize
as 128 13x13
grayscale images
12. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201712
Maximally Activating Patches
Pick a layer and a channel; e.g. conv5 is
128 x 13 x 13, pick channel 17/128
Run many images through the network,
record values of chosen channel
Visualize image patches that correspond
to maximal activations
Springenberg et al, “Striving for Simplicity: The All Convolutional Net”, ICLR Workshop 2015
Figure copyright Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, Martin Riedmiller, 2015;
reproduced with permission.
13. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201713
Occlusion Experiments
Mask part of the image before
feeding to CNN, draw heatmap of
probability at each mask location
Zeiler and Fergus, “Visualizing and Understanding Convolutional
Networks”, ECCV 2014
Boat image is CC0 public domain
Elephant image is CC0 public domain
Go-Karts image is CC0 public domain
14. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201714
Saliency Maps
Dog
How to tell which pixels matter for classification?
Simonyan, Vedaldi, and Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification Models
and Saliency Maps”, ICLR Workshop 2014.
Figures copyright Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman, 2014; reproduced with permission.
15. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201715
Saliency Maps
Dog
How to tell which pixels matter for classification?
Compute gradient of (unnormalized) class
score with respect to image pixels, take
absolute value and max over RGB channels
Simonyan, Vedaldi, and Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification Models
and Saliency Maps”, ICLR Workshop 2014.
Figures copyright Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman, 2014; reproduced with permission.
16. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201716
Saliency Maps
Simonyan, Vedaldi, and Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification Models
and Saliency Maps”, ICLR Workshop 2014.
Figures copyright Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman, 2014; reproduced with permission.
17. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201717
Saliency Maps: Segmentation without supervision
Simonyan, Vedaldi, and Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification Models
and Saliency Maps”, ICLR Workshop 2014.
Figures copyright Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman, 2014; reproduced with permission.
Rother et al, “Grabcut: Interactive foreground extraction using iterated graph cuts”, ACM TOG 2004
Use GrabCut on
saliency map
18. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201718
Intermediate Features via (guided) backprop
Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, ECCV 2014
Springenberg et al, “Striving for Simplicity: The All Convolutional Net”, ICLR Workshop 2015
Pick a single intermediate neuron, e.g. one
value in 128 x 13 x 13 conv5 feature map
Compute gradient of neuron value with respect
to image pixels
19. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201719
Intermediate features via (guided) backprop
Pick a single intermediate neuron, e.g. one
value in 128 x 13 x 13 conv5 feature map
Compute gradient of neuron value with respect
to image pixels
Images come out nicer if you only
backprop positive gradients through
each ReLU (guided backprop)
ReLU
Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, ECCV 2014
Springenberg et al, “Striving for Simplicity: The All Convolutional Net”, ICLR Workshop 2015
Figure copyright Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas
Brox, Martin Riedmiller, 2015; reproduced with permission.
20. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201720
Intermediate features via (guided) backprop
Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, ECCV 2014
Springenberg et al, “Striving for Simplicity: The All Convolutional Net”, ICLR Workshop 2015
Figure copyright Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, Martin Riedmiller, 2015; reproduced with permission.
21. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201721
Visualizing CNN features: Gradient Ascent
(Guided) backprop:
Find the part of an
image that a neuron
responds to
Gradient ascent:
Generate a synthetic
image that maximally
activates a neuron
I* = arg maxI
f(I) + R(I)
Neuron value Natural image regularizer
22. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201722
Visualizing CNN features: Gradient Ascent
score for class c (before Softmax)
zero image
1. Initialize image to zeros
Repeat:
2. Forward image to compute current scores
3. Backprop to get gradient of neuron value with respect to image pixels
4. Make a small update to the image
23. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201723
Visualizing CNN features: Gradient Ascent
Simonyan, Vedaldi, and Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification
Models and Saliency Maps”, ICLR Workshop 2014.
Figures copyright Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman, 2014; reproduced with permission.
Simple regularizer: Penalize L2
norm of generated image
24. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201724
Visualizing CNN features: Gradient Ascent
Simonyan, Vedaldi, and Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification
Models and Saliency Maps”, ICLR Workshop 2014.
Figures copyright Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman, 2014; reproduced with permission.
Simple regularizer: Penalize L2
norm of generated image
25. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201725
Visualizing CNN features: Gradient Ascent
Simple regularizer: Penalize L2
norm of generated image
Yosinski et al, “Understanding Neural Networks Through Deep Visualization”, ICML DL Workshop 2014.
Figure copyright Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson, 2014.
Reproduced with permission.
26. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201726
Visualizing CNN features: Gradient Ascent
Better regularizer: Penalize L2 norm of
image; also during optimization
periodically
(1) Gaussian blur image
(2) Clip pixels with small values to 0
(3) Clip pixels with small gradients to 0
Yosinski et al, “Understanding Neural Networks Through Deep Visualization”, ICML DL Workshop 2014.
27. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201727
Visualizing CNN features: Gradient Ascent
Better regularizer: Penalize L2 norm of
image; also during optimization
periodically
(1) Gaussian blur image
(2) Clip pixels with small values to 0
(3) Clip pixels with small gradients to 0
Yosinski et al, “Understanding Neural Networks Through Deep Visualization”, ICML DL Workshop 2014.
Figure copyright Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson, 2014. Reproduced with permission.
28. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201728
Visualizing CNN features: Gradient Ascent
Better regularizer: Penalize L2 norm of
image; also during optimization
periodically
(1) Gaussian blur image
(2) Clip pixels with small values to 0
(3) Clip pixels with small gradients to 0
Yosinski et al, “Understanding Neural Networks Through Deep Visualization”, ICML DL Workshop 2014.
Figure copyright Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson, 2014. Reproduced with permission.
29. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201729
Visualizing CNN features: Gradient Ascent
Use the same approach to visualize intermediate features
Yosinski et al, “Understanding Neural Networks Through Deep Visualization”, ICML DL Workshop 2014.
Figure copyright Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson, 2014. Reproduced with permission.
30. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201730
Visualizing CNN features: Gradient Ascent
Use the same approach to visualize intermediate features
Yosinski et al, “Understanding Neural Networks Through Deep Visualization”, ICML DL Workshop 2014.
Figure copyright Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson, 2014. Reproduced with permission.
31. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201731
Visualizing CNN features: Gradient Ascent
Adding “multi-faceted” visualization gives even nicer results:
(Plus more careful regularization, center-bias)
Nguyen et al, “Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks”, ICML Visualization for Deep Learning Workshop 2016.
Figures copyright Anh Nguyen, Jason Yosinski, and Jeff Clune, 2016; reproduced with permission.
32. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201732
Visualizing CNN features: Gradient Ascent
Nguyen et al, “Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks”, ICML Visualization for Deep Learning Workshop 2016.
Figures copyright Anh Nguyen, Jason Yosinski, and Jeff Clune, 2016; reproduced with permission.
33. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201733
Visualizing CNN features: Gradient Ascent
Nguyen et al, “Synthesizing the preferred inputs for neurons in neural networks via deep generator networks,” NIPS 2016
Figure copyright Nguyen et al, 2016; reproduced with permission.
Optimize in FC6 latent space instead of pixel space:
34. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201734
Fooling Images / Adversarial Examples
(1) Start from an arbitrary image
(2) Pick an arbitrary class
(3) Modify the image to maximize the class
(4) Repeat until network is fooled
35. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201735
Fooling Images / Adversarial Examples
Boat image is CC0 public domain
Elephant image is CC0 public domain
36. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201736
Fooling Images / Adversarial Examples
Boat image is CC0 public domain
Elephant image is CC0 public domain
What is going on? Ian Goodfellow will explain
37. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201737 37
DeepDream: Amplify existing features
Rather than synthesizing an image to maximize a specific neuron, instead
try to amplify the neuron activations at some layer in the network
Choose an image and a layer in a CNN; repeat:
1. Forward: compute activations at chosen layer
2. Set gradient of chosen layer equal to its activation
3. Backward: Compute gradient on image
4. Update image
Mordvintsev, Olah, and Tyka, “Inceptionism: Going Deeper into Neural
Networks”, Google Research Blog. Images are licensed under CC-BY
4.0
38. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201738 38
DeepDream: Amplify existing features
Rather than synthesizing an image to maximize a specific neuron, instead
try to amplify the neuron activations at some layer in the network
Equivalent to:
I* = arg maxI
∑i
fi
(I)2
Mordvintsev, Olah, and Tyka, “Inceptionism: Going Deeper into Neural
Networks”, Google Research Blog. Images are licensed under CC-BY
4.0
Choose an image and a layer in a CNN; repeat:
1. Forward: compute activations at chosen layer
2. Set gradient of chosen layer equal to its activation
3. Backward: Compute gradient on image
4. Update image
39. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201739
DeepDream: Amplify existing features
Code is very simple but
it uses a couple tricks:
(Code is licensed under Apache 2.0)
40. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201740
DeepDream: Amplify existing features
Code is very simple but
it uses a couple tricks:
(Code is licensed under Apache 2.0)
Jitter image
41. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201741
DeepDream: Amplify existing features
Code is very simple but
it uses a couple tricks:
(Code is licensed under Apache 2.0)
Jitter image
L1 Normalize gradients
42. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201742
DeepDream: Amplify existing features
Code is very simple but
it uses a couple tricks:
(Code is licensed under Apache 2.0)
Jitter image
L1 Normalize gradients
Clip pixel values
Also uses multiscale processing for a fractal effect (not shown)
43. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201743
Sky image is licensed under CC-BY SA 3.0
44. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201744
Image is licensed under CC-BY 4.0
45. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201745
Image is licensed under CC-BY 4.0
46. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201746
Image is licensed under CC-BY 3.0
47. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201747
Image is licensed under CC-BY 3.0
48. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201748
Image is licensed under CC-BY 4.0
49. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201749
Feature Inversion
Given a CNN feature vector for an image, find a new image that:
- Matches the given feature vector
- “looks natural” (image prior regularization)
Mahendran and Vedaldi, “Understanding Deep Image Representations by Inverting Them”, CVPR 2015
Given feature vector
Features of new image
Total Variation regularizer
(encourages spatial smoothness)
50. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201750
Feature Inversion
Reconstructing from different layers of VGG-16
Mahendran and Vedaldi, “Understanding Deep Image Representations by Inverting Them”, CVPR 2015
Figure from Johnson, Alahi, and Fei-Fei, “Perceptual Losses for Real-Time Style Transfer and Super-Resolution”, ECCV 2016. Copyright Springer, 2016.
Reproduced for educational purposes.
51. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201751
Texture Synthesis
Given a sample patch of some texture, can we
generate a bigger image of the same texture?
Input
Output
Output image is licensed under the MIT license
52. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201752
Texture Synthesis: Nearest Neighbor
Generate pixels one at a time in
scanline order; form neighborhood
of already generated pixels and
copy nearest neighbor from input
Wei and Levoy, “Fast Texture Synthesis using Tree-structured Vector Quantization”, SIGGRAPH 2000
Efros and Leung, “Texture Synthesis by Non-parametric Sampling”, ICCV 1999
53. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201753
Texture Synthesis: Nearest Neighbor
Images licensed under the MIT license
54. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201754
Neural Texture Synthesis: Gram Matrix
Each layer of CNN gives C x H x W tensor of
features; H x W grid of C-dimensional vectors
This image is in the public domain.
w
H
C
55. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201755
Neural Texture Synthesis: Gram Matrix
Each layer of CNN gives C x H x W tensor of
features; H x W grid of C-dimensional vectors
Outer product of two C-dimensional vectors
gives C x C matrix measuring co-occurrence
This image is in the public domain.
w
H
C
C
C
56. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201756
Neural Texture Synthesis: Gram Matrix
Each layer of CNN gives C x H x W tensor of
features; H x W grid of C-dimensional vectors
Outer product of two C-dimensional vectors
gives C x C matrix measuring co-occurrence
Average over all HW pairs of vectors, giving
Gram matrix of shape C x C
This image is in the public domain.
w
H
C
C
C
Gram Matrix
57. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201757
Neural Texture Synthesis: Gram Matrix
Each layer of CNN gives C x H x W tensor of
features; H x W grid of C-dimensional vectors
Outer product of two C-dimensional vectors
gives C x C matrix measuring co-occurrence
Average over all HW pairs of vectors, giving
Gram matrix of shape C x C
This image is in the public domain.
w
H
C
C
C
Efficient to compute; reshape features from
C x H x W to =C x HW
then compute G = FFT
58. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201758
Gatys, Ecker, and Bethge, “Texture Synthesis Using Convolutional Neural Networks”, NIPS 2015
Figure copyright Leon Gatys, Alexander S. Ecker, and Matthias Bethge, 2015. Reproduced with permission.
Neural Texture Synthesis
1. Pretrain a CNN on ImageNet (VGG-19)
2. Run input texture forward through CNN,
record activations on every layer; layer i
gives feature map of shape Ci
× Hi
× Wi
3. At each layer compute the Gram matrix
giving outer product of features:
(shape Ci
× Ci
)
4. Initialize generated image from random
noise
5. Pass generated image through CNN,
compute Gram matrix on each layer
6. Compute loss: weighted sum of L2
distance between Gram matrices
7. Backprop to get gradient on image
8. Make gradient step on image
9. GOTO 5
59. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201759
Gatys, Ecker, and Bethge, “Texture Synthesis Using Convolutional Neural Networks”, NIPS 2015
Figure copyright Leon Gatys, Alexander S. Ecker, and Matthias Bethge, 2015. Reproduced with permission.
Neural Texture Synthesis
1. Pretrain a CNN on ImageNet (VGG-19)
2. Run input texture forward through CNN,
record activations on every layer; layer i
gives feature map of shape Ci
× Hi
× Wi
3. At each layer compute the Gram matrix
giving outer product of features:
(shape Ci
× Ci
)
4. Initialize generated image from random
noise
5. Pass generated image through CNN,
compute Gram matrix on each layer
6. Compute loss: weighted sum of L2
distance between Gram matrices
7. Backprop to get gradient on image
8. Make gradient step on image
9. GOTO 5
60. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201760
Gatys, Ecker, and Bethge, “Texture Synthesis Using Convolutional Neural Networks”, NIPS 2015
Figure copyright Leon Gatys, Alexander S. Ecker, and Matthias Bethge, 2015. Reproduced with permission.
Neural Texture Synthesis
1. Pretrain a CNN on ImageNet (VGG-19)
2. Run input texture forward through CNN,
record activations on every layer; layer i
gives feature map of shape Ci
× Hi
× Wi
3. At each layer compute the Gram matrix
giving outer product of features:
(shape Ci
× Ci
)
4. Initialize generated image from random
noise
5. Pass generated image through CNN,
compute Gram matrix on each layer
6. Compute loss: weighted sum of L2
distance between Gram matrices
7. Backprop to get gradient on image
8. Make gradient step on image
9. GOTO 5
61. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201761
Gatys, Ecker, and Bethge, “Texture Synthesis Using Convolutional Neural Networks”, NIPS 2015
Figure copyright Leon Gatys, Alexander S. Ecker, and Matthias Bethge, 2015. Reproduced with permission.
Neural Texture Synthesis
1. Pretrain a CNN on ImageNet (VGG-19)
2. Run input texture forward through CNN,
record activations on every layer; layer i
gives feature map of shape Ci
× Hi
× Wi
3. At each layer compute the Gram matrix
giving outer product of features:
(shape Ci
× Ci
)
4. Initialize generated image from random
noise
5. Pass generated image through CNN,
compute Gram matrix on each layer
6. Compute loss: weighted sum of L2
distance between Gram matrices
7. Backprop to get gradient on image
8. Make gradient step on image
9. GOTO 5
62. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201762
Neural Texture Synthesis
Gatys, Ecker, and Bethge, “Texture Synthesis Using Convolutional Neural Networks”, NIPS 2015
Figure copyright Leon Gatys, Alexander S. Ecker, and Matthias Bethge, 2015. Reproduced with permission.
Reconstructing texture from
higher layers recovers
larger features from the
input texture
63. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201763
Neural Texture Synthesis: Texture = Artwork
Texture synthesis
(Gram
reconstruction)
Figure from Johnson, Alahi, and Fei-Fei, “Perceptual
Losses for Real-Time Style Transfer and
Super-Resolution”, ECCV 2016. Copyright Springer, 2016.
Reproduced for educational purposes.
64. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201764
Neural Style Transfer: Feature + Gram
Reconstruction
Feature
reconstruction
Texture synthesis
(Gram
reconstruction)
Figure from Johnson, Alahi, and Fei-Fei, “Perceptual
Losses for Real-Time Style Transfer and
Super-Resolution”, ECCV 2016. Copyright Springer, 2016.
Reproduced for educational purposes.
65. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201765
Neural Style Transfer
Content Image Style Image
+
This image is licensed under CC-BY 3.0 Starry Night by Van Gogh is in the public domain
Gatys, Ecker, and Bethge, “Texture Synthesis Using Convolutional Neural Networks”, NIPS 2015
66. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201766
Neural Style Transfer
Content Image Style Image Style Transfer!
+ =
This image is licensed under CC-BY 3.0 Starry Night by Van Gogh is in the public domain This image copyright Justin Johnson, 2015. Reproduced with
permission.
Gatys, Ecker, and Bethge, “Image style transfer using convolutional neural networks”, CVPR 2016
67. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201767
Style
image
Content
image
Output
image
(Start with
noise)
Gatys, Ecker, and Bethge, “Image style transfer using convolutional neural networks”, CVPR 2016
Figure adapted from Johnson, Alahi, and Fei-Fei, “Perceptual Losses for Real-Time Style Transfer and
Super-Resolution”, ECCV 2016. Copyright Springer, 2016. Reproduced for educational purposes.
68. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201768
Style
image
Content
image
Output
image
Gatys, Ecker, and Bethge, “Image style transfer using convolutional neural networks”, CVPR 2016
Figure adapted from Johnson, Alahi, and Fei-Fei, “Perceptual Losses for Real-Time Style Transfer and
Super-Resolution”, ECCV 2016. Copyright Springer, 2016. Reproduced for educational purposes.
69. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201769
Neural Style Transfer
Gatys, Ecker, and Bethge, “Image style transfer using convolutional neural networks”, CVPR 2016
Figure copyright Justin Johnson, 2015.
Example outputs from
my implementation
(in Torch)
70. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201770
More weight to
content loss
More weight to
style loss
Neural Style Transfer
71. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201771
Larger style
image
Smaller style
image
Resizing style image before running style transfer
algorithm can transfer different types of features
Neural Style Transfer
Gatys, Ecker, and Bethge, “Image style transfer using convolutional neural networks”, CVPR 2016
Figure copyright Justin Johnson, 2015.
72. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201772
Neural Style Transfer: Multiple Style Images
Mix style from multiple images by taking a weighted average of Gram matrices
Gatys, Ecker, and Bethge, “Image style transfer using convolutional neural networks”, CVPR 2016
Figure copyright Justin Johnson, 2015.
73. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201773
74. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201774
75. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201775
76. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201776
Neural Style Transfer
Problem: Style transfer
requires many forward /
backward passes through
VGG; very slow!
77. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201777
Neural Style Transfer
Problem: Style transfer
requires many forward /
backward passes through
VGG; very slow!
Solution: Train another
neural network to perform
style transfer for us!
78. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201778
78
Fast Style Transfer (1) Train a feedforward network for each style
(2) Use pretrained CNN to compute same losses as before
(3) After training, stylize images using a single forward pass
Johnson, Alahi, and Fei-Fei, “Perceptual Losses for Real-Time Style Transfer and Super-Resolution”, ECCV 2016
Figure copyright Springer, 2016. Reproduced for educational purposes.
79. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201779
Fast Style Transfer
Slow SlowFast Fast
Johnson, Alahi, and Fei-Fei, “Perceptual Losses for Real-Time Style Transfer and Super-Resolution”, ECCV 2016
Figure copyright Springer, 2016. Reproduced for educational purposes.
https://github.com/jcjohnson/fast-neural-style
80. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201780
Fast Style Transfer
Ulyanov et al, “Texture Networks: Feed-forward Synthesis of Textures and Stylized Images”, ICML 2016
Ulyanov et al, “Instance Normalization: The Missing Ingredient for Fast Stylization”, arXiv 2016
Figures copyright Dmitry Ulyanov, Vadim Lebedev, Andrea Vedaldi, and Victor Lempitsky, 2016. Reproduced with
permission.
Concurrent work from Ulyanov et al, comparable results
81. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201781
Fast Style Transfer
Ulyanov et al, “Texture Networks: Feed-forward Synthesis of Textures and Stylized Images”, ICML 2016
Ulyanov et al, “Instance Normalization: The Missing Ingredient for Fast Stylization”, arXiv 2016
Figures copyright Dmitry Ulyanov, Vadim Lebedev, Andrea Vedaldi, and Victor Lempitsky, 2016. Reproduced with
permission.
Replacing batch normalization with Instance Normalization improves results
82. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201782
One Network, Many Styles
Dumoulin, Shlens, and Kudlur, “A Learned Representation for Artistic Style”, ICLR 2017.
Figure copyright Vincent Dumoulin, Jonathon Shlens, and Manjunath Kudlur, 2016; reproduced with permission.
83. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201783
One Network, Many Styles
Dumoulin, Shlens, and Kudlur, “A Learned Representation for Artistic Style”, ICLR 2017.
Figure copyright Vincent Dumoulin, Jonathon Shlens, and Manjunath Kudlur, 2016; reproduced with permission.
Use the same network for multiple
styles using conditional instance
normalization: learn separate scale
and shift parameters per style
Single network can blend styles after training
84. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201784
Summary
Many methods for understanding CNN representations
Activations: Nearest neighbors, Dimensionality reduction,
maximal patches, occlusion
Gradients: Saliency maps, class visualization, fooling
images, feature inversion
Fun: DeepDream, Style Transfer.
85. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 10, 201785
Next time: Unsupervised Learning
Autoencoders
Variational Autoencoders
Generative Adversarial Networks