Understanding Deep Learning DNN RNN LSTM CNN and R-CNN
Understanding Deep Learning DNN RNN LSTM CNN and R-CNN
and R-CNN
medium.com/@sprhlabs/understanding-deep-learning-dnn-rnn-lstm-cnn-and-r-cnn-6602ed94dbff
Image credit
This can be demonstrated through a simple model where we are going to look at an
active shooter and how an object detection system is going to identify a weapon, track
the criminal and deploy a depth sensing localized drone to de-escalate with a pepper
1/6
spray and then escalate using force by dropping down 3 feet to the group and deploying
an electric shock weapon.
This figure is showing how a simple model that is developed using deep learning can be used to
ensure public safety.
For attaining this model, we have to use Machine Learning. Questions may arise in your
mind what is this Machine Learning and Deep Learning as most of the people just enjoy
the benefits of technology but very few of them are aware or interested to know about
the terms and how they work. Here we are going to give you a concise lucid idea about
these terms.
2/6
The outputs are obtained by supervised learning with datasets of some information
based on ‘what we want’ through back propagation. Like you go to a restaurant and the
chef gives you an idea about the ingredients of your meal. FFNNs work in the same
way as you will have the flavor of those specific ingredients while eating but just after
finishing your meal you will forget what you have eaten. If the chef gives you the meal of
same ingredients again you can’t recognize the ingredients, you have to start from
scratch as you don’t have any memory of that. But the human brain doesn’t work like
that.
3/6
In this way, RNNs can use their internal state (memory) to process sequences of
inputs. This makes them applicable to tasks such as unsegmented, connected
handwriting recognition or speech recognition. But they not only work on the
information you feed but also on the related information from the past which means
whatever you feed and train the network matters, like feeding it ‘chicken’ then ‘egg’ may
give different output in comparison to ‘egg’ then ‘chicken’. RNNs also have problems
like vanishing (or exploding) gradient/long-term dependency problem where
information rapidly gets lost over time. Actually, it’s the weight which gets lost when it
reaches a value of 0 or 1 000 000, not the neuron. But in this case, the previous state
won’t be very informative as it’s the weight which stores the information from the past.
4/6
Convolutional Neural Networks (CNNs) improved automatic image captioning
like those are seen in Facebook. Thus you can see that RNN is more like helping us in
data processing predicting our next step whereas CNN helps us in visuals analyzing.
But CNNs are not also flawless. A typical CNN can tell the type of an object but can’t
specify their location. This is because CNN can regress one object at a time thus when
multiple objects remain in the same visual field then the CNN bounding box regression
cannot work well due to interference. As for example, CNN can detect the bird shown in
the model below but if there are two birds of different species within the same visual
field it can’t detect that.
While an R-CNN (R standing for regional, for object detection) can force the CNN to
focus on a single region at a time improvising dominance of a specific object in a given
region. Before feeding into CNN for classification and bounding box regression, the
regions in the R-CNN are resized into equal size following detection by selective search
algorithm. Therefore, it helps to specify a preferred object.
Are there any techniques to go one step further and locate exact pixels of each object
instead of just bounding boxes? Yes, there is. Image segmentation is what Kaiming He
and a team of researchers, including Girshick, explored at Facebook AI using an
architecture known as Mask R-CNN which can satisfy our intuition a bit.
We found the iPhone A12 Bionic Chip a great edge decentralized neural network engine
as the latest iPhone XS max has 6.9 billion transistors, 6-core CPU, 8-core Neural
Engine on SoC Bionic chip and can do 5 trillion operations per second which is suitable
for machine learning and AR depth sensing.
References:
1. US violent crime and murder down after two years of increases, FBI data
shows,24/9/2018, The Guardian.
3. Hof, Robert D. “Is Artificial Intelligence Finally Coming into Its Own?”. MIT
Technology Review. Retrieved 2018–07–10.
6/6