0% found this document useful (0 votes)
5 views

gridHTM

This paper introduces Grid HTM, a novel architecture based on Hierarchical Temporal Memory (HTM) for video anomaly detection, addressing limitations of deep learning approaches. Grid HTM utilizes segmentation techniques to convert complex video data into a Sparse Distributed Representation (SDR) format, enhancing noise tolerance and online learning capabilities. The study emphasizes the importance of aggregation functions and independent configurations for improved performance and flexibility in detecting anomalies in surveillance footage.

Uploaded by

onkarstudy123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

gridHTM

This paper introduces Grid HTM, a novel architecture based on Hierarchical Temporal Memory (HTM) for video anomaly detection, addressing limitations of deep learning approaches. Grid HTM utilizes segmentation techniques to convert complex video data into a Sparse Distributed Representation (SDR) format, enhancing noise tolerance and online learning capabilities. The study emphasizes the importance of aggregation functions and independent configurations for improved performance and flexibility in detecting anomalies in surveillance footage.

Uploaded by

onkarstudy123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Grid HTM: Hierarchical Temporal Memory for Anomaly

Detection in Videos
Vladimir Monakhov Vajira Thambawita
University of Oslo and SimulaMet SimulaMet
Norway Norway

Pål Halvorsen Michael A. Riegler


SimulaMet and OsloMet SimulaMet and UiT
Norway Norway
arXiv:2205.15407v1 [cs.CV] 30 May 2022

ABSTRACT surveillance market is expected to reach $23.60 Billion by 2027 [1].


The interest for video anomaly detection systems has gained trac- Leveraging modern computer vision, modern anomaly detection
tion for the past few years. The current approaches use deep learn- systems play an important role in increasing monitoring efficiency
ing to perform anomaly detection in videos, but this approach has and reducing the need for expensive live monitoring. Their use
multiple problems. For starters, deep learning in general has issues cases can vary from detecting faulty products on an assembly line
with noise, concept drift, explainability, and training data volumes. to detecting car accidents on a highway.
Additionally, anomaly detection in itself is a complex task and faces The most important component in video anomaly detection
challenges such as unknowness, heterogeneity, and class imbal- systems is the intelligence behind it. The intelligence ranges from
ance. Anomaly detection using deep learning is therefore mainly simple on-board algorithms to advanced deep learning models,
constrained to generative models such as generative adversarial where the latter has experienced increased popularity in the past
networks and autoencoders due to their unsupervised nature, but few years. Yet, despite the major progress within the field of deep
even they suffer from general deep learning issues and are hard learning, there are still many tasks where humans outperform
to train properly. In this paper, we explore the capabilities of the models, especially in anomaly detection where the anomalies are
Hierarchical Temporal Memory (HTM) algorithm to perform anom- often undefined. Deep learning approaches also perform poorly
aly detection in videos, as it has favorable properties such as noise when dealing with noise and concept drift.
tolerance and online learning which combats concept drift. We The cause for the discrepancy lies in the difference between
introduce a novel version of HTM, namely, Grid HTM, which is how humans and machine learning algorithms represent data and
an HTM-based architecture specifically for anomaly detection in learn. Most machine learning algorithms use a dense representation
complex videos such as surveillance footage. of the data and apply back-propagation in order to learn. Human
learning happens in the neocortex, where evidence points to that
CCS CONCEPTS the neocortex uses a sparse representation and performs Hebbian-
style learning. For the latter, there is a growing field of machine
• Computing methodologies → Computer vision.
learning dedicated to replicating the inner mechanics of the neo-
cortex, namely Hierarchical Temporal Memory (HTM) theory [2].
KEYWORDS
This theory outlines its advantages over standard machine learning,
HTM, deep learning, surveillance, anomaly detection such as noise-tolerance and the ability to adapt to changing data.
ACM Reference Format: With the advantages of HTM and the rise of video anomaly
Vladimir Monakhov, Vajira Thambawita, Pål Halvorsen, and Michael A. detection in mind, a natural question one could pose is whether it is
Riegler. 2018. Grid HTM: Hierarchical Temporal Memory for Anomaly possible to apply HTM for anomaly detection in videos. Combined
Detection in Videos. In Proceedings of Make sure to enter the correct conference with a lack of related works, it is this very question that is the
title from your rights confirmation emai (Conference acronym ’XX). ACM, motivation behind this paper. In this paper, we therefore propose
New York, NY, USA, 8 pages. https://doi.org/XXXXXXX.XXXXXXX
and evaluate Grid HTM which is a novel expansion of the base
HTM algorithm that allows for unsupervised anomaly detection in
1 INTRODUCTION videos.
As the global demand for security and automation increases, many
seek to use video anomaly detection systems. In the US alone, the
Permission to make digital or hard copies of all or part of this work for personal or 2 BACKGROUND
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation Anomaly detection is often defined as detecting data points that de-
on the first page. Copyrights for components of this work owned by others than ACM viate from the general distribution [3]. Unlike most other problems
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a in deep learning, anomaly detection deals with unpredictable and
fee. Request permissions from [email protected]. rare events which makes it hard to apply traditional deep learn-
Conference acronym ’XX, June 03–05, 2018, Woodstock, NY ing for anomaly detection. A subset of anomaly detection is smart
© 2018 Association for Computing Machinery.
ACM ISBN 978-1-4503-XXXX-X/18/06. . . $15.00 surveillance [4], which is the use of video analysis specifically in
https://doi.org/XXXXXXX.XXXXXXX surveillance.
Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Monakhov et al.

An issue for deep-learning models in general is that they are sus- instance segmentation of cars on a video from the VIRAT [16]
ceptible to noise in the dataset [5, 6], which leads to decreased model dataset. An example segmentation is shown in Figure 1.
accuracy and poor prediction results. Due to the nature of training
deep learning models, they are also in most cases not self-supervised
and therefore require constant tuning in order to stay effective on
changing data. In addition, they require a lot of data before they can
be considered effective, and performance increases logarithmically
based on the volume of training data [7]. Deep learning models
also suffer from issues with out-of-distribution generalization [8],
where a model might perform great on the dataset it is tested on, but
performs poorly when deployed in real life. This could be caused Figure 1: Segmentation result of cars, which is suited to be
by selection bias in the dataset or when there are differences in the used as an SDR. Original frame taken from VIRAT [16].
causal structure between the training domain and the deployment
domain [9]. Another challenge with deep learning models is that
The idea is that the SP will learn to find an optimal general
they generally suffer from a lack of explainability [10]. While it is
representation of cars. How general this representation is can be
known how the models make their decisions, their huge parametric
configured using the various SP parameters, but ideally they should
spaces make it unfeasible to know why they make those predic-
be set so that different cars will be represented similarly while
tions. Combined with the vast potential that deep learning offers
trucks and motorcycles will be represented differently. An example
in critical sectors such as medicine, makes approaches that offer
representation by the SP is shown in Figure 2.
explainability highly attractive.
The HTM theory [2] introduces a machine learning algorithm
which works on the same principles as the brain and therefore
solves some of the issues that deep learning has. HTM is considered
noise resistant and can perform online learning, meaning that it
learns as it observes more data. HTM replicates the structure of the
neocortex which is made up of cortical regions, which in turn are
made up of mini-columns and then neurons.
The data in an HTM model is represented using a Sparse Dis-
tributed Representation (SDR), which is a sparse bit array. An en- Figure 2: The SDR (left) and its corresponding SP represen-
coder converts real world values into SDRs, and there are currently tation (right). Note that the SP is untrained.
encoders for numbers, geospatial locations, categories, and dates.
One of the difficulties with HTM is making it work on visual data, The task of the TM will then be to learn the common patterns that
where creating a good encoder for visual data is still being re- the cars exhibit, their speed, shape, and positioning will be taken
searched [11, 12, 13]. The learning mechanism consists of two parts, into account. Finally, the learning will be set so that new patterns
the Spatial Pooler (SP) and the Temporal Memory (TM). The SP are learned quickly, but forgotten slowly. This will allow the model
learns to extract semantically important information into output to quickly learn the norm, even if there is little activity, while still
SDRs. The TM learns sequences of patterns of SDRs and forms a reacting to anomalies. This requires that the input is stationary, in
prediction in the form of a predictive SDR. A research study [14] has our example this means that the camera is not moving.
shown that HTM is very capable of performing anomaly detection It is possible to split different segmentation classes into their
on low-dimensional data and is able to outperform other anomaly respective SDRs. This will give the SP and the TM the ability to
detection methods. However, related works, such as Daylidyonok, learn different things for each of the classes. For instance, if there
Frolenkova, and Panov [13], show that HTM struggles with higher are two classes "person" and "car", then the TM will learn that it
dimensional data. Therefore, a natural conclusion is that HTM is normal for objects belonging to "person" to be on the sidewalk,
should be applied differently, and that a new type of architecture while objects belonging to "car" will be marked as anomalous when
using HTM should be explored for the purpose of video anomaly on the sidewalk.
detection and surveillance. Ideally, the architecture will have a calibration period spanning
several days or weeks, during which the architecture is not per-
forming any anomaly detection, but is just learning the patterns.
3 GRID HTM
This paper proposes and explores a new type of architecture, named 4 IMPROVEMENTS
Grid HTM, for anomaly detection in videos using HTM, and pro- Daylidyonok, Frolenkova, and Panov [13] tested only the base HTM
poses to use segmentation techniques to simplify the data into an version and showed that the algorithm cannot handle subtle anom-
SDR-friendly format. These segmentation techniques could be any- alies, therefore multiple improvements needed to be introduced to
thing from simple binary thresholding to deep learning instance increase effectiveness.
segmentation. Even keypoint detectors such as Oriented FAST and Invariance. One issue that becomes evident is the lack of in-
Rotated BRIEF (ORB) [15] could in theory be applied. When ex- variance, due to the TM learning the global patterns. Using the
plaining Grid HTM, the examples will be taken from deep learning example, it learns that it is normal for cars to drive along the road
Grid HTM: Hierarchical Temporal Memory for Anomaly Detection in Videos Conference acronym ’XX, June 03–05, 2018, Woodstock, NY

but only in the context of there being cars parked in the parking Clean data
lot. It is instead desired that the TM learns that it is normal for cars
to drive along the road, regardless of whether there are cars in the
parking lot. We proposes a solution based on dividing the encoder
output into a grid and have a separate SP and TM for each cell in
the grid. The anomaly scores of all the cells are then aggregated
into a single anomaly score using an aggregation function.
Aggregation Function. Selecting the correct aggregation func-
tion is important because it affects the final anomaly output. For
instance, it might be tempting to use the mean of all the anomaly (a) Mean. (b) Non-zero mean.
scores as the aggregation function:
𝑋 : {𝑥 ∈ R : 𝑥 ≥ 0} Figure 4: Aggregation functions performance on clean data.
Í
𝑥
𝑥 ∈𝑋
𝐴𝑛𝑜𝑚𝑎𝑙𝑦_𝑆𝑐𝑜𝑟𝑒 =
|𝑋 | By using Grid HTM it is now possible to determine where in the
Where 𝑋 denotes the set of anomaly scores 𝑥 from each individual input an anomaly has occurred by simply observing which cell has
grid. However, this leads to problems with normalization, meaning a high anomaly score. It is also possible to estimate the number of
that an overall anomaly score of 1 is hard to achieve due to many predictions for each cell which can be used as a measure of certainty,
cells having a zero anomaly score. In fact, it becomes unclear what where fewer predictions means higher certainty. Making it possible
a high anomaly score is anymore. Using the mean also means that to measure certainty per cell creates a new source of information
anomalies that take up a lot of space will be weighted higher than which can be used for explainability or robustness purposes.
anomalies that take up a little space, which might not be desirable. Flexibility and Performance. In addition, it is also possible
To solve the aforementioned problem and if the data has little noise, to configure the SP and the TM in each cell independently, giving
a potential aggregation function could be the non-zero mean: the architecture increased flexibility and to use a non-uniform grid,
meaning that some cells can have different sizes. Last but not least,
𝑋 : {𝑥 ∈ R : 𝑥 > 0} dividing the frame into smaller cells makes enables it to run each
Í

 𝑥 cell in parallel for increased performance.
 ∈𝑋
 𝑥
Reviewing Encoder Rules. A potential challenge with the
if |𝑋 | > 0

𝐴𝑛𝑜𝑚𝑎𝑙𝑦_𝑆𝑐𝑜𝑟𝑒 = |𝑋 |

 grid approach is that the rules for creating a good encoder, may not
 0 otherwise
 be respected and therefore should be reviewed:
Meaning that only the cells with a strictly positive anomaly score, • Semantically similar data should result in SDRs with
will be contributing to the overall anomaly score which helps solve overlapping active bits. In this example, a car at one posi-
the aforementioned normalization and weighting problem. On the tion will produce an SDR with a high amount of overlapping
other hand, the non-zero mean will perform poorly when the archi- bits as another car at a similar position in the input image.
tecture is exposed to noisy data which could lead to there always • The same input should always produce the same SDR.
being one or more cells with a high anomaly score. Figure 3 illus- The segmentation model produces a deterministic output
given the same input.
Noisy data • The output must have the same dimensionality (total
number of bits) for all inputs. The segmentation model
output has a fixed dimensionality.
• The output should have similar sparsity (similar num-
ber of one-bits) for all inputs and have enough one-
bits to handle noise and subsampling. The segmentation
model does not respect this. An example is that there can be
no cars (zero active bits), one car (𝑛 active bits), or two cars
(2𝑛 active bits), and that this will fluctuate over time.
(a) Mean. (b) Non-zero mean. The solution for the last rule is two-fold, and consists of imposing a
soft upper bound and a hard lower bound for the number of active
Figure 3: Aggregation function performance on noisy data. pixels within a cell. The purpose is to lower the variation of number
of active pixels, while also containing enough semantic information
trates the effect of an aggregation function for noisy data, where for the HTM to work:
the non-zero mean is rendered useless due to the noise. On the • Pick a cell size so that the distribution of number of active
other hand, Figure 4 shows how the non-zero mean gives a clearer pixels is as tight as possible, while containing enough se-
anomaly score when the data is clean. mantic information and also being small enough so that the
Explainability. Having the encoder output divided into a grid desired invariance is achieved. The cell size acts as a soft
has the added benefit of introducing explainability into the model. upper bound for the possible number of active pixels.
Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Monakhov et al.

• Create a pattern representing emptiness, where the number


of active bits is similar to what can be expected on average
when there are cars inside a cell. This acts as a hard lower
bound for the number of active pixels.
There could be situations where a few pixels are active within a
cell, which could happen when a car has just entered a cell, but this
is acceptable as long as it does not affect the distribution too much.
If it does affect the distribution, which can be the case with noisy
data, then an improvement would be to add a minimum sparsity
requirement before a cell is considered not empty, e.g. less than 5
active pixels means that the cell is empty. In the following example,
the number of active pixels within a cell centered in the video was
used to build the distributions seen in Figure 5:

σ = 3.78 σ = 1.41
105 Non-zero Mean 105 Non-zero Mean

104 104
Figure 6: Example Grid HTM output and the corresponding
Frames

Frames

103 103 input. The color represents the anomaly score for each of
102 102 the cells, where red means high anomaly score and green
101 101
means zero anomaly score. Two of the cars are marked as
0 20 40 60 20 40 60 anomalous because they are moving, which is something
Number of Active Pixels Number of Active Pixels
Grid HTM has not seen before during its 300 frame (top
(a) Without empty pattern. (b) With empty pattern and a right) long lifetime.
minimum sparsity requirement
of 5.

Figure 5: Distribution of number of active pixels within a


cell of size 12 × 12.
Figure 7: High anomaly score when an empty cell (repre-
With a carefully selected empty pattern sparsity, the standard de- sented with an empty pattern with a sparsity value of 5)
viation of active pixels was lowered from 3.78 to 1.41. It is possible changes to being not empty, as something enters the cell.
to automate this process by developing an algorithm which finds
the optimal cell size and empty pattern sparsity which causes the
least variation of number of active pixels per cell. This algorithm
would run as a part of the calibration process.
The visual output resulting from these changes, which is an
equally important output as the aggregated anomaly score, can be Figure 8: The anomaly score is ignored (set to 0) for the
seen in Figure 6 (for each cell red means higher anomaly score, frame in which the cell changes state from empty to not
green lower anomaly score). Since there are now cells that are empty.
observing an empty pattern for a lot of the time in sparse data,
boosting is recommended to be turned off, otherwise the SP output
for the empty cells would change back and forth in order to adjust Multistep Temporal Patterns. Since the TM can only grow
the active duty cycle. segments to cells that were active in the previous timestep, it will
Stabilizing Anomaly Output. Another issue with the grid struggle to learn temporal patterns across multiple timesteps. This
based approach is when a car first comes into a cell. The TM in that is especially evident in high framerate videos, where an object in
cell has no way of knowing that a car is about to enter, since it does motion has a similar representation at timestep 𝑡 and 𝑡 + 1, as an
not see outside its own cell, and therefore the first frame that a car object standing still.
enters a cell will cause a high anomaly output. This is illustrated in This could cause situations where an object that is supposed to
Figure 7 where it can be observed that this effect causes the anomaly be moving, suddenly stands still, yet the TM will not mark it as an
output to needlessly fluctuate. The band-aid solution is to ignore anomaly due to it being stuck in a contextual loop. A contextual
the anomaly score for the frame during which the cell goes from loop is when one of the predictions at 𝑡 becomes true at 𝑡 + 1, and
being empty to being not empty, which is illustrated in Figure 8. A then one of the predictions at 𝑡 + 1 is almost identical to the state
more proper solution could be to allow the TM to grow synapses at 𝑡, which becomes true if the object is not moving, causing the
to the TMs in the neighboring cells, but this is not documented in TM to enter the same state that it was in at 𝑡. A solution is to
any research papers and might also hinder invariance. concatenate the past 𝑛 SP outputs as input into the TM, which is
Grid HTM: Hierarchical Temporal Memory for Anomaly Detection in Videos Conference acronym ’XX, June 03–05, 2018, Woodstock, NY

made possible by keeping a buffer of past SP outputs and shifting its


contents out as new SP outputs are inserted. This follows the core Grid HTM
0.05 Segments
idea behind encoding time in addition to the data, which makes Frame Freeze
0.04
time act as a contextual anchor. However, in this case there are no

Anomaly Score
timestamps that are suitable to be used as contextual anchors, so 0.03
as a replacement, the past observations are encoded instead.
0.02
Concatenating past observations together will force the TM
input, for when an object is in motion and when an object is still, 0.01

to be unique. High framerate videos can benefit the most from this, 0.00
and the effect will be more pronounced for higher values of 𝑛. 0 20000 40000 60000 80000 100000 120000 140000 160000
Frames
A potential side effect of introducing temporal patterns, is that
because the TM is now exposed to multiple frames at once, it will
Figure 9: Anomaly score output from Grid HTM.
be more tolerant to temporal noise. An example of temporal noise is
when an object disappears for a single frame due to falling below the
classification threshold of the deep learning segmentation model moving average (𝑛 = 200) was applied to smooth out the anomaly
encoder. The reason for the noise tolerance is that instead of the score output, otherwise the graph would be too noisy.
temporal noise making up the entire input for the TM, it now only With the aggregation functions presented in this paper in mind,
makes up 𝑛1 of the TM input. it is safe to conclude that looking at the anomaly score output is
Use Cases. The most intuitive use case is to use Grid HTM meaningless for complex data such as a surveillance video. This
for semi-active surveillance, where personnel only have to look however does not mean that Grid HTM is completely useless, and
at segments containing anomalies, leading to drastically increased this can be observed by looking at the visual output of Grid HTM.
efficiency. One example is making it possible to have an entire city The visual output during which the first segment anomaly occurs
be monitored by a few people. This is made possible by making it can be seen in Figure 10. Here, it is observed that Grid HTM cor-
so that people only have to look at segments that Grid HTM has rectly marks the sudden change of cars when the current segment
found anomalous, which is what drastically lowers the manpower ends and a new segment begins.
requirement for active monitoring of the entire city.

5 EXPERIMENTAL DETAILS AND RESULTS


As stated earlier, one of the use cases of Grid HTM is anomaly detec-
tion in surveillance, and we using a video from the VIRAT [16] video
dataset with long duration and a stationary camera, we demonstrate
our system. The video consists of technical anomalies in the form
of several segments with sudden frame skips in between. There is
also a synthetic anomaly introduced in the form of a frame repeat
lasting a couple of seconds, essentially "freezing" time, in order to
test whether Grid HTM is able to understand how objects should
be moving in time.
In this experiment, a segmentation model which can extract
classes into their respective SDRs is employed. Meaning that there
Figure 10: The first segment anomaly, which is marked with
could be an SDR for cars and an SDR for persons, that are then
red text, and the corresponding changes detected by Grid
concatenated before being fed into the system. The segmentation
HTM. The numbers beneath each frame represent the rel-
model used is PointRend [17] with a ResNet101 [18] backbone,
ative frame number and the current anomaly score respec-
pretrained on ImageNet [19], and implemented using PixelLib [20].
tively.
For the sake of simplicity, this experiment will focus only on the
segmentation of cars. While on the topic of segmentation, it is
important to mention that the segmentation model is not perfect In the original video, there is a road on which cars regularly
and that there are cases where objects are misclassified as well as drive. By observing the visual output, it becomes evident that after
cases where cars repeatedly go above and below the confidence some time Grid HTM has mostly learned that behavior and does not
threshold. report those moving cars as anomalies. This is shown in Figure 11.
We can see in Figure 9 that Grid HTM is detecting when segments To prove that Grid HTM has learned that cars on the road should
begin and end, however it is not possible to use a threshold value be moving, it is possible to look at the visual output during the
to isolate them, and they also have vastly different anomaly scores period when the video is repeating the same frame and observe
compared to each other. This is due to the way the aggregation if the architecture marks the cars standing still on the road as
function works, which means that the anomaly output is dependent anomalies. It can be observed in Figure 12 that the cars along the
on the physical size of the anomaly. It should also be noted that a main road are not marked as anomalies, but this could be attributed
Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Monakhov et al.

Figure 11: Visual output when a car is driving along a road. Figure 13: Anomaly output when there is no frame repeat-
ing, where it should have repeated is marked in red. The blue
circle highlights the object of interest.

Figure 12: Anomaly output during the repeating frame, the


start of the frame repeat is marked with red text. The blue
circle highlights the object of interest. Figure 14: Anomaly output during the repeating frame, the
start of the frame repeat is marked with red text. The blue
circle highlights the object of interest. This time without
to the fact that there is a crossing there and that cars periodically multistep temporal patterns.
have to stop at that point to let pedestrians cross.
On the other hand, when looking at the anomaly marked with causes a lot more anomalies to be wrongly detected. This is evident
a blue circle, the car on the road in the parking lot is marked as in Figure 14, where a higher number of severe anomalies can be
an anomaly that increases in severity as the time goes on during observed compared to previous examples. This also highlights how
the frame repeat. The reason why that car causes an anomaly is sensitive HTM can be regarding parameters. The working code for
because, unlike the cars on the main road, a car is rarely observed Grid HTM and the parameters for the experiments conducted in
as standing still at that position. To prove that the anomaly was for this paper can be found on GitHub1 .
actually directly caused by the repeating frame, and not just due
to repeating the anomaly in time, it should be compared to the 6 CONCLUSION
anomaly output if there was no repeating frame. It can be observed
We presented a novel method to perform anomaly detection in
in Figure 13 that the anomaly output is minor compared to when
videos. Experiments showed that the proposed Grid HTM can be
there was a repeating frame, proving that the anomaly was indeed
used for unsupervised anomaly detection in complex videos such
a product of the repeating frame and that Grid HTM was able to
as surveillance footage. One of the most important future work
learn how objects should be moving in time.
would be to create a dataset with videos that are several days long
Finally, it is interesting to look at how Grid HTM handles the
and contain anomalies such as car accidents, jaywalking, and other
repeating frames without multistep temporal patterns, which is
similar anomalous behaviors. For Grid HTM, more time can be
shown in Figure 14. Unfortunately, simply disabling multistep tem-
spent exploring other aggregation functions so that the aggregated
poral patterns without adjusting the other TM parameters causes
anomaly score can be used more efficiently. Additionally, it would be
the same car to be marked as an anomaly before and during the
a big benefit to create an algorithm which can decide the parameters
frame repeat. In fact, as previously mentioned, disabling multistep
temporal patterns causes Grid HTM to be less noise tolerant which 1 https://github.com/vladim0105/GridHTM
Grid HTM: Hierarchical Temporal Memory for Anomaly Detection in Videos Conference acronym ’XX, June 03–05, 2018, Woodstock, NY

for each cell during the calibration phase. It is also possible to Credibility in Modern Machine Learning. (2020). doi: 10 .
improve explainability and robustness by implementing a measure 48550/ARXIV.2011.03395.
of certainty for each cell. [10] Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, Javier
Finally, experiments should be performed to validate the possi- Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado,
bility of having the TM in each cell grow synapses to neighboring Salvador Garcia, Sergio Gil-Lopez, Daniel Molina, Richard
cells in order to solve the issue with unstable anomaly output. Benjamins, Raja Chatila, and Francisco Herrera. 2020. Ex-
plainable Artificial Intelligence (XAI): Concepts, taxonomies,
REFERENCES opportunities and challenges toward responsible AI. Infor-
[1] Divyanshi Tewari. 2019. U.S. Video Surveillance Market mation Fusion, 58, 82–115. issn: 1566-2535. doi: https://doi.
by Component (Solution, Service, and Connectivity Tech- org/10.1016/j.inffus.2019.12.012.
nology), Application (Commercial, Military & Defense, In- [11] Y. Zou, Y. Shi, Y. Wang, Y. Shu, Q. Yuan, and Y. Tian. 2018.
frastructure, Residential, and Others), and Customer Type Hierarchical Temporal Memory Enhanced One-Shot Dis-
(B2B and B2C): Opportunity Analysis and Industry Forecast, tance Learning for Action Recognition. In Proceedings of the
2020–2027. Online. (March 2019). https://www.alliedmarketresearch. 2018 IEEE International Conference on Multimedia and Expo
com/us-video-surveillance-market-A06741. (ICME), 1–6. doi: 10.1109/ICME.2018.8486447.
[2] J. Hawkins, S. Ahmad, S. Purdy, and A. Lavin. Biological [12] David McDougall (ctrl-z 9000-times). 2019. Online. (Septem-
and Machine Intelligence (BAMI). Initial online release 0.4, ber 2019). https://github.com/htm-community/htm.core/
(2016). https : / / numenta . com / resources / biological - and - issues/259#issuecomment-533333336.
machine-intelligence/. [13] Alexei V. Samsonovich, editor. 2019. Extended Hierarchical
[3] Guansong Pang, Chunhua Shen, Longbing Cao, and Anton Temporal Memory for Motion Anomaly Detection. Biologically
Van Den Hengel. 2021. Deep Learning for Anomaly Detec- Inspired Cognitive Architectures 2018. Springer International
tion. ACM Computing Surveys, 54, 2, (April 2021), 1–38. issn: Publishing, Cham, 69–81. isbn: 978-3-319-99316-4. doi: 10.
1557-7341. doi: 10.1145/3439950. 1007/978-3-319-99316-4_10.
[4] Sijie Zhu, Chen Chen, and Waqas Sultani. 2020. Video Anom- [14] Subutai Ahmad, Alexander Lavin, Scott Purdy, and Zuha
aly Detection for Smart Surveillance. Online. (2020). doi: Agha. 2017. Unsupervised real-time anomaly detection for
10.48550/ARXIV.2004.00222. streaming data. Neurocomputing, 262, 134–147. Online Real-
[5] Shivani Gupta and Atul Gupta. 2019. Dealing with Noise Time Learning Strategies for Data Streams. issn: 0925-2312.
Problem in Machine Learning Data-sets: A Systematic Re- doi: https://doi.org/10.1016/j.neucom.2017.04.070.
view. Procedia Computer Science, 161, 466–474. The Fifth [15] Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary
Information Systems International Conference, 23-24 July Bradski. 2011. ORB: An efficient alternative to SIFT or SURF.
2019, Surabaya, Indonesia. issn: 1877-0509. doi: https://doi. In Proceedings of the 2011 International Conference on Com-
org/10.1016/j.procs.2019.11.146. puter Vision (ICCV), 2564–2571. doi: 10.1109/ICCV.2011.
[6] Dan Hendrycks and Thomas Dietterich. 2019. Benchmarking 6126544.
Neural Network Robustness to Common Corruptions and [16] Sangmin Oh, Anthony Hoogs, Amitha Perera, Naresh Cun-
Perturbations. Online. (2019). doi: 10.48550/ARXIV.1903. toor, Chia-Chih Chen, Jong Taek Lee, Saurajit Mukherjee,
12261. J. K. Aggarwal, Hyungtae Lee, Larry Davis, Eran Swears,
[7] Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhinav Xioyang Wang, Qiang Ji, Kishore Reddy, Mubarak Shah,
Gupta. 2017. Revisiting Unreasonable Effectiveness of Data Carl Vondrick, Hamed Pirsiavash, Deva Ramanan, Jenny
in Deep Learning Era. Online. (2017). doi: 10.48550/ARXIV. Yuen, Antonio Torralba, Bi Song, Anesco Fong, Amit Roy-
1707.02968. Chowdhury, and Mita Desai. 2011. A large-scale benchmark
[8] Zheyan Shen, Jiashuo Liu, Yue He, Xingxuan Zhang, Renzhe dataset for event recognition in surveillance video. In Pro-
Xu, Han Yu, and Peng Cui. 2021. Towards Out-Of-Distribution ceedings of the 2013 IEEE Conference on Computer Vision and
Generalization: A Survey. (2021). doi: 10.48550/ARXIV.2108. Pattern Recognition (CVPR), 3153–3160. doi: 10.1109/CVPR.
13624. 2011.5995586.
[9] Alexander D’Amour, Katherine Heller, Dan Moldovan, Ben [17] Alexander Kirillov, Yuxin Wu, Kaiming He, and Ross Gir-
Adlam, Babak Alipanahi, Alex Beutel, Christina Chen, Jonathan shick. 2019. PointRend: Image Segmentation as Rendering.
Deaton, Jacob Eisenstein, Matthew D. Hoffman, Farhad Hor- Online. (2019). doi: 10.48550/ARXIV.1912.08193.
mozdiari, Neil Houlsby, Shaobo Hou, Ghassen Jerfel, Alan [18] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
Karthikesalingam, Mario Lucic, Yian Ma, Cory McLean, Di- 2016. Deep Residual Learning for Image Recognition. In Pro-
ana Mincu, Akinori Mitani, Andrea Montanari, Zachary ceedings of the 2016 IEEE Conference on Computer Vision and
Nado, Vivek Natarajan, Christopher Nielson, Thomas F. Os- Pattern Recognition (CVPR), 770–778. doi: 10.1109/CVPR.
borne, Rajiv Raman, Kim Ramasamy, Rory Sayres, Jessica 2016.90.
Schrouff, Martin Seneviratne, Shannon Sequeira, Harini Suresh, [19] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and
Victor Veitch, Max Vladymyrov, Xuezhi Wang, Kellie Web- Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image
ster, Steve Yadlowsky, Taedong Yun, Xiaohua Zhai, and D. database. In Proceedings of the 2009 IEEE Conference on Com-
Sculley. 2020. Underspecification Presents Challenges for puter Vision and Pattern Recognition (CVPR), 248–255. doi:
10.1109/CVPR.2009.5206848.
Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Monakhov et al.

[20] Ayoola Olafenwa. 2021. Simplifying Object Segmentation


with PixelLib Library. Online. (2021). https://vixra.org/abs/
2101.0122.

You might also like