Image Captionbot For Assistive Technology
Image Captionbot For Assistive Technology
ISSN No:-2456-2165
Abstract:- Because an image can have a variety of A new image captioning model known as "domain
meanings in different languages, it's difficult to generate specific image caption generator" replaces the general caption's
short descriptions of those meanings automatically. It's specific words with those that are specific to the domain. This
difficult to extract context from images and use it to model is referred to as a "domain-specific image caption
construct sentences because they contain so many different generator" (DSIG). The image caption generator was put to the
types of information. It allows blind people to test in terms of both quality and quantity. This model does not
independently explore their surroundings. Deep learning, a allow for the implementation of a semantic ontology from
new programming trend, can be used to create this type of beginning to end.
system. This project will use VGG16, a top-notch CNN
architecture for image classification and feature extraction. For example, in [2], Kurt Shuster and his colleagues
In the text description process, LSTM and an embedding proposed a model that understands an image's content and
layer will be used. These two networks will be combined to provides humans with engaging captions. Using the most
form an image caption generation network. After that, recent advances in image and sentence encoding, create and
we'll train our model with data from the flickr8k dataset. retrieve models that perform well on standard captioning tasks.
The model's output is converted to audio for the benefit of Here, a brand-new retrieval architecture called TransResNet is
those who are visually impaired. developed, as well as a new state-of-the-art for creating
captions for COCO videos. Modifiable personality traits can be
Keywords:- Deep Learning; Recurrent neural network; used to enhance the models' human appeal. These models can
Convolutional neural network; VGG16; LSTM. be trained with a large amount of data by collecting a large
amount. In terms of relevance and involvement, the system
I. INTRODUCTION performs similarly to a human. There are ongoing efforts to
improve generative models that have previously failed.
Many people with disabilities still find it difficult to fully
participate in society, but they are still a valuable and Soheyla Amirian and other researchers coined the term to
important part of our society. As a result, they have been describe the functions of automatic image annotation, tagging,
hampered in their social and economic advancement, and they and indexing, which are all detailed in. It is known as image
have little or no desire to contribute to our economic captioning when metadata is automatically generated in the
prosperity. Our goal is to assist in bridging this ever-widening form of captions (i.e. producing sentences that express the
gap between the two groups. These technological content of the image). There are many ways to search for
advancements will assist us in achieving this goal. images using image captions. These include using them in
databases, online and on personal devices. For image
A person without visual impairments can deduce the captioning, Deep Learning has had some success in recent
scene description and content of an image, but the blind in our years. Accuracy, diversity and emotional impact of the
society do not have this ability. This ability to provide visual captions are all issues that need to be addressed. Generating
content descriptions in the form of naturally spoken sentences new and combinatorial samples is possible with the proposed
could be extremely beneficial to the visually impaired. If you generative adversarial models. Our goal is to improve image
want to imagine a world where no one is limited by their visual captions by experimenting with various autoencoders. Using
abilities, you can have access to the visual medium without unsupervised neural networks, autoencoders are able to learn to
having to see the objects themselves. Their goal is to use an encode data on their own. Visit the study's website if you're
automated method of capturing visual content and producing interested in finding out more.
natural language sentences to empower the visually impaired.
[4] proposed deep learning for the generation of image
This ability was one of the most difficult for a computer captions using neural networks. N. Komal Kumar and D.
to achieve on its own before recent advances in the field of Vigneswari conducted their research using a Flickr 8k dataset.
computer vision. Image descriptions are therefore more A. Mohan K Laxman and J. Yuvaraj used the method here.
difficult than object recognition and classification because they More accurate image captions were generated using the
must capture more than just the objects themselves. To provide proposed deep learning method than using any of the currently
a visual representation and understanding, the visual and available image caption generators. Image caption generators
linguistic models must be understood. could benefit from a hybrid model.
II. RELATED WORKS Using knowledge graphs, Yimin Zhou, Yiwei Sun, and
For the past few years, researchers have been focusing on Vasant Honavar have proposed CNet-NIC, a new approach to
the issue of translating visual content into descriptions in image captioning. The performance of image captioning
natural language forms. They are vulnerable to attack and have systems on several benchmark data sets, such as MS COCO,
a limited set of capabilities because of certain constraints. was compared using CIDEr-D, a performance measure
IV. CONCLUSION
REFERENCES