0% found this document useful (0 votes)
5 views

PROJECT DOC-FILE

The document presents a project report on developing an efficient deep learning-based hybrid model for image caption generation, submitted for a Bachelor of Technology degree in Computer Science and Engineering. It outlines the project's objectives, methodologies using CNN, NLP techniques, and the evaluation metrics employed. The report includes sections on system specifications, literature survey, system analysis, and various technical details related to the implementation of the proposed model.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

PROJECT DOC-FILE

The document presents a project report on developing an efficient deep learning-based hybrid model for image caption generation, submitted for a Bachelor of Technology degree in Computer Science and Engineering. It outlines the project's objectives, methodologies using CNN, NLP techniques, and the evaluation metrics employed. The report includes sections on system specifications, literature survey, system analysis, and various technical details related to the implementation of the proposed model.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

A MAJOR PROJECT REPORT

ON
AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR
IMAGE CAPTION GENERATION
Submitted in partial fulfilment of the requirement for the award of the Degree of
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING(AI & ML)
BY
P.SAI PRAKASH 20N71A6628
S.HARIKA 20N71A6634
N.SHRAVAN KUMAR 21N75A6604

Under the guidance of


Mr. G.SRINIVAS
(ASSISTANT PROFESSOR)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING(AI&ML)

DRK INSTITUTE OF SCIENCE AND TECHNOLOGY

(Affiliated to JNTU University, Hyderabad)


Bowrampet (V), via Air Force Academy, Hyderabad-500043
2023-2024

i
DRK INSTITUTE OF SCIENCE AND TECHNOLOGY

(Affiliated to JNTU University, Hyderabad)


Bowrampet (V), via Air Force Academy, Hyderabad-500043

CERTIFICATE

This is to certify that the project report entitled “AN EFFICIENT DEEP LEARNING
BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION”
P.SAI PRAKASH (20N71A6628), S.HARIKA(20N71A6634), N.SHRAVAN KUMAR
(21N75A6604) or the partial fulfillment of the requirement for the award of B.Tech. Degree
in COMPUTER SCIENCE AND ENGINEERING(AI & ML), JNTUH University
Hyderabad, for the academic year 2023-2024

INTERNAL GUIDE HEAD OF THE DEPARTMENT

Mr. G.SRINIVAS Mr. K.PRAVEEN


(Assistant Professor) (Associate Professor)

EXTERNAL EXAMINER

ii
DECLARATION

We here declare that the project report entitled “AN EFFICIENT DEEP LEARNING
BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION” submitted to the
Department of COMPUTER SCIENCE AND ENGINEERING(AI & ML) in partial
fulfillment of requirements for the award of the degree of BACHELOR OF
TECHNOLOGY. This project is the result of our own effort and that it has not been submitted
to any other University or Institution for the award of any degree or diploma other than
specified above.

P.SAI PRAKASH 20N71A6628

S.HARIKA 20N71A6634

N.SHRAVAN KUMAR 21N75A6604

iii
ACKNOWLEDGMENT

The report would not be complete without mentioning certain individuals whose
guidance and encouragement have been of immense help to complete this thesis.
We express a deep sense of gratitude to our guide Mr. K.PRAVEEN, Assistant
Professor, Department of CSE(AI & ML), for her able guidance and cooperation
throughout our project. We are highly grateful to her for providing all the facilities for
the completion of the project work.

We are very thankful to Mr. K.PRAVEEN, Head of the Department CSE, for
providing the necessary resources for successful completion of the project work.

We would also like to express our gratitude to Dr. VENKATA SUBBAIAH,


Principal of DRK Institute of Science and Technology, for his valuable guidance and
encouragement gave to us throughout this project.

We render our respected sir D.B. CHANDRA SEKHAR RAO, Chairman of


DRK Group of Institutions, for his initiative in creating a conducive atmosphere in
which we could complete the course work and project work successfully.

We would like to thank our parents and friends, who have the greatest
contributions in all our achievements, for the great care and blessings in making us
successful in all our endeavor's

P.SAI PRAKASH 20N71A6628

S.HARIKA 20N71A6634

N.SHRAVAN KUMAR 21N75A6604

iv
ABSTRACT

In the recent years, with the increase in the use of different social media platforms, image
captioning approach play a major role in automatically describe the whole image into natural
language sentence. Image captioning plays a significant role in computer-based society. Image
captioning is the process of automatically generating the natural language textual description
of the image using artificial intelligence techniques. Computer vision and natural language
processing are the key aspect of the image processing system. Convolutional Neural Network
(CNN) is a part of computer vision and used object detection and feature extraction and on the
other side Natural Language Processing (NLP) techniques help in generating the textual
caption of the image. Generating suitable image description by machine is challenging task as
it is based upon object detection, location and their semantic relationships in a human
understandable language such as English. In this paper our aim to develop an encoder-decoder
based hybrid image captioning approach using VGG16, ResNet50 and YOLO. VGG16 and
ResNet50 are the pre-trained feature extraction model which are trained on millions of images.
YOLO is used for real time object detection. It first extracts the image features using VGG16,
ResNet50 and YOLO and concatenate the result in to single file. At last LSTM and BiGRU are
used for textual description of the image. Proposed model is evaluated by using BLEU,
METEOR and RUGE score.

v
TABLE OF CONTENTS

CONTENTS PAGE.NO

1. INTRODUCTION 1
2. SYSTEM SPECIFICATIONS 2
2.1 HARDWARE REQUIREMENT 2
2.2 SOFTWARE REQUIREMENTS 2
3. SOFTWARE AND HARDWARE SPECIFICATIONS 3
3.1 REQUIREMENT ANALYSIS 3
3.2 REQUIREMENT SPECIFICATIONS 3
3.2.1 Functional Requirements 3
3.2.2 Software Requirements 3
3.2.3 Hardware Requirements 3
4. LITERATURE SURVEY 4
5. SYSTEM ANALYSIS 6
5.1 EXISTING SYSTEM 6
5.2 PROPOSED SYSTEM 8
6. MODULES 10
6.1 MODULES 10
6.2 MODULES DESCRIPTION 10
7. SYSTEM DESIGN 12
7.1 SYSTEM ARCHITECTURE 12
7.2 DATA FLOW DIAGRAM 12
7.3 UML DIAGRAM 13
7.3.1 USE CASE DIAGRAM 14
7.3.2 CLASS CASE DIAGRAM 15
7.3.3 SEQUENCE DIAGRAM 16
7.3.4 ACTIVITY DIAGRAM 18

vi
CONTENTS PAGE.NO

8. SOURCE CODE 19
9. SYSTEM STUDY 29
9.1 FEASIBILITY STUDY 29
9.1.1 ECONOMICAL FEASIBILITY 29
9.1.2 TECHNICAL FEASIBILITY 29
9.1.3 SOCIAL FEASIBILITY 30
10. SYSTEM TEST 31
10.1 TYPES OF TESTS 31
10.2 TEST CASES 34
11. OUTPUT SCREENS 35
12. CONCLUSION 39
13. FURTHER ENHANCEMENTS 40
14. REFERENCES 41

vii
LIST OF FIGURES

FIGURE NAMES PAGE.NO

FIGURE 7.1 SYSTEM ARCHITECTURE 12


FIGURE 7.2.1 DATA FLOW DIAGRAM 13
FIGURE 7.3.1.1 USE CASE DIAGRAM 15
FIGURE 7.3.2.1 CLASS DIAGRAM 16
FIGURE 7.3.3.1 SEQUENCE DIAGRAM 17
FIGURE 7.3.4.1 ACTIVITY DIAGRAM 18

viii
LIST OF OUTPUT FIGURES
OUTPUT SCREEN NAMES PAGE.NO

OUTPUT SCREEN 11.1 HOME PAGE 35


OUTPUT SCREEN 11.2 USER REGISTRATION FORM 35
OUTPUT SCREEN 11.3 USER LOGIN FORM 36
OUTPUT SCREEN 11.4 ADMIN LOGIN FORM 36
OUTPUT SCREEN 11.5 USER DETAILS 37
OUTPUT SCREEN 11.6 IMAGE CAPTION DETAILS 37
OUTPUT SCREEN 11.7 DATA SET VIEW 38
OUTPUT SCREEN 11.8 PREDICTION RESULTS 38

ix
CHAPTER – 1
INTRODUCTION

In this www world, every day in our life, all have experienced with the huge number of images
in a real world which are self-interpret by the individual human being by using their wisdom.
Human are naturally programmed to convert the natural scene in to text but it is the complex
task for the machine as they are not much efficient like human. Still, human generated captions
are considered better as machine need human intervention and programmed accordingly for
the better result. Due to the recent development in deep learningbased techniques, computers
are capable to handle the challenges of image captioning like detection of object, attribute and
their relationship, image feature extraction and generating syntactic and semantic image
caption [1]. With the advancement of AI, so many new ideas have revolutionized in the areas
of image processing and it has transformed the world in a surprising way. The image captioning
Approach (Fig. 1) has wider application in the real world as it provides the better platform for
human computer interaction. Due to the emerging application in image processing, image
captioning becomes the topic of interest for the academician and researchers. By seeing the
Fig. 2, picture someone guess that two dogs are playing with toy and someone might say two
dogs hauling in floating toy from the ocean or two dogs run through the water with rope in their
mouths, so all of these captions are appropriate to describe this picture. Our brain is so much
trained and advanced that it can describe a picture almost accurate but same was not the case
with machines. Hence, the main aim of the image captioning is first identified the different
objects and their relationship present in the image using deep learning-based technique,
generating the textual description using the natural language processing and evaluate the
performance of the natural language-based description using different performance matrices.
Object detection and segmentation are the part of the computer vision and done with the help
of popular CNN and DNN and generating image description (Fig. 3) are the part of natural
language processing which is done by RNN and LSTM. CNN works for understanding the
objects of the image or scene and provide the answers the various questions about the objects
in image like what, where, how, etc.

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
1
CHAPTER - 2
SYSTEM SPECIFICATIONS

2.1 HARDWARE REQUIREMENTS

❖ System : Intel i3

❖ Hard Disk : 1 TB.

❖ Monitor : 14’ Colour Monitor.

❖ Mouse : Optical Mouse.

❖ RAM : 4GB.

2.2 SOFTWARE REQUIREMENTS

❖ Operating system : Windows 10.

❖ Coding Language : Python.

❖ Front-End : Html. CSS

❖ Designing : Html,css,javascript.

❖ Data Base : SQLite.

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
2
CHAPTER - 3
SOFTWARE AND HARDWARE SPECIFICATIONS

3.1 REQUIREMENT ANALYSIS

The project involved analyzing the design of few applications so as to make the
application more users friendly. To do so, it was really important to keep the navigations from
one screen to the other well ordered and at the same time reducing the amount of typing the
user needs to do. In order to make the application more accessible, the browser version had to
be chosen so that it is compatible with most of the Browsers.

3.2 REQUIREMENT SPECIFICATION

3.2.1 Functional Requirements

▪ Graphical User interface with the User.


3.2.2 Software Requirements
For developing the application the following are the Software Requirements:

1. Python

2. Django

Operating Systems supported

1. Windows 10 64 bit OS

Technologies and Languages used to Develop

1. Python

Debugger and Emulator

▪ Any Browser (Particularly Chrome)


3.2.3 Hardware Requirements

For developing the application the following are the Hardware Requirements:

▪ Processor: Intel i3
▪ RAM
▪ Space on Hard Disk: minimum 1TB

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
3
CHAPTER - 4
LITERATURE SURVEY

1) An Empirical Study of Language CNN for Image Captioning


Authors: Jiuxiang Gu, Gang Wang, Jianfei Cai and Tsuhan Chen

Language Models based on recurrent neural networks have dominated recent image caption
generation tasks. In this paper, we introduce a Language CNN model which is suitable for
statistical language modeling tasks and shows competitive performance in image captioning.
In contrast to previous models which predict next word based on one previous word and hidden
state, our language CNN is fed with all the previous words and can model the long-range
dependencies of history words, which are critical for image captioning. The effectiveness of
our approach is validated on two datasets MS COCO and Flickr30K. Our extensive
experimental results show that our method outperforms the vanilla recurrent neural network
based language models and is competitive with the state-of-the-art methods.

2.) Convolutional Image Captioning


AUTHORS: Jyoti Aneja,Aditya Deshpande, and Alexander Schwing
Gippsland School of Information Technology, Monash University, Churchil

Image captioning is an important but challenging task, applicable to virtual assistants, editing
tools, image indexing, and support of the disabled. Its challenges are due to the variability and
ambiguity of possible image descriptions. In recent years significant progress has been made
in image captioning, using Recurrent Neural Networks powered by long-short-term-memory
(LSTM) units. Despite mitigating the vanishing gradient problem, and despite their compelling
ability to memorize dependencies, LSTM units are complex and inherently sequential across
time. To address this issue, recent work has shown benefits of convolutional networks for
machine translation and conditional image generation. Inspired by their success, in this paper,
we develop a convolutional image captioning technique. We demonstrate its efficacy on the
challenging MSCOCO dataset and demonstrate performance on par with the baseline, while
having a faster training time per number of parameters. We also perform a detailed analysis,
providing compelling reasons in favor of convolutional language generation approaches.

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
4
LITERATURE SURVEY

3. ) Image captioning with deep LSTM based on sequential residual


AUTHORS : Hanli Wang ,Pengjie Tang and Kaisheng Xu

Image captioning is a fundamental task which requires semantic understanding of images and
the ability of generating description sentences with proper and correct structure. In
consideration of the problem that language models are always shallow in modern image caption
frameworks, a deep residual recurrent neural network is proposed in this work with the
following two contributions. First, an easy-to-train deep stacked Long Short Term Memory
(LSTM) language model is designed to learn the residual function of output distributions by
adding identity mappings to multi-layer LSTMs. Second, in order to overcome the over-fitting
problem caused by larger-scale parameters in deeper LSTM networks, a novel temporal
Dropout method is proposed into LSTM. The experimental results on the benchmark
MSCOCO and Flickr30K datasets demonstrate that the proposed model achieves the state-of-
the-art performances with 101.1 in CIDEr on MSCOCO and 22.9
In B-4 on Flickr 30K, respectively.

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
5
CHAPTER - 5
SYSTEM ANALYSIS

5.1 EXISTING SYSTEM

Tssshe key existing systems were that they compared against:

• MFCC + Softmax Regression: Extract MFCC features, feed into softmax regression
model for genre classification.
• CQT + Softmax Regression: Use Constant Q Transform instead of STFT to get
spectrogram features, feed into softmax regression.
• FFT + Softmax Regression: Take FFT directly on audio, feed amplitude spectrum into
softmax regression.
• MFCC + MLP: Use MFCC as input, feed into a multilayer perceptron (MLP) model
with softmax output for classification.
• CQT + MLP: Use CQT spectrogram as input, feed into MLP model.
• FFT + MLP: Use FFT amplitude spectrum as input, feed into MLP.

The key existing systems used:

• Different input audio representations: MFCC, CQT, FFT


• Simple linear models like softmax regression
• Non-linear MLP models

But they did not use convolutional neural networks or other deep learning approaches. The
input features were hand-engineered rather than learned.
Let me know if you need any clarification on these existing systems! I tried to infer the
details from the limited information provided in the paper.

DISADVANTAGES OF EXISTING SYSTEM

Based on the typical audio feature extraction and classification approaches used in the existing
systems described in the paper, some potential disadvantages or limitations could be:

• Hand-crafted audio features like MFCC may not capture all the relevant information
for genre classification. They are engineered based on human assumptions rather than
learned from data.
• Features like MFCC are extracted from short frames independently, without
considering temporal context. This ignores useful temporal patterns in the audio.
• Simple linear models like softmax regression have limited modeling capacity to
capture complex patterns in audio features.

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
6
SYSTEM ANALYSIS

• Non-linear MLPs are able to model complex patterns, but their performance still
relies on the quality of input features.
• Most systems use a pipeline approach - feature engineering, feature selection, then
classifier training. This is not end-to-end learning.
• Lack of shift/translation invariance - small variations in pitch or tempo can degrade
accuracy of systems relying on fixed audio features.
• Unable to effectively learn from raw audio - most systems rely on engineered features
rather than learning directly from spectrograms/waveforms.
• Inability to scale up - unlike deep learning approaches, traditional methods can't
benefit from larger datasets.

The key limitations are reliance on engineered features rather than end-to-end feature
learning, lack of modeling temporal context, limited invariance properties, and disjoint
training of feature extraction and classifier components. Deep learning approaches can help
overcome some of these disadvantages.
Algorithm:

Here are some of the key existing algorithms and techniques that were used prior to this
work:

• Using hand-crafted audio features like MFCCs, chroma features, spectral contrast, etc
and feeding them into machine learning classifiers like SVM, KNN, Random Forests
etc.
• Using aggregation and statistics of low-level features, e.g. mean, variance, histograms
etc.
• Applying dimensionality reduction on hand-crafted features like PCA, ICA etc before
classification.
• Using mid-level representations like bag-of-words on audio features.
• Combining multiple features at feature-level or decision-level via techniques like
feature concatenation, early fusion, late fusion etc.
• Using deep neural networks like Deep Belief Networks (DBNs) and stacked
autoencoders for unsupervised pre-training before classification.
• Applying recurrent neural networks like LSTMs on top of pre-extracted features for
sequence modeling.
• Using 1D convolutional neural networks on raw waveform or spectrogram for feature
learning.

The key existing techniques relied heavily on hand-crafted audio features or 1D convolution,
rather than 2D convolutional feature learning directly from spectrograms as proposed in this
paper. The deep learning approaches focused more on unsupervised pre-training rather than
end-to-end feature learning.

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
7
SYSTEM ANALYSIS

5.2 PROPOSED SYSTEM

Here is the key points about the music genre classification paper:

• Motivation: Develop better feature representations directly from audio rather than
using hand-crafted features like MFCCs for music genre classification.
• Approach: Use 2D convolutional neural network applied on spectrograms to learn
features that capture timbral and temporal patterns.
• Input: 30-second audio clips converted to spectrograms using Short-time Fast Fourier
Transform (STFT).
• Feature Learning: Designed 4 filters to detect patterns related to percussion, harmony,
pitch slides etc. Convolved filters with spectrogram to obtain 4 feature maps.
• Subsampling: Applied 2x2 max pooling on feature maps for dimensionality reduction
and translation invariance.
• Classification: Flattened feature maps and fed them into a Multilayer Perceptron
(MLP) with softmax output for 10-way genre classification.
• Results: Achieved 72.4% accuracy on GTZAN dataset, outperforming MFCC+MLP
(46.8%) and other baseline systems relying on hand-crafted features.
• Conclusion: Learned features from spectrograms using 2D CNNs capture more
relevant information for genre classification than engineered MFCC features. End-to-
end feature learning shows promise over pipeline systems.

The key ideas are - using 2D CNN on spectrograms for feature learning, end-to-end training,
and demonstrating superior performance over traditional methods relying on MFCC and
other hand-crafted audio features for music classification.

ADVANTAGES OF PROPOSED SYSTEM

Some of the key problems this work is trying to address for music genre classification are:

1. Limitations of hand-crafted audio features like MFCCs:

• The paper mentions MFCCs lack dynamic analysis capability as they are extracted
from single frames.
• MFCCs may not capture all the relevant information for genre classification.

2. Finding better representations from raw audio:

• Rather than using hand-crafted features, learn features directly from the spectrogram
using convolutional neural nets.

3. Capturing temporal patterns:

• The 2D convolutional filters can capture patterns across both time and frequency
dimensions of the spectrogram, unlike MFCCs.

4. Translation invariance

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
8
SYSTEM ANALYSIS

The max pooling provides some invariance to pitch shifting or tempo changes.

5. End-to-end learning:

• Compared to systems relying on engineered features, learn the feature extraction and
classification together end-to-end.

Some of the key limitations the paper tries to address are:

• Finding better features from raw audio data rather than relying on hand-crafted
features
• Learning features that capture temporal/spectral patterns
• Achieving some translation invariance
• End-to-end learning of features and classifier

The goal is to show convolutional neural networks can achieve better music genre
classification from raw audio compared to approaches using traditional audio features.

Algorithm:

The proposed algorithm for music genre classification can be summarized as follows:
Input:

• Take 30-second audio clips


• Compute spectrogram using Short-time Fast Fourier Transform (STFT)
• Retain only magnitude values from spectrogram

Feature Extraction:

• Define 4 different 2D convolutional filters designed to capture different patterns in the


spectrogram
• Convolve each filter with the input spectrogram to generate 4 feature maps
• This acts as a feature detector to extract useful representations

Subsampling:

Apply 2x2 max pooling to each feature map

Reduces dimensionality and provides translation invariance

Classification:

• Flatten the 4 subsampled feature maps into a vector


• Feed the feature vector into a Multilayer Perceptron (MLP)
• Use softmax activation in the output layer for predicting genre

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
9
CHAPTER - 6
MODULES

IMPLEMENTATION

6.1 MODULES

• User
• Admin
• Data Preprocessing
• Machine Learning

6.2 MODULES DESCRIPTION

User
The User can register the first. While registering he required a valid user email and mobile for
further communications. Once the user register then admin can activate the user. Once admin
activated the user then user can login into our system. User can upload the dataset based on our
dataset column matched. For algorithm execution data must be in float format. Here we took
Employment Scam Aegean Dataset (EMSCAD) containing 18000 sample dataset. User can
also add the new data for existing dataset based on our Django application. User can click the
Classification in the web page so that the data calculated Accuracy and macro avg, weighted
avg based on the algorithms. User can display the ml results. user can also display the
prediction results.

Admin
Admin can login with his login details. Admin can activate the registered users. Once he
activate then only the user can login into our system. Admin can view the overall data in the
browser. Admin can click the Results in the web page so calculated Accuracy and macro avg,
weighted avg based on the algorithms is displayed. All algorithms execution complete then
admin can see the overall accuracy in web page. And also display the classification results.

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
10
MODULES

Data Preprocessing
They worked on this dataset in three steps- data pre-processing, feature selection and fraud
detection using classifier. In the preprocessing step, they removed noise and html tags from the
data so that the general text pattern remained preserved. They applied feature selection
technique to reduce the number of attributes effectively and efficiently. Support Vector
Machine was used for feature selection and ensemble classifier using random forest was used
to detect fake job posts from the test data. Random forest classifier seemed a tree structured
classifier which worked as ensemble classifier with the help of majority voting technique. This
classifier showed 97.4% classification accuracy to detect fake job posts.

Machine learning
This paper proposed to use different data mining techniques and classification algorithm like
KNN, decision tree, support vector machine, naïve bayes classifier, random forest classifier,
multilayer perceptron and deep neural network to predict a job post if it is real or fraudulent.
The Accuracy and macro avg weighted avg of the classifiers was calculated and displayed in
my results. The classifier which bags up the highest accuracy could be determined as the best
classifier.

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
11
CHAPTER - 7
SYSTEM DESIGN

7.1 SYSTEM ARCHITECTURE

7.1.1 SYSTEM ARCHITECTURE

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
12
7.2 DATA FLOW DIAGRAM

1.The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to
represent a system in terms of input data to the system, various processing carried out on this
data, and the output data is generated by this system.
2.The data flow diagram (DFD) is one of the most important modeling tools. It is used to model
the system components. These components are the system process, the data used by the process,
an external entity that interacts with the system and the information flows in the system.
3.DFD shows how the information moves through the system and how it is modified by a series
of transformations. It is a graphical technique that depicts information flow and the
transformations that are applied as data moves from input to output.
4.DFD is also known as bubble chart. A DFD may be used to represent a system at any level
of abstraction. DFD may be partitioned into levels that represent increasing information flow
and functional detail.

7.2.1 DATA FLOW DIAGRAM

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
13
SYSTEM DESIGN

7.3 UML DIAGRAMS

UML stands for Unified Modeling Language. UML is a standardized general-purpose


modeling language in the field of object-oriented software engineering. The standard is
managed, and was created by, the Object Management Group.
The goal is for UML to become a common language for creating models of object
oriented computer software. In its current form UML is comprised of two major components:
A Meta-model and a notation. In the future, some form of method or process may also
be added to; or associated with, UML.
The Unified Modeling Language is a standard language for specifying, Visualization,
Constructing and documenting the artifacts of software system, as well as for business
modeling and other non-software systems.
The UML represents a collection of best engineering practices that have proven
successful in the modeling of large and complex systems.
The UML is a very important part of developing objects oriented software and the
software development process. The UML uses mostly graphical notations to express the design
of software projects.

GOALS

The Primary goals in the design of the UML are as follows:


1.Provide users a ready-to-use, expressive visual modeling Language so that they can develop
and exchange meaningful models.
2.Provide extendibility and specialization mechanisms to extend the core concepts.
3.Be independent of particular programming languages and development process.
4. Provide a formal basis for understanding the modeling language.
5. Encourage the growth of OO tools market.
6.Support higher level development concepts such as collaborations, frameworks, patterns and
components.
7. Integrate best practices.

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
14
SYSTEM DESIGN

7.3.1 USE CASE DIAGRAM

A use case diagram in the Unified Modeling Language (UML) is a type of behavioral
diagram defined by and created from a Use-case analysis. Its purpose is to present a graphical
overview of the functionality provided by a system in terms of actors, their goals (represented
as use cases), and any dependencies between those use cases. The main purpose of a use case
diagram is to show what system functions are performed for which actor. Roles of the actors
in the system can be depicted.

7.3.1.1 USE CASE DIAGRAM

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
15
SYSTEM DESIGN

7.3.2 CLASS DIAGRAM

In software engineering, a class diagram in the Unified Modeling Language (UML) is a type
of static structure diagram that describes the structure of a system by showing the system's
classes, their attributes, operations (or methods), and the relationships among the classes. It
explains which class contains information.

7.3.2.1 CLASS DIAGRAM

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
16
SYSTEM DESIGN

7.3.3 SEQUENCE DIAGRAM

A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram


that shows how processes operate with one another and in what order. It is a construct of a
Message Sequence Chart. Sequence diagrams are sometimes called event diagrams, event
scenarios, and timing diagrams.

7.3.3.1 SEQUENCE DIAGRAM

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
17
SYSTEM DESIGN

7.3.4 ACTIVITY DIAGRAM

Activity diagrams are graphical representations of workflows of stepwise activities and actions
with support for choice, iteration and concurrency. In the Unified Modeling Language, activity
diagrams can be used to describe the business and operational step-by-step workflows of
components in a system. An activity diagram shows the overall flow of control.

7.3.4.1 ACTIVITY DIAGRAM

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
18
CHAPTER - 8
SOURCE CODE

User side views:


# Create your views here.
from django.shortcuts import render, HttpResponse
from django.contrib import messages
from .forms import UserRegistrationForm
from .models import UserRegistrationModel
from django.conf import settings

# Create your views here.


def UserRegisterActions(request):
if request.method == 'POST':
form = UserRegistrationForm(request.POST)
if form.is_valid():
print('Data is Valid')
form.save()
messages.success(request, 'You have been successfully registered')
form = UserRegistrationForm()
return render(request, 'UserRegistrations.html', {'form': form})
else:
messages.success(request, 'Email or Mobile Already Existed')
print("Invalid form")
else:
form = UserRegistrationForm()
return render(request, 'UserRegistrations.html', {'form': form})

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
19
SOURCE CODE

def UserLoginCheck(request):
if request.method == "POST":
loginid = request.POST.get('loginid')
pswd = request.POST.get('pswd')
print("Login ID = ", loginid, ' Password = ', pswd)
try:
check = UserRegistrationModel.objects.get(loginid=loginid, password=pswd)
status = check.status
print('Status is = ', status)
if status == "activated":
request.session['id'] = check.id
request.session['loggeduser'] = check.name
request.session['loginid'] = loginid
request.session['email'] = check.email
print("User id At", check.id, status)
return render(request, 'users/UserHomePage.html', {})
else:
messages.success(request, 'Your Account Not at activated')
return render(request, 'UserLogin.html')
except Exception as e:
print('Exception is ', str(e))
pass
messages.success(request, 'Invalid Login id and password')
return render(request, 'UserLogin.html', {})

def UserHome(request):
return render(request, 'users/UserHomePage.html', {})

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
20
SOURCE CODE

def DatasetView(request):
path = settings.MEDIA_ROOT + "//" + 'DataSet.csv'
import pandas as pd
df = pd.read_csv(path, nrows=100,index_col=False)
df.reset_index()
df = df.to_html
return render(request, 'users/viewdataset.html', {'data': df})

def preProcessData(request):
from .utility.PreprocessedData import preProcessed_data_view
data = preProcessed_data_view()
return render(request, 'users/preproccessed_data.html', {'data': data})

def Model_Results(request):
from .utility import PreprocessedData
nb_report = PreprocessedData.build_naive_bayes()
knn_report = PreprocessedData.build_knn()
dt_report = PreprocessedData.build_decsionTree()
rf_report = PreprocessedData.build_randomForest()
svm_report = PreprocessedData.build_svm()
mlp_report = PreprocessedData.build_mlp()
return render(request, 'users/ml_reports.html', {'nb': nb_report,"knn":knn_report, 'dt':
dt_report, 'rf': rf_report, 'svm': svm_report,'mlp':mlp_report})

def user_input_prediction(request):
if request.method=='POST':
from .utility import PreprocessedData
joninfo = request.POST.get('joninfo')
result = PreprocessedData.predict_userInput(joninfo)

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
21
SOURCE CODE

print(request)
return render(request, 'users/testform.html', {'result': result})
else:
return render(request,'users/testform.html',{})

base.html:
{%load static%}
<!DOCTYPE html>
<html>
<head>

<!-- /.website title -->


<title>Clouds html5 Multipurpose Landing Page for Apps</title>
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1,
user-scalable=no">

<!-- CSS Files -->


<link href="{%static 'css/bootstrap.min.css'%}" rel="stylesheet" media="screen">
<link href="{%static 'css/font-awesome.min.css'%}" rel="stylesheet">
<link href="{%static 'fonts/icon-7-stroke/css/pe-icon-7-stroke.css'%}" rel="stylesheet">
<link href="{%static 'css/animate.css'%}" rel="stylesheet" media="screen">
<link href="{%static 'css/owl.theme.css'%}" rel="stylesheet">
<link href="{%static 'css/owl.carousel.css'%}" rel="stylesheet">

<link href="{%static 'css/styles.css'%}" rel="stylesheet" media="screen">

<!-- Google Fonts -->


<link href='http://fonts.googleapis.com/css?family=Open+Sans:400,300,600,700'
rel='stylesheet' type='text/css'>

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
22
SOURCE CODE

<link href='http://fonts.googleapis.com/css?family=Alegreya+Sans:100,300,400,700'
rel='stylesheet' type='text/css'>

<!-- Font Awesome -->


<link href="http://maxcdn.bootstrapcdn.com/font-awesome/4.2.0/css/font-
awesome.min.css" rel="stylesheet">
</head>

<body data-spy="scroll" data-target="#navbar-scroll">

<div id="top"></div>

<!-- NAVIGATION -->


<div id="menu">
<nav class="navbar-wrapper navbar-default" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-
target=".navbar-themers">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand site-name" href="#top" style="COlor:WHITE"><h2>Fake
Job Posting</h2></a>
</div>

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
23
SOURCE CODE

<div id="navbar-scroll" class="collapse navbar-collapse navbar-themers navbar-right">


<ul class="nav navbar-nav">
<li><a href="{%url 'index'%}">Home</a></li>
<li><a href="{%url 'UserLogin'%}">ML Users</a></li>
<li><a href="{%url 'AdminLogin'%}">Admin</a></li>
<li><a href="{%url 'UserRegister'%}">Registrations</a></li>
</ul>
</div>
</div>
</nav>
</div>

{%block contents%}
{%endblock%}
<!-- /.footer -->
<footer id="footer">
<div class="container">
<div class="col-sm-4 col-sm-offset-4">
<!-- /.social links -->

<div class="text-center wow fadeInUp" style="font-size: 14px;">Copyright Alex


Corporations Template by <a
href="#">Alex Hales</a></div>
<a href="#" class="scrollToTop"><i class="fa fa-arrow-circle-o-up"></i></a>
</div>
</div>
</footer>

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
24
SOURCE CODE

<!-- /.javascript files -->


<script src="{%static 'js/jquery.js'%}"></script>
<script src="{%static 'js/bootstrap.min.js'%}"></script>
<script src="{%static 'js/custom.js'%}"></script>
<script src="{%static 'js/jquery.sticky.js'%}"></script>
<script src="{%static 'js/wow.min.js'%}"></script>
<script src="{%static 'js/owl.carousel.min.js'%}"></script>
<script src="{%static 'js/ekko-lightbox-min.js'%}"></script>
<script type="text/javascript">
$(document).delegate('*[data-toggle="lightbox"]', 'click', function (event) {
event.preventDefault();
$(this).ekkoLightbox();
});
</script>
<script>
new WOW().init();
</script>
</body>
</html>
Index.html:
{%extends 'base.html'%}
{%load static%}
{%block contents%}
<!-- /.parallax full screen background image -->
<div class="fullscreen landing parallax banner" style="background-image:url('{%static
'images/bg.jpg'%}');"
data-img-width="2000" data-img-height="1325" data-diff="100">

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
25
SOURCE CODE

<div class="overlay">
<div class="container">
<div class="row">

<div class="col-md-6">

<!-- /.main title -->


<h1 class="wow fadeInLeft">
Common types of Job Scam
</h1>

<!-- /.header paragraph -->


<div class="landing-text wow fadeInLeft">
<p>
Fraudsters who want to gain other people’s personal information like
insurance details, bank
details, income tax details, date of birth, national id create fake job
advertisements.
Advance fee scams occur when frauds ask for money showing reasons like
admin charges,
information security checking cost, management cost etc. Sometimes
fraudsters act them-
selves as employers and ask people about passport details, bank statements,
driving license
etc. as pre-employment check. Illegal money mulling scams occur when
they convince students
to pay money into their accounts and then transfer it back.

</p>

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
26
SOURCE CODE

</div>

</div>

<!-- /.phone image -->


<div class="col-md-6">
<img src="{%static 'images/header-phone.png'%}" alt="phone"
class="header-phone img-responsive wow fadeInRight">
</div>
</div>
</div>
</div>
</div>

<!-- /.feature section -->


<div id="feature">
<div class="container">
<div class="row">
<div class="col-md-10 col-md-offset-1 col-sm-12 text-center feature-title">

<!-- /.feature title -->


<h2>Related Works</h2>
<p>
Many researches occurred to predict if a job post is real or
fake. A good number of research works are to check online
fraud job advertiser. Vidros [1] et al. identified job scammers
as fake online job advertiser. They found statistics about many
real and renowned companies and enterprises who produced

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
27
SOURCE CODE

fake job advertisements or vacancy posts with ill-motive. They


experimented on EMSCAD dataset using several classification
algorithms like naive bayes classifier, random forest classifier,Zero R, One R etc.
Random Forest
Classifier showed the
best performance on the dataset with 89.5% classification
accuracy
</p>
</div>
</div>

</div>
</div>

{%endblock%}
Admin side views:
from django.shortcuts import render, HttpResponse
from django.contrib import messages
from users.models import UserRegistrationModel

# Create your views here.


def AdminLoginCheck(request):
if request.method == 'POST':
usrid = request.POST.get('loginid')

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
28
CHAPTER - 9
SYSTEM STUDY

9.1 FEASIBILITY STUDY

The feasibility of the project is analyzed in this phase and business proposal is
put forth with a very general plan for the project and some cost estimates. During system
analysis the feasibility study of the proposed system is to be carried out. This is to ensure that
the proposed system is not a burden to the company. For feasibility analysis, some
understanding of the major requirements for the system is essential.
Three key considerations involved in the feasibility analysis are,

 ECONOMICAL FEASIBILITY
 TECHNICAL FEASIBILITY
 SOCIAL FEASIBILITY

9.1.1 ECONOMICAL FEASIBILITY

This study is carried out to check the economic impact that the system will have on
the organization. The amount of fund that the company can pour into the research and
development of the system is limited. The expenditures must be justified. Thus the developed
system as well within the budget and this was achieved because most of the technologies used
are freely available. Only the customized products had to be purchased.

9.1.2 TECHNICAL FEASIBILITY

This study is carried out to check the technical feasibility, that is, the technical
requirements of the system. Any system developed must not have a high demand on the
available technical resources. This will lead to high demands on the available technical
resources. This will lead to high demands being placed on the client. The developed system
must have a modest requirement, as only minimal or null changes are required for
implementing this system.

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
29
SYSTEM STUDY

9.1.3 SOCIAL FEASIBILITY

The aspect of study is to check the level of acceptance of the system by the user. This
includes the process of training the user to use the system efficiently. The user must not feel
threatened by the system, instead must accept it as a necessity. The level of acceptance by the
users solely depends on the methods that are employed to educate the user about the system
and to make him familiar with it. His level of confidence must be raised so that he is also able
to make some constructive criticism, which is welcomed, as he is the final user of the system.

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
30
CHAPTER - 10
SYSTEM TEST

The purpose of testing is to discover errors. Testing is the process of trying to discover every
conceivable fault or weakness in a work product. It provides a way to check the functionality
of components, sub assemblies, assemblies and/or a finished product It is the process of
exercising software with the intent of ensuring that the Software system meets its requirements
and user expectations and does not fail in an unacceptable manner. There are various types of
test. Each test type addresses a specific testing requirement.

10.1 TYPES OF TESTS

Unit testing
Unit testing involves the design of test cases that validate that the internal
program logic is functioning properly, and that program inputs produce valid outputs. All
decision branches and internal code flow should be validated. It is the testing of individual
software units of the application .it is done after the completion of an individual unit before
integration. This is a structural testing, that relies on knowledge of its construction and is
invasive. Unit tests perform basic tests at component level and test a specific business process,
application, and/or system configuration. Unit tests ensure that each unique path of a business
process performs accurately to the documented specifications and contains clearly defined
inputs and expected results.

Integration testing
Integration tests are designed to test integrated software components to
determine if they actually run as one program. Testing is event driven and is more concerned
with the basic outcome of screens or fields. Integration tests demonstrate that although the
components were individually satisfaction, as shown by successfully unit testing, the
combination of components is correct and consistent. Integration testing is specifically aimed
at exposing the problems that arise from the combination of components.

Functional testing
Functional testing provide systematic demonstrations that functions tested are
available as specified by the business and technical requirements, system documentation, and
user manuals.
Functional testing is centered on the following items:

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
31
SYSTEM TEST

Valid Input : identified classes of valid input must be accepted.

Invalid Input : identified classes of invalid input must be rejected.

Functions : identified functions must be exercised.

Output : identified classes of application outputs must be exercised.

Systems/Procedures : interfacing systems or procedures must be invoked.

Organization and preparation of functional tests is focused on requirements, key


functions, or special test cases. In addition, systematic coverage pertaining to identify Business
process flows; data fields, predefined processes, and successive processes must be considered
for testing. Before functional testing is complete, additional tests are identified and the effective
value of current tests is determined.

System Testing
System testing ensures that the entire integrated software system meets
requirements. It tests a configuration to ensure known and predictable results. An example of
system testing is the configuration oriented system integration test. System testing is based on
process descriptions and flows, emphasizing pre-driven process links and integration points.

White Box Testing


White Box Testing is a testing in which in which the software tester has
knowledge of the inner workings, structure and language of the software, or at least its purpose.
It is purpose. It is used to test areas that cannot be reached from a black box level.

Black Box Testing


Black Box Testing is testing the software without any knowledge of the inner
workings, structure or language of the module being tested. Black box tests, as most other kinds
of tests, must be written from a definitive source document, such as specification or
requirements document, such as specification or requirements document. It is a testing in which
the software under test is treated, as a black box .you cannot “see” into it. The test provides
inputs and responds to outputs without considering how the software works.
Unit Testing

Unit testing is usually conducted as part of a combined code and unit test phase
of the software lifecycle, although it is not uncommon for coding and unit testing to be
conducted as two distinct phases.

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
32
SYSTEM TEST

Test strategy and approach

Field testing will be performed manually and functional tests will be written in detail.

Test objectives

• All field entries must work properly.


• Pages must be activated from the identified link.
• The entry screen, messages and responses must not be delayed.

Features to be tested

• Verify that the entries are of the correct format


• No duplicate entries should be allowed
• All links should take the user to the correct page.
Integration Testing
Software integration testing is the incremental integration testing of two or more
integrated software components on a single platform to produce failures caused by interface
defects.

The task of the integration test is to check that components or software applications, e.g.
components in a software system or – one step up – software applications at the company level
– interact without error.

Test Results: All the test cases mentioned above passed successfully. No defects encountered.

Acceptance Testing
User Acceptance Testing is a critical phase of any project and requires significant participation
by the end user. It also ensures that the system meets the functional requirements.

Test Results: All the test cases mentioned above passed successfully. No defects encountered.

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
33
SYSTEM TEST

TESTCASES
10.2 Sample Test Cases

Remarks(IF
S.no Test Case Excepted Result Result
Fails)
If already user
1 User Register If User registration successfully. Pass email exist
then it fails.
If Username and password is Un Register
2 User Login correct then it will getting valid Pass Users will not
page. logged in.
The request
will be
accepted by the
Random forest The request will be accepted by
3 Pass random forest
and svm the random forest and svm
and svm
otherwise its
failed
The request
will be
accepted by the
Decision Tree The request will be accepted by
Decision Tree
4 and multilayer the Decision Tree and multilayer Pass
and multilayer
perceptron perceptron
perceptron
otherwise its
failed
The request
will be
accepted by the
Naive Bayes and The request will be accepted by
Naive Bayes
5 k-nearest the Naive Bayes and k-nearest Pass
and k-nearest
neighbour neighbour
neighbour
otherwise its
failed
View dataset by Data set will be displayed by the Results not
6 Pass
user user true failed

User Results not


7 Display reviews with true results Pass
classification true failed

Calculate macro avg and


accuracy macro macro avg and weighted avg weighted avg
8 Pass
avg and calculated not displayed
weighted avg failed
Admin can login with his login Invalid login
9 Admin login credential. If success he get his Pass details will not
home page allowed here
10 Admin can Admin can activate the register Pass If user id not
activate the user id found then it
register users won’t login.

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
34
CHAPTER - 11
OUTPUT SCREENS

OUTPUT SCREEN 11.1 Home page

OUTPUT SCREEN 11.2 USER REGISTRATION FORM

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
35
OUTPUT SCREENS

OUTPUT SCREEN 11.3 User Login Form

OUTPUT SCREEN 11.4 Admin Login Form

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
36
OUTPUT SCREENS

OUTPUT SCREEN 11.5 User Details

OUTPUT SCREEN 11.6 Image Caption Details

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
37
OUTPUT SCREENS

OUTPUT SCREEN 11.7 Data set View

OUTPUT SCREEN 11.8 Prediction Results

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
38
CHAPTER - 12
CONCLUSION

In this paper, a hybrid encoder-decoder based model to generate the effective caption of the
image by using the Flickr8k dataset. During the encoding phase, the proposed model used
transfer learning-based model like VGG16 and ResNet5o and YOLO for extracting the image
features. A concatenate function is used to combine the feature and removes the duplicate one.
For the decoding, BiGRu and LSTM are used to get the complete caption of the image. Further
BLEU value is evaluated of both the captions generated by BiGRU and LSTM. Final caption
is considered whose METEOR value is high. The proposed model is also evaluated by
METEOR and ROUGE. The proposed model achieved score BLUE-1: 0.67, METEOR: 0.54
and ROUGE: 0.31 on Flickr8k dataset. The experimental results show the better results through
BLUE, METEOR and ROUGE when compared to another state-of-art models. The model is
also helpful in generating the captions at real time.

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
39
CHAPTER - 13
Future Enhancement

potential future enhancements for the field of image caption generation using efficient deep
learning based hybrid models.:
1.Improved Attention Mechanisms: Enhance attention mechanisms within the model to better
focus on relevant regions of the image when generating captions. Exploring variants of
attention, such as self-attention or multi-head attention, could lead to more accurate and
contextually relevant captions.

2.Semantic Understanding: Incorporate techniques from the field of visual semantic


understanding, allowing the model to better grasp the relationships between objects, actions,
and scenes in the image. This could lead to captions that are not only descriptive but also
capture the underlying semantics. image caption generation using efficient deep learning
based hybrid models.:

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
40
CHAPTER - 14
REFERENCES

[1] J. Gu, G. Wang, J. Cai, and T. Chen, “An Empirical Study of Language CNN for Image
Captioning,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2017-October, pp. 1231–1240, 2017,
doi: 10.1109/ICCV.2017.138.
[2] J. Aneja, A. Deshpande, and A. G. Schwing, “Convolutional Image Captioning,” Proc.
IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 5561–5570, 2018, doi:
10.1109/CVPR.2018.00583.
[3] K. Xu et al., “Show, Attend and Tell: Neural Image Caption Generation with Visual
Attention.” Available: http://proceedings.mlr.press/v37/xuc15.
[4] K. Xu, H. Wang, and P. Tang, “Image Captioning With Deep Lstm Based On Sequential
Residual Department of Computer Science and Technology , Tongji University , Shanghai , P
. R . China Key Laboratory of Embedded System and Service Computing , Ministry of
Education ,” no. July, pp. 361–366, 2017.
[5] S. Liu, L. Bai, Y. Hu, and H. Wang, “Image Captioning Based on Deep Neural Networks,”
MATEC Web Conf., vol. 232, pp. 1–7, 2018, doi: 10.1051/matecconf/201823201052.
[6] R. Subash, R. Jebakumar, Y. Kamdar, and N. Bhatt, “Automatic image captioning using
convolution neural networks and LSTM,” J. Phys. Conf. Ser., vol. 1362, no. 1, 2019, doi:
10.1088/1742- 6596/1362/1/012096.
[7] C. Wang, H. Yang, and C. Meinel, “Image Captioning with Deep Bidirectional LSTMs and
Multi-Task Learning,” ACM Trans. Multimed. Comput. Commun. Appl., vol. 14, no. 2s, 2018,
doi: 10.1145/3115432.
[8] M. Han, W. Chen, and A. D. Moges, “Fast image captioning using LSTM,” Cluster
Comput., vol. 22, pp. 6143–6155, May 2019, doi: 10.1007/s10586-018-1885-9.
[9] H. Dong, J. Zhang, D. Mcilwraith, and Y. Guo, “I2T2I: Learning Text To Image Synthesis
With Textual Data Augmentation.”
[10] Y. Xian and Y. Tian, “Self-Guiding Multimodal LSTM - When We Do Not Have a Perfect
Training Dataset for Image Captioning,” IEEE Trans. Image Process., vol. 28, no. 11, pp. 5241–
5252, 2019, doi: 10.1109/TIP.2019.2917229.

AN EFFICIENT DEEP LEARNING BASED HYBRID MODEL FOR IMAGE CAPTION GENERATION
41

You might also like