0% found this document useful (0 votes)

176 views66 pages

A Project Report: in Partial Fulfillment of The Requirement For The Award of The Degree of

The document discusses bag-of-discriminative-words (BODW) representation via topic modelling. It proposes a model called discriminatively objective-subjective LDA (dosLDA) that uses objective and subjective selection variables to encode the interplay between topics and words' discriminative power in documents. dosLDA represents each document as a BODW to discern each word's objective or subjective sense with respect to its topic. The experiments show dosLDA performs competitively in topic modeling and document classification while also identifying words' discriminative power.

Uploaded by

Venkataramanareddy Sura

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

176 views66 pages

A Project Report: in Partial Fulfillment of The Requirement For The Award of The Degree of

Uploaded by

Venkataramanareddy Sura

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

BAG-OF-DISCRIMINATIVE-WORDS (BODW) REPRESENTATION VIA TOPIC MODELLING

A Project Report
In Partial Fulfillment of the requirement for the award of the Degree of
MASTER OF COMPUTER APPLICATIONS

Submitted By
PANDA ACHUTA
Regd.No: Y19MC20028

Under the Guidance of

Dr. G. Neelima Shri. Dr.K.Gangadhara Rao,(M.tech.,Phd)

Assistant Professor Sr. Associate Professor
INTERNAL GUIDE HEAD OF THE DEPARTMENT

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

ACHARYA NAGARJUNA UNIVERSITY
NAGARJUNA NAGAR, GUNTUR-522510
2018 – 2021
ACHARYA NAGARJUNA UNIVERSITY
Department of Computer Science & Engineering

CERTIFICATE
This is to certify that this project entitled “BAG-OF-DISCRIMINATIVE-WORDS (BODW)
REPRESENTATION VIA TOPIC MODELLING” is a bona-fide record of the project work done and
submitted by PANDA ACHUTA Reg.No:Y19MC20028 during the year 2018-2021 in partial
fulfillment of the requirements for the award of degree of Master of Computer Applications
in the department of Computer Science & Engineering.

I certify that he carries this project as an independent project under my guidance.

Head of the Department Project Guide

External Examiner
PROJECT COMPLETION CERTIFICATE

This is to Certify that the following student with the Name MR. PANDA ACHUTA
bearing the REGD.NO: Y19MC20028 from ACHARYA NAGARJUNA UNIVERSITY,
GUNTUR has successfully completed the project titled “BAG-OF-DISCRIMINATIVE
WORDS (BODW) REPRESENTATION VIA TOPIC MODELLING ”
in our organization.

He has done the project using PYTHON Technologies during the Period
May 2021 - July 2021 under the guidance and supervision of Mr.V.Ashok
(Project Guide) For “ SEEBACK SOFTWARE SYSTEMS ”

He has completed the assigned Project well with in the time frame.
He is sincere, hardworking and his conduct during period is commendable.
We wish him all the best in the endeavor.

FOR RSPS PVT LTD

V.Ashok

(Project Guide)

# 17-92, Road No. 8, Film Nagar, Jubilee Hills, Hyderabad - 500 033. Ph : +91 - 40 - 64621004
DECLARATION
I hear Declare that the result embodied in this dissertation Entitled
“BAG-OF-DISCRIMINATIVE-WORDS (BODW) REPRESENTATION VIA TOPIC
MODELLING” is carried out by me during the period from MAY 2021 to JULY
2021 in Partial Fulfillment of the Degree of Master of Computer Applications
From Acharya Nagarjuna University, and I have not Submitted the Same to any
other University/Institute for the award of any other degree.

PANDA ACHUTA
Regd.NO:Y19MC20028
ACKNOWLEDGEMENT

Task successful makes everyone happy. But the happiness will be gold
without glitter if didn’t state the person /who has supported us to make it a
success will be crowned to people who made it a reality but the people whose
constant guidance and encouragement made it possible will be crowned first on
the eve of success. This acknowledgement transcends the reality of formality
when would like to express deep gratitude and respect to all those people behind
the screens who guided, inspire and helped for the completion of our project. I
consider myself lucky enough to get such a good project. This project would add
as an asset to my academic profile.

I am extremely grateful to Dr. K. GANGADHARA RAO, M.Tech., Ph.D.,

Head of the DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING,
for his able guidance and useful suggestions. I sincerely thank him for his
motivation, encouragement and moral support throughout the course our project
work.

I express my profoundness of gratitude to my guide Dr. G. NEELIMA,

Assistant professor ,for her support and guidance for the successful completion
of this project.

I express my thanks to all the Teaching and Non-Teaching staff members

in the DEPARTMENT OF COMPUTERSCIENCE & ENGINEERING,
ACHRAYA NAGARJUNA UNIVERSITY COLLEGE OF SCIENCES,
GUNTUR for their valuable support throughout this project.

Finally I would like to thanks my Parents and Friends for their co-
operation to complete this project.

PANDA ACHUTA
(Regd.No:Y19MC20040)
BAG -OF-DISCRIMINATIVE-WORDS (BODW)
REPRESENTATION VIA TOPIC MODELLING
ABSTRACT:
Many of words in a given document either deliver facts (objective) or
express opinions (subjective) respectively depending on the topics they are
involved in. For example, given a bunch of documents, the word “bug” assigned to
the topic “order Hemiptera” apparently remarks one object (i.e., one kind of
insects), while the same word assigned to the topic “software” probably conveys a
negative opinion. Motivated by the intuitive assumption that different words have
varying degrees of discriminative power in delivering the objective sense or the
subjective sense with respect to their assigned topics, a model named as
discriminatively objective-subjective LDA (dosLDA) is proposed in this paper.
The essential idea underlying the proposed dosLDA is that pair of objective and
subjective selection variables is explicitly employed to encode the interplay
between topics and discriminative power for the words in documents in a
supervised manner. As a result, each document is appropriately represented as
“bag-of-discriminative-words” (BoDW). The experiments reported on documents
and images demonstrate that dosLDA not only performs competitively over
traditional approaches in terms of topic modeling and document classification, but
also has the ability to discern the discriminative power of each word in terms of its
objective or subjective sense with respect to its assigned topic.
INTRODUCTION
There is a vast amount of multimedia data such as overwhelming
news and various images that can be easily obtained from the Internet,
which in turn has given rise to a great challenge of automatically
clustering, analyzing, and summarizing the data. So far, plenty of
machine learning algorithms have been employed to address the
challenge. Among them, topic models, which are able to discover the
latent structures (i.e., the topics) and provide low-dimensional
representation in terms of the learned topics, have attracted great
attention in recent decades. Topic models adopt the “Bag-of-Words”
(BoW) representation, where each sample in the given data is
represented as an orderless collection of different elements. Therefore,
topic models are not only capable of analyzing texts, but can also work
with data of any modality that can be represented as “documents” and
“words” (such as images represented by a bag of visual words).
The two most successful and representative works in topic
modeling are probabilistic latent semantic analysis (pLSA) and latent
Dirichlet allocation (LDA). As the first topic model, pLSA evolves from
latent semantic analysis (LSA) and is able to capture the hidden
semantics conveyed by different words via a probabilistic generative
process of the documents. In pLSA, documents are projected into a low-
dimensional topic space by assigning each word with a latent topic,
where each topic is usually represented as a multinomial distribution
over a fixed vocabulary. The LDA model inherits the notion of pLSA,
but it employs an extra generative process on the topic proportion of
each document and models the whole corpus via a hierarchical Bayesian
framework. In fact, pLSA turns out to be a special case of LDA with a
uniform Dirichlet prior in a maximum a posteriori model, while LDA
has a better ability of modeling large-scale documents for its well
defined a priori. In the past decade, the LDA model has been intensively
studied and widely applied for many different tasks.
The BoW representation disregards the linguistic structures
between the words. In such an unsupervised manner, the learned
representations of documents provided by LDA are often found to be not
strongly predictive. From a pure viewpoint of prediction, unsupervised
LDA unfortunately ignores the nature of the discriminative task of
interest such as classification, thus provides no guarantee that the
extracted information will be effectual. To alleviate such limitation,
many approaches attempt to exploit the useful auxiliary information
(e.g., the category labels or the ratings provided by the authors) when
modeling of its corresponding documents in a supervised manner. In
such variants of LDA, the auxiliary information is usually considered to
be a response variable predicted based on the latent representation of the
document (i.e., the proportion of topics), where the assignments of
topics to each word take effect instead of the words themselves. In other
words, the “Bag-of-Topics” (BoT) representation has taken place of the
traditional BoW representation to better characterize massive documents
in predictive tasks such as regression and classification. The most
representative models that proposed in the notion of BoT are the
supervised LDA (sLDA), the scene-understanding model, multi-class
sLDA, and τLDA.
In the BoT representation, any two different words drawn from the
vocabulary are treated equal if they are assigned with the same topic; in
reality, however, it is intuitive that many of the words in a given
document either deliver facts (objective) or express opinions (subjective)
depending on the topics they are involved. For example, given a bunch
of documents, the word “bug” assigned to the topic “order Hemiptera”
apparently remarks one object(one kind of insects), while the same word
assigned to the topic “software” probably conveys a negative opinion. In
this paper, we argue that the deliberate identification of the objectively
or subjectively discriminative power of the words with respect to their
involved topics helps construct more predictive representation for each
document.
As a result, this paper proposes an approach named as
discriminatively objective-subjective LDA (dosLDA). The essential idea
underlying it is that a pair of objective and subjective selection variables
is explicitly employed to encode the interplay between topics and
discriminative power with respect to the words in a supervised manner.
The dosLDA possesses the attractive power in naturally selecting out
those words that are discriminative in delivering either an objective or a
subjective sense in one given document, and generates the novel “bag-
of-discriminative-words” (BoDW) representations for each document. It
is demonstrated via several experiments that our proposed BoDW is
more predictive for discriminative tasks than the traditional BoW and
BoT representations employed in the current methods.
PYTHON

Python is a general-purpose interpreted, interactive, object-oriented, and high-

level programming language. An interpreted language, Python has a design
philosophy that emphasizes code readability (notably
using whitespace indentation to delimit code blocks rather than curly brackets or
keywords), and a syntax that allows programmers to express concepts in
fewer lines of code than might be used in languages such as C++or Java. It
provides constructs that enable clear programming on both small and large
scales. Python interpreters are available for many operating systems. CPython,
the reference implementation of Python, is open source software and has a
community-based development model, as do nearly all of its variant
implementations. CPython is managed by the non-profit Python Software
Foundation. Python features a dynamic type system and automatic memory
management. It supports multiple programming paradigms, including object-
oriented, imperative, functional and procedural, and has a large and
comprehensive standard library

DJANGO

Django is a high-level Python Web framework that encourages rapid

development and clean, pragmatic design. Built by experienced developers, it
takes care of much of the hassle of Web development, so you can focus on
writing your app without needing to reinvent the wheel. It’s free and open
source.

Django's primary goal is to ease the creation of complex, database-driven

websites. Django emphasizes reusability and "plug ability" of components, rapid
development, and the principle of don't repeat yourself. Python is used
throughout, even for settings files and data models.

Django also provides an optional administrative create, read, update and

delete interface that is generated dynamically through introspection and
configured via admin models
LDA ALGORITHM
In natural language processing, Latent Dirichlet allocation (LDA) is
a generative statistical model that allows sets of observations to be explained
by unobserved groups that explain why some parts of the data are similar. For
example, if observations are words collected into documents, it posits that each
document is a mixture of a small number of topics and that each word's creation is
attributable to one of the document's topics.

In LDA, each document may be viewed as a mixture of various topics where

each document is considered to have a set of topics that are assigned to it via LDA.
This is identical to probabilistic latent semantic analysis (pLSA), except that in
LDA the topic distribution is assumed to have a sparse Dirichlet prior. The sparse
Dirichlet priors encode the intuition that documents cover only a small set of topics
and that topics use only a small set of words frequently. In practice, this results in a
better disambiguation of words and a more precise assignment of documents to
topics. LDA is a generalization of the pLSA model, which is equivalent to LDA
under a uniform Dirichlet prior distribution.

With plate notation, the dependencies among the many variables can be
captured concisely. The boxes are "plates" representing replicates. The outer plate
represents documents, while the inner plate represents the repeated choice of topics
and words within a document. M denotes the number of documents, N the number
of words in a document the words are the only observable variables, and the other
variables are latent variables. As proposed in the original paper, a sparse Dirichlet
prior can be put over the topic-word distribution. This codes the intuition that the
probability of topics is focused on a small set of words. The resulting model is the
most widely applied variant of LDA today. The plate notation for this model is
shown on the right, where denotes the number of topics and are -dimensional
vectors storing the parameters of the Dirichlet-distributed topic-word distributions
( is the number of words in the vocabulary).
CONCLUSION
In this paper, a supervised topic model named as dosLDA is
proposed to discover the words having discriminative power to deliver
either an objective or a subjective sense with regard to their assigned
topics. The dosLDA model is able to obtain the BoDW representations
for documents, and each document is endowed with two different
BoDW representations in terms of objective and subjective senses,
respectively. The results obtained on several experiments suggest that:
(1) the BoDW representation is more predictive than the traditional BoT
representation for discriminative tasks; (2) dosLDA boosts the
performance of topic modeling via the joint discovery of latent semantic
structure of the whole dataset and the different objective and subjective
discrimination among the words; (3) dosLDA has lower computational
complexity than sLDA, especially under an increasing number of topics;
(4) the detected discriminative words or visual words are useful in topic
demonstration as well as objective and sentimental region localization.
BAG-OF-DISCRIMINATIVE-WORDS (BODW)
REPRESENTATION VIA TOPIC MODELING

ABSTRACT:
Many of words in a given document either deliver facts (objective) or
express opinions (subjective) respectively depending on the topics they are
involved in. For example, given a bunch of documents, the word “bug” assigned to
the topic “order Hemiptera” apparently remarks one object (i.e., one kind of
insects), while the same word assigned to the topic “software” probably conveys a
negative opinion. Motivated by the intuitive assumption that different words have
varying degrees of discriminative power in delivering the objective sense or the
subjective sense with respect to their assigned topics, a model named as
discriminatively objective-subjective LDA (dosLDA) is proposed in this paper.
The essential idea underlying the proposed dosLDA is that pair of objective and
subjective selection variables is explicitly employed to encode the interplay
between topics and discriminative power for the words in documents in a
supervised manner. As a result, each document is appropriately represented as
“bag-of-discriminative-words” (BoDW). The experiments reported on documents
and images demonstrate that dosLDA not only performs competitively over
traditional approaches in terms of topic modeling and document classification, but
also has the ability to discern the discriminative power of each word in terms of its
objective or subjective sense with respect to its assigned topic.
ARCHITECTURE:

MODULES:
There are three modules can be divided here for this project they are listed as
below
• Document Analysis
• Image Analysis
• Graphical Representation
From the above three modules, project is implemented. Bag of discriminative
words are achieved

MODULE DESCRIPTION:
The modules are implemented as given in the following ways

• Document Analysis
Users are uploading the document. The uploaded document can be
analyzed and highlight the words. Every positive word in document
highlighted in Green color and negative words in red color. The graph
Analysis of the given document can be viewed as pie chart. The Graph has
been plot for document total words, neutral words, positive and negative
words.

• Graphical Representation
Both admin and user can get the analysis respectively. The graph can
be plot based on various factors that means number of word and positive and
negative words count. User can get Line chart and bar chart for individual
documents. Admin only gets the analysis for the image in Doughnut Chart

EXISTING SYSTEM:

The two most successful and representative works in topic modeling are
probabilistic latent semantic analysis (pLSA) and latent Dirichlet allocation
(LDA). As the first topic model, pLSA evolves from latent semantic analysis
(LSA) and is able to capture the hidden semantics conveyed by different words via
a probabilistic generative process of the documents. In pLSA, documents are
projected into a low-dimensional topic space by assigning each word with a latent
topic, where each topic is usually represented as a multinomial distribution over a
fixed vocabulary. The LDA model inherits the notion of pLSA, but it employs an
extra generative process on the topic proportion of each document and models the
whole corpus via a hierarchical Bayesian framework. In fact, pLSA turns out to be
a special case of LDA with a uniform Dirichlet prior in a maximum a posteriori
model, while LDA has a better ability of modeling large-scale documents for its
well defined a priori. In the past decade, the LDA model has been intensively
studied and widely applied for many different tasks.
The BoW representation disregards the linguistic structures between the
words. In such an unsupervised manner, the learned representations of documents
provided by LDA are often found to be not strongly predictive. From a pure
viewpoint of prediction, unsupervised LDA unfortunately ignores the nature of the
discriminative task of interest such as classification, thus provides no guarantee
that the extracted information will be effectual. To alleviate such limitation, many
approaches attempt to exploit the useful auxiliary information (e.g., the category
labels or the ratings provided by the authors) when modeling of its corresponding
documents in a supervised manner. In such variants of LDA, the auxiliary
information is usually considered to be a response variable predicted based on the
latent representation of the document (i.e., the proportion of topics), where the
assignments of topics to each word take effect instead of the words themselves. In
other words, the “Bag-of-Topics” (BoT) representation has taken place of the
traditional BoW representation to better characterize massive documents in
predictive tasks such as regression and classification. The most representative
models that proposed in the notion of BoT are the supervised LDA (sLDA)], the
scene-understanding model, multi-class sLDA, and τLDA.
DISADVANTAGES:

• From a pure viewpoint of prediction, unsupervised LDA unfortunately

ignores the nature of the discriminative task of interest such as classification,
thus provides no guarantee that the extracted information will be effectual.
• The assignments of topics to each word take effect instead of the words
themselves.
PROPOSED SYSTEM:

The proposed work is an approach named as discriminatively objective-

subjective LDA (dosLDA). The essential idea underlying it is that a pair of
objective and subjective selection variables is explicitly employed to encode the
interplay between topics and discriminative power with respect to the words in a
supervised manner. The dosLDA possesses the attractive power in naturally
selecting out those words that are discriminative in delivering either an objective or
a subjective sense in one given document, and generates the novel “bag-of-
discriminative words” (BoDW) representations for each document, which is
illustrated in Figure. It is demonstrated via several experiments that our proposed
BoDW is more predictive for discriminative tasks than the traditional BoW and
BoT representations employed in the current methods.

ADVANTAGES
• The bag of discriminated words is very effective when it is comes to
analysis the document or image itself
• For images, the system gets the comments from the user in order to
involve the user and get the user view about image and from there
they can find the sentiments of the image

CONCLUSION
In this paper, a supervised topic model named as dosLDA is
proposed to discover the words having discriminative power to deliver
either an objective or a subjective sense with regard to their assigned
topics. The dosLDA model is able to obtain the BoDW representations
for documents, and each document is endowed with two different
BoDW representations in terms of objective and subjective senses,
respectively. The results obtained on several experiments suggest that:
(1) the BoDW representation is more predictive than the traditional BoT
representation for discriminative tasks; (2) dosLDA boosts the
performance of topic modeling via the joint discovery of latent semantic
structure of the whole dataset and the different objective and subjective
discrimination among the words; (3) dosLDA has lower computational
complexity than sLDA, especially under an increasing number of topics;
(4) the detected discriminative words or visual words are useful in topic
demonstration as well as objective and sentimental region localization.
REQUIREMENT ANALYSIS

The project involved analyzing the design of few applications so as to make

the application more users friendly. To do so, it was really important to keep the
navigations from one screen to the other well ordered and at the same time
reducing the amount of typing the user needs to do. In order to make the
application more accessible, the browser version had to be chosen so that it is
compatible with most of the Browsers.

REQUIREMENT SPECIFICATION

Functional Requirements

▪ Graphical User interface with the User.

Software Requirements

For developing the application the following are the Software

Requirements:

1. Python

2. Django

Operating Systems supported

1. Windows 7

2. Windows XP

3. Windows 8
Technologies and Languages used to Develop

1. Python

Debugger and Emulator

▪ Any Browser (Particularly Chrome)
Hardware Requirements

For developing the application the following are the Hardware Requirements:

▪ Processor: Pentium IV or higher

▪ RAM: 256 MB
▪ Space on Hard Disk: minimum 512MB
INPUT AND OUTPUT DESIGN

INPUT DESIGN
The input design is the link between the information system and the user. It
comprises the developing specification and procedures for data preparation and those steps are
necessary to put transaction data in to a usable form for processing can be achieved by inspecting
the computer to read data from a written or printed document or it can occur by having people
keying the data directly into the system. The design of input focuses on controlling the amount of
input required, controlling the errors, avoiding delay, avoiding extra steps and keeping the
process simple. The input is designed in such a way so that it provides security and ease of use
with retaining the privacy. Input Design considered the following things:

➢ What data should be given as input?

➢ How the data should be arranged or coded?
➢ The dialog to guide the operating personnel in providing input.
➢ Methods for preparing input validations and steps to follow when error occur.

OBJECTIVES

1. Input Design is the process of converting a user-oriented description of the input

into a computer-based system. This design is important to avoid errors in the data input process
and show the correct direction to the management for getting correct information from the
computerized system.

2. It is achieved by creating user-friendly screens for the data entry to handle large
volume of data. The goal of designing input is to make data entry easier and to be free from
errors. The data entry screen is designed in such a way that all the data manipulates can be
performed. It also provides record viewing facilities.

3. When the data is entered it will check for its validity. Data can be entered with the
help of screens. Appropriate messages are provided as when needed so that the user will not be
in maize of instant. Thus the objective of input design is to create an input layout that is easy to
follow

OUTPUT DESIGN

A quality output is one, which meets the requirements of the end user and presents
the information clearly. In any system results of processing are communicated to the users and to
other system through outputs. In output design it is determined how the information is to be
displaced for immediate need and also the hard copy output. It is the most important and direct
source information to the user. Efficient and intelligent output design improves the system’s
relationship to help user decision-making.

1. Designing computer output should proceed in an organized, well thought out

manner; the right output must be developed while ensuring that each output element is designed
so that people will find the system can use easily and effectively. When analysis design computer
output, they should Identify the specific output that is needed to meet the requirements.

2. Select methods for presenting information.

Create document, report, or other formats that contain information produced by the system.

The output form of an information system should accomplish one or more of the
following objectives.

• Convey information about past activities, current status or projections of the

• Future.
• Signal important events, opportunities, problems, or warnings.
• Trigger an action.
• Confirm an action.
LITERATURE SURVEY

Literature survey is the most important step in software development

process. Before developing the tool it is necessary to determine the time factor,
economy and company strength. Once these things are satisfied, ten next steps
are to determine which operating system and language can be used for
developing the tool. Once the programmers start building the tool the
programmers need lot of external support. This support can be obtained from
senior programmers, from book or from websites. Before building the system the
above consideration are taken into account for developing the proposed system.
MODULES:
There are three modules can be divided here for this project they are listed as
below
• Document Analysis
• Image Analysis
• Graphical Representation
From the above three modules, project is implemented. Bag of discriminative
words are achieved

MODULE DESCRIPTION:
The modules are implemented as given in the following ways

• Image Analysis
Admin is the one who can upload the picture for analysis. User can
view the picture and rate according to their Perspective. And give comments
to that image. From the comments and ratings admin can analysis the
Sentiment of image. The Sentiment of the image can give to admin based on
comments that are given by users.
• Graphical Representation
Both admin and user can get the analysis respectively. The graph can
be plot based on various factors that means number of word and positive and
negative words count. User can get Line chart and bar chart for individual
documents. Admin only gets the analysis for the image in Doughnut Chart
1. ORGANISATION PROFILE

Chennai Sunday Systems Pvt. Ltd has a rich background in Software

Development and continues its entire attention on achieving transcending
excellence in the Development and Maintenance of Software Projects and
Products in Many Areas. Some of them are ERP, Banking, Manufacturing,
Insurance and Laying Emphasis on the Multimedia Projects. These projects are
prevalent and have been distributed and implemented for clients world over.
Major software Development Park at Kodambakkam, Chennai, India.

I. Genesis

Elucidating the Aborigines of the Company, Chennai Sunday Systems, was

initially dot Com IT Jobs Consultancy.

After stabilizing the Products, Mr. P. Siva kumar M.C.A started giving
Counseling Services in the name of skdot Com Agency, privately handled
several Projects for leading Companies like Alstom, MRL and Spencers. DSEL
has confronted challenges and rooted itself has a niche player in the
Multimedia and the Business Software Segment. The Level of performed has
been exemplary leaving it to nothing than the best of benchmark.

The indispensable factors, which give DSEL the competitive advantages over
others in the market may be slated as:

• Performance
• Pioneering efforts

• Client satisfaction

• Innovative concepts

• Constant Evaluations

• Improvisation

• Cost Effectiveness

II. Comparison Mission Statement

“To help customer optimize their investments in information technology, to

help them gain a competitive edge in the market place.”

III. Quality policy

“ To help our stock holders by regularly reviewing and improving our process.”

IV. Infrastructure

Nested in an area with built-in area of 2,400 sq. ft. The park has encountered
itself with computing resources that include from IBM. Besides, it also houses
HP/9000, Sun Sparch, DEC Alpha System and over 500 IBM PS/VP nodes
over a Heterogeneous Fiber Optic Network. Operating system is used varied
from MVS and Aix through OS/400 and OS/2 to SOLARIS, UNIX and
Windows with range of RDBMS, Languages and Case tools.

Major Functions / Activities at Chennaisunday systems Pvt. Ltd

Only a few years ago, the World Wide Web was a very design unfriendly place.
But with the advent of Images, Web Pages have become Interactive. This inter-
activity is still limited. In its endeavor to make the Internet more Interactive
Exciting, at Chennaisunday systems Pvt Ltd has set up the Internet Team.
EXISTING SYSTEM:
The two most successful and representative works in topic modeling are
probabilistic latent semantic analysis (pLSA) and latent Dirichlet allocation
(LDA). As the first topic model, pLSA evolves from latent semantic analysis
(LSA) and is able to capture the hidden semantics conveyed by different words via
a probabilistic generative process of the documents. In pLSA, documents are
projected into a low-dimensional topic space by assigning each word with a latent
topic, where each topic is usually represented as a multinomial distribution over a
fixed vocabulary. The LDA model inherits the notion of pLSA, but it employs an
extra generative process on the topic proportion of each document and models the
whole corpus via a hierarchical Bayesian framework. In fact, pLSA turns out to be
a special case of LDA with a uniform Dirichlet prior in a maximum a posteriori
model, while LDA has a better ability of modeling large-scale documents for its
well defined a priori. In the past decade, the LDA model has been intensively
studied and widely applied for many different tasks.
The BoW representation disregards the linguistic structures between the
words. In such an unsupervised manner, the learned representations of documents
provided by LDA are often found to be not strongly predictive. From a pure
viewpoint of prediction, unsupervised LDA unfortunately ignores the nature of the
discriminative task of interest such as classification, thus provides no guarantee
that the extracted information will be effectual. To alleviate such limitation, many
approaches attempt to exploit the useful auxiliary information (e.g., the category
labels or the ratings provided by the authors) when modeling of its corresponding
documents in a supervised manner. In such variants of LDA, the auxiliary
information is usually considered to be a response variable predicted based on the
latent representation of the document (i.e., the proportion of topics), where the
assignments of topics to each word take effect instead of the words themselves. In
other words, the “Bag-of-Topics” (BoT) representation has taken place of the
traditional BoW representation to better characterize massive documents in
predictive tasks such as regression and classification. The most representative
models that proposed in the notion of BoT are the supervised LDA (sLDA)], the
scene-understanding model, multi-class sLDA, and τLDA.
DISADVANTAGES:
• From a pure viewpoint of prediction, unsupervised LDA unfortunately
ignores the nature of the discriminative task of interest such as classification,
thus provides no guarantee that the extracted information will be effectual.
• The assignments of topics to each word take effect instead of the words
themselves.
PROPOSED SYSTEM:
The proposed work is an approach named as discriminatively objective-
subjective LDA (dosLDA). The essential idea underlying it is that a pair of
objective and subjective selection variables is explicitly employed to encode the
interplay between topics and discriminative power with respect to the words in a
supervised manner. The dosLDA possesses the attractive power in naturally
selecting out those words that are discriminative in delivering either an objective or
a subjective sense in one given document, and generates the novel “bag-of-
discriminative words” (BoDW) representations for each document, which is
illustrated in Figure. It is demonstrated via several experiments that our proposed
BoDW is more predictive for discriminative tasks than the traditional BoW and
BoT representations employed in the current methods.
ADVANTAGES
• The bag of discriminated words is very effective when it is comes to
analysis the document or image itself

• For images, the system gets the comments from the user in order to
involve the user and get the user view about image and from there
they can find the sentiments of the image
SYSTEM DESIGN

1. ARCHITECTURE DIAGRAM

2. COMPONENT DIAGRAM
a. User
b. Admin

3. ER DIAGRAM
a. User
b. Admin
4. USE CASE DIAGRAM
a. User

b. Admin
5. CLASS DIAGRAM

6. DATA FLOW DIAGRAM

a. User

b. Admin
7. ACTIVITY DIAGRAM
a. User

b. Admin
8. SEQUENCE DIAGRAM
a. User

b. Admin
SYSTEM SPECIFICATION:

HARDWARE REQUIREMENTS:

❖ System : Pentium IV 2.4 GHz.

❖ Hard Disk : 40 GB.

❖ Floppy Drive : 1.44 Mb.

❖ Monitor : 14’ Colour Monitor.

❖ Mouse : Optical Mouse.

❖ Ram : 512 Mb.

SOFTWARE REQUIREMENTS:

❖ Operating system : Windows 7 Ultimate.

❖ Coding Language : Python.

❖ Front-End : Python.

❖ Designing : Html, css, javascript.

❖ Data Base : MySQL (WAMP Server).

SYSTEM STUDY

FEASIBILITY STUDY

The feasibility of the project is analyzed in this phase and

business proposal is put forth with a very general plan for the project
and some cost estimates. During system analysis the feasibility study
of the proposed system is to be carried out. This is to ensure that the
proposed system is not a burden to the company. For feasibility
analysis, some understanding of the major requirements for the system
is essential.
Three key considerations involved in the feasibility analysis are,

 ECONOMICAL FEASIBILITY
 TECHNICAL FEASIBILITY
 SOCIAL FEASIBILITY

ECONOMICAL FEASIBILITY

This study is carried out to check the economic impact that the system will
have on the organization. The amount of fund that the company can pour into the
research and development of the system is limited. The expenditures must be
justified. Thus the developed system as well within the budget and this was
achieved because most of the technologies used are freely available. Only the
customized products had to be purchased.
TECHNICAL FEASIBILITY

This study is carried out to check the technical feasibility, that

is, the technical requirements of the system. Any system developed
must not have a high demand on the available technical resources. This
will lead to high demands on the available technical resources. This will
lead to high demands being placed on the client. The developed system
must have a modest requirement, as only minimal or null changes are
required for implementing this system.

SOCIAL FEASIBILITY

The aspect of study is to check the level of acceptance of the

system by the user. This includes the process of training the user to use
the system efficiently. The user must not feel threatened by the
system, instead must accept it as a necessity. The level of acceptance
by the users solely depends on the methods that are employed to
educate the user about the system and to make him familiar with it. His
level of confidence must be raised so that he is also able to make some
constructive criticism, which is welcomed, as he is the final user of the
system.
SYSTEM TEST

The purpose of testing is to discover errors. Testing is the process of trying to discover every
conceivable fault or weakness in a work product. It provides a way to check the functionality of
components, sub assemblies, assemblies and/or a finished product It is the process of exercising
software with the intent of ensuring that the Software system meets its requirements and user
expectations and does not fail in an unacceptable manner. There are various types of test. Each
test type addresses a specific testing requirement.

TYPES OF TESTS
Unit testing
Unit testing involves the design of test cases that validate that the internal
program logic is functioning properly, and that program inputs produce valid outputs. All
decision branches and internal code flow should be validated. It is the testing of individual
software units of the application .it is done after the completion of an individual unit before
integration. This is a structural testing, that relies on knowledge of its construction and is
invasive. Unit tests perform basic tests at component level and test a specific business process,
application, and/or system configuration. Unit tests ensure that each unique path of a business
process performs accurately to the documented specifications and contains clearly defined inputs
and expected results.

Integration testing
Integration tests are designed to test integrated software components to
determine if they actually run as one program. Testing is event driven and is more concerned
with the basic outcome of screens or fields. Integration tests demonstrate that although the
components were individually satisfaction, as shown by successfully unit testing, the
combination of components is correct and consistent. Integration testing is specifically aimed at
exposing the problems that arise from the combination of components.
Functional test
Functional tests provide systematic demonstrations that functions tested are
available as specified by the business and technical requirements, system documentation, and
user manuals.
Functional testing is centered on the following items:

Valid Input : identified classes of valid input must be accepted.

Invalid Input : identified classes of invalid input must be rejected.

Functions : identified functions must be exercised.

Output : identified classes of application outputs must be exercised.

Systems/Procedures : interfacing systems or procedures must be invoked.

Organization and preparation of functional tests is focused on requirements, key

functions, or special test cases. In addition, systematic coverage pertaining to identify Business
process flows; data fields, predefined processes, and successive processes must be considered for
testing. Before functional testing is complete, additional tests are identified and the effective
value of current tests is determined.

System Test
System testing ensures that the entire integrated software system meets
requirements. It tests a configuration to ensure known and predictable results. An example of
system testing is the configuration oriented system integration test. System testing is based on
process descriptions and flows, emphasizing pre-driven process links and integration points.

White Box Testing

White Box Testing is a testing in which in which the software tester has
knowledge of the inner workings, structure and language of the software, or at least its purpose.
It is purpose. It is used to test areas that cannot be reached from a black box level.
Black Box Testing
Black Box Testing is testing the software without any knowledge of the inner
workings, structure or language of the module being tested. Black box tests, as most other kinds
of tests, must be written from a definitive source document, such as specification or requirements
document, such as specification or requirements document. It is a testing in which the software
under test is treated, as a black box .you cannot “see” into it. The test provides inputs and
responds to outputs without considering how the software works.
Unit Testing

Unit testing is usually conducted as part of a combined code and unit test phase
of the software lifecycle, although it is not uncommon for coding and unit testing to be
conducted as two distinct phases.

Test strategy and approach

Field testing will be performed manually and functional tests will be written in
detail.
Test objectives
• All field entries must work properly.
• Pages must be activated from the identified link.
• The entry screen, messages and responses must not be delayed.

Features to be tested
• Verify that the entries are of the correct format
• No duplicate entries should be allowed
• All links should take the user to the correct page.
Integration Testing
Software integration testing is the incremental integration testing of two or more
integrated software components on a single platform to produce failures caused by interface
defects.

The task of the integration test is to check that components or software applications, e.g.
components in a software system or – one step up – software applications at the company level –
interact without error.
Test Results: All the test cases mentioned above passed successfully. No defects encountered.

Acceptance Testing
User Acceptance Testing is a critical phase of any project and requires significant participation
by the end user. It also ensures that the system meets the functional requirements.

Test Results: All the test cases mentioned above passed successfully. No defects encountered.

PROPOSAL FOR PHD
75% (4)
PROPOSAL FOR PHD
3 pages
DMU Charity Club Information Management System 2st Semister Final
No ratings yet
DMU Charity Club Information Management System 2st Semister Final
58 pages
Jawaharlal Nehru Technology University-A, Ananthapur: A Social Relevant Project Report Submitted To
No ratings yet
Jawaharlal Nehru Technology University-A, Ananthapur: A Social Relevant Project Report Submitted To
24 pages
csp.report.FINAL
No ratings yet
csp.report.FINAL
46 pages
H.P State Electronics Development Corporation Limited (H.P Government Undertaking)
No ratings yet
H.P State Electronics Development Corporation Limited (H.P Government Undertaking)
26 pages
Class 12 Ip Project File
No ratings yet
Class 12 Ip Project File
25 pages
Online Voting Management System Project Report
No ratings yet
Online Voting Management System Project Report
110 pages
Full Document
No ratings yet
Full Document
86 pages
Data Mining Report Final
No ratings yet
Data Mining Report Final
67 pages
major project report file
No ratings yet
major project report file
88 pages
CNIWP - 074 - DO-178C Handbook
No ratings yet
CNIWP - 074 - DO-178C Handbook
70 pages
team13
No ratings yet
team13
91 pages
BT-3435 ALI (2)
No ratings yet
BT-3435 ALI (2)
49 pages
BCA 8th Project report(surveilence bot)
No ratings yet
BCA 8th Project report(surveilence bot)
49 pages
19bit0029 VL2022230101720 Pe004
No ratings yet
19bit0029 VL2022230101720 Pe004
56 pages
fake object detection
No ratings yet
fake object detection
69 pages
doc
No ratings yet
doc
52 pages
Giridhar k Final
No ratings yet
Giridhar k Final
50 pages
Project report
No ratings yet
Project report
70 pages
project_document
No ratings yet
project_document
70 pages
SUJAL
No ratings yet
SUJAL
40 pages
Phaneendra_Questions
No ratings yet
Phaneendra_Questions
5 pages
GIRIDHAR K (1)
No ratings yet
GIRIDHAR K (1)
46 pages
Chatbot in Python
No ratings yet
Chatbot in Python
41 pages
Major Report
No ratings yet
Major Report
125 pages
Nettippattam
No ratings yet
Nettippattam
65 pages
DPV Project
No ratings yet
DPV Project
65 pages
Exploratory_Project_Report
No ratings yet
Exploratory_Project_Report
57 pages
Major Project
No ratings yet
Major Project
39 pages
Secrets of Successful SAP Integrations Mergers & Acquisitions
100% (1)
Secrets of Successful SAP Integrations Mergers & Acquisitions
4 pages
Rakesh Nite Sh Saks Ham 3
No ratings yet
Rakesh Nite Sh Saks Ham 3
42 pages
Ground water identifier Project
No ratings yet
Ground water identifier Project
47 pages
MINI PROJECT 1.1
No ratings yet
MINI PROJECT 1.1
55 pages
ML_Record (1)
No ratings yet
ML_Record (1)
28 pages
CPP FINAL REPORT
No ratings yet
CPP FINAL REPORT
23 pages
Project Report II
No ratings yet
Project Report II
26 pages
SWATHI PROJECT Online Mentoring
No ratings yet
SWATHI PROJECT Online Mentoring
69 pages
phase1 report_removed
No ratings yet
phase1 report_removed
36 pages
Major Project
No ratings yet
Major Project
20 pages
Short Text Classification in Twitter To Improve Information Filtering
No ratings yet
Short Text Classification in Twitter To Improve Information Filtering
82 pages
Report 1 Crim
No ratings yet
Report 1 Crim
73 pages
major synopsis
No ratings yet
major synopsis
18 pages
Fik Complaint Management System (Fik CMS) : (BTCL6041651)
No ratings yet
Fik Complaint Management System (Fik CMS) : (BTCL6041651)
31 pages
ISO 30302 2015 Spanish
No ratings yet
ISO 30302 2015 Spanish
50 pages
Essay On Working in Partnership - Health & Social Care
No ratings yet
Essay On Working in Partnership - Health & Social Care
11 pages
Gooooooooooooooopal
No ratings yet
Gooooooooooooooopal
71 pages
Ipd-3 Final Doc - Merged
No ratings yet
Ipd-3 Final Doc - Merged
35 pages
Mini Project Final Report
No ratings yet
Mini Project Final Report
61 pages
Project Report Employee Payroll System
No ratings yet
Project Report Employee Payroll System
25 pages
Library Management System Using Java: ASHUTOSH PATRA (2001229024) LALAJI PRASAD PANDA (2001229088) BINAYAK BAL (2001229025)
No ratings yet
Library Management System Using Java: ASHUTOSH PATRA (2001229024) LALAJI PRASAD PANDA (2001229088) BINAYAK BAL (2001229025)
28 pages
Management Front
No ratings yet
Management Front
3 pages
Project Report ON: Budget Tracking Android Application
No ratings yet
Project Report ON: Budget Tracking Android Application
11 pages
Data Mining Starting Page
No ratings yet
Data Mining Starting Page
6 pages
A Project Report On "Hotel Reservation and Billing System"
No ratings yet
A Project Report On "Hotel Reservation and Billing System"
12 pages
Blocking Profile From Hackers
No ratings yet
Blocking Profile From Hackers
69 pages
Divisibility, HCF & LCM Tricks and Formulas
No ratings yet
Divisibility, HCF & LCM Tricks and Formulas
1 page
Phaneendra Testing Fresher Resume
No ratings yet
Phaneendra Testing Fresher Resume
1 page
cer_onkar
No ratings yet
cer_onkar
5 pages
Jay Bahvani Education Society. Rajiv Gandhi Polytechnic Zirapwadi, Phaltan
No ratings yet
Jay Bahvani Education Society. Rajiv Gandhi Polytechnic Zirapwadi, Phaltan
11 pages
Online Zoo Information Site: Project Report
No ratings yet
Online Zoo Information Site: Project Report
9 pages
Sample
No ratings yet
Sample
95 pages
Rollout Methodology: Frank Bergmann
No ratings yet
Rollout Methodology: Frank Bergmann
41 pages
Aktu PPT 14
No ratings yet
Aktu PPT 14
18 pages
Inception Eaboration - Technical Design Construction - Coding Transition
No ratings yet
Inception Eaboration - Technical Design Construction - Coding Transition
3 pages
Ratio, Proportion and Percentages Formulas and Tricks
100% (1)
Ratio, Proportion and Percentages Formulas and Tricks
3 pages
Bachelor of Science in Ict With Education: Kwame Nkrumah University
No ratings yet
Bachelor of Science in Ict With Education: Kwame Nkrumah University
3 pages
What Do IT Architects Do All Day V6 Student Guide
No ratings yet
What Do IT Architects Do All Day V6 Student Guide
50 pages
Dbms Introduction
No ratings yet
Dbms Introduction
2 pages
Title Here .. SYSTEM: Submitted by
No ratings yet
Title Here .. SYSTEM: Submitted by
9 pages
Lecture 5 - Conceptual, Logical & Physical DB Design
100% (1)
Lecture 5 - Conceptual, Logical & Physical DB Design
68 pages
PUSBALATHA
No ratings yet
PUSBALATHA
57 pages
CED & Startup Project Report Format LNCT Jan June 2019
No ratings yet
CED & Startup Project Report Format LNCT Jan June 2019
23 pages
Major Projectfinal
No ratings yet
Major Projectfinal
59 pages
frac (1) (2) / /frac (X) (N) /: Basic Formulas On Ages
No ratings yet
frac (1) (2) / /frac (X) (N) /: Basic Formulas On Ages
4 pages
Time Series Analysis On Flight Passengers 1
No ratings yet
Time Series Analysis On Flight Passengers 1
42 pages
Ce 21 PDF
No ratings yet
Ce 21 PDF
75 pages
Non Criminal Affidavit
80% (10)
Non Criminal Affidavit
3 pages
KA1 - Business Analysis Planning & Monitoring
No ratings yet
KA1 - Business Analysis Planning & Monitoring
2 pages
Sri - Dr.Gangadhara Rao Kancharla: IV Semester Mini-Project IN
No ratings yet
Sri - Dr.Gangadhara Rao Kancharla: IV Semester Mini-Project IN
16 pages
Affidavit Notarized Undertaking
No ratings yet
Affidavit Notarized Undertaking
3 pages
Manual Testing Interview Questions
100% (1)
Manual Testing Interview Questions
25 pages
Supplier Audit Checklist Template: 1.process
100% (5)
Supplier Audit Checklist Template: 1.process
3 pages
"Customer Management System": Software Requirement Specification
No ratings yet
"Customer Management System": Software Requirement Specification
7 pages
College Management System Project Report
No ratings yet
College Management System Project Report
30 pages
CISA Exam Prep Domain 3-2019
100% (1)
CISA Exam Prep Domain 3-2019
129 pages
WT Lab
No ratings yet
WT Lab
68 pages
Inspection & Testing Requirements Scope:: Test and Inspection Per
No ratings yet
Inspection & Testing Requirements Scope:: Test and Inspection Per
2 pages
Telecom Quality Plan Sample PDF
100% (1)
Telecom Quality Plan Sample PDF
24 pages
Case Study
No ratings yet
Case Study
12 pages
SPPM Course File (22-23)
No ratings yet
SPPM Course File (22-23)
51 pages
BS EN 50128 Searchable PDF
100% (6)
BS EN 50128 Searchable PDF
134 pages
CSV Good Documentation and Test Practices For GXP
0% (1)
CSV Good Documentation and Test Practices For GXP
16 pages
Project Final PDF
No ratings yet
Project Final PDF
51 pages
Eurolab Handbook Iso Iec 17025 2017
No ratings yet
Eurolab Handbook Iso Iec 17025 2017
32 pages
ISA Implementation For SIS
100% (2)
ISA Implementation For SIS
264 pages
Fundamentals of Software Engineering: Designed to provide an insight into the software engineering concepts
From Everand
Fundamentals of Software Engineering: Designed to provide an insight into the software engineering concepts
Hitesh Mohapatra
No ratings yet
Code, Bytes, Algorithms, And Innovation: Software & Engineering
From Everand
Code, Bytes, Algorithms, And Innovation: Software & Engineering
Tobi Makinde
No ratings yet
Workshop Practice Manual
From Everand
Workshop Practice Manual
Jatinder Madan
No ratings yet
Exploring Higher Vocational Software Technology Education
From Everand
Exploring Higher Vocational Software Technology Education
Chen Ping
No ratings yet
Cookbook for Mobile Robotic Platform Control: With Internet of Things And Ti Launch Pad
From Everand
Cookbook for Mobile Robotic Platform Control: With Internet of Things And Ti Launch Pad
Dr. Anita Gehlot
No ratings yet

A Project Report: in Partial Fulfillment of The Requirement For The Award of The Degree of

Uploaded by

A Project Report: in Partial Fulfillment of The Requirement For The Award of The Degree of

Uploaded by

BAG-OF-DISCRIMINATIVE-WORDS (BODW) REPRESENTATION VIA TOPIC MODELLING

Under the Guidance of

Dr. G. Neelima Shri. Dr.K.Gangadhara Rao,(M.tech.,Phd)

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

I certify that he carries this project as an independent project under my guidance.

Head of the Department Project Guide

FOR RSPS PVT LTD

I am extremely grateful to Dr. K. GANGADHARA RAO, M.Tech., Ph.D.,

I express my profoundness of gratitude to my guide Dr. G. NEELIMA,

I express my thanks to all the Teaching and Non-Teaching staff members

Python is a general-purpose interpreted, interactive, object-oriented, and high-

Django is a high-level Python Web framework that encourages rapid

Django's primary goal is to ease the creation of complex, database-driven

Django also provides an optional administrative create, read, update and

In LDA, each document may be viewed as a mixture of various topics where

• From a pure viewpoint of prediction, unsupervised LDA unfortunately

The proposed work is an approach named as discriminatively objective-

The project involved analyzing the design of few applications so as to make

▪ Graphical User interface with the User.

For developing the application the following are the Software

Operating Systems supported

Debugger and Emulator

▪ Processor: Pentium IV or higher

➢ What data should be given as input?

1. Input Design is the process of converting a user-oriented description of the input

1. Designing computer output should proceed in an organized, well thought out

2. Select methods for presenting information.

• Convey information about past activities, current status or projections of the

Literature survey is the most important step in software development

Chennai Sunday Systems Pvt. Ltd has a rich background in Software

Elucidating the Aborigines of the Company, Chennai Sunday Systems, was

II. Comparison Mission Statement

“To help customer optimize their investments in information technology, to

III. Quality policy

Major Functions / Activities at Chennaisunday systems Pvt. Ltd

6. DATA FLOW DIAGRAM

❖ System : Pentium IV 2.4 GHz.

❖ Hard Disk : 40 GB.

❖ Floppy Drive : 1.44 Mb.

❖ Monitor : 14’ Colour Monitor.

❖ Mouse : Optical Mouse.

❖ Ram : 512 Mb.

❖ Operating system : Windows 7 Ultimate.

❖ Coding Language : Python.

❖ Designing : Html, css, javascript.

❖ Data Base : MySQL (WAMP Server).

The feasibility of the project is analyzed in this phase and

This study is carried out to check the technical feasibility, that

The aspect of study is to check the level of acceptance of the

Valid Input : identified classes of valid input must be accepted.

Invalid Input : identified classes of invalid input must be rejected.

Functions : identified functions must be exercised.

Output : identified classes of application outputs must be exercised.

Systems/Procedures : interfacing systems or procedures must be invoked.

Organization and preparation of functional tests is focused on requirements, key

White Box Testing

Test strategy and approach

You might also like