Implement Prediction Based Parallel Clustering Algorithm using Hybrid Optimization Technique
Implement Prediction Based Parallel Clustering Algorithm using Hybrid Optimization Technique
INTRODUCTION
This chapter outlines and focuses the research field reconnoitered in this doctoral study. In
particular, describing I-Commerce. I-Commerce technique along with recommendation systems
is focusing on optimization techniques with parallel clustering algorithms. The main purposes
along with the organization of this document are articulated.
INTRODUCTION 1
1
The rapid development of Social Networking revolutionary changes our daily lives and global
business, which has been addressed in recent research. Recommendations play a vital role through
which vast amounts of information shared among different internet users every day.
Recommendations can be in the form of reviews, surveys, ratings, spoken words, reference letters
reports from news media, travel guides etc. Amongst large collection of available items
recommender system can predict personalized item for users. The main goal of recommender
system is to select appropriate item based on user interest and requirements. Items can be classified
as videos, books, office products, home appliances mobile accessories etc. The emerging growth
of e-commerce web portals, e.g. Amazon.com, Flipkart.com, Netflix.com, TripAdvisor.com etc.
have created enormous interests in RS 1 research to provide effective recommendations to end-
users. Recommendations are divided into the two approaches namely content-based, where on the
basis of user history and personal data products are recommended to the user. Currently goggle is
using the similar techniques for recommendations. Collaborative filtering, where we take benefit
of the fact that people who had same tastes in the past may also agree on their tastes in the future.
Despite the success of CF2 approach, it suffers from many serious problems related to data sparsity
that affect recommendations’ accuracy.
To enhance the quality of such RS many researchers have implemented different techniques but
different problems are identified. The first one is the long tail problem which concerns items with
only a few ratings. The second is the cold-start problem which has two variants such as new-user
when an unknown user joins the system looking forward recommendations, new-item (called also
first rater problem) in which the RS is unable to deliver to users new items without any rating. The
new-item problem is a special case of the long tail problem. In recent years, research on the
enlargement of clustering techniques has attained notable attention in numerous data mining
applications such as text mining, pattern recognition, web analysis, information retrieval,
visualization and image segmentation. In the user-based recommendation model, the clustering
technique is used to determine a similarity measure to cluster the users based on the similar item-
ratings given by various users. The items systematized based on similarity ratings in a few clusters
of users is recommended to new user who has similar tastes to the user group. Many clustering
1
Recommendation System
2
Collaborative Filtering
2
techniques are used to produce personalized recommendations in the user-based model such as
fuzzy c-means, K-means, SOM etc. With the development of real-time clustering-based
recommender systems, the information dispensation problems make the recommendation process
more complex. The usage of traditional clustering algorithms such as nearest neighbor K-means
clustering algorithm has some drawbacks in obtaining best or optimal solutions for large-scale
application problems. The primary objective is to propose a prediction based parallel Clustering
Algorithm using Optimization Technique for identifying users need and predict the result as per
user’s requirement as well as enhance the relevancy of results.
3
Electronic Commerce
4
Mobile Commerce
5
Internet of things
6
Internet Commerce
7
Internet of things Commerce
3
“Providing a software package to business owners which will help for a fast and affordable
way of selling the products and services online with no financial risk by Electronics or Mobile
means”.
It covers auctioning, collaborating with trading partners, making payments, buying and selling of
products, transferring funds, etc. I-Commerce system can help us to conduct research in various
stages with respect to current requirements. As per the architecture is shown in figure 1.1 we can
process large amounts of data using I-Commerce architecture. The slogan said by one of most
famous socialist “By the people, for the people, to the people”. By keeping same concept in mind
and to provide similar facilities to the consumer I-Commerce model is developed. I-commerce
relies on a recommendation system that helps in the growth of the market and increases the profit
of sellers. Architecture of I-Commerce it is divided into 4 parts Consumer, Integration, Analytics
and Data. It is driven by the consumer where Buying and selling of products take place through
the web, mobile, IoT and store channels. For performing this operation various services are used
like Global shopping cart, I-Commerce Front-End, Services API. Then it is integrated into
payment gateways using real-time inventories and product catalogs. Once the process is completed
data is stored into the database, then by using historical data and User reviews, analytical data is
generated. This whole process describes the steps involves in basic architecture of I-Commerce.
The best example to validate I-Commerce architecture stages is trivago.com. Where developer is
parsing all the websites through the analysis phases and gives a most relevant hotel for the
consumers by identifying their tastes. Many other shopping websites are also using a similar
concept to recommend and predict the most relevant products.
4
Figure 1.1 Architecture of I-Commerce
1) E-Commerce
2) M-Commerce
3) IoT Commerce
E-Commerce is a buying and selling of products takes place through electronic means. Modern E-
Commerce uses WWW8 for transactions. E-Commerce transactions include buying of Electronics
gadgets such as mobile phones, Purchasing of Books (Amazon), Purchasing of Music (ITunes
Store). It May employ online shopping of the products through internet or direct to customers via
websites, gathering the data through websites and social media. There are three main areas are
involved in E-Commerce are Electric Markets, Online retailing and Auctions. M-Commerce is a
handing over the E-Commerce capabilities into the customer pocket. Many retailers have started
using M-Commerce like Amazon, Flipkart. It provides a support for enormous services like Mobile
money transfer where money can be transferred online like Google pay app, Mobile ticketing
where we can book the ticket online through mobile like UTS app, Mobile vouchers, coupons and
8
World Wide Web
5
loyalty cards like Phonepay is provide a vouchers or coupons on online shopping or transferring
the money, content purchase or delivery where we can buy ringtones, wallpapers, games through
mobiles, information services like news, stock quotes, sports scores etc. Mobile marketing and
advertisings where online consumers are received promotional messages or vouchers or coupons.
IoT Commerce is Internet of Things commerce where automated buying and selling of the
products takes place via IoT. IoT is communication between any two sensors with the help of
internet. Currently, world is moving with the IoT, where in future buying and selling of the
products will takes place automatically without knowing the user as per the need of users.
As shown in figure 1.3, these types are divided into 4 parts i.e. B2B9, B2C10, C2B11 and C2C12.
9
Business to Business
10
Business to Consumer
11
Consumer to Business
12
Consumer to Consumer
6
Figure 1.3 Types of I-Commerce
B2B is a web-based economical business model used for selling and buying of products between
two business users. Alibaba or Indiamart is famous examples of B2B category.
B2C is a web-based business model used between business users and consumers, Target.com or
Trivago is a famous example of B2C type.
C2B is a consumer to business model which is a mirror image of B2C. Here, Consumers can sell
a product and buying is to be carried out by business users. Amazon.com is a good example of
C2B type. The last one is a C2C model where buying and selling of products take place between
consumers. OLX.com is a famous example of C2C model. Online buying and selling of products
have a lot of advantages and it creates ease of access for buying or selling of products with
trustworthy users. All this types can be correlated for all types I-Commerce and can be
implemented in real time environment.
7
product. Organizations can also increase sales by applying various technologies digital marketing
or recommendation techniques.
8
1.3.5 Rewards and Offers
There is good service and offers are being provided while buying a product online. Many
commerce sites provide a discount, rewards, offers which helps the customer to buy products at a
cheaper price.
9
As per the Internet world statistics, approximately 4380 million online users are availing the
facility of Internet. Means 60% of world population is using an internet. Obviously, as the number
of users increases over the globe on internet it is going to dominate traditional business.
As shown in figure 1.5 in 2018 a US I-Commerce company has total sales of $252.69 billion.
Amazon has 49% of sales in the year 2018. The above scenarios focused on growth of online
commerce.
10
Figure 1.6 World wise Retail Sales worldwide [8]
As shown in figure 1.6 it shows the continuous growth of retail sales over the years. As per the
predictions and many researchers till 2021 I-Commerce will reach to the peak of mountains.
Maximum money will be invested, spent and earn from I-Commerce.
11
Collaborative filtering is working based on the assumptions that in past users are agreed for buying
the product will agree for the same in future. Previous historical data between peer users and rating
of the products from different users, Product is recommended to users. Figure 1.7 demonstrates
the simple example of collaborative filtering. The process involved in collaborative filtering is
asking a user to rate an item, then system asks user to search the product, at the time of searching
on the basis of few users past behavior it tries to find out the correlation and similarity between
different items and recommend the same product to user.
12
allows users to enter the views, comments, this data can be used to recommend the products next
time to another user. Popular trends in opinion based RS are sentimental analysis, text mining and
information extracting. The main issue with this type is system is not capable to learn the user
preferences taken from users actions. Here system is recommending the same products which users
have already purchased. To solve this problem many recommender systems are using hybrid RS.
Hybrid RS is combining two or more types of RS together to share the properties. Generally,
Content bases RS and collaborative filtering system is combined together with a few other
approaches. It gives a good result with high accuracy and relevancy. These methods are also used
to solve the common problems of RS like cold start problem and sparsity problem. Netflix is the
best example of hybrid RS. The most important parameters of hybrid RS are combining the score
of different RS numerically, Selecting the recommendations components, features combinations
and augmentation. Recommend the products as per the priority of users is select an appropriate
meta-level to produce finalized output.
13
nest ants are using a pheromone substance to keep a track of food source. Traveling salesman
problem can be solved using ACO 13.
Procedure
ACO_MetaHeuristic
Begin
while (not_termination)
Generate Solutions ()
Daemon Actions ()
Pheromone Update ()
end while
end
In Edge selection, each ant moves from state ‘x’ to state ‘y’. Each ant ‘k’ computes a sets Ak(x) of
feasible solution for each set for each iterations. For ant, k probability is Pxyk from state ‘x’ to state
‘y’. It generally depends on attractiveness ηxy and the trail τ of the move. Trails are updated when
all ants have updated their solutions. The probability of k th ant from state ‘x’ to state ‘y’ is
β
𝑘 (τα
𝑥𝑧 )(η𝑥𝑧 )
𝑝𝑥𝑦 = β ………………….. (1.1)
Σz ϵ allowed y (τα
𝑥𝑧 )(η𝑥𝑧 )
Where τxy is the amount of pheromone deposited from state x to y α and β are the parameter to
control influence to represent the attractiveness and trail level τ xz , ηxz for all the possible
transitions.
To update the pheromone the trails are updated once all the ants have completed the updations are
done as per equation number 1.2.
ρ is the coefficient of pheromone evaporation and Δτkxy is the amount of pheromone deposited by
kth ant. The same concept can be implemented for solving traveling salesman problem.
Steps involved in solving traveling salesman problems are
1. The first rule is to visit each city exactly once.
3. If the pheromone trail laid out between any two cities is more, the probability of selection of
edge will be more.
13
Ant Colony Optimization
14
4. Having completed its journey, the ant deposits more pheromones on all edges it traversed, if
the journey is short; If the Journey is completed, More pheromone is deposited on all paths
which are traversed.
Ant colony optimization can be used in I-Commerce, but the accuracy and relevancy of prediction
of products are less. ACO focuses on quantity of products, not on quality of products. So to solve
the further issues researchers have introduces a new concept called Adaptive ACO. In ACO
searching process is based on positive feedback reinforcement techniques using pheromone
information. But, escape from local optima is more difficult than other meta-heuristics. For solving
the above problem adaptive ACO uses the transition of the shortest path. In this technique, cranky
ants are introduced into the system. Cranky ants select the shortest distance having a lot of
pheromones. The path has lots of pheromones indicates as many-selected path. Using the
recognition techniques achieves control of tradeoff between intensification and diversification.
1) Start
2) for each particle j = 1, ..., S
3) Initialize the particles position with uniformly distributed random vector:
4) xj ~U(blo, bup)
5) Initialize the particle's best known position to its initial position: p j ← xj
6) if f(pj) < f(g) then
7) update the swarm's best known position: g ← pj
8) Initialize the particle's velocity: vj ~ U(-|bup-blo|, |bup-blo|)
14
Particle Swarm Optimization
15
9) while a termination criterion is not met do:
10) for each particle j = 1, ..., S do
11) for each dimension d = 1, ..., n do Pick random numbers: r p, r0067 ~ U(0,1)
12) Update the particle's velocity: vj,d ← ω vj,d + φp rp (pj,d-xj,d) + φg rg (gd-xj,d)
13) Update the particle's position: xj ← xj + vj
14) if f(xj) < f(pj)
15) then Update the particle's best known position: pi ← xi
16) if f(pi) < f(g) then
17) Update the swarm's best known position: g ← pi
18) Stop
PSO uses two main principal viz. communication and learning. To find the search space and
updated solutions it performs a communication with another agent. While performing
communication function, it focuses on learning to find the stochastic solutions.
Meta-heuristics techniques are very popular over last 3 decades. Physical activities of animals,
birds are encourages researchers to employ the same concepts in real-life examples. These
techniques are either individual-based or population-based. GWO15 is a technique developed by
Mirjalili (2014) by employing the concept of leadership hierarchy of wolves. The main leader is
called an Alpha (α). Alpha is responsible for decision making. Dominant leader is called as beta
(β). Beta is subordinate wolf help the alphas for decision making. The lower rank wolves are
called as an Omega (ω) and delta (δ). These wolves are responsible for the completion of task
assign by alpha, beta and report to them. The hunting techniques and wolf behavior are
mathematically structured to solve real time engineering problems.
Gs are denoted as a search agent and Gd is denoted as a variable size design. Then
15
Grey Wolf Optimization
16
Vector A, a, C and Itermax
⃗⃗⃗
A =2. ⃗⃗a Rand 1− a the value of 𝑎
⃗⃗⃗ linearly decreased from 2 to 0
𝐶 = 2. Rand 2
𝑔11 𝑔21 .
Wolves = [𝑔12 . . ]
𝑔𝑠
. . 𝑔𝑔𝑑
⃗⃗⃗⃗⃗⃗ + 𝐺2
𝐺1 ⃗⃗⃗⃗⃗⃗ + 𝐺3
⃗⃗⃗⃗⃗⃗
𝑔
⃗⃗⃗ (t+1)= ……………………… (1.3)
3
GWO can be used for many real-time applications like E-Commerce, Fuzzy systems. Most of the
swarm intelligence techniques that are used to solve the optimization problems are not having a
leadership quality. This drawback is overcome by GWO algorithm. Optimization algorithms can
help to find the most optimized dat. But if the size of data is large then we have to use technique
which can give the most optimized result in minimum time with high efficiency. To achieve this
objective the best option is either we can use Hadoop or parallel clustering algorithms.
17
maintain large data sets. Big data involves generating, storing, analyze, retrieving and transferring
large amount of data. The term “Big Data” not only refers to how much data you have but also
what you do with that data. It involves various techniques, frameworks as well as tools. Big data
helps to analyze data, identify patterns and behavior which helps to take decisions in better manner
which results in development of business or organization. Big data is a large amount of data in
which data analyze, visualize, summarize and extract the information using application software.
It is characterized by Volume, Variety, Veracity, Velocity and Value as shown in figure 1.8. In
most of the phases big data works on the concept of Map Reduce. Hadoop is a famous framework
which can be used to process large amount of data. Hadoop is an open source framework that can
be integrated with Java (Eclipse), Python (Anaconda), and R (R Studio).
Variety: Data is produced in various formats. It may be structured, Unstructured & semi-structured
data. With traditional systems, it is not possible to manage a large amount of data. Structured data
is a data-oriented in one particular format like tabular formats having rows and columns,
unstructured data is a combination of text, audio, images, videos. And semi-structured data is a
data which is separated by a particular separator like a comma, tab.
Velocity: In today's era data is transferred from batch processing data into streaming data. Speed
of data generation or the processing is similar to “velocity of light” it increases on a continuous
basis. Compared to small data, big data is produced continuously.
Veracity: Veracity is nothing but an uncertainty of data. It refers to data quality and data value.
The best example of veracity is data available on social sites.
18
Value: Generated data from the diverse sources is raw data. Value of the data can be discovered
by analyzing data, recognizing patterns and by predicting behavior.
Fuzzy C-means algorithms allow to part of data belong to two or more clusters. This algorithm is
used for pattern recognition and based on minimization of
𝑚
𝐽𝑚 = ∑N 𝐶
i=1 ∑𝑗=1 𝑢𝑖𝑗 ||𝑋𝑖−𝐶𝑗|| 2 ……………..(1.4)
m is greater than 1, uij is a degree of membership for Xi in the cluster J, Cj is a d-dimension center
of the cluster.
In first step we initialize membership matrix U, then at K-step will calculate the center vectors C(k)
. In each iteration updations of U(k), U(k+1) takes place. If the difference between U(k), U(k+1) is less
than coefficient of membership function then stop. While implementing all the above techniques
for I-Commerce there are lot of challenges are involved.
The biggest challenge in I-Commerce is security; there is a lack of confidence between consumers
while purchasing products online, consumer need to enter the bank information or personal
information. Where performing this process consumers are having a misconception that while
16
Parallel Clustering Algorithm
19
entering a data, it must be unsecured. Many Misconceptions are there about warrantee and
guarantee of products. It means consumers are always doubtful while purchasing products about
their warranty, whether customer service centers respond to problems or not. Online literacy is
also important while handling products online. Because of computer illiteracy, many peoples are
not able to handle I-Commerce services properly.
Also, there is a myth that online products are not delivered in proper conditions or customers and
if the wrong product is delivered, they will not return the correct product. Consumers are not aware
of the importance of I-commerce. As compared to Physical stores cost is reduced in I-Commerce
as many supply chain phases are removed. Also, consumers are interested to feel the product
experience or personal selection, but that is not possible in I-commerce.
The biggest challenge is in developing countries are not aware of the laws and regulations of I-
Commerce. Because of the unawareness of laws peoples are hesitated to buy the products online.
To solve the above challenges many researchers have done several types of research. Several
approaches have been introduced in I-Commerce to solve the above issues.
20
Due to increase in internet shopping, there is a need for technological innovation, so customer can
select the best product from the best vendor, by using a recommendation system or search engines.
Recommendation techniques are still great challenges in I-Commerce for selection of relevant
products for customer. By applying the different mechanisms in I-commerce, product search
recommendation systems to display the products to the customer by taking the input and traverse
in database. The enormous choices are available for different products with large number of items
with various brands. This creates large database information. It creates a cognitive overload on
customers. I-Commerce space is becoming a very competitive and vendors are finding the various
ways to attract the customer for selling the products online. In order to solve this problem, many
vendors are using several techniques like Amazon uses A9, eBay uses Best match algorithm,
Flipkart using a solar Lucene. However, search engines are not able to solve this problem
completely.
However, preserving privacy has been assumed to be a challenge in I-commerce as the occurrence
of illegal intruders in has subsisted vastly. User information is at a threat of being hacked
effortlessly through the I-commerce service resulting in apprehension and an unsatisfied customer.
Likewise, numerous varieties and ranges of products are available in the I-Commerce applications.
Selecting the right and better product on the basis of the user requirement is perplexing and is a
tiresome task and overcoming these shortfalls necessitates methods to make I-commerce more
user-friendly. Prediction of I-commerce site and products can assist the customers with
opportunities for choosing the providers besides the products. Thus in accordance with these
insights, is aimed at developing a Prediction based Parallel clustering algorithm for recommending
products using I-commerce.
21
reviews, ratings, price, priority and extract the products with multiple vendors. In this thesis, we
proposed a model with a primary objective is to propose a prediction based parallel clustering
algorithm using optimization technique for identifying users need and predict the result as per
user’s requirement as well as enhance the relevancy of results.
22
J. C. Bezdek (1981): "Pattern Recognition with Fuzzy Objective Function Algorithms", Plenum
Press, New York.
J. C. Dunn (1973): "A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact
Well-Separated Clusters", Journal of Cybernetics 3: 32-57.
Kandasamy K. and Kumar C. S. (2015), “Modified PSO Based Optimal Time Interval
Identification for Predicting Mobile User Behaviour (MPSO-OTI2-PMB) in Location-Based
Services”, Indian Journal of Science and Technology, Vol. 8 (s7), pp. 185–193.
Kennedy, J.; Eberhart, R. (1995). "Particle Swarm Optimization". Proceedings of IEEE
International Conference on Neural Networks. IV. pp. 1942–1948.
Kim W. (2009), “Parallel Clustering Algorithms: Survey”, retrieved on 9th March 2017 and from
https://s3-us-west-2.amazonaws.com/mlsurveys/46.pdf
Lee S. and Ahn H. (2011), “The hybrid model of neural networks and genetic algorithms for the
design of controls for internet-based systems for business-to-consumer electronic commerce”,
Expert Systems with Applications, Vol. 38, pp. 4326–4338.
Maalini D. and Subashini S. (2014), “An Enhanced Prediction of Subsequent Mobile User
Behavior in Location Based Service Environments”, International Journal of Emerging Research
in Management &Technology, Vol. 3, Issue. 2, pp. 55-60.
Nayak M. and Prakash K. B. (2013), “A Framework for Personal Mobile commerce Pattern
Mining and Prediction”, International Journal of Engineering Trends and Technology (IJETT),
Vol. 4, Issue. 9, pp. 3921-3925.
Niknam T., Nayeripour M. and Firouzi B. B. (2009), “Application of a New Hybrid optimization
Algorithm on Cluster Analysis”, International Journal of Electrical and Computer Engineering,
Vol. 4, No. 4, pp. 238-243. Argade D. and Chavan H. (2015), “Improve Accuracy of Prediction of
User‟s Future MCommerce Behaviour”, Procedia Computer Science, Vol. 49, pp. 111 – 117.
Retail e-commerce sales CAGR forecast in selected countries from 2016 to 2021". Statista.
October 2016. Retrieved 1 January 2018.
Taherkhani, M., Safabakhsh, R. (2016). "A novel stability-based adaptive inertia weight for
particle swarm optimization". Applied Soft Computing. 38: 281–295.
Waldner, Jean-Baptiste (2008). Nanocomputers and Swarm Intelligence. London: ISTEJohn
Wiley & Sons. p. 225. ISBN 978-1-84704-002-2.
23
WEBSITES
[1] https://www.researchgate.net/publication/2955372_The_emergence_of_Mcommerce ,
accessed on 18th September 2016.
[2] http://www.mobify.com/wpcontent/uploads/eMarketer_Optimizing_for_Mobile_
Commerce_05022016.pdf accessed on 26th October 2016
[3] www.sciencedirect.com/science/article/pii/0167819189900367 accessed on 29th
December 2016
[4] www.icommerceteam.com accessed on 26th Jan 2017
[5] http://grigory.us/blog/mapreduce-clustering/ accessed on 2nd Feb 2017
[6] http://www.investopedia.com/terms/e/ecommerce.asp accessed on 6th Feb 2017
[7] Mcom.cs.cmu.edu accessed on 10th Feb 2017
[8] https://www.shopify.com/enterprise/global-ecommerce-statistics accessed on 9th Feb 2019
[9] https://ourworldindata.org/internet#growth-of-the-internet accessed on 9th April 2019
[10] https://www.internetworldstats.com/stats.htm accessed on 9th April 2019
[11] https://www.statista.com/statistics/273018/ number- of- internet- users- worldwide/
accessed on 8th May 2019
[12] www.emarketer.com accessed on 9th May 2019
24