0% found this document useful (0 votes)
53 views

Part Ii: Applications of Gas: Ga and The Internet Genetic Search Based On Multiple Mutation Approaches

This document discusses using genetic algorithms for intelligent internet search. It describes a system designed at a university that uses genetic algorithms with phases including input, spidering, agent, generator, topic, space, and time to iteratively evolve search results. The system begins with an input set, spiders links to generate the first generation, evaluates fitness, and performs crossover, mutation and reproduction over multiple iterations to obtain satisfactory results. The document also discusses applications of genetic algorithms, innovations needed at different levels, and simulation results showing combined topic, spatial and temporal mutation improves search quality.

Uploaded by

Srikar Chintala
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Part Ii: Applications of Gas: Ga and The Internet Genetic Search Based On Multiple Mutation Approaches

This document discusses using genetic algorithms for intelligent internet search. It describes a system designed at a university that uses genetic algorithms with phases including input, spidering, agent, generator, topic, space, and time to iteratively evolve search results. The system begins with an input set, spiders links to generate the first generation, evaluates fitness, and performs crossover, mutation and reproduction over multiple iterations to obtain satisfactory results. The document also discusses applications of genetic algorithms, innovations needed at different levels, and simulation results showing combined topic, spatial and temporal mutation improves search quality.

Uploaded by

Srikar Chintala
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 31

Part II: Applications of GAs

GA and the Internet Genetic search based on multiple mutation approaches

GAs are useful and efficient when

The search sapace is large, complex or poorly understood Domain knowledge is scarce or expert knowledge is difficult to encode to narrow the search space No mathematical analysis is available Traditional search methods fail For problem solving and for modeling

Applications
GAs are applied to many scientific, engineering problems , In business and entertainment , including: 1. Optimization: It is used in wide variety of optimization tasks including numerical optimization such as traveling Salesman Problem, Job Scheduling Problem, video and sound quality optimization. 2. Automatic Programming: It is used to evolve or generate computer program for specific task automatically 3. In machine and robot Learning 4. In Models of social systems 5. Interactions between evolution and learning

Some Applications of Gas


Control systems design Software guided circuit design

Optimization

Internet search

search

GA

Path finding

Mobile robots

Data mining

Trend spotting

Stock prize prediction

Algorithms Phases
Process set of URLs given by user Select all links from input set Evaluate fitness function for all genomes Perform crossover, mutation, and reproduction

Satisfactory solution obtained?

The End

Introduction

GA can be used for intelligent internet search. GA is used in cases when search space is relatively large. GA is adoptive search. GA is heuristic search method.

System for GA Internet Search

Designed at faculty for electrical engineering, university of belgrade


Input set

C O N T R O L
P R O G R A M

Generator Agent Spider

Topic Current set Space

Top data

Time Output set

Net data

Spider

Spider is software packages, that picks up internet documents from user supplied input with depth specified by user. Spider takes one URL, fetches all links, and documents thy contain with predefined depth. The fetched documents are stored on local hard disk with same structure as on the original location. Spiders task is to produce the first generation. Spider is used during crossover and mutation.

Agent

Agent takes as an input a set of urls, and calls spider, for every one of them, with depth 1. Then, agent performs extraction of keywords from each document, and stores it in local hard disk.

Generator

Generator generates a set of urls from given keywords, using some conventional search engine. It takes as input the desired topic, calls yahoo search engine, and submits a query looking for all documents covering the specific topic. Generator stores URL and topic of given web page in database called topdata.

Topic

It uses topdata DB in order to insert random urls from database into current set. Topic performs mutation.

Space

Space takes as input the current set from the agent application and injects into it those urls from the database netdata that appeared with the greatest frequency in the output set of previous searches.

Time

Time takes set of urls from agent and inserts ones with greatest frequency into DB netdata. The netdata DB contains of three fields: URL, topic, and count number. The DB is updated in each algorithm iteration.

How Does The System Work?


command flow Input set C O N T R O L P R O G R A M Generator Agent Spider

data flow

Topic Current set Space

Top data

Time Output set

Net data

GA and the Internet: Conclusion

GA for internet search, on contrary to other gas, is much faster and more efficient that conventional solutions, such as standard internet search engines.

INTERNET

Genetic Search Based on Multiple Mutation Approaches


Concept and its improvements adapted to specific applications in e-business, and concrete software package
Main problems in finding information on the Internet: How to find quickly and retrieve efficiently the potentially useful information considering the fact of the fast growth of the quantity and variety of Internet sites Huge number of documents , many of which are completely unrelated to what the user originally attempted to find, searched with indexing engines Documents placed on the top of the result list are often less acceptable then the lower ones Indexing process may take days, weeks , or even longer, because the volume of new information being created daily

Links Based Approach


The question is: How to locate and retrieve the needed information before it gets indexed?

The efficient way to locate the new not-yet-indexed information: Using links-based approaches genetic search simulated annealing Best result: indexing - based approaches

+
links - based approaches

Genetic Search Algorithm


GENETIC ALGORITHM OF ZERO ORDER, with no mutation
Start: Model Web presentation that contains all the needed types of information (fitness function is evaluated). It is assumes that it includes URL pointers to other similar Web presentations, and these are downloaded. The Web presentations that survived the fitness function are assumed to include additional URL pointers, and their related Web presentations are downloaded next. After the end-of-search condition is met, the Web presentations are ranked according to their fitness value.

Genetic Search Algorithm


Type of mutation:

Topic-oriented database mutation Semantic mutations - based on the principles of spatial locality - based on the principles of temporal locality Logical reasoning and semantics consideration is involve in picking out URLs for mutation.

Innovations Required by Domain Area


APPLICATION LEVEL

LEVEL OF THE GENERAL PROJECT APPROACH AND PRODUCT ARCHITECTURE


ALGORITHMIC LEVEL

IMPLEMENTATION LEVEL

Application Level

Statistical analysis and data mining has to be performed, in order to figure out the common and typical patterns of behavior and need The state-of-the-art of mutual referencing has to be determined The trends and asymptotic situations foreseen for the time of project finalization has to be determined

Level of the General Project Approach and Product Architecture


Decisions have to be made about the most important goals to be achieved:

Maximizing the speed of search

Maximizing the sophistication of search


Maximizing specific effects of interest for a given institution or a customer

Maximizing a combination of the above


Decision on this level affect the applicability of the final product / tool.

Algorithmic Level
Develop an efficient mutation algorithm of interest for the application

in the direction of database architecture and design in introducing the elements of semantic-based mutation

Semantics-based mutations are especially of interest for chaotic markets, typical of new markets in developed countries or traditional markets in under-developed countries.

Semantics-based Mutation
Mutation based on spatial localities

After a fruitful Web presentation is reached (using a tradicional algorithm with mutation), the site of the same Internet service provider is searched for other presentations on the same or similar topic

Explanation : In chaotic markets, it is very unlikely that service/product offers from the same small geographic area each other on their Web presentations After a successful side trip based on spatial mutation, one continue with the traditional database mutation.

Semantics-based Mutation
Mutation based on temporal localities

One comes back periodically to a Web presentation which was fruitful in the past One comes back periodically to other Web presentations developed by the author who created some fruitful Web presentations in the past Temporal mutation can use direct revisits or a number of indirect forms or revisit.

Implementation Level

Utilization of novel technologies, for maximal performance and minimal implementation complexity Important for: - good flexibility - extendibility - reliability - availability Utilization of mobile platforms and mobile agents

Implementation Level

Static agents - one has to download megabytes of information - treat that information with a decision-making code of size measured in kilobytes - derive the final business related decision, which is binary in size (one bit: yes or no) A huge amount of data is transferred through the network in vain, because only a small percent of fetched documents will turn out to be useful

Mobile agents - they would browse through the network and perform the search locally, on the remote servers, transferring only the needed documents and data - they load the network only with kilobytes and a single bit

Simulation Result

Links-based approach in the static domain How various mutation strategies can affect the search efficiency Set of software packages have developed , that would perform Internet search using genetic algorithms (by Veljko Milutinovic, Dragana Cvetkovic, and Jelena Mirkovic) As the fitness function they have measured average Jaccards score for the output documents, while changing the type and rate of mutation

Simulation Result

The simulation result for topic mutation

The simulation result for temporal and spatial mutation combined with topic mutation

Simulation Result
The simulation result for topic, spatial and temporal mutation combined.
Constant increase in the quality of pages found.

Conclusion: Evolution

Tutorial download: galeb.etf.bg.ac.yu/~vm Option:Tutorials

You might also like