Part Ii: Applications of Gas: Ga and The Internet Genetic Search Based On Multiple Mutation Approaches
Part Ii: Applications of Gas: Ga and The Internet Genetic Search Based On Multiple Mutation Approaches
The search sapace is large, complex or poorly understood Domain knowledge is scarce or expert knowledge is difficult to encode to narrow the search space No mathematical analysis is available Traditional search methods fail For problem solving and for modeling
Applications
GAs are applied to many scientific, engineering problems , In business and entertainment , including: 1. Optimization: It is used in wide variety of optimization tasks including numerical optimization such as traveling Salesman Problem, Job Scheduling Problem, video and sound quality optimization. 2. Automatic Programming: It is used to evolve or generate computer program for specific task automatically 3. In machine and robot Learning 4. In Models of social systems 5. Interactions between evolution and learning
Optimization
Internet search
search
GA
Path finding
Mobile robots
Data mining
Trend spotting
Algorithms Phases
Process set of URLs given by user Select all links from input set Evaluate fitness function for all genomes Perform crossover, mutation, and reproduction
The End
Introduction
GA can be used for intelligent internet search. GA is used in cases when search space is relatively large. GA is adoptive search. GA is heuristic search method.
C O N T R O L
P R O G R A M
Top data
Net data
Spider
Spider is software packages, that picks up internet documents from user supplied input with depth specified by user. Spider takes one URL, fetches all links, and documents thy contain with predefined depth. The fetched documents are stored on local hard disk with same structure as on the original location. Spiders task is to produce the first generation. Spider is used during crossover and mutation.
Agent
Agent takes as an input a set of urls, and calls spider, for every one of them, with depth 1. Then, agent performs extraction of keywords from each document, and stores it in local hard disk.
Generator
Generator generates a set of urls from given keywords, using some conventional search engine. It takes as input the desired topic, calls yahoo search engine, and submits a query looking for all documents covering the specific topic. Generator stores URL and topic of given web page in database called topdata.
Topic
It uses topdata DB in order to insert random urls from database into current set. Topic performs mutation.
Space
Space takes as input the current set from the agent application and injects into it those urls from the database netdata that appeared with the greatest frequency in the output set of previous searches.
Time
Time takes set of urls from agent and inserts ones with greatest frequency into DB netdata. The netdata DB contains of three fields: URL, topic, and count number. The DB is updated in each algorithm iteration.
data flow
Top data
Net data
GA for internet search, on contrary to other gas, is much faster and more efficient that conventional solutions, such as standard internet search engines.
INTERNET
The efficient way to locate the new not-yet-indexed information: Using links-based approaches genetic search simulated annealing Best result: indexing - based approaches
+
links - based approaches
Topic-oriented database mutation Semantic mutations - based on the principles of spatial locality - based on the principles of temporal locality Logical reasoning and semantics consideration is involve in picking out URLs for mutation.
APPLICATION LEVEL
ALGORITHMIC LEVEL
IMPLEMENTATION LEVEL
Application Level
Statistical analysis and data mining has to be performed, in order to figure out the common and typical patterns of behavior and need The state-of-the-art of mutual referencing has to be determined The trends and asymptotic situations foreseen for the time of project finalization has to be determined
Algorithmic Level
Develop an efficient mutation algorithm of interest for the application
in the direction of database architecture and design in introducing the elements of semantic-based mutation
Semantics-based mutations are especially of interest for chaotic markets, typical of new markets in developed countries or traditional markets in under-developed countries.
Semantics-based Mutation
Mutation based on spatial localities
After a fruitful Web presentation is reached (using a tradicional algorithm with mutation), the site of the same Internet service provider is searched for other presentations on the same or similar topic
Explanation : In chaotic markets, it is very unlikely that service/product offers from the same small geographic area each other on their Web presentations After a successful side trip based on spatial mutation, one continue with the traditional database mutation.
Semantics-based Mutation
Mutation based on temporal localities
One comes back periodically to a Web presentation which was fruitful in the past One comes back periodically to other Web presentations developed by the author who created some fruitful Web presentations in the past Temporal mutation can use direct revisits or a number of indirect forms or revisit.
Implementation Level
Utilization of novel technologies, for maximal performance and minimal implementation complexity Important for: - good flexibility - extendibility - reliability - availability Utilization of mobile platforms and mobile agents
Implementation Level
Static agents - one has to download megabytes of information - treat that information with a decision-making code of size measured in kilobytes - derive the final business related decision, which is binary in size (one bit: yes or no) A huge amount of data is transferred through the network in vain, because only a small percent of fetched documents will turn out to be useful
Mobile agents - they would browse through the network and perform the search locally, on the remote servers, transferring only the needed documents and data - they load the network only with kilobytes and a single bit
Simulation Result
Links-based approach in the static domain How various mutation strategies can affect the search efficiency Set of software packages have developed , that would perform Internet search using genetic algorithms (by Veljko Milutinovic, Dragana Cvetkovic, and Jelena Mirkovic) As the fitness function they have measured average Jaccards score for the output documents, while changing the type and rate of mutation
Simulation Result
The simulation result for temporal and spatial mutation combined with topic mutation
Simulation Result
The simulation result for topic, spatial and temporal mutation combined.
Constant increase in the quality of pages found.
Conclusion: Evolution