1.2 A Brief History of The Web and The Internet

The operation of the Web relies on the structure of itshypertext documents. Hypertext allows Web page authors to link their documents to other related documents. To view these documents, one simply follows the links (calledhyperl inks) hypertext that also allows other media (e.g., image, audio and video files) is calledhypermedi a.

Uploaded by

marathedm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views6 pages

1.2 A Brief History of The Web and The Internet

Uploaded by

marathedm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

1 Introduction

the returned documents written in HTML and laying out the text and graphics on the users computer screen on the client side. The operation of the Web relies on the structure of itshyp ertext documents. Hypertext allows Web page authors to link their documents to other related documents residing on computers anywhere in the world. To view these documents, one simply follows the links (calledhyperl inks). The idea of hypertext was invented by Ted Nelson in 1965 [403], who also created the well known hypertext system Xanadu (http://xanadu. com/). Hypertext that also allows other media (e.g., image, audio and video files) is calledhypermedi a.

1.2 A Brief History of the Web and the Internet Creation of the Web: The Web was invented in 1989 by Tim BernersLee, who, at that time, worked at CERN (Centre European pour la Recherche Nucleaire, or European Laboratory for Particle Physics) in Switzer- land. He coined the term World Wide Web, wrote the first World Wide Web server, httpd, and the first client program (a browser and editor), WorldWideWeb. It began in March 1989 when Tim Berners-Lee submitted a proposal ti- tled Information Management: A Proposal to his superiors at CERN. In the proposal, he discussed the disadvantages of hierarchical information organization and outlined the advantages of a hypertext-based system. The proposal called for a simple protocol that could request information stored in remote systems through networks, and for a scheme by which information could be exchanged in a common format and documents of individuals could be linked by hyperlinks to other documents. It also proposed methods for reading text and graphics using the display technology at CERN at that time. The proposal essentially outlined a distributed hyper-

Mosaic and Netscape Browsers: The next significant event in the development of the Web was the arrival ofMosaic. In February of 1993, Marc Andreesen from the University of Illinois NCSA (National Center for Supercomputing Applications) and his team released the first "Mosaic for X" graphical Web browser for UNIX. A few months later, different versions of Mosaic were released for Macintosh and Windows operating systems. This was an important event. For the first time, a Web client, with a consistent and simple point-and-click graphical user interface, was im- plemented

for the three most popular operating systems available at the time. It soon made big splashes outside the academic circle where it had begun. In mid-1994, Silicon Graphics founder Jim Clark collaborated with Marc Andreessen, and they founded the company Mosaic Communications (later renamed as Netscape Communications). Within a few months, theNetscape browser was released to the public, which started the explosive growth of the Web. The Internet Explorer from Microsoft en- tered the market in August, 1995 and began to challenge Netscape. The creation of the World Wide Web by Tim Berners-Lee followed by the release of the Mosaic browser are often regarded as the two most significant contributing factors to the success and popularity of the Web. Internet: The Web would not be possible without the Internet, which provides the communication network for the Web to function. TheInter- net started with the computer network ARPANET in the Cold War era. It was produced as the result of a project in the United States aiming at main- taining control over its missiles and bombers after a nuclear attack. It was supported by Advanced Research Projects Agency (ARPA), which was part of the Department of Defense in the United States. The first ARPANET connections were made in 1969, and in 1972, it was demon- strated at the First International Conference on Computers and Communi- cation, held in Washington D.C. At the conference, ARPA scientists linked computers together from 40 different locations. In 1973, Vinton Cerf and Bob Kahn started to develop the protocol later to be calledTCP/IP (Transmission Control Protocol/Internet Proto- col). In the next year, they published the paper Transmission Control Pro- tocol, which marked the beginning of TCP/IP. This new protocol allowed diverse computer networks to interconnect and communicate with each other. In subsequent years, many networks were built, and many compet- ing techniques and protocols were proposed and developed. However, ARPANET was still the backbone to the entire system. During the period, the network scene was chaotic. In 1982, the TCP/IP was finally adopted, and theInternet, which is a connected set of networks using the TCP/IP protocol, was born. 4 1 Introduction Search Engines:With information being shared worldwide, there was a need for individuals to find information in an orderly and efficient manner. Thus began the development of search engines. The search systemExcite was introduced in 1993 by six Stanford University students. EINet Galaxy was established in 1994 as part of the MCC Research Consortium at the University of Texas. Jerry Yang and David Filo createdYahoo! in 1994, which started out as a listing of their favorite Web sites, and offered direc- tory search. In subsequent years, many search systems emerged, e.g.,Lycos, Inforseek, AltaVista, Inktomi, Ask Jeeves, Northernlight, etc. Google was launched in 1998 by Sergey Brin and Larry Page based on their research project at Stanford University. Microsoft started to commit to search in 2003, and launched theMSN search engine in spring 2005. It used search engines from others before.Yahoo! provided a general search capability in 2004 after it purchased Inktomi in 2003. W3C(The World Wide Web Consortium): W3C was formed in the December of 1994 by MIT and CERN as an international organization to lead the development of the Web. W3C's main objective was to promote standards for the evolution of the Web and interoperability between WWW products by producing specifications and reference software. The firstInternational Conference on World Wide Web (WWW) was also held in 1994, which has been a yearly event ever since. From 1995 to 2001, the growth of the Web boomed. Investors saw commercial opportunities and became involved. Numerous businesses started on the Web, which led to irrational developments. Finally, the bubble burst in 2001. However, the development of the Web was not stopped, but has only become more rational since.

1.3 Web Data Mining The rapid growth of the Web in the last decade makes it the largest pub- licly accessible data source in the world. The Web has many unique characteristics, which make mining useful information and knowledge a fasci- nating and challenging task. Let us review some of these characteristics. 1. The amount of data/information on the Web is huge and still growing. The coverage of the information is also very wide and diverse. One can find information on almost anything on the Web. 2. Data of all types exist on the Web, e.g., structured tables, semistructured Web pages, unstructured texts, and multimedia files (images, audios, and videos). 1.3 Web Data Mining 5 3. Information on the Web isheterogeneous. Due to the diverse author- ship of Web pages, multiple pages may present the same or similar information using completely different words and/or formats. This makes integration of information from multiple pages a challenging problem. 4. A significant amount of information on the Web is linked. Hyperlinks exist among Web pages within a site and across different sites. Within a site, hyperlinks serve as information organization mechanisms. Across different sites, hyperlinks represent implicit conveyance of authority to the target pages. That is, those pages that are linked (or pointed) to by many other pages are usually high quality pages or authoritative pages simply because many people trust them. 5. The information on the Web is noisy. Thenoise comes from two main sources. First, a typical Web page contains many pieces of information, e.g., the main content of the page, navigation links, advertisements, copyright notices, privacy policies, etc. For a particular application, only part of the information is useful. The rest is considered noise. To perform fine-grain Web information analysis and data mining, the noise should be removed. Second, due to the fact that the Web does not have quality control of information, i.e., one can write almost anything that one likes, a large amount of information on the Web is of low quality, erroneous, or even misleading. 6. The Web is also about services. Most commercial Web sites allow people to perform useful operations at their sites, e.g., to purchase products, to pay bills, and to fill in forms. 7. The Web is dynamic. Information on the Web changes constantly. Keeping up with the change and monitoring the change are important issues for many applications. 8. The Web is a virtual society. The Web is not only about data, information and services, but also about interactions among people, organiza- tions and automated systems. One can communicate with people anywhere in the world easily and instantly, and also express ones views on anything in Internet forums, blogs and review sites. All these characteristics present both challenges and opportunities for mining and discovery of information and knowledge from the Web. In this book, we only focus on mining textual data. For mining of images, videos and audios, please refer to [143, 441]. To explore information mining on the Web, it is necessary to know data mining, which has been applied in many Web mining tasks. However, Web mining is not entirely an application of data mining. Due to the richness and diversity of information and other Web specific characteristics discussed above, Web mining has developed many of its own algorithms.

What is Data Mining? Data mining is also called knowledge discovery in databases (KDD). It is commonly defined as the process of discovering usefulpatterns or knowledge from data sources,

e.g., databases, texts, images, the Web, etc. The patterns must be valid, potentially useful, and understandable. Data mining is a multi-disciplinary field involving machine learning, statistics, databases, artificial intelligence, information retrieval, and visualization. There are many data mining tasks. Some of the common ones are supervised learning(or classification), unsupervised learning(or clustering), association rule mining, and sequential pattern mining. We will study all of them in this book. A data mining application usually starts with an understanding of the application domain by data analysts (data miners), who then identify suitable data sources and the target data. With the data, data mining can be performed, which is usually carried out in three main steps: Pre-processing: The raw data is usually not suitable for mining due to various reasons. It may need to be cleaned in order to remove noises or abnormalities. The data may also be too large and/or involve many irrelevant attributes, which call for data reduction through sampling and attribute selection. Details about data pre-processing can be found in any standard data mining textbook. Data mining: The processed data is then fed to a data mining algorithm which will produce patterns or knowledge. Post-processing: In many applications, not all discovered patterns are useful. This step identifies those useful ones for applications. Various evaluation and visualization techniques are used to make the decision. The whole process (also called the data mining process) is almost always iterative. It usually takes many rounds to achieve final satisfactory results, which are then incorporated into real-world operational tasks. Traditional data mining uses structured data stored in relational tables, spread sheets, or flat files in the tabular form. With the growth of the Web and text documents, Web mining and text mining are becoming increasingly important and popular. Web mining is the focus of this book. 1.3.2 What is Web Mining? Web mining aims to discover useful information or knowledge from the Web hyperlink structure, page content, and usage data. Although Web mining uses many data mining techniques, as mentioned above it is not purely an application of traditional data mining due to the heterogeneity and semistructured or unstructured nature of the Web data. Many new mining tasks and algorithms were invented in the past decade. Based on the primary kinds of data used in the mining process, Web mining tasks can be categorized into three types: Web structure mining, Web content mining and Web usage mining. Web structure mining: Web structure mining discovers useful knowledge from hyperlinks (or links for short), which represent the structure of the Web. For example, from the links, we can discover important Web pages, which, incidentally, is a key technology used in search engines. We can also discover

communities of users who share common interests. Traditional data mining does not perform such tasks because there is usually no link structure in a relational table. Web content mining: Web content mining extracts or mines useful information or knowledge from Web page contents. For example, we can automatically classify and cluster Web pages according to their topics. These tasks are similar to those in traditional data mining. However, we can also discover patterns in Web pages to extract useful data such as descriptions of products, postings of forums, etc, for many purposes. Furthermore, we can mine customer reviews and forum postings to discover consumer sentiments. These are not traditional data mining tasks. Web usage mining: Web usage mining refers to the discovery of user access patterns from Web usage logs, which record every click made by each user. Web usage mining applies many data mining algorithms. One of the key issues in Web usage mining is the pre-processing of click- stream data in usage logs in order to produce the right data for mining. In this book, we will study all these three types of mining. However, due to the richness and diversity of information on the Web, there are a large number of Web mining tasks. We will not be able to cover them all. We will only focus on some important tasks and their algorithms. TheWeb mining process is similar to the data mining process. The dif- ference is usually in the data collection. In traditional data mining, the data is often already collected and stored in a data warehouse. For Web mining, data collection can be a substantial task, especially for Web structure and content mining, which involves crawling a large number of target Web pages. We will devote a whole chapter on crawling. Once the data is collected, we go through the same three-step process: data preprocessing, Web data mining and post-processing. However, the techniques used for each step can be quite different from those used in traditional data mining.

2 Association Rules and Sequential Patterns

Association rules are an important class of regularities in data. Mining of association rules is a fundamental data mining task. It is perhaps the most important model invented and extensively studied by the database and data mining community. Its objective is to find allco-occurren ce relationships, calledassociations, among data items. Since it was first introduced in 1993 by Agrawal et al. [9], it has attracted a great deal of attention. Many efficient algorithms, extensions and applications have been reported. The classic application of association rule mining is the market basket data analysis, which aims to discover how items purchased by customers in a supermarket (or a store) are associated. An example association rule is

Cheese->Beer [support = 10%, confidence = 80%]. The rule says that 10% customers buyCheese andBeer together, and those who buyCheese also buy Beer 80% of the time. Support and confidence are two measures of

rule strength, which we will define later This mining model is in fact very general and can be used in many applications. For example, in the context of the Web and text documents, it can be used to find word co-occurrence relationships and Web usage patterns as we will see in later chapters. Association rule mining, however, does not consider the sequence in which the items are purchased. Sequential pattern mining takes care of that. An example of a sequential pattern is 5% of customers buybedfirst, thenmattress and then pillows. The items are not purchased at the same time, but one after another. Such patterns are useful in Web usage mining for analyzingclickstreams in server logs. They are also useful for finding languageor linguistic patterns from natural language texts.

2.1 Basic Concepts of Association Rules

The problem of mining association rules can be stated as follows: LetI = {i1, i2, ,im} be a set ofitems. LetT = (t1,t2, ,tn) be a set oftransactions (the database), where each transactionti is a set of items such that ti I. An association rule is an implication of the form,
T

Data Mining For The Masses
100% (2)
Data Mining For The Masses
264 pages
Web Mining
No ratings yet
Web Mining
71 pages
Texto Teste 1
No ratings yet
Texto Teste 1
4 pages
Unit 1
No ratings yet
Unit 1
29 pages
MODULE_1
No ratings yet
MODULE_1
53 pages
Unit 1
No ratings yet
Unit 1
29 pages
UNIT1
No ratings yet
UNIT1
37 pages
History of The World Wide Web
No ratings yet
History of The World Wide Web
4 pages
IT&WD UNIT-1
No ratings yet
IT&WD UNIT-1
19 pages
History of The Internet - Javatpoint
No ratings yet
History of The Internet - Javatpoint
10 pages
WT-Unit 1.1
No ratings yet
WT-Unit 1.1
38 pages
Teknologi Maklumat Dalam Pendidikan: Internet and World Wide Web
No ratings yet
Teknologi Maklumat Dalam Pendidikan: Internet and World Wide Web
55 pages
MODULE 1 (3)
No ratings yet
MODULE 1 (3)
124 pages
B.Tech : Prepared by
No ratings yet
B.Tech : Prepared by
131 pages
1.1 History and Evolution: 1.1.1 The Internet
No ratings yet
1.1 History and Evolution: 1.1.1 The Internet
12 pages
WT Lecture Notes 01
No ratings yet
WT Lecture Notes 01
20 pages
Understanding Internet
No ratings yet
Understanding Internet
13 pages
Browser Wars Era
From Everand
Browser Wars Era
Lucas Lee
No ratings yet
Notes introduction unit 1 & unit 2 full (WT)
No ratings yet
Notes introduction unit 1 & unit 2 full (WT)
67 pages
INFOT 2 Chapter 1 Web Systems and Technologies
No ratings yet
INFOT 2 Chapter 1 Web Systems and Technologies
32 pages
Looking Back at The Evolution of The Internet
No ratings yet
Looking Back at The Evolution of The Internet
4 pages
Grade 8 HISTORY OF THE INTERNET
No ratings yet
Grade 8 HISTORY OF THE INTERNET
6 pages
ICF-8-MODULE-3rd-QUARTER-WEEK-1-7-FINAL
No ratings yet
ICF-8-MODULE-3rd-QUARTER-WEEK-1-7-FINAL
27 pages
Topic 1
No ratings yet
Topic 1
8 pages
HTML Book Chapters 1 9
No ratings yet
HTML Book Chapters 1 9
122 pages
The History of Internet
No ratings yet
The History of Internet
7 pages
Internet History
No ratings yet
Internet History
38 pages
World Wide Web - Wikipedia
No ratings yet
World Wide Web - Wikipedia
149 pages
The History of The Internet Presentation
100% (2)
The History of The Internet Presentation
24 pages
History of The World Wide Web
No ratings yet
History of The World Wide Web
2 pages
Www.history
No ratings yet
Www.history
4 pages
Chapter 1
No ratings yet
Chapter 1
16 pages
1 Introduction
No ratings yet
1 Introduction
61 pages
Web Technologies notes
No ratings yet
Web Technologies notes
154 pages
The History of The Internet Presentation 1203556045416389 3 PDF
100% (1)
The History of The Internet Presentation 1203556045416389 3 PDF
24 pages
1 The History of The Internet Presentation
100% (1)
1 The History of The Internet Presentation
24 pages
Internet and World Wide Web The Internet
No ratings yet
Internet and World Wide Web The Internet
6 pages
2C Redes
No ratings yet
2C Redes
34 pages
Lesson 1 Introduction To The Worldwide Web and Its Structure
No ratings yet
Lesson 1 Introduction To The Worldwide Web and Its Structure
8 pages
The History of the Internet Presentation Ppt[1]
No ratings yet
The History of the Internet Presentation Ppt[1]
22 pages
ITEC50 Lesson 1
No ratings yet
ITEC50 Lesson 1
29 pages
A Timeline of The History of The World Wide Web
0% (1)
A Timeline of The History of The World Wide Web
5 pages
Web301 - Prelim Lesson
No ratings yet
Web301 - Prelim Lesson
85 pages
History of The Internet
No ratings yet
History of The Internet
9 pages
Module 1 - Lesson 1
No ratings yet
Module 1 - Lesson 1
16 pages
Lesson 3 History of The Internet
No ratings yet
Lesson 3 History of The Internet
19 pages
Introduction To Web Technologies and E-Services
No ratings yet
Introduction To Web Technologies and E-Services
48 pages
World Wide Web
No ratings yet
World Wide Web
30 pages
IP-Chapter 1
No ratings yet
IP-Chapter 1
23 pages
02 History
No ratings yet
02 History
22 pages
World Wide Web (WWW) ACN CHAPTER 5
No ratings yet
World Wide Web (WWW) ACN CHAPTER 5
9 pages
World Wide Web
No ratings yet
World Wide Web
4 pages
Lesson 1 The Evolution of Internetmidterm
No ratings yet
Lesson 1 The Evolution of Internetmidterm
4 pages
Chapter 1
No ratings yet
Chapter 1
60 pages
The Evolution of Internet
100% (2)
The Evolution of Internet
5 pages
Web Design Lesson 1 5
No ratings yet
Web Design Lesson 1 5
11 pages
Unit I Internet and The World Wide Web
No ratings yet
Unit I Internet and The World Wide Web
3 pages
Introduction To Internet: History of Internet and WWW
No ratings yet
Introduction To Internet: History of Internet and WWW
15 pages
internet tech
No ratings yet
internet tech
31 pages
Responsive Web Design With Html 5 & Css
From Everand
Responsive Web Design With Html 5 & Css
James wood
No ratings yet
The Open Web
From Everand
The Open Web
Christopher Adams
No ratings yet
Dr. Anjan Krishnamurthy Associate Professor Dept. of CSE, BMSIT&M
No ratings yet
Dr. Anjan Krishnamurthy Associate Professor Dept. of CSE, BMSIT&M
129 pages
Data Mininginhealthcare
No ratings yet
Data Mininginhealthcare
13 pages
Update Project Report Vaishnavi
No ratings yet
Update Project Report Vaishnavi
59 pages
1708443470801
No ratings yet
1708443470801
71 pages
BI Question Bank - All Units PDF
No ratings yet
BI Question Bank - All Units PDF
6 pages
Data Mining Concept (MMU)
No ratings yet
Data Mining Concept (MMU)
38 pages
Learn R For Applied Statistics: With Data Visualizations, Regressions, and Statistics 1st Edition Eric Goh Ming Hui
100% (6)
Learn R For Applied Statistics: With Data Visualizations, Regressions, and Statistics 1st Edition Eric Goh Ming Hui
62 pages
Dr.R.Vidya
No ratings yet
Dr.R.Vidya
21 pages
Information Security_ Inferential Control in Databases
No ratings yet
Information Security_ Inferential Control in Databases
31 pages
2022-2023 Ieee Software Titles
No ratings yet
2022-2023 Ieee Software Titles
26 pages
Data Mining For Business Intelligence: Shmueli, Patel & Bruce
No ratings yet
Data Mining For Business Intelligence: Shmueli, Patel & Bruce
37 pages
Evaluating Student's Performance Using K-Means Clustering: Rakesh Kumar Arora, Dr. Dharmendra Badal
No ratings yet
Evaluating Student's Performance Using K-Means Clustering: Rakesh Kumar Arora, Dr. Dharmendra Badal
5 pages
Topic 1 Overview of Intelligent Systems
No ratings yet
Topic 1 Overview of Intelligent Systems
35 pages
Final Exam Machine Learning & Data Mining
No ratings yet
Final Exam Machine Learning & Data Mining
3 pages
# Understanding DM Architecture, KDD & DM Tools
No ratings yet
# Understanding DM Architecture, KDD & DM Tools
29 pages
01 Introduction
No ratings yet
01 Introduction
37 pages
Fake News Detection AI
No ratings yet
Fake News Detection AI
18 pages
Data Mining: A Database Perspective
No ratings yet
Data Mining: A Database Perspective
19 pages
ML Assignment 2
No ratings yet
ML Assignment 2
2 pages
M Tech2014-16
No ratings yet
M Tech2014-16
173 pages
Knowledge Discovery in Data Science: KDD Meets Big Data
No ratings yet
Knowledge Discovery in Data Science: KDD Meets Big Data
6 pages
Romi DM Aug2020
No ratings yet
Romi DM Aug2020
722 pages
Viva Data Mining Lab
No ratings yet
Viva Data Mining Lab
11 pages
0.0-An Overview
No ratings yet
0.0-An Overview
11 pages
A Survey On Crop Prediction Using Machine Learning Approach
No ratings yet
A Survey On Crop Prediction Using Machine Learning Approach
4 pages
Mengenali Fungsi Logika "And" Melalui Pemrograman Perceptron Dengan Matlab
No ratings yet
Mengenali Fungsi Logika "And" Melalui Pemrograman Perceptron Dengan Matlab
8 pages
Concept Hierarchy in Data Mining: Specication, Generation and Implementation
No ratings yet
Concept Hierarchy in Data Mining: Specication, Generation and Implementation
116 pages
Multiclass Legal Judgment Outcome Prediction For Consumer Lawsuits Using Xgboost
No ratings yet
Multiclass Legal Judgment Outcome Prediction For Consumer Lawsuits Using Xgboost
20 pages
Viva AI
No ratings yet
Viva AI
7 pages