0% found this document useful (0 votes)
9 views

Understanding Search Engines Dirk Lewandowski pdf download

The document is an introduction to the book 'Understanding Search Engines' by Dirk Lewandowski, which provides a comprehensive overview of search engines from various perspectives including technology, usage, and societal significance. It aims to reflect the complexity of web search and is based on a previous German edition. The book is structured into multiple chapters that cover topics such as web searching methods, content processing, user interaction, and ranking of search results.

Uploaded by

allenmbeishmn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Understanding Search Engines Dirk Lewandowski pdf download

The document is an introduction to the book 'Understanding Search Engines' by Dirk Lewandowski, which provides a comprehensive overview of search engines from various perspectives including technology, usage, and societal significance. It aims to reflect the complexity of web search and is based on a previous German edition. The book is structured into multiple chapters that cover topics such as web searching methods, content processing, user interaction, and ranking of search results.

Uploaded by

allenmbeishmn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Understanding Search Engines Dirk Lewandowski

download

https://ebookmeta.com/product/understanding-search-engines-dirk-
lewandowski/

Download more ebook from https://ebookmeta.com


We believe these products will be a great fit for you. Click
the link to download now, or visit ebookmeta.com
to discover even more!

Custom Search - Discover More: : A Complete Guide to


Google Programmable Search Engines 1st Edition Irina
Shamaeva

https://ebookmeta.com/product/custom-search-discover-more-a-
complete-guide-to-google-programmable-search-engines-1st-edition-
irina-shamaeva/

Custom Search Discover more A Complete Guide to Google


Programmable Search Engines 1st Edition Irina Shamaeva
David Michael Galley

https://ebookmeta.com/product/custom-search-discover-more-a-
complete-guide-to-google-programmable-search-engines-1st-edition-
irina-shamaeva-david-michael-galley/

Exodus V Plague Book 13 1st Edition Patton Dirk Dirk


Patton

https://ebookmeta.com/product/exodus-v-plague-book-13-1st-
edition-patton-dirk-dirk-patton/

Biochemistry 7th Edition Reginald H. Garrett

https://ebookmeta.com/product/biochemistry-7th-edition-reginald-
h-garrett/
Scent of Deception (Bonds of Steele Omegaverse Book 3)
1st Edition Laurel Night

https://ebookmeta.com/product/scent-of-deception-bonds-of-steele-
omegaverse-book-3-1st-edition-laurel-night/

Tracing Textile Production from the Viking Age to the


Middle Ages Tools Textiles Texts and Contexts 2nd
Edition Ingvild Øye

https://ebookmeta.com/product/tracing-textile-production-from-
the-viking-age-to-the-middle-ages-tools-textiles-texts-and-
contexts-2nd-edition-ingvild-oye/

Dragonlance Shadow of the Dragon Queen Dungeons Dragons


Adventure Book 1st Edition Wizards Rpg Team

https://ebookmeta.com/product/dragonlance-shadow-of-the-dragon-
queen-dungeons-dragons-adventure-book-1st-edition-wizards-rpg-
team/

Cultural Intelligence in the World of Work Past Present


Future 1st Edition Yuan Liao

https://ebookmeta.com/product/cultural-intelligence-in-the-world-
of-work-past-present-future-1st-edition-yuan-liao/

Honey: A Miraculous Product of Nature 1st Edition

https://ebookmeta.com/product/honey-a-miraculous-product-of-
nature-1st-edition/
I Know Body Parts Mary Rose Osburn

https://ebookmeta.com/product/i-know-body-parts-mary-rose-osburn/
Dirk Lewandowski

Understanding
Search Engines
Understanding Search Engines
Dirk Lewandowski

Understanding
Search Engines
Dirk Lewandowski
Department of Information
Hamburg University of Applied Sciences
Hamburg, Germany

ISBN 978-3-031-22788-2 ISBN 978-3-031-22789-9 (eBook)


https://doi.org/10.1007/978-3-031-22789-9

Translation from the German language edition: “Suchmaschinen verstehen” by Dirk Lewandowski,
# Springer 2021. Published by Springer Vieweg, Berlin, Heidelberg. All Rights Reserved.

# The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2023
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

This book aims to give a comprehensive introduction to search engines. The


particularity of this book is that it looks at the subject from different angles. These
are, in particular, technology, use, Internet-based research, economics, and societal
significance. In this way, I want to reflect the complexity of the search engines and
Web search as a whole. I am convinced that only such a comprehensive view does
justice to the topic and enables a real understanding.
A German-language version of this book has been available for several years.
This English edition follows the third German edition of 2021. I am pleased that the
publisher has made this international edition possible.
In this translation, care has been taken to adapt to the international context where
necessary. However, for many examples, it does not matter in which country a
search was carried out or a screenshot was taken. However, the references cited in
the text were adapted where English-language sources were available. The further
reading sections at the end of the chapters have also been adapted.
I would like to thank all those who have asked and encouraged me over the years
to produce an English edition. So, here it is, and I hope it will be as valuable to many
readers as the German editions have been.

Hamburg, Germany Dirk Lewandowski


September 2022

v
Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 The Importance of Search Engines . . . . . . . . . . . . . . . . . . . . . . 2
1.2 A Book About Google? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Objective of This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Talking About Search Engines . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Structure of This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 Structure of the Chapters and Markings in the Text . . . . . . . . . . 9
1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Ways of Searching the Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 Searching for a Website vs. Searching for Information
on a Topic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 What Is a Document? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Where Do People Search? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Different Pathways to Information on the World Wide Web . . . . 14
2.4.1 Search Engines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4.2 Vertical Search Engines . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.3 Metasearch Engines . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.4 Web Directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.5 Social Bookmarking Sites . . . . . . . . . . . . . . . . . . . . . . 21
2.4.6 Question-Answering Sites . . . . . . . . . . . . . . . . . . . . . . 21
2.4.7 Social Networking Sites . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 How Search Engines Capture and Process Content from
the Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1 The World Wide Web and How Search Engines Acquire
Its Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 Content Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Web Crawling: Finding Documents on the Web . . . . . . . . . . . . 33
3.3.1 Guiding and Excluding Search Engines . . . . . . . . . . . . 37
3.3.2 Content Exclusion by Search Engine Providers . . . . . . 39

vii
viii Contents

3.3.3 Building the Database and Crawling for Vertical


Collections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4 The Indexer: Preprocessing Documents for Searching . . . . . . . . 42
3.4.1 Indexing Images, Audio, and Video Files . . . . . . . . . . . 47
3.4.2 The Representation of Web Documents in Search
Engines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.5 The Searcher: Understanding Queries . . . . . . . . . . . . . . . . . . . . 51
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4 User Interaction with Search Engines . . . . . . . . . . . . . . . . . . . . . . . 59
4.1 The Search Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 Collecting Usage Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3 Query Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4 Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.5 Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.5.1 Entering Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.5.2 Autocomplete Suggestions . . . . . . . . . . . . . . . . . . . . . 68
4.5.3 Query Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.5.4 Query Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.5.5 Distribution of Queries by Frequency . . . . . . . . . . . . . 73
4.5.6 Query Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.5.7 Using Operators and Commands for Specific
Searches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.6 Search Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5 Ranking Search Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.1 Groups of Ranking Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.2 Text Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.2.1 Identifying Potentially Relevant Documents . . . . . . . . . 86
5.2.2 Calculating Frequencies . . . . . . . . . . . . . . . . . . . . . . . 88
5.2.3 Considering the Structural Elements of Documents . . . . 89
5.3 Popularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.3.1 Link-Based Rankings . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.3.2 Usage Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.4 Freshness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.5 Locality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.6 Personalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.7 Technical Ranking Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.8 Ranking and Spam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Contents ix

6 Vertical Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119


6.1 Vertical Search Engines as the Basis of Universal Search . . . . . . 121
6.2 Types of Vertical Search Engines . . . . . . . . . . . . . . . . . . . . . . . 123
6.3 Collections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.3.1 News . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.3.2 Scholarly Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.3.3 Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.3.4 Videos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.4 Integrating Vertical Search Engines into Universal Search . . . . . 133
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7 Search Result Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
7.1 The Influence of Device Types and Screen Resolutions . . . . . . . 138
7.2 The Structure of Search Engine Result Pages . . . . . . . . . . . . . . 139
7.3 Elements on Search Engine Result Pages . . . . . . . . . . . . . . . . . 146
7.3.1 Organic Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
7.3.2 Advertising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.3.3 Universal Search Results . . . . . . . . . . . . . . . . . . . . . . 148
7.3.4 Knowledge Graph Results . . . . . . . . . . . . . . . . . . . . . 150
7.3.5 Direct Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
7.3.6 Integration of Transactions . . . . . . . . . . . . . . . . . . . . . 152
7.3.7 Navigation Elements . . . . . . . . . . . . . . . . . . . . . . . . . 153
7.3.8 Support for Query Modification . . . . . . . . . . . . . . . . . . 154
7.3.9 Search Options on the Result Page . . . . . . . . . . . . . . . 155
7.4 The Structure of Snippets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
7.5 Options Related to Single Results . . . . . . . . . . . . . . . . . . . . . . 159
7.6 Selection of Suitable Results . . . . . . . . . . . . . . . . . . . . . . . . . . 160
7.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
8 The Search Engine Market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
8.1 Search Engines’ Business Model . . . . . . . . . . . . . . . . . . . . . . . 165
8.2 The Importance of Search Engines for Online Advertising . . . . . 166
8.3 Search Engine Market Shares . . . . . . . . . . . . . . . . . . . . . . . . . . 167
8.4 Important Search Engines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
8.5 Partnerships in the Search Engine Market . . . . . . . . . . . . . . . . . 170
8.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
9 Search Engine Optimization (SEO) . . . . . . . . . . . . . . . . . . . . . . . . . 175
9.1 The Importance of Search Engine Optimization . . . . . . . . . . . . . 176
9.2 Fundamentals of Search Engine Optimization . . . . . . . . . . . . . . 178
9.2.1 Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
9.2.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
9.2.3 HTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
x Contents

9.2.4 Trust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182


9.2.5 Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
9.2.6 User-Related Factors . . . . . . . . . . . . . . . . . . . . . . . . . 183
9.2.7 Toxins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
9.2.8 Vertical Search Engines . . . . . . . . . . . . . . . . . . . . . . . 184
9.3 Search Engine Optimization and Spam . . . . . . . . . . . . . . . . . . . 185
9.4 The Role of Ranking Updates . . . . . . . . . . . . . . . . . . . . . . . . . 185
9.5 Search Engine Optimization for Special Collections . . . . . . . . . 186
9.6 The Position of Search Engine Providers . . . . . . . . . . . . . . . . . 187
9.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
10 Search Engine Advertising (SEA) . . . . . . . . . . . . . . . . . . . . . . . . . . 191
10.1 Specifics of Search Engine Advertising . . . . . . . . . . . . . . . . . . 194
10.2 Functionality and Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
10.3 Distinguishing Between Ads and Organic Results . . . . . . . . . . . 197
10.4 Advertising in Universal Search Results . . . . . . . . . . . . . . . . . . 198
10.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
11 Alternatives to Google . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
11.1 Overlap Between Results from Different Search Engines . . . . . . 204
11.2 Why Should One Use a Search Engine Other Than Google? . . . 204
11.2.1 Obtaining a “Second Opinion” . . . . . . . . . . . . . . . . . . 205
11.2.2 More or Additional Results . . . . . . . . . . . . . . . . . . . . . 205
11.2.3 Different Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
11.2.4 Better Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
11.2.5 Different Result Presentation . . . . . . . . . . . . . . . . . . . . 207
11.2.6 Different User Guidance . . . . . . . . . . . . . . . . . . . . . . . 207
11.2.7 Avoiding the Creation of User Profiles . . . . . . . . . . . . . 208
11.2.8 Alternative Search Options . . . . . . . . . . . . . . . . . . . . . 208
11.3 When Should One Use a Search Engine Other Than Google? . . . 208
11.4 Particularities of Google due to Its Market Dominance . . . . . . . 210
11.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
12 Search Skills . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
12.1 Source Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
12.2 Selecting the Right Keywords . . . . . . . . . . . . . . . . . . . . . . . . . 218
12.3 Boolean Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
12.4 Connecting Queries with Boolean Operators . . . . . . . . . . . . . . . 222
12.5 Advanced Search Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
12.6 Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
12.7 Complex Searches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
12.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Contents xi

13 Search Result Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231


13.1 Criteria for Evaluating Texts on the Web . . . . . . . . . . . . . . . . . 231
13.2 Human vs. Machine Inspection of Quality . . . . . . . . . . . . . . . . 232
13.3 Scientific Evaluation of Search Result Quality . . . . . . . . . . . . . . 235
13.3.1 Standard Test Design of Retrieval Effectiveness
Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
13.3.2 Measuring Retrieval Effectiveness Using
Click-Through Data . . . . . . . . . . . . . . . . . . . . . . . . . . 240
13.3.3 Evaluation in Interactive Information Retrieval . . . . . . . 241
13.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
14 The Deep Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
14.1 The Content of the Deep Web . . . . . . . . . . . . . . . . . . . . . . . . . 249
14.2 Sources vs. Content from Sources, Accessibility of Content
via the Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
14.3 The Size of the Deep Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
14.4 Areas of the Deep Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
14.5 Social Media as Deep Web Content . . . . . . . . . . . . . . . . . . . . . 256
14.6 What Role Does the Deep Web Play Today? . . . . . . . . . . . . . . 258
14.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
15 Search Engines Between Bias and Neutrality . . . . . . . . . . . . . . . . . 261
15.1 The Interests of Search Engine Providers . . . . . . . . . . . . . . . . . 262
15.2 Search Engine Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
15.3 The Effect of Search Engine Bias on Search Results . . . . . . . . . 265
15.4 Interest-Driven Presentation of Search Results . . . . . . . . . . . . . 267
15.5 What Would “Search Neutrality” Mean? . . . . . . . . . . . . . . . . . . 270
15.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
16 The Future of Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
16.1 Search as a Basic Technology . . . . . . . . . . . . . . . . . . . . . . . . . 276
16.2 Changes in Queries and Documents . . . . . . . . . . . . . . . . . . . . . 277
16.3 Better Understanding of Documents and Queries . . . . . . . . . . . . 278
16.4 The Economic Future of Search Engines . . . . . . . . . . . . . . . . . 278
16.5 The Societal Future of Search Engines . . . . . . . . . . . . . . . . . . . 279
16.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282

Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
Introduction
1

This book is about better understanding the search tools we use daily. Only when we
have a basic understanding of how search engines are constructed and how they
work can we use them effectively in our research.
However, not only the use of existing search engines is relevant here but also
what we can learn from well-known search engines like Google when we want to
build our own search systems. The starting point is that Web search engines are
currently the leading systems in terms of technology, setting the standards in terms
of both the search process and user behavior. Therefore, if we want to build our own
search systems, we must comply with the habits shaped by Web search engines,
whether we like it or not.
This book is an attempt to deal with the subject of search engines comprehen-
sively in the sense of looking at it from different angles:

1. Technology: First of all, search engines are technical systems. This involves the
gathering of the Web’s content as well as ranking and presenting the search
results.
2. Use: Search engines are not only shaped by their developers but also by their
users. Since the data generated during use is incorporated into the ranking of the
search results and the design of the user interface, usage significantly influences
how search engines are designed.
3. Web-based research: Although, in most cases, search engines are used in a
relatively simple way – and often not much more is needed for a successful
search – search engines are also tools for professional information research. The
fact that search engines are easy to use for everyone does not mean that every
search task can be easily solved using them.
4. Economy: Search engines are of great importance for content producers who
want to get their content on the market. Because they are central nodes in the
Web, they also play an important economic role. Here, the main focus is on search
engine visibility, which can be achieved through various online marketing
measures (such as search engine optimization and placing advertisements).

# The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 1


D. Lewandowski, Understanding Search Engines,
https://doi.org/10.1007/978-3-031-22789-9_1
2 1 Introduction

5. Society: Since search engines are the preferred means of searching for informa-
tion and are used massively every day, they also have an enormous significance
for knowledge acquisition in society. Among other things, this raises the question
of whether search results are credible and whether search engines play a role in
spreading misinformation and disinformation, often treated under the label of
“fake news.”

My fundamental thesis is that one is impossible without the other: we cannot


understand search engines as technical systems if we do not know their social
significance. Nor can we understand their social impact if we do not know the
underlying technology. Of course, one does not have to have the same detailed
knowledge in all areas; but one should achieve a solid basis.
Of course, an introductory book cannot cover the topics mentioned comprehen-
sively. My aim is instead to introduce the concepts central to discussing search
engines and provide the basic knowledge that makes a well-founded discussion of
search engines possible in the first place.

1.1 The Importance of Search Engines

In this book, I argue that search engines have an enormous social significance. This
can be explained, on the one hand, by their mass use and, on the other hand, by the
ranking and presentation of search results.
Search engines (like other services on the Internet) are used en masse. Their
importance lies in the fact that we use them to search for information actively. Every
time we enter a query, we reveal our interests. With every search engine result page
(SERP) that a search engine returns to us, there is a (technically mediated) interpre-
tation of both the query and the potentially relevant results. By performing these
interpretations in a particular way, a search engine conveys a specific impression of
the world of information found on the Web.
For every query, there is a result page that displays the results in a specific order.
Although, in theory, we can select from all these results, we rely heavily on the order
given by the search engine. De facto, we do not select from the possibly millions of
results found but only from the few displayed first.
If we consider this, societal questions arise, such as how diverse the search engine
market is: Is it okay to use only one search engine and have only one of many
possible views of the information universe for each query?
The importance of search engines has already been put into punchy titles such as
“Search Engine Society” (Halavais, 2018), “Society of the Query” (the title of a
conference series and a book; König & Rasch, 2014), and “The Googlization of
Everything” (Vaidhyanathan, 2011). Perhaps it is not necessary to go so far as to
proclaim Google, search engines, or queries as the determining factor of our society;
however, the enormous importance of search engines for our knowledge acquisition
can no longer be denied.
1.1 The Importance of Search Engines 3

If we look at the hard numbers, we see that search engines are the most popular
service on the Internet. We regard the Internet as a collection of protocols and
services, including e-mail, chat, and the File Transfer Protocol (FTP). It may seem
surprising that the use of search engines is at the top of the list when users are asked
about their activities on the Internet. Search engines are even more popular than
writing and reading e-mails. For instance, 76% of all Germans use a search engine at
least once a week, but “only” 65% read or write at least one e-mail during this time.
This data comes from the ARD/ZDF-Onlinestudie (Beisch & Schäfer, 2020), which
surveys the use of the Internet among the German population every year. Compara-
ble studies confirm the high frequency of search engine use: the Eurobarometer
study (European Commission, 2016) shows that 85% of all Internet users in
Germany use a search engine at least once a week; the figure for daily use is still
48%. Germany is below the averages of the EU countries (88% and 57%,
respectively).
Let’s look at the ARD/ZDF-Onlinestudie to see which other Internet services are
used particularly often. We find that, in addition to e-mail and search engines,
messengers (probably WhatsApp in particular) are the most popular. On the other
hand, social media services only reach 36%.
A second way of looking at this is to look at the most popular websites (Alexa.
com, 2021). Google is in the first and third place (google.com and google.de),
followed by YouTube (second place), Amazon, and eBay. It is striking that not
only Google is in the first place but eBay and Amazon are two major e-commerce
companies that not only offer numerous opportunities for browsing but also play a
major role in (product) searches.
The fact that search engines are a mass phenomenon can also be seen in the
number of daily queries. Market research companies estimate the number of queries
sent to Google alone at around 3.3 trillion in 2016 (Internet Live Stats & Statistic
Brain Research Institute, 2017) – that’s more than a million queries per second!
An additional level of consideration arises when we look at how users access
information on the World Wide Web. While there are theoretically many access
points to information on the Web, search engines are the most prevalent. On the one
hand, Web pages can, of course, be accessed directly by typing the address (Uniform
Resource Locator; URL) into the browser bar. Then there are other services, such as
social media services, which also lead us to websites. But none of these services has
achieved a level of importance comparable to that of search engines for accessing
information on the Web, nor is this situation likely to change in the foreseeable
future.
Last but not least, search engines are also significant because of the online
advertising market. The sale of ads in search engines (ads in response to a query)
accounts for 40% of the market (Zenith, 2021); in Germany alone, search engine
advertising generated sales of 4.1 billion euros in 2019 (Statista, 2021).
This form of advertising is particularly attractive because, with each search query,
users reveal what they want to find and thus also whether and what they might want
to buy. This makes it easy for advertisers to decide when they want to offer their
product to a user. Scatter losses, i.e., the proportion of users who see an
4 1 Introduction

advertisement but have no interest in it at that moment, can be significantly reduced


or even avoided altogether in this way.
Search engine providers, like other companies, have to earn money. So far, the
only model search engines have used to make money is the insertion of advertising
in the form of text ads around the search results. Other revenue models have not
caught on. This also means that search engine providers do not, as is often claimed,
align their search engines solely with the demands and needs of users but also with
their own profit intentions and those of their advertising customers.
For companies, however, the importance of search engines is not only a result of
being able to use search engines as an advertising platform but also because of being
found by users in the organic search results. The procedures that serve to increase the
probability of being found are subsumed under the title of search engine
optimization.
Already at this point, we see that if we consider search engines not only as
technical systems but also as socially relevant, we are dealing with at least four
stakeholder groups or actor groups (see Röhle, 2010, p. 14):

1. Search engine providers: On the one hand, search engine providers are interested
in satisfying their users. This involves both the quality of the search results and
the user experience. On the other hand, search engine providers’ second major
(or even more significant?) interest is to offer their advertisers an attractive
environment and earn as much money as possible from advertising.
2. Users: The users’ interest is to obtain satisfactory search results with little effort
and not to be disturbed too much in their search process, for example, by intrusive
advertising.
3. Content producers: Anyone who offers content on the Web also wants to be found
by (potential) users. However, another interest of many content producers is to
earn money with their content. This, in turn, means that it is not necessarily in
their interest to make their content fully available to search engines.
4. Search engine optimizers: Search engine optimizers work on behalf of content
producers to ensure that their offerings can be found on the Web, primarily in
search engines. Their knowledge of the search engines’ ranking procedures and
their exploitation of these procedures to place “their” websites influence the
search engine providers, who attempt to protect themselves against manipulation.

This brief explanation of the stakeholders already shows that this interplay can
lead to conflicts. Search engine providers have to balance the interests of their users
and their advertisers; search engine optimizers have to ensure the maximum visibil-
ity of their clients’ offerings but must not exploit their knowledge of how search
engines work to such an extent that they are penalized by search engine providers for
manipulation.
Clearly, we are dealing with complex interactions in the search engine market.
Only if we look at search engines from different perspectives are we able to classify
these interactions and understand why search engines are designed the way they are.
1.2 A Book About Google? 5

Search engines have to meet the needs of different user groups; it is not enough for
them to restrict their services to one of these groups.
When we talk about search engines and their importance for information access,
we usually only consider the content initially produced for the Web. However,
search engines have been trying to include content from the “real,” i.e., the physical
world, in their search systems for years. Vaidhyanathan (2011) distinguishes three
types of content that search engines like Google capture:

1. Scan and link: External content is captured, aggregated, and made available for
search (e.g., Web search).
2. Host and serve: Users’ content is collected and hosted on their own platform (e.g.,
YouTube).
3. Scan and serve: Things from the real world are transferred into the digital world
by the search engine provider (e.g., Google Books, Google Street View).

Vaidhyanathan (2011) summarizes this under “The Googlization of Everything”


(which is also the title of his book) and thus illustrates not only that search engine
content goes beyond the content of the Web (even if this continues to form the basis)
but also that we are still at the beginning when it comes to the development of search
engines: So far, only a small part of all the information that is of interest to search
engines has been digitized and thus made available for search.
Furthermore, there is a second, largely taken-for-granted assumption, namely,
that a search process must necessarily contain a query entered by the user. However,
we see that search engines can increasingly generate queries by themselves by
observing the behavior of a user and then offering information that is very likely
to be useful to them. For example, suppose a user is walking through a city with their
smartphone in their pocket. In that case, it is easy to predict their desire for a meal
option at lunchtime and suggest a restaurant based on that user’s known past
preferences and current location. To do this, a query (made up of the above
information) is required, but the user does not have to enter it themselves. We will
return to this in Chap. 4.

1.2 A Book About Google?

When we think of search engines, we primarily think of Google. We all use this
search engine almost daily, usually for all kinds of search purposes. Here, again, the
figures speak for themselves: in Germany, well over 90% of all queries to general
search engines are directed to Google, while other search engines play only a minor
role (Statcounter, 2021).
Therefore, this book is based on everyday experience with Google and tries to
explain the structure and use of search engines using this well-known example.
Nevertheless, this book aims to go further: to show which alternatives to Google
there are and when it is worthwhile to use them. But this book will not describe all
possible search engines; it is rather about introducing other search engines, utilizing
Another Random Scribd Document
with Unrelated Content
roach. U, uter­us; s, membrane, which are pressed
sperma­theca. The nerve- together when the parts are at rest,
cord is in­tro­duced into both are stiffened by chitinous
fig­ures. thickenings.
If the succeeding sterna retained
their proper place, as they do in
some Orthoptera (e.g., the Mole
Cricket), the 8th and 9th sterna
would project beyond the 7th, while
the rectum would open beneath the
last tergum, and the uterus between
the 8th and 9th sterna. In the adult
female Cockroach, however, the 8th
and 9th somites are telescoped into
the 7th, and completely hidden by it.
Their terga are reduced to narrow
bands. The 8th sternum forms a
Fig. 97.—Hinder end of ab­‐ semi-transparent plate which slopes
do­men of fe­male Cock­‐ downwards and backwards, and is
roach. In the upper fig­ure pierced by a vertical slit, the outlet
the halves of the 7th ster­‐ of the uterus. The upper edge of
num are closed; in the this sternum is hinged upon the
lower fig­ure they are open. projecting basis of the anterior
gonapophyses (to be described
immediately), and the parts form a kind of spring joint, ordinarily
closed, but capable of being opened wide upon occasion. The 9th
sternum is a small median crescentic plate, distinct from the 8th; it
supports the spermatheca, whose duct traverses an oval plate which
projects from the fore-edge of the sternum.
By the telescoping of the 8th and 9th somites the sterna take the
position shown in fig. 96B, and a new cavity, the genital pouch, is
formed by invagination. This receives the extremity of the body of
the male during copulation, while it
serves as a mould in which the egg-
capsule is cast during oviposition. Its
chitinous lining resembles that of the
outer integument. The uterus opens
into its anterior end, which is
bounded by the 8th sternum; the
spermatheca opens into its roof,
which is supported by the 9th
sternum and the gonapophyses;
while its floor is completed by the
7th sternum and the infolded
chitinous membrane.
A pair of appendages (anterior
gonapophyses) are shown by the Fig. 98.—External
development of the parts to belong Reproductive Organs of Fe­‐
to the 8th somite. They are slender, male. T8, &c., terga; S7,
irregularly bent, and curved inwards &c., sterna; G, an­ter­ior
at the tips. A small, forked, chitinous gona­pophy­sis; G′, its base;
slip connects them with both the 8th g, pos­ter­ior gona­pophy­ses;
and 9th terga, but their principal Od, ovi­duct; sp, sperma­‐
attachment is to the upper (properly, theca; R, rec­tum. The
posterior) edge of the 8th sternum. upper fig­ure shows the
The anterior gonapophyses expand parts in oblique profile; the
at their bases into broad horizontal left lower fig­ure is an
plates, which form part of the roof oblique view from before of
of the genital pouch. the outlet of the uter­us,
Two pairs of appendages, belonging the anterior gona­pophy­ses
to the 9th somite, form the posterior being cut short; the right
gonapophyses. The outer pair are lower fig­ure shows the
relatively large, soft, and curved: the gona­pophy­ses. Arrows indi­‐
inner narrow, hard, and straight. 167
The anterior gonapophyses form the cate the out­let of the ovi­‐
lower, and the posterior the upper duct and uter­us.
jaw of a forceps, which in many
Insects can be protruded beyond the body. Some of the parts are
often armed with teeth, and the primary use of the apparatus is to
bore holes in earth or wood for the reception of the eggs. Hence the
apparatus is often called the ovipositor. It forms a prominent
appendage of the abdomen in such Insects as Crickets, Saw-flies,
Sirex, and Ichneumons. The sting of the Bee is a peculiar adaptation
of the same organ to a very different purpose. In the Cockroach the
ovipositor is used to grasp the egg-capsule, while it is being formed,
filled with eggs, and hardened; and the notched edge (fig. 5, p. 23)
is the imprint of the inner posterior gonapophyses, made while the
capsule is still soft. The shape of the parts in the male and female
indicates that the ovipositor is passive in copulation, and is then
raised to allow access to the spermatheca.

Male Reproductive Organs.


The male reproductive organs of Insects, in spite of very great
superficial diversity, are reducible to a common type, which is
exemplified by certain Coleoptera. The essential parts are (1) the
testes, which in their simplest form are paired, convoluted tubes;
more commonly they branch into many tubules or vesiculæ, while
they may become consolidated into a single organ; (2) long coiled
vasa deferentia, opening into or close to (3) paired vesiculæ
seminales, which discharge into (4) the ejaculatory duct, a muscular
tube, with chitinous lining, by which the spermatozoa are forcibly
expelled. Opening into the vesiculæ seminales, the ejaculatory duct,
or by a distinct external orifice, may be found (5) accessory glands,
very variable in form, size, and number. More than one set may
occur in the same Insect. To these parts, which are rarely deficient,
are very often appended an external armature of hooks or claspers.
The male Cockroach will be found to
agree with this description. It
presents, however, two peculiarities
which are uncommon, though not
unparalleled. In the first place the
testes are functional only in the
young male. They subsequently
shrivel, and are functionally replaced
by the vesiculæ seminales and their
appendages, where the later
transformations of the sperm-cells
are effected. The atrophied testes
Fig. 99.—1. Male Organs,
are nevertheless sufficiently large in
ventral view. Ts, testis; VD,
the adult to be easily made out.
vas deferens; DE, duct­us
Secondly, the accessory glands are
ejac­ulat­orius; U, utric­uli
numerous, and differ both in
maj­ores; u, utric­uli brevi­‐
function and insertion. Two sets are
ores. 2. Do., dor­sal view,
attached to the vesiculæ seminales,
show­ing ter­min­ation of
and the fore end of the ejaculatory
vasa defer­ent­ia. 3. Con­‐
duct (utriculi majores and
glob­ate gland, and its duct.
breviores); another large conglobate
× 8.
gland opens separately to the
exterior. We shall now describe the
structure of these parts in more detail. 168
The testes may be found in older larvæ or adults beneath the fifth
and sixth terga of the abdomen. They lie in the fat-body, from which
they are not very readily distinguished. Each testis consists of 30–40
rounded vesicles attached by short tubes to the vas deferens. 169 The
wall of the testis consists of a peritoneal layer and an epithelium,
which is folded inwards along transverse lines. The cells of the
epithelium give rise to spermatocysts, 170 which enclose sperm cells.
By division of the nuclei of the sperm cells spermatozoa are formed,
which have at first nucleated heads
and long tails. Subsequently the
enlarged heads disappear. The
spermatozoa move actively. In adult
males the testes undergo atrophy,
but can with care be discovered in
the enveloping fat-body.
The vasa deferentia are about ·25
inch in length. They pass backwards
from the testes, then turn Fig. 100.—Male Organs,
downwards on each side of the large side view. T7, seventh ter­‐
intestine, and finally curve upwards gum; S7, seventh ster­num;
and forwards, entering the vesiculæ Ts, DE, as before. A, B, see
seminales on their dorsal side. Each fig. 102. × 8.
vas deferens divides once or twice
into branches, which immediately reunite; in the last larval stage the
termination of the passage dilates into a rounded, transparent
vesicle.
The vesiculæ seminales are simple, rounded lobes in the pupa
(fig. 101), but their appearance is greatly altered in the adult by the
development of two sets of utricles (modified accessory glands). The
longer utricles (utriculi majores) open separately into the sides of the
vesiculæ; nearer to the middle line are the shorter and more
numerous utriculi breviores, which open into the fore part of the
vesiculæ.
The utricles form the “mushroom-shaped gland” of Huxley, which
was long described as the testis. In the adult male the utricles are
usually distended with spermatozoa, and of a brilliant opaque white.
The ejaculatory duct is about ·15 inch long, and overlies the 6th-9th
sterna. It is wide in front, where it receives the paired outlets of the
vesiculæ seminales. Further back it narrows, and widens again near
to its outlet, which we find to be between the external chitinous
parts, and not into the penis, as described by Brehm. The duct
possesses a muscular wall for the forcible ejection of its contents,
and in accordance with its origin as a folding-in of the outer surface,
it is provided with a chitinous lining. In the adult the fore part of the
duct may be distended with spermatozoa.
The ejaculatory duct is originally double (p. 194), and its internal
cavity is still subdivided in the last larval stage or so-called “pupa.”
Upon the ventral surface of the ejaculatory duct lies an accessory
gland of unknown function; it is “composed of dichotomous,
monilated tubes, lined by a columnar epithelium, all bound together
by a common investment into a flattened, elongated mass.” 171 The
duct of this gland does not enter the penis, as described by Brehm,
but opens upon a double hook, which forms part of the external
genital armature (fig. 99, 3). It may be convenient to distinguish this
as the “conglobate gland.” 172
Fig. 101.—Vesiculæ Seminales
and Duct­us Ejacu­lator­ius of
Pupa. VD, vas de­fer­ens. × 28.
Fig. 102.—External Male Organs,
sep­arat­ed. The letter­ing agrees
with Brehm’s fig­ures. A, titil­lator;
B, penis; C-F, hooks and plates.
× 8.

The external reproductive organs of the male Cockroach are


concealed within the 9th sternum. The so-called penis (fig. 102) is
long, slender, and dilated at the end. It is not perforated, and we do
not understand its use, though it probably conveys the semen.
The “titillator” (Brunner von Wattenwyl) is a solid curved hook with a
hollow base. Besides these, are several odd-shaped, unsymmetrical
pieces (fig. 102, C, D, E, F), moved by special muscles. A pair of
styles (see figs. 32–3 and 103) project from the hinder edge of the
9th sternum. These paired and unpaired appendages are believed to
open the genital pouch of the female, but we do not understand
their action in detail. 173
Brehm observes that the male
reproductive organs of the
Cockroach are most nearly paralleled
by those of the Mantidæ. A free
penis occurs in all Orthoptera,
except Acridiidæ and Phasmidæ.
The male organs of the House
Cricket will be found much easier to
understand than those of the
Cockroach. The testes are of
irregular, oval figure, the vasa Fig. 103.—The Tenth
deferentia very long, tortuous, and Tergum reflect­ed to show
enlarged towards the middle of their the ex­ter­nal male organs in
length. The vesiculæ seminales bear situ. T10, tenth ter­gum; p,
many utriculi majores et breviores. podical plates; A-F, as in
The penis is of simple form, and fig. 102; S, sub-anal styles.
dilated at the end. The titillator is × 8.
broad, but produced into a slender
prong, which projects beyond the penis. A pair of subanal styles is
found, but the unpaired hooklets are wanting or very inconspicuous.
Very little is known about the act of copulation among Cockroaches,
and the opportunities of observation are few. The following account
is given by Cornelius (loc. cit., p. 22):—
“The male and female Cockroaches associate in pairs, the
females being generally quiet. The male, on the contrary,
bustles about the female, runs round her, trailing his
extended abdomen on the ground, and now and then raises
his wings. If the female moves away, the male stops the road.
At last, when the female has become perfectly still, the male
goes in front of her, brings the end of his abdomen towards
her, then moves backwards, and pushes his whole length
under the female. The operation is so rapid that it is
impossible to give an exact account of the circumstances.
Then the male creeps out from beneath the female, raises
high both pairs of wings, depresses them again, and goes off,
while the female usually remains quiet for some time.”
CHAPTER X.
Development.

SPECIAL REFERENCES.
Rathke. Zur Entwickelungsgesch. der Blatta germanica. Meckel’s Arch. of
Anat. u. Phys., Bd. VI. (1832).
Balfour. Comparative Embryology, 2 vols. (1880–1).
Graber. Insekten, Vol. II. (1879).
Lubbock. Origin and Metamorphoses of Insects (1874).
Kowalewsky. Embryol. Studien an Würmern u. Arthropoden. Mém. Ac.
Petersb. Sér. VII., Vol. XVI. (1871).
Weismann. Entw. der Dipteren. Zeits. f. wiss. Zool., Bde. XIII., XIV. (1863–
4).
Metschnikoff. Embryol. Studien. an Insecten. Ib., Bd. XVI. (1866).
Bütschli. Entwicklungsgeschichte der Biene. Ib., Bd. XX. (1870).
Bobretzky. Bildung d. Blastoderms u. d. Keimblätter bei den Insecten. Ib.,
Bd. XXXI. (1878).
Nusbaum. Rozwój przewodów organów pteiowych u owadów (Polish).
Kosmos. (1884). [Development of Sexual Outlets in Insects.]
---- Struna i struna Leydig’a u owadów (Polish). Kosmos (1886). [Chorda
and Leydig’s chorda in Insects.]

The Embryonic Development of the


Cockroach. 174
By JOSEPH NUSBAUM, Magister of Zoology, Warsaw.
The development of the Cockroach is by no means an easy study. It
costs some pains to find an accessible place in which the females
regularly lay their eggs, and the opaque capsule renders it hard to
tell in what stage of growth the contained embryos will be found.
Accordingly, though the development of the Cockroach has lately
attracted some observers, the inexperienced embryologist will find it
more profitable to examine the eggs of Bees, of Aphides, or of such
Diptera as lay their eggs in water.
The Cockroach is developed, like most animals, from fertilised
eggs. 175 The eggs of various animals differ much in size and form,
but always contain a formative plasma or egg-protoplasm, a
germinal vesicle (nucleus), and a germinal spot (nucleolus). Besides
these essential parts, eggs also always contain a greater or less
quantity of food-yolk, which serves for the supply of the developing
embryo. The quantity of this yolk may be small, and its granules are
then uniformly dispersed through the egg-protoplasm; or very
considerable, in which case the protoplasm and yolk become more
or less sharply defined. Eggs of the first kind are known as
holoblastic, those of the second kind as meroblastic, names
suggested by the complete or partial segmentation which these
kinds of eggs respectively undergo. When the food-yolk is very
abundant it does not at first (and in some cases does not at any
time) exhibit the phenomena of growth, such as cell-division. If, on
the other hand, the yolk is scanty and evenly dispersed through the
egg-protoplasm, the segmentation proceeds regularly and
completely. The eggs of Arthropoda, including those of the
Cockroach, are meroblastic.
The eggs of the Cockroach (P. orientalis) are enclosed (see p. 23)
sixteen together in stout capsules of horny consistence. They are
adapted to the form of the capsule, laterally compressed, convex on
the outer, and concave on the inner side. The ventral surface of the
embryo lies towards the inner, concave surface of the egg. Each egg
is provided with a very thin brownish shell (chorion), whose surface
is ornamented with small six-sided projections. In young eggs, still
enclosed within the ovary, the nucleus (germinal vesicle) and
nucleolus (germinal spot) can be plainly seen, but by the time they
are ready for deposition within the capsule, so large a quantity of
food-yolk, at first finely—afterwards coarsely—granular, accumulates
within them, that the germinal vesicle and spot cease to be visible.
Since the yolk of the newly-laid egg of the Cockroach is of a
consistence extremely unfavourable to hardening and microscopic
investigation, I have not been able to obtain transverse sections of
the germinal vesicle, nor to study the mode of its division
(segmentation). If, however, we may judge from what other
observers have found in the eggs of Insects more suitable for
investigation than those of the Cockroach, we shall be led to
conclude that a germinal vesicle, with a germinal spot surrounded by
a thin layer of protoplasm, lies within the nutritive yolk of the
Cockroach egg. From this protoplasm all the cells of the embryo are
derived.
The germinal vesicle, together with the
surrounding protoplasm, undergoes a
process of division or segmentation. Some of
the cells thus formed travel towards the
surface of the egg to form a thin layer of
flattened cells investing the yolk, the so-
called blastoderm, while others remain
scattered through the yolk, and constitute
the yolk-cells (fig. 107).
On the future
Fig. 104.—Ventral ventral side of
Plate of Blatta the embryo
germanica, with (and therefore
de­vel­op­ing ap­‐ on the concave
pend­ages, seen surface of the Fig. 105.—Ventral Plate
from below. × 20. egg) the cells of B. ger­man­ica, side
of the view. × 20.
blastoderm become columnar, and
here is formed the so-called ventral plate, the first indication of the
embryo. This is a long narrow flattened structure (fig. 104). It is
wider in front where the head segment is situated; further back it
becomes divided by many transverse lines into the primitive
segments. The total number of segments in the ventral plate of
Insects is usually seventeen. 176 Indications of the appendages
appear very early. They give rise to an unpaired labrum, paired
antennæ, mandibles, and maxillæ (two pairs). The first and second
pair of maxillæ have originally, according to Patten, 177 two and three
branches respectively. Behind the mouth-parts are found three
rudimentary legs. Upon all the abdominal segments, according to
Patten, rudimentary limbs are formed; but these soon disappear,
except one pair, which persists for a time in the form of a knobbed
stalk; subsequently this, too, completely disappears. Three or four of
the hindmost segments curve under the ventral surface of the
embryo, and apparently (?) give rise to the modified segments and
appendages of the extremity of the abdomen (fig. 105). The ventral
plate lies at first directly beneath the egg membrane (chorion), but
afterwards becomes sunk in the yolk, so that a portion of the yolk
makes its way between the ventral plate and the chorion. Whilst this
portion of the yolk is perfectly homogeneous, the remainder, placed
internally to it, becomes coarsely granular, and encloses many
roundish cavities and yolk-cells. The middle region of the body is
more deeply sunk in the yolk than the two ends, and the embryo
thus assumes a curved position (fig. 105).
Fig. 106.—Diagram to illustrate the formations
of the Em­bry­on­ic Mem­branes. A, am­nion; S,
serous en­vel­ope; B, blasto­derm.
Fig. 107.—
Transverse sec­tion
through young Em­‐
bryo of B. ger­man­‐
ica. E, epi­blast; M,
meso­blast; Y, yolk-
cells.

This curvature of the embryo is closely connected with the formation


of the embryonic membranes. On either side of the ventral plate a
fold of the blastoderm arises, and these folds grow towards each
other beneath the chorion. Ultimately they meet along the middle
line of the ventral plate (fig. 106), and thus form a double
investment, the outer layer being the serous envelope, the inner the
amnion. Between the two the yolk passes in, as has been explained
above (fig. 107).
At the same time that the embryonic membranes are forming, the
embryonic layers make their appearance. The ventral plate, which
was originally one-layered, forms the epiblast or outer layer of the
embryo, and from this are subsequently derived the middle layer
(mesoblast) and the deep layer (hypoblast).

Fig. 108.—Diagram to illustrate the for­ma­tion


of the Ger­min­al Lay­ers. E, epi­blast; M, meso­‐
blast.

As to the origin of the mesoblast most observers have found 178 that
a long groove (the germinal groove) appears in the middle line of
the ventral plate (fig. 108), which bulges into the yolk, gradually
detaches itself from the epiblast, and completes itself into a tube.
The lumen of this tube soon becomes filled with cells, and the solid
cellular mass thus formed divides into two longitudinal tracts, which
lie right and left of the middle line of the ventral plate beneath the
epiblast, and are known as the mesoblastic bands. In the Cockroach
I was able to satisfy myself that in this Insect also, the mesoblast, in
all probability, arises by the formation and closure of a similar groove
of the epiblast. M (fig. 108) represents the stage in which the lumen
of the groove has disappeared, and the mesoblast forms a solid
cellular mass.
The origin of the hypoblast in Insects has not as yet been clearly
determined. Two quite different views on this subject have found
support. Some observers (Bobretsky, Graber, and others) maintain
that the hypoblast originates in the yolk-cells, which form a
superficial layer investing the rest of the yolk. Others (especially
Kowalewsky 179) believe that the process is altogether different.
According to the latest observations of the eminent embryologist just
named, upon the development of the Muscidæ, the germinal groove
gives rise, not only to the two mesoblastic bands, but also, in its
central region, to the hypoblast. This makes its appearance,
however, not as a continuous layer, but as two hourglass-shaped
rudiments, one at the anterior, the other at the posterior end of the
ventral plate. These rudiments have their convex ends directed away
from each other, while their edges are approximated and gradually
meet so as to form a continuous hypoblast beneath the mesoblast.
Although I have not been able completely to satisfy myself as to the
mode of formation of the hypoblast in the Cockroach, I have
observed stages of development which lead me to suppose that it
proceeds in this Insect in a manner similar to that observed by
Kowalewsky in Muscidæ. The hourglass-shaped rudiments of the
hypoblast become pushed upwards by those foldings-in of the
epiblast which form towards the anterior and posterior ends of the
embryo, and give rise to the stomodæum and proctodæum. 180
The stage of development in which the germinal groove appears, by
the folding inwards of the epiblast, has been observed in many other
animals, and is known as the Gastræa-stage. In all higher types
(Vertebrates, the higher Worms, Arthropoda, Echinodermata) the
mesoblast and hypoblast are formed in the folded-in part of the
Gastræa in a manner similar to that observed in Insects.
The yolk-cells, which some observers have supposed to form the
hypoblast, are believed by Kowalewsky to have no other function
except that of the disintegration and solution of the yolk. I can,
however, with confidence affirm that in the Cockroach these cells
take part in the formation of permanent tissues (see below).
Each of the two mesoblastic bands which lie right and left of the
germinal groove divides into many successive somites, and each of
these becomes hollow. Every such somite consists of an inner
(dorsal) one-layered and an outer (ventral) many-layered wall, the
latter being in contact with the epiblast. The cavities of all the
somites unite to form a common cavity, the cœlom or perivisceral
space of the Cockroach. The cœlom, like the cavities in which it
originates, is bounded by two layers of mesoblast—an inner, the so-
called splanchnic or visceral layer, which lies on the outer side of the
hypoblast, and an outer somatic or parietal layer, beneath the
epiblast. There are accordingly four layers in the Cockroach-embryo
—viz., (1) epiblast, from which the integument and nervous system
are developed; (2) somatic layer of mesoblast, mainly converted into
the muscles of the body-wall; (3) splanchnic layer of mesoblast,
yielding the muscular coat of the alimentary canal; and (4)
hypoblast, yielding the epithelium of the mesenteron.
Fig. 109.—Transverse sections of Em­bryo of B.
ger­man­ica, with rudi­men­tary ner­vous sys­tem
(Oc. 4, Obj. D.D. Zeiss). N, ner­vous sys­tem; M,
meso­blast­ic so­mites.

Scattered yolk-cells associate themselves with the mesoblast cells, so


that the constituents of the mesoblast have a two-fold origin.
Fig. 109 shows that the yolk-cells are large, finely granular, and
provided with many (3–6) nuclei and nucleoli. They send out many
branching protoplasmic threads, which connect the different cells
together, and thus form a cellular network. Certain cells separate
themselves from the rest, apply themselves to the walls of the
somites, and form a provisional diaphragm (fig. 110, D) consisting of
a layer of flattened cells; 181 other cells (fig. 109) pass into and
through the walls of the somites, and reach their central cavity,
where they increase in number and blend with the mesoblast cells.
What finally becomes of them I cannot say; perhaps they form the
fat-body.
Fig. 110.—Transverse section
through ven­tral re­gion of Em­bryo
of B. ger­man­ica. The nerve-cord
has by this time de­tached itself
from the epi­blast, E. D is the tem­‐
por­ary dia­phragm; Ch, tem­por­ary
cel­lu­lar band, from which the
neuri­lemma pro­ceeds; Ap, ap­‐
pend­ages in sec­tion; M, meso­‐
blast; N, nerve-cord. (Oc. 4. Obj.
BB. Zeiss).
Fig. 111.—Transverse
section of older Em­‐
bryo of B. ger­man­ica
(ab­do­men). E, Epi­‐
blast; H, hypo­blast;
Ht, heart; G, re­pro­‐
duc­tive or­gans; S,
spher­ic­al gran­ules.

The ventral plate occupies, as I have explained, the future ventral


surface of the Insect, and here only at first both the embryonic
membranes are to be met with. On the sides and above the yolk is
invested by the serous envelope alone. The ventral plate, however,
gradually extends upwards upon the sides of the egg, in the
directions of the arrows (fig. 107), and finally closes upon the dorsal
surface of the embryo, so as completely to invest the whole yolk.
Every segment of the embryo shows at a certain stage numerous
clusters of spherical granules, which according to Patten (loc. cit.)
are composed of urates (fig. 111, S).
We shall now proceed to consider the development of the several
organs of the Cockroach.
Nervous System.—Along the middle
line of the whole ventral surface
there is formed a somewhat deep
groove-like infolding of the epiblast,
bounded on either side by paired
solid thickenings, which detach
themselves from the epiblast Fig. 112.—Transverse
(fig. 110, N) and constitute the section of Nerve-cord of
double nervous chain. In many other Embryo of B. germanica
Insects a median cord (from which (Oc. 4, Obj. D.D. Zeiss). C,
are derived the transverse cellular layer; F, fibrillar
interganglionic commissures) forms substance (punkt-
along the bottom of the nervous substance of Leydig); Ch,
fold. This secondary median fold is cellular band; N′ N″ inner
very inconspicuous and slightly and outer neurilemma.
developed in the Cockroach, so that
the transverse commissures between the developing ganglia are
mainly contributed by the cellular substance of the lateral nervous
band. The brain is formed out of two epiblastic thickenings which
occupy shallow depressions. The so-called inner neurilemma, which
surrounds the ventral nerve-cord, is developed as follows:—Along
the ventral nerve-cord, and between its lateral halves, a small solid
cellular band (fig. 110, Ch) is developed out of the mesoblastic
diaphragm described above. This grows round the ventral nerve-cord
on all sides (fig. 112, N′), passing also inwards between the central
fibrillar tract and the outer cellular layer, and thus forming the thin
membrane which invests the central nervous mass (fig. 112, N″).
The above-mentioned solid mesoblastic band, which exists for a very
short time only, may perhaps be homologised with the chorda
dorsalis of Vertebrates, and the chorda of the higher Worms, since in
these types also the chorda forms a solid cellular band of meso-
hypoblastic origin, lying between the nervous system and the
hypoblast. The peripheral nerves arise as direct prolongations of the
fibrillar substance of the nerve-cord.
Alimentary Canal.—The epithelium of the mesenteron is formed out
of the hypoblast, whose cells assume a cubical form and gradually
absorb the yolk. The epithelium of the stomodæum and
proctodæum is derived, however, from two epiblastic involutions at
the fore and hind ends of the embryo. The muscular coat of the
alimentary canal is contributed by the splanchnic layer of the
mesoblast. The mesenteron in an early stage of development
appears as an oval sac of greenish colour (fig. 113), faintly seen
through the body-wall. The cæcal tubes are extensions of the
mesenteron, the Malpighian tubules of the proctodæum. The
epiblastic invaginations may be recognised in all stages of growth by
their chitinous lining and layer of chitinogenous cells, continuous
with the similar layers in the external integument.
Tracheal System.—Tubular infoldings of the epiblast, forming at
regular intervals along the sides of the embryo and projecting into
the somatic mesoblast, give rise to the paired tracheal tubes, which
are at first simple and distinct from one another. 182
Heart.—The wall of the heart in Insects is of mesoblastic origin, and
develops from paired rudiments derived from that peripheral part of
each mesoblastic band which unites the somatic to the splanchnic
layer. In this layer two lateral semi-cylindrical rudiments appear,
which, as the mesoblastic bands meet on the dorsal surface of the
embryo, are brought into contact and unite to form the heart
(fig. 111). The heart is therefore hollow from the first, its cavity not
being constricted off from the permanent perivisceral space enclosed
by the mesoblast, but being a vestige of
the primitive embryonic blastocœl,
which is bounded by the epiblast, as
well as by the two other embryonic
layers. Such a mode of the development
of the heart was observed by Bütschli in
the Bee, and by Korotneff in the Mole
Cricket. I am convinced, from my own
observations, that the heart of the
Cockroach originates in this way, though
it is to be observed that, in consequence
of Patten’s results, 183 the question
requires further investigation. According
Fig. 113.—Alimentary to Patten the mesoblastic layers of the
Canal of Em­bryo of B. embryo pulsate rhythmically long before
ger­man­ica. Copied the formation of the heart. Patten also
from Rathke, loc. cit., states that the blood-corpuscles are
but dif­fer­ently let­tered. partially derived from the wall of the
st, stomo­dæum, al- heart.
ready div­ided into
œsoph­agus, crop, and
giz­zard; m, mes­en­ter­‐
on; pr, procto­dæum,
with Mal­pigh­ian tub­ules
(re­moved on the right
side). × 12.

You might also like