0% found this document useful (0 votes)
18 views3 pages

4 Design and Development

Best design concepts

Uploaded by

Srinivas D
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views3 pages

4 Design and Development

Best design concepts

Uploaded by

Srinivas D
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

CHAPTER 4

SYSTEM DESIGN AND DEVELOPMENT

4.1 Architectural Design

Fig. 4.1.1 Block diagram

Fig. 4.1.1 represents the workflow of an online web scraper built using Python. The
system extracts data from multiple websites and stores it either in a database or in a file.
The components involved in this process include:

● Web Sites (Web Site 1, Web Site 2, Web Site 3): These are the target websites
from which data is to be scraped. The scraper will access these websites to gather
the necessary information.
● Scraping Script: The core component of the system, the scraping script, is written
in Python. It is responsible for sending HTTP requests to the target websites,
parsing the HTML content of the web pages to extract the required data, handling
any errors or exceptions that occur during the scraping process, and transforming
the extracted data into a structured format.

Dept of ISE, BNMIT 2023-24Page 5


Web scraping
● Storage Options:

1. Database: The extracted data can be stored in a database for easy retrieval and
query. This can be implemented using various database management systems
like MySQL, PostgreSQL, MongoDB, etc.
2. File: Alternatively, the data can be saved in a file, such as a CSV or JSON file.
This is useful for smaller datasets or when a database is not necessary.

The workflow starts with the scraping script sending requests to the specified
websites (Web Site 1, Web Site 2, Web Site 3). The HTML content of these websites is
downloaded and parsed by the scraping script. The relevant data is extracted and
processed. The processed data is then either saved to a database for complex queries and
large datasets, or saved to a file for simpler use cases or smaller datasets. This system can
be expanded or modified as per the requirements, for example, by adding additional
websites or implementing more complex data processing or storage mechanisms.

Fig. 4.1.2 Flowchart

Dept of ISE, BNMIT 2023-24Page 7


Web scraping

This flowchart (Fig. 4.1.2) illustrates the sequential steps involved in the operation of
an online web scraper using Python. The process includes downloading the contents of
web pages, extracting the necessary data, storing the data, and analyzing the data. The
detailed steps are as follows:

● Downloading the Contents: The first step involves sending HTTP requests to the
target websites and downloading the HTML content of the web pages. This is the
initial step where the scraper accesses the web pages to gather the required data.
● Extracting the Data: Once the HTML content is downloaded, the next step is to
parse this content and extract the relevant data. This involves identifying and
extracting specific pieces of information from the web pages based on the defined
requirements.
● Storing the Data: After extracting the data, it needs to be stored in a structured
format. The data can be saved in a database for easy retrieval and complex
queries, or in a file (such as CSV or JSON) for simpler use cases and smaller
datasets.
● Analyzing the Data:
The final step involves analyzing the stored data to derive meaningful insights.

This could involve:

o Data Cleaning: Removing duplicates, handling missing values, and


correcting data inconsistencies.
o Data Visualization: Creating charts and graphs using libraries like
matplotlib or seaborn to visualize trends and patterns.
o Statistical Analysis: Performing statistical tests or calculations using
libraries like numpy or scipy.
o Machine Learning: Applying machine learning algorithms using libraries
like scikit-learn or tensorflow to make predictions or classify data.

This flowchart provides a clear visual representation of the entire web scraping
process, from initial data acquisition to final data analysis. Each step is crucial to ensure
the accuracy and usefulness of the extracted data.

Dept of ISE, BNMIT 2023-24Page 7

You might also like