0% found this document useful (0 votes)
337 views

Web Scraping

The document discusses web scraping, which is the process of automatically extracting data from websites. It describes some common web scraping tools like Import.io, ParseHub, BeautifulSoup, Scrapy, Selenium, and Pandas. It also outlines some uses of web scraping for data scientists such as data collection, data preparation, competitive intelligence, market research, and social media analysis. The document then explains the basic process a web scraper follows which involves making HTTP requests, extracting and parsing website code, and saving relevant data.

Uploaded by

zahraadokmak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
337 views

Web Scraping

The document discusses web scraping, which is the process of automatically extracting data from websites. It describes some common web scraping tools like Import.io, ParseHub, BeautifulSoup, Scrapy, Selenium, and Pandas. It also outlines some uses of web scraping for data scientists such as data collection, data preparation, competitive intelligence, market research, and social media analysis. The document then explains the basic process a web scraper follows which involves making HTTP requests, extracting and parsing website code, and saving relevant data.

Uploaded by

zahraadokmak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

LEBANESE UNIVERSITY

FACULTY OF INFORMATION
DATA SCIENCE

Web Scraping
DR. LINDA MAHMOUDI
BY: ZAHRAA DOKMAK
SARA DOKMAK
What is Web Scraping?
o Web scraping is the process of automatically extracting data from websites. It involves writing
code or using specialized tools to retrieve and parse the HTML or XML content of web pages. By
examining the structure and elements of the web pages, web scraping allows you to extract
specific information such as text, images, links, prices, or any other data present on the website.

o Web scraping can be useful for various purposes, including data analysis, research, data mining,
monitoring prices or product information, gathering contact information, aggregating news or
social media data, and much more. It provides a means to gather large amounts of data from
multiple websites efficiently and automate repetitive tasks.
Web Scraping Tools
There are several tools available for data scraping, catering to different needs and preferences. Here are some
commonly used tools for data scraping:

1. Import.io: Import.io is a cloud-based data extraction tool that enables users to scrape data from websites
using a visual interface or by writing custom code. It offers features like scheduling, data integration, and
data export.

2. ParseHub: ParseHub is a user-friendly web scraping tool that provides both visual interface and advanced
scraping options. It allows users to build scraping projects by selecting and training the data elements they
want to extract.

3. BeautifulSoup: BeautifulSoup is a Python library that simplifies the process of scraping data from HTML
and XML documents. It provides methods to parse and navigate the document structure, making it easier
to extract specific data elements.
Web Scraping Tools (Continued)
4. Scrapy: Scrapy is a powerful Python framework for web scraping. It offers a complete set of tools for
handling requests, managing cookies, and parsing HTML/XML responses. Scrapy is highly customizable
and suitable for complex scraping projects.

5. Selenium: Selenium is primarily used for browser automation but can also be used for web scraping. It
allows users to interact with web pages, fill out forms, and extract data by simulating browser behavior.

6. Pandas: Pandas is another multi-purpose Python library used for data manipulation and indexing.
It can be used to scrape the web in conjunction with BeautifulSoup. The main benefit of using
pandas is that analysts can carry out the entire data analytics process using one language (avoiding
the need to switch to other languages, such as R).
Uses of Data Scarping for Data Scientists
1. Data Collection: Data scientists often need large amounts of data for analysis and model training. Data scraping
allows them to collect data from various sources, including websites, social media platforms, forums, or any other
online repositories. This enables data scientists to gather diverse and relevant data for their projects without relying
solely on pre-existing datasets.

2. Data Preparation: Data scraping can assist in the data preparation phase. By scraping data from different sources,
data scientists can curate and preprocess the collected data to create a unified and structured dataset. This may
involve cleaning and transforming the scraped data, handling missing values, removing duplicates, and performing
other necessary data preprocessing tasks.

3. Competitive Intelligence: Data scientists can use data scraping to gather information about competitors in the
industry. They can scrape data from competitor websites, social media profiles, product listings, or pricing
information. This data can provide valuable insights into market trends, customer behavior, competitor strategies,
and other relevant information for competitive analysis.
Uses of Data Scarping for Data Scientists
4. Market Research: Data scraping enables data scientists to gather data related to market trends, consumer
reviews, product features, pricing information, or any other relevant market data. By scraping data from
various sources, they can analyze market dynamics, consumer preferences, and make data-driven
decisions for market research and business strategies.

5. Social Media Analysis: Data scientists can scrape data from social media platforms like Twitter, Facebook,
Instagram, or LinkedIn to analyze trends, user behavior, sentiment analysis, or build recommendation
systems. By collecting and analyzing social media data, data scientists can gain insights into customer
opinions, brand perception, user engagement, and other social media-related factors.
How does a Web Scraper Function?
All web scraping bots follow the three basic principles:

•Step 1: Making an HTTP request to a server

•Step 2: Extracting and parsing (or breaking down) the website’s code

•Step 3: Saving the relevant data locally


The web page I am going to scrape
Inspecting the page
The html code
Scraping the web page:
The csv file
THANK YOU!

You might also like