0% found this document useful (0 votes)

6 views8 pages

B_2 CIE Web Scraping

The document provides an overview of web scraping tools, detailing their components, processes, and coding examples for extracting data from websites, specifically eBay. It outlines the steps involved in web scraping, including identifying data, developing scripts, and storing results. Additionally, it includes Python code snippets for scraping product details and analyzing data using pandas.

Uploaded by

renehe1781

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views8 pages

B_2 CIE Web Scraping

Uploaded by

renehe1781

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Essentials of Data and Text Processing

Submitted By,
Jivani Dhairya (202203100110120),
Sanjana Kotadiya (202203100110175),
Tirthkumar Thummar (202203100110190),
Archie Koradia (202203100110197)

Guided By,
Ms. Jenisha Tailor

Uka Tarsadia University

January,2025
Chapter 1: Web Scraping Tool Introduction

Web scraping tools are software applications designed to extract data from
websites automatically. It enables users to retrieve data from web pages
and save it in usable format for analysis or other purposes.

Chapter 2: Web Scraping System & Data Gathered

The web scraping system typically consists of the following components:

1. Data Sources: Websites from which data will be scraped. These can include e-
commerce sites, social media platforms, news websites, government databases,
and more.
2. Web Scraping Tool: The software application used to automate the data
extraction process. This could be a custom script developed in-house or a third-
party tool like those mentioned in Chapter 1.
3. Data Storage: The destination where scraped data is stored. This could be a
local file, database, cloud storage service, or data warehouse.
4. Data Processing: Optional step where the scraped data is cleaned,
transformed.

Chapter 3: Web Scraping Steps

The web scraping process typically involves the following steps:

1. Identify Data
2. Inspect Web Page
3. Select Scraping Tool
4. Develop Scraping Script
5. Execute Scraping Script
6. Handle Errors
7. Store Scraped Data
Chapter 4: Codes

Doing Web Scraping

import requests
from bs4 import BeautifulSoup
import pandas as pd
import os
# Function to extract product details from individual product page
def extract_product_details(product_url):
response = requests.get(product_url)
if response.status_code == 200:
soup = BeautifulSoup(response.content, 'html.parser')

# Try to extract specific details (these selectors will need to be updated based on
the actual page structure)
try:
display_size = soup.find('li', {'class': 'd-item__attr-value'}).text.strip() # Example
selector for display size
except AttributeError:
display_size = 'N/A'

try:
battery_capacity = soup.find('li', {'class': 'd-item__attr-value'}).text.strip() #
Example selector for battery
except AttributeError:
battery_capacity = 'N/A'

try:
status = soup.find('span', class_='d-item__cond').text.strip() # Example selector
for status
except AttributeError:
status = 'N/A'

return display_size, battery_capacity, status

else:
return 'N/A', 'N/A', 'N/A'

# Define the eBay search URL (modify the search query to suit your needs)
url = "https://www.ebay.com/sch/i.html?_nkw=iphone&_sop=12" # Example: search
for iPhones

# Step 1: Send a request to the eBay search results page

response = requests.get(url)

# Check if the request was successful

if response.status_code == 200:
# Step 2: Parse the page content using BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')

# Step 3: Prepare a list to store product data

product_data = []

# Step 4: Find all product listings (based on the structure of the page)
listings = soup.find_all('li', class_='s-item') # This class might change, inspect the
actual structure

for item in listings:

try:
title = item.find('h3', class_='s-item_title').text.strip() # Product title
except AttributeError:
title = 'N/A'
try:
price = item.find('span', class_='s-item__price').text.strip() # Price
except AttributeError:
price = 'N/A'

try:
shipping = item.find('span', class_='s-item__shipping').text.strip() # Shipping info
except AttributeError:
shipping = 'N/A'

try:
condition = item.find('span', class_='s-item__condition').text.strip() # Product
condition
except AttributeError:
condition = 'N/A'

try:
link = item.find('a', class_='s-item__link')['href'] # Product URL
except (AttributeError, TypeError):
link = 'N/A'

# Fetch additional details from the product page

display_size, battery_capacity, status = extract_product_details(link)

# Add the extracted data to the list

product_data.append({
'Title': title,
'Price': price,
'Shipping': shipping,
'Condition': condition,
'Display Size': display_size,
'Battery Capacity': battery_capacity,
'Status': status,
'Product URL': link
})

# Step 5: Save the data to an Excel file on Desktop

if product_data:
# For Windows (change the path if you're on Mac/Linux)
desktop_path = os.path.join(os.path.expanduser('~'), 'Desktop',
'ebay_iphone_data.xlsx')

df = pd.DataFrame(product_data)
df.to_excel(desktop_path, index=False, engine='openpyxl')
print(f"Data saved to '{desktop_path}'")
else:
print("No product data found.")
else:
print(f"Failed to retrieve the page. Status Code: {response.status_code}")

Importing Data in Python:

import pandas as pd
# Replace with your actual file path
df = pd.read_csv(r'C:\Users\Admin\Downloads\ebay_iphone_data.csv')
# Display all rows of the dataframe
print(df)
# Optional: Check the total number of rows and columns
print(f"Dataset shape: {df.shape}")
Finding Mean, Median, Mode:
import pandas as pd
# Replace with your actual file path
df = pd.read_csv(r'C:\Users\Admin\Downloads\ebay_iphone_data.csv')
# Filter out rows where Shipping is "Free International Shipping"
filtered_df = df[df['Shipping'] != "Free International Shipping"]
# Select the specific columns
selected_columns = filtered_df[['Price', 'Shipping']
# Convert Shipping to numeric if it contains numeric values (e.g., "$10", "15")
# Ensure to handle currency symbols if they exist
selected_columns['Shipping'] = pd.to_numeric(selected_columns['Shipping'],
errors='coerce')
# Mean
mean_values = selected_columns.mean()
print("Mean:\n", mean_values)
# Median
median_values = selected_columns.median()
print("\nMedian:\n", median_values)

# Mode
mode_values = selected_columns.mode()
print("\nMode:\n", mode_values)

Chapter 5: Screenshots of Data Scraped

Chapter 6: References
https://www.ebay.com/

Selected Essays of Javed Ahmad Ghamidi
No ratings yet
Selected Essays of Javed Ahmad Ghamidi
197 pages
Richard Earl Lectures
No ratings yet
Richard Earl Lectures
86 pages
Curl Cheat Sheet
No ratings yet
Curl Cheat Sheet
7 pages
Web-Scraping-With-Python
No ratings yet
Web-Scraping-With-Python
16 pages
GRADE 12 SBA TASKS 2025
No ratings yet
GRADE 12 SBA TASKS 2025
84 pages
Web Scraping - PPT-1
100% (2)
Web Scraping - PPT-1
9 pages
The syntax of old Romanian 1st Edition Pană Dindelegan pdf download
No ratings yet
The syntax of old Romanian 1st Edition Pană Dindelegan pdf download
53 pages
Nature and Elements of Fiction
No ratings yet
Nature and Elements of Fiction
20 pages
Flipkart Web Scrapping
No ratings yet
Flipkart Web Scrapping
8 pages
Operation Systems, by Gary Nutt: Third Edition
No ratings yet
Operation Systems, by Gary Nutt: Third Edition
14 pages
Team one _20241214_203551_0000
No ratings yet
Team one _20241214_203551_0000
15 pages
Eda2s3axw Level1
No ratings yet
Eda2s3axw Level1
2 pages
3.8 Words To Express Contrast: Word How To Use It Example
No ratings yet
3.8 Words To Express Contrast: Word How To Use It Example
2 pages
Core Tutorial
No ratings yet
Core Tutorial
254 pages
3.1 Reselling - Code
No ratings yet
3.1 Reselling - Code
2 pages
P.d.1º Eso.
No ratings yet
P.d.1º Eso.
4 pages
Step 2
No ratings yet
Step 2
2 pages
Step 3
No ratings yet
Step 3
2 pages
Domestic Accidents: Their Cause and Prevention: Reflexive Pronouns
No ratings yet
Domestic Accidents: Their Cause and Prevention: Reflexive Pronouns
7 pages
Web Scraping Assignment Ebay
No ratings yet
Web Scraping Assignment Ebay
6 pages
Python Toolbox 100 Scripts for Developers Enhance Your Development Skills with Ready-to-Use Python Scripts (Sari, Serhan) (Z-Library)
No ratings yet
Python Toolbox 100 Scripts for Developers Enhance Your Development Skills with Ready-to-Use Python Scripts (Sari, Serhan) (Z-Library)
193 pages
vnprod
No ratings yet
vnprod
33 pages
myrecent_projects
No ratings yet
myrecent_projects
1 page
Section 6_ Gather Competitive Intelligence With Python
No ratings yet
Section 6_ Gather Competitive Intelligence With Python
30 pages
Assessment task_ Carbon38
No ratings yet
Assessment task_ Carbon38
5 pages
Product Info Scrapper
No ratings yet
Product Info Scrapper
18 pages
Position of Adverbs and Expressions of Frequency
No ratings yet
Position of Adverbs and Expressions of Frequency
4 pages
How To Scrape Product Data From Amazon - A Complete Guide - Oxylabs
No ratings yet
How To Scrape Product Data From Amazon - A Complete Guide - Oxylabs
19 pages
Cabico Tan
No ratings yet
Cabico Tan
11 pages
Web Scrapping Project Phase 4 1679950739
No ratings yet
Web Scrapping Project Phase 4 1679950739
12 pages
Scrapy Beginners Series Part 2 - Cleaning & Processing Data - ScrapeOps
No ratings yet
Scrapy Beginners Series Part 2 - Cleaning & Processing Data - ScrapeOps
10 pages
Unit - 2 Web Intelligence
No ratings yet
Unit - 2 Web Intelligence
12 pages
Team one _20241214_201830_0000
No ratings yet
Team one _20241214_201830_0000
14 pages
Benchmaster Documentation
No ratings yet
Benchmaster Documentation
12 pages
Vedic Origin and Classification of Script
100% (4)
Vedic Origin and Classification of Script
30 pages
Minor Project
No ratings yet
Minor Project
12 pages
dropdownlistscraping
No ratings yet
dropdownlistscraping
7 pages
Python Programming
No ratings yet
Python Programming
11 pages
Python_PPT(5)[1]
No ratings yet
Python_PPT(5)[1]
27 pages
Mango Details Web Scrapping: Project
No ratings yet
Mango Details Web Scrapping: Project
3 pages
1747399713103-1747037056197-webscraping
No ratings yet
1747399713103-1747037056197-webscraping
12 pages
DAP_4_module
No ratings yet
DAP_4_module
45 pages
Grammar 10. Adjectives and Adverbs of Manner Ngoc Minh
No ratings yet
Grammar 10. Adjectives and Adverbs of Manner Ngoc Minh
4 pages
UI21CS29_Lab2
No ratings yet
UI21CS29_Lab2
11 pages
6
No ratings yet
6
3 pages
Python scrapping task
No ratings yet
Python scrapping task
2 pages
Algerian Arabic Speech Database (Algasd) Corpus Design and Automatic Speech Recognition Application
No ratings yet
Algerian Arabic Speech Database (Algasd) Corpus Design and Automatic Speech Recognition Application
10 pages
Web Scrape For Barcodes
No ratings yet
Web Scrape For Barcodes
9 pages
3. Basic Web Scraping Example
No ratings yet
3. Basic Web Scraping Example
1 page
Rate Analogy
No ratings yet
Rate Analogy
9 pages
IP Project File
No ratings yet
IP Project File
25 pages
Love Medicine by Louise Erdrich
0% (1)
Love Medicine by Louise Erdrich
1 page
01 Web Data Analytics Pawan
No ratings yet
01 Web Data Analytics Pawan
55 pages
Web_Scrapping.ipynb - Colab
No ratings yet
Web_Scrapping.ipynb - Colab
7 pages
Team 7 Cse - B Journal Paper
No ratings yet
Team 7 Cse - B Journal Paper
6 pages
basic_scraping_techniques
No ratings yet
basic_scraping_techniques
7 pages
UI Ex 6 (61)-1
No ratings yet
UI Ex 6 (61)-1
3 pages
Balaji 1
No ratings yet
Balaji 1
30 pages
Workshop 2B: Web Scraping With Beautifulsoup 4: Comp20008 Elements of Data Processing
No ratings yet
Workshop 2B: Web Scraping With Beautifulsoup 4: Comp20008 Elements of Data Processing
5 pages
PEDIA Stickers 1
No ratings yet
PEDIA Stickers 1
10 pages
Untitled Document
No ratings yet
Untitled Document
2 pages
Prepare the Synopsis for the mini project for BTec
No ratings yet
Prepare the Synopsis for the mini project for BTec
3 pages
Yoga Sutras Basics 081205
100% (1)
Yoga Sutras Basics 081205
5 pages
Web Scraping and Data Collection CheatSheet 1731972399
No ratings yet
Web Scraping and Data Collection CheatSheet 1731972399
10 pages
Sari Serhan Python Toolbox 100 Scripts For Developers 2023
No ratings yet
Sari Serhan Python Toolbox 100 Scripts For Developers 2023
193 pages
19-5E8 Tushara Priya
No ratings yet
19-5E8 Tushara Priya
23 pages
web_scrapping_final[1]
No ratings yet
web_scrapping_final[1]
7 pages
Web Scraping
No ratings yet
Web Scraping
11 pages
II SEM - AI23231- POAI
No ratings yet
II SEM - AI23231- POAI
65 pages
scrapeez
No ratings yet
scrapeez
3 pages
The Holiness of God Omnibus I Study Guide
No ratings yet
The Holiness of God Omnibus I Study Guide
19 pages
Data Entry Guidelines
No ratings yet
Data Entry Guidelines
25 pages
Crosstalk-Webster & Castanon-Book 1
No ratings yet
Crosstalk-Webster & Castanon-Book 1
47 pages
BEOWULF
0% (1)
BEOWULF
7 pages
MA English CBCS - 3rd Sem Syllabus
No ratings yet
MA English CBCS - 3rd Sem Syllabus
9 pages
Poems by Alejandro Murguía
No ratings yet
Poems by Alejandro Murguía
4 pages
IIM PBA Assignment 2
No ratings yet
IIM PBA Assignment 2
3 pages
Beginner Guide To Web Scraping of Data
No ratings yet
Beginner Guide To Web Scraping of Data
14 pages
1.8 Data Scrapping PDF
No ratings yet
1.8 Data Scrapping PDF
42 pages
ADELE Hello + Present Perfect Practice
No ratings yet
ADELE Hello + Present Perfect Practice
2 pages
Meaning of Educational Technolog:: Activity No. 1
No ratings yet
Meaning of Educational Technolog:: Activity No. 1
11 pages
A Note On Sri Dasam Granth Sahib.
No ratings yet
A Note On Sri Dasam Granth Sahib.
36 pages
Essential n8n Playbook
From Everand
Essential n8n Playbook
Leandro Calado
No ratings yet
Angular Generative AI: Building an intelligent CV enhancer with Google Gemini
From Everand
Angular Generative AI: Building an intelligent CV enhancer with Google Gemini
Abdelfattah Ragab
No ratings yet
Firebase Storage for Angular: A reliable file upload solution for your applications
From Everand
Firebase Storage for Angular: A reliable file upload solution for your applications
Abdelfattah Ragab
No ratings yet
How to a Developers Guide in 4k: Developer edition, #2
From Everand
How to a Developers Guide in 4k: Developer edition, #2
Xinc Cyberwizard
No ratings yet
10 Lessons in Front-end
From Everand
10 Lessons in Front-end
Krasimir Tsonev
2/5 (1)
MCTS 70-515 Exam: Web Applications Development with Microsoft .NET Framework 4 (Exam Prep)
From Everand
MCTS 70-515 Exam: Web Applications Development with Microsoft .NET Framework 4 (Exam Prep)
Eddie Vi
4/5 (1)
How to a Developers Guide to 4k: Developer edition, #3
From Everand
How to a Developers Guide to 4k: Developer edition, #3
Xinc Cyberwizard
No ratings yet
Inspiring Powershell Articles
From Everand
Inspiring Powershell Articles
Murat Yildirimoglu
No ratings yet