SlideShare a Scribd company logo
Python Has Become The Most Popular Language For Web Scraping for Many
Reasons. These Include It’s Flexibility, Ease of Coding, Dynamic Typing, A
Large Collection of Libraries to Manipulate Data, and Support For The Most
Common Scraping Tools, Such As Scrapy, Beautiful Soup, and Selenium.
What is Web Scraping?
Web Scraping is a software method of scraping data from different
websites. It keeps attention on the transformation of unstructured data on
the web (Typically HTML), into structured data that can be stored and
analyzed.
1
Why We Scrape?
 Web Pages that Contain Wealth of Data Designed Mostly for Human Consumption.
 Static Website
 Interfacing with 3rd Party with no API access
 Website are More Important than APIs
 The Data is Already Feasible
 No Rate Limiting
 Anonymous Access
2
Fetch The Data
 Involves Finding the endpoint – URL or URLs
 Sending HTTP Request to the server
 Using Request Library:
Import Requests
Data = requests.get (‘http://google.com/’)
Html = data.content
3
Processing
 Avoid using reg-ex
 Reason why not to use it:
1. It’s Fragile
2. Really Hard to Maintain
3. Importer HTML & Encoding Handling
4
Use Beautiful Soup For Parsing
 Provides Simple Methods to Search, Navigate, and Select
 Deals with Broken Web-Pages Really Well
 Auto-detects encoding
5
Export The Data
 Database (Relational or Non-Relational)
 File (XML, YAML, CSV, JSON, etc)
 APIs
6
Challenges
 External Site Can Be Changes Without Warning
7
 Figuring out the Frequency is Difficult
 Changes can Break Scrapers Easily
 Bad HTTP Status Codes
 Example: Using 200 OK to signal an error
 Cannot always trust your HTTP libraries default behavior
 Messy HTML Markup
Scrapy – A Framework For Web Scraping
8
 Uses XPath to Select Elements
 Interactive Shell Scripting
 Using Scrapy:
1. Define a Model to Store Items
2. Create Your Spider to Extract Items
3. Write a Pipeline to Store Them
Web Scraping using Python | Web Screen Scraping
Ad

More Related Content

What's hot (20)

Web Scraping Basics
Web Scraping BasicsWeb Scraping Basics
Web Scraping Basics
Kyle Banerjee
 
Getting started with Web Scraping in Python
Getting started with Web Scraping in PythonGetting started with Web Scraping in Python
Getting started with Web Scraping in Python
Satwik Kansal
 
Web scraping in python
Web scraping in pythonWeb scraping in python
Web scraping in python
Saurav Tomar
 
Web Scraping With Python
Web Scraping With PythonWeb Scraping With Python
Web Scraping With Python
Robert Dempsey
 
Scraping data from the web and documents
Scraping data from the web and documentsScraping data from the web and documents
Scraping data from the web and documents
Tommy Tavenner
 
Web Scraping
Web ScrapingWeb Scraping
Web Scraping
Carlos Rodriguez
 
What is Web-scraping?
What is Web-scraping?What is Web-scraping?
What is Web-scraping?
Yu-Chang Ho
 
Web scraping
Web scrapingWeb scraping
Web scraping
Ashley Davis
 
WEB Scraping.pptx
WEB Scraping.pptxWEB Scraping.pptx
WEB Scraping.pptx
Shubham Jaybhaye
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
hktripathy
 
Scrapy
ScrapyScrapy
Scrapy
Francisco Sousa
 
Intro to beautiful soup
Intro to beautiful soupIntro to beautiful soup
Intro to beautiful soup
Andreas Chandra
 
Scrapy-101
Scrapy-101Scrapy-101
Scrapy-101
Snehil Verma
 
REST & RESTful Web Services
REST & RESTful Web ServicesREST & RESTful Web Services
REST & RESTful Web Services
Halil Burak Cetinkaya
 
An Introduction to Semantic Web Technology
An Introduction to Semantic Web TechnologyAn Introduction to Semantic Web Technology
An Introduction to Semantic Web Technology
Ankur Biswas
 
Semantic Web
Semantic WebSemantic Web
Semantic Web
Adarsh Kumar Yadav
 
Skillshare - Introduction to Data Scraping
Skillshare - Introduction to Data ScrapingSkillshare - Introduction to Data Scraping
Skillshare - Introduction to Data Scraping
School of Data
 
Front-end development introduction (HTML, CSS). Part 1
Front-end development introduction (HTML, CSS). Part 1Front-end development introduction (HTML, CSS). Part 1
Front-end development introduction (HTML, CSS). Part 1
Oleksii Prohonnyi
 
Beautiful soup
Beautiful soupBeautiful soup
Beautiful soup
DeepakRaghavan4
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC
 
Getting started with Web Scraping in Python
Getting started with Web Scraping in PythonGetting started with Web Scraping in Python
Getting started with Web Scraping in Python
Satwik Kansal
 
Web scraping in python
Web scraping in pythonWeb scraping in python
Web scraping in python
Saurav Tomar
 
Web Scraping With Python
Web Scraping With PythonWeb Scraping With Python
Web Scraping With Python
Robert Dempsey
 
Scraping data from the web and documents
Scraping data from the web and documentsScraping data from the web and documents
Scraping data from the web and documents
Tommy Tavenner
 
What is Web-scraping?
What is Web-scraping?What is Web-scraping?
What is Web-scraping?
Yu-Chang Ho
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
hktripathy
 
An Introduction to Semantic Web Technology
An Introduction to Semantic Web TechnologyAn Introduction to Semantic Web Technology
An Introduction to Semantic Web Technology
Ankur Biswas
 
Skillshare - Introduction to Data Scraping
Skillshare - Introduction to Data ScrapingSkillshare - Introduction to Data Scraping
Skillshare - Introduction to Data Scraping
School of Data
 
Front-end development introduction (HTML, CSS). Part 1
Front-end development introduction (HTML, CSS). Part 1Front-end development introduction (HTML, CSS). Part 1
Front-end development introduction (HTML, CSS). Part 1
Oleksii Prohonnyi
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC
 

Similar to Web Scraping using Python | Web Screen Scraping (20)

Mastering Web Page Scrapers A Beginner’s Guide to Extracting Online Data (1).pdf
Mastering Web Page Scrapers A Beginner’s Guide to Extracting Online Data (1).pdfMastering Web Page Scrapers A Beginner’s Guide to Extracting Online Data (1).pdf
Mastering Web Page Scrapers A Beginner’s Guide to Extracting Online Data (1).pdf
PromptCloudTechnolog
 
World wide web architecture presentation
World wide web architecture presentationWorld wide web architecture presentation
World wide web architecture presentation
ImMe Khan
 
Introductiontowebarchitecture 090922221506-phpapp01
Introductiontowebarchitecture 090922221506-phpapp01Introductiontowebarchitecture 090922221506-phpapp01
Introductiontowebarchitecture 090922221506-phpapp01
Maisha Price
 
Guide for web scraping with Python libraries_ Beautiful Soup, Scrapy, and mor...
Guide for web scraping with Python libraries_ Beautiful Soup, Scrapy, and mor...Guide for web scraping with Python libraries_ Beautiful Soup, Scrapy, and mor...
Guide for web scraping with Python libraries_ Beautiful Soup, Scrapy, and mor...
ThinkODC
 
Web scrapping and how to do it using python.pptx
Web scrapping and how to do it using python.pptxWeb scrapping and how to do it using python.pptx
Web scrapping and how to do it using python.pptx
bakada6025
 
Web hacking
Web hackingWeb hacking
Web hacking
Prashant Vashisht
 
Introduction to Web Architecture
Introduction to Web ArchitectureIntroduction to Web Architecture
Introduction to Web Architecture
Chamnap Chhorn
 
Web Scraping in PHP Using Simple HTML DOM Parser
Web Scraping in PHP Using Simple HTML DOM ParserWeb Scraping in PHP Using Simple HTML DOM Parser
Web Scraping in PHP Using Simple HTML DOM Parser
MD MAHSIN UL ISLAM
 
How Does Beautiful Soup Facilitate E-Commerce Website Scraping in Python.pdf
How Does Beautiful Soup Facilitate E-Commerce Website Scraping in Python.pdfHow Does Beautiful Soup Facilitate E-Commerce Website Scraping in Python.pdf
How Does Beautiful Soup Facilitate E-Commerce Website Scraping in Python.pdf
dev670968
 
How Does Beautiful Soup Facilitate E-Commerce Website Scraping in Python.ppt ...
How Does Beautiful Soup Facilitate E-Commerce Website Scraping in Python.ppt ...How Does Beautiful Soup Facilitate E-Commerce Website Scraping in Python.ppt ...
How Does Beautiful Soup Facilitate E-Commerce Website Scraping in Python.ppt ...
dev670968
 
Web scraping & browser automation
Web scraping & browser automationWeb scraping & browser automation
Web scraping & browser automation
BHAWESH RAJPAL
 
How to Build a Web Scraping API using Java, Spring Boot, and Jsoup?
How to Build a Web Scraping API using Java, Spring Boot, and Jsoup?How to Build a Web Scraping API using Java, Spring Boot, and Jsoup?
How to Build a Web Scraping API using Java, Spring Boot, and Jsoup?
Data Scraping and Data Extraction
 
DATA SCRAPING AND WEB Scrapping.....pptx
DATA SCRAPING AND WEB Scrapping.....pptxDATA SCRAPING AND WEB Scrapping.....pptx
DATA SCRAPING AND WEB Scrapping.....pptx
ssusereff6ca
 
Implementation ofWeb Application for Disease Prediction Using AI
Implementation ofWeb Application for Disease Prediction Using AIImplementation ofWeb Application for Disease Prediction Using AI
Implementation ofWeb Application for Disease Prediction Using AI
BOHR International Journal of Computer Science (BIJCS)
 
Lesson 6 web based attacks
Lesson 6 web based attacksLesson 6 web based attacks
Lesson 6 web based attacks
Frank Victory
 
Apache error
Apache errorApache error
Apache error
Rishabh Bahukhandi
 
Implementation of Web Application for Disease Prediction Using AI
Implementation of Web Application for Disease Prediction Using AIImplementation of Web Application for Disease Prediction Using AI
Implementation of Web Application for Disease Prediction Using AI
BOHR International Journal of Data Mining and Big Data
 
Automated Data Scraping and Extraction.pdf
Automated Data Scraping and Extraction.pdfAutomated Data Scraping and Extraction.pdf
Automated Data Scraping and Extraction.pdf
WebDataGuru
 
Semantic framework for web scraping.
Semantic framework for web scraping.Semantic framework for web scraping.
Semantic framework for web scraping.
Shyjal Raazi
 
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML ParsingMastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Knoldus Inc.
 
Mastering Web Page Scrapers A Beginner’s Guide to Extracting Online Data (1).pdf
Mastering Web Page Scrapers A Beginner’s Guide to Extracting Online Data (1).pdfMastering Web Page Scrapers A Beginner’s Guide to Extracting Online Data (1).pdf
Mastering Web Page Scrapers A Beginner’s Guide to Extracting Online Data (1).pdf
PromptCloudTechnolog
 
World wide web architecture presentation
World wide web architecture presentationWorld wide web architecture presentation
World wide web architecture presentation
ImMe Khan
 
Introductiontowebarchitecture 090922221506-phpapp01
Introductiontowebarchitecture 090922221506-phpapp01Introductiontowebarchitecture 090922221506-phpapp01
Introductiontowebarchitecture 090922221506-phpapp01
Maisha Price
 
Guide for web scraping with Python libraries_ Beautiful Soup, Scrapy, and mor...
Guide for web scraping with Python libraries_ Beautiful Soup, Scrapy, and mor...Guide for web scraping with Python libraries_ Beautiful Soup, Scrapy, and mor...
Guide for web scraping with Python libraries_ Beautiful Soup, Scrapy, and mor...
ThinkODC
 
Web scrapping and how to do it using python.pptx
Web scrapping and how to do it using python.pptxWeb scrapping and how to do it using python.pptx
Web scrapping and how to do it using python.pptx
bakada6025
 
Introduction to Web Architecture
Introduction to Web ArchitectureIntroduction to Web Architecture
Introduction to Web Architecture
Chamnap Chhorn
 
Web Scraping in PHP Using Simple HTML DOM Parser
Web Scraping in PHP Using Simple HTML DOM ParserWeb Scraping in PHP Using Simple HTML DOM Parser
Web Scraping in PHP Using Simple HTML DOM Parser
MD MAHSIN UL ISLAM
 
How Does Beautiful Soup Facilitate E-Commerce Website Scraping in Python.pdf
How Does Beautiful Soup Facilitate E-Commerce Website Scraping in Python.pdfHow Does Beautiful Soup Facilitate E-Commerce Website Scraping in Python.pdf
How Does Beautiful Soup Facilitate E-Commerce Website Scraping in Python.pdf
dev670968
 
How Does Beautiful Soup Facilitate E-Commerce Website Scraping in Python.ppt ...
How Does Beautiful Soup Facilitate E-Commerce Website Scraping in Python.ppt ...How Does Beautiful Soup Facilitate E-Commerce Website Scraping in Python.ppt ...
How Does Beautiful Soup Facilitate E-Commerce Website Scraping in Python.ppt ...
dev670968
 
Web scraping & browser automation
Web scraping & browser automationWeb scraping & browser automation
Web scraping & browser automation
BHAWESH RAJPAL
 
How to Build a Web Scraping API using Java, Spring Boot, and Jsoup?
How to Build a Web Scraping API using Java, Spring Boot, and Jsoup?How to Build a Web Scraping API using Java, Spring Boot, and Jsoup?
How to Build a Web Scraping API using Java, Spring Boot, and Jsoup?
Data Scraping and Data Extraction
 
DATA SCRAPING AND WEB Scrapping.....pptx
DATA SCRAPING AND WEB Scrapping.....pptxDATA SCRAPING AND WEB Scrapping.....pptx
DATA SCRAPING AND WEB Scrapping.....pptx
ssusereff6ca
 
Lesson 6 web based attacks
Lesson 6 web based attacksLesson 6 web based attacks
Lesson 6 web based attacks
Frank Victory
 
Automated Data Scraping and Extraction.pdf
Automated Data Scraping and Extraction.pdfAutomated Data Scraping and Extraction.pdf
Automated Data Scraping and Extraction.pdf
WebDataGuru
 
Semantic framework for web scraping.
Semantic framework for web scraping.Semantic framework for web scraping.
Semantic framework for web scraping.
Shyjal Raazi
 
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML ParsingMastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Knoldus Inc.
 
Ad

Recently uploaded (20)

1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Decision Trees in Artificial-Intelligence.pdf
Decision Trees in Artificial-Intelligence.pdfDecision Trees in Artificial-Intelligence.pdf
Decision Trees in Artificial-Intelligence.pdf
Saikat Basu
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Deloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining ProjectsDeloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining Projects
Process mining Evangelist
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATAAWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
SnehaBoja
 
Process Mining and Data Science in the Financial Industry
Process Mining and Data Science in the Financial IndustryProcess Mining and Data Science in the Financial Industry
Process Mining and Data Science in the Financial Industry
Process mining Evangelist
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Decision Trees in Artificial-Intelligence.pdf
Decision Trees in Artificial-Intelligence.pdfDecision Trees in Artificial-Intelligence.pdf
Decision Trees in Artificial-Intelligence.pdf
Saikat Basu
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Deloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining ProjectsDeloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining Projects
Process mining Evangelist
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATAAWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
SnehaBoja
 
Process Mining and Data Science in the Financial Industry
Process Mining and Data Science in the Financial IndustryProcess Mining and Data Science in the Financial Industry
Process Mining and Data Science in the Financial Industry
Process mining Evangelist
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
Ad

Web Scraping using Python | Web Screen Scraping

  • 1. Python Has Become The Most Popular Language For Web Scraping for Many Reasons. These Include It’s Flexibility, Ease of Coding, Dynamic Typing, A Large Collection of Libraries to Manipulate Data, and Support For The Most Common Scraping Tools, Such As Scrapy, Beautiful Soup, and Selenium.
  • 2. What is Web Scraping? Web Scraping is a software method of scraping data from different websites. It keeps attention on the transformation of unstructured data on the web (Typically HTML), into structured data that can be stored and analyzed. 1
  • 3. Why We Scrape?  Web Pages that Contain Wealth of Data Designed Mostly for Human Consumption.  Static Website  Interfacing with 3rd Party with no API access  Website are More Important than APIs  The Data is Already Feasible  No Rate Limiting  Anonymous Access 2
  • 4. Fetch The Data  Involves Finding the endpoint – URL or URLs  Sending HTTP Request to the server  Using Request Library: Import Requests Data = requests.get (‘http://google.com/’) Html = data.content 3
  • 5. Processing  Avoid using reg-ex  Reason why not to use it: 1. It’s Fragile 2. Really Hard to Maintain 3. Importer HTML & Encoding Handling 4
  • 6. Use Beautiful Soup For Parsing  Provides Simple Methods to Search, Navigate, and Select  Deals with Broken Web-Pages Really Well  Auto-detects encoding 5
  • 7. Export The Data  Database (Relational or Non-Relational)  File (XML, YAML, CSV, JSON, etc)  APIs 6
  • 8. Challenges  External Site Can Be Changes Without Warning 7  Figuring out the Frequency is Difficult  Changes can Break Scrapers Easily  Bad HTTP Status Codes  Example: Using 200 OK to signal an error  Cannot always trust your HTTP libraries default behavior  Messy HTML Markup
  • 9. Scrapy – A Framework For Web Scraping 8  Uses XPath to Select Elements  Interactive Shell Scripting  Using Scrapy: 1. Define a Model to Store Items 2. Create Your Spider to Extract Items 3. Write a Pipeline to Store Them