Scraping HTML Chapter2
Scraping HTML Chapter2
W EB S CRAP IN G IN P YTH ON
Thomas Laetsch
Data Scientist, NYU
Slashes and Brackets
Single forward slash / looks forward one generation
xpath = '/html[1]/body[1]'
Thomas Laetsch
Data Scientist, NYU
(At)tribute
@ represents "attribute"
@class
@id
@href
Thomas Laetsch
Data Scientist, NYU
Setting up a Selector
from scrapy import Selector
html = '''
<html>
<body>
<div class="hello datacamp">
<p>Hello World!</p>
</div>
<p>Enjoy DataCamp!</p>
</body>
</html>
'''
Created a scrapy Selector object using a string with the html code
sel.xpath("//p")
>>> sel.xpath("//p")
>>> sel.xpath("//p").extract()
>>> sel.xpath("//p").extract_first()
second_p = ps[1]
second_p.extract()
import requests
url = 'https://www.datacamp.com/courses/all'