How To Scrape Product Data From Amazon - A Complete Guide - Oxylabs
How To Scrape Product Data From Amazon - A Complete Guide - Oxylabs
Back to blog
Tutorials Scrapers
Maryia Stsiopkina
Share
2023-02-28 8 min read
Amazon is packed with useful e-commerce data, such as product information, reviews, and prices.
Extracting this data efficiently and putting it to use is imperative for any modern business. Whether
you intend to monitor the performance of your products sold by third-party resellers or track your
competitors, you need reliable web scraping services, like Amazon Scraper, to grasp this data for
market analytics.
Amazon scraping, however, has its peculiarities. In this step-by-step guide, we’ll go over every stage
needed to create an Amazon web scraper.
The following commands work on macOS and Linux. These commands will create a virtual
environment and activate it:
1
2
1
2
You will need packages for two broad steps—getting the HTML and parsing the HTML to query
relevant data.
Requests is a popular third-party Python library for making HTTP requests. It provides a simple and
intuitive interface to make HTTP requests to web servers and receive responses. This library is perhaps
the most known library related to web scraping.
The limitation of the Requests library is that it returns the HTML response as a string, which is not easy
to query for specific elements such as listing prices while working with web scraping code.
This is where Beautiful Soup steps in. Beautiful Soup is a Python library used for web scraping to pull
the data out of HTML and XML files. It allows you to extract information from the page by searching
for tags, attributes, or specific text.
To install these two libraries, you can use the following command:
1
If you are on windows, use Python instead of Python3. The rest of the command remains unchanged:
It's time to try out the Requests scraping library. Create a new file with the name amazon.py and enter
the following code:
1
2
3
4
5
6
In most cases, you cannot view the desired HTML. Amazon will block this request, and you will see the
following text in the response:
If you print the , you will see that instead of getting 200, which means success,
you get 503, which means an error.
Amazon knows this request was not using a browser and thus blocks it.
It is a common practice employed by many websites. Amazon will block your requests and return an
error code beginning with 500 or sometimes even 400.
The solution is simple. You can send the headers along with your request that a browser would.
Sometimes, sending only the is enough. At other times, you may need to send more
headers. A good example is sending the header.
To identify the user-agent sent by your browser, press F12 and open the Network tab. Reload the
page. Select the first request and examine Request Headers.
You can copy this and create a dictionary for the headers.
The following example shows a dictionary with the and accept-language headers:
1
2
3
4
You can send this dictionary to the optional parameter of the get method as follows:
1
Executing the code with these changes will show the expected HTML with the product details.
Another note is that if you send as many headers as possible, you will not need Javascript rendering.
If you need rendering, you will need tools like Playwright or Selenium.
The category page displays the product title, product image, product rating, product price, and,
most importantly, the product URLs page. If you want more details, such as product descriptions, you
will get them only from the product details page.
You will see that it is a span tag with its id attribute set to "productTitle".
Similarly, if you right-click the price and select Inspect, you will see the HTML markup of the price.
You can see that the dollar component of the price is in a span tag with the class "a-price-whole",
and the cents component is in another span tag with the class set to "a-price-fraction".
Once you have this information, add the following lines to the code we have written so far:
1
2
Beautiful Soup supports a unique way of selecting tags that utilize the find methods. Alternatively,
Beautiful Soup also supports CSS selectors. You can use either of these to get the same results. In this
guide, we will use CSS selectors, which are universal ways to select elements. CSS selectors work with
almost all web scraping tools that can be used for web scraping Amazon product data.
We are now ready to use the Soup object to query for specific information.
The product name or the product title is located in a span element with its id productTitle. It's easy to
select elements using the id that is unique.
We send the CSS selector to the select_one method, which returns an element instance.
We can extract information from the text using the text attribute.
Upon printing it, you will see that there are few white spaces. To fix that, add function call as
follows:
Now, the following statement can select the element that contains the rating.
1
Note that the rating value is actually in the title attribute:
1
2
3
The product price is located in two places — below the product title and also on the Buy Now box.
This CSS selector can be passed to the select_one method of BeautifulSoup as follows:
1
5. Locate and scrape product image
Let's scrape the default image. This image has the CSS selector as #landingImage. With this
information, we can write the following lines of code to get the image URL from the src attribute:
1
2
The next step in scraping Amazon product information is scraping the product description.
The methodology remains the same — create a CSS selector and use the select_one method.
1
2
However, to reach the product information, you will begin with product listing or category pages.
If you examine this page, you will notice that all the products are contained in a that has a
special attribute [data-asin]. In that , all the product links are in an h2 tag.
With this in mind, the CSS Selector would be as follows:
We can read the href attribute of this selector and run a loop. However, note that the links will be
relative. You would need to use the urljoin method to parse these links.
1
2
3
4
5
6
7
8
9
10
Handling pagination
The link to the next page is in a link that contains the text Next. We can look for this link using the
contains operator of CSS as follows:
1
2
3
4
The data we are scraping is being returned as a dictionary. This is intentional. We can create a list
that contains all the scraped products.
1
2
3
4
5
6
7
1
2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
Best
40 practices
41
Scraping
42 Amazon without proxies or dedicated scraping tools is full of obstacles. Just like many other
43 scraping targets, Amazon has rate-limiting in place, meaning it can block your IP address if
popular
you 44
exceed the established limit. Apart from that, Amazon uses bot-detection algorithms that can
45
check your HTTP headers for any suspicious details. Also, you should be ready to constantly adapt to
46
the different page layouts and various HTML structures.
47
48
Considering these factors, it’s recommended to follow some common practices to prevent getting
49
detected and blocked by Amazon. Some of the most useful tips are:
50
1.Use51a real User-Agent. It’s important to make your User-Agent look as plausible as possible. Here’s
the52list of the most common user agents.
53
2Set54
your fingerprint. Many websites use Transmission Control Protocol (TCP) and IP fingerprinting to
. 55
detect bots. To avoid getting spotted, you need to make sure your fingerprint parameters are always
56
consistent.
57
58
3Change the crawling pattern. To develop a successful crawling pattern, you should think about
. 59
how a regular user would behave while exploring a page and add clicks, scrolls, and mouse
60
movements
61
accordingly.
62
63
Easier
64
solution to extract Amazon data
65
And66this is only a small portion of the requirements you should keep in mind when scraping Amazon.
67
Alternatively, you can turn to a ready-made scraping solution designed specifically for scraping
68
Amazon - Amazon Scraper API. With this scraper, you can:
69
Scrape
70 and parse various Amazon page types, including Search, Product, Offer listing, Questions
&71Answers, Reviews, Best Sellers, and Sellers.
72
Target
73 localized product data in 195 locations worldwide;
Retrieve accurate parsed results in JSON format without installing any other library;
Enjoy multiple handy features, such as bulk scraping and automated jobs.
All you need is the product URL — irrespective of the country of the Amazon store. For example, the
following code extracts details for the Bose QC 45 from Amazon.com:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
You will get the complete product data returned in JSON format.
Another way to get the information is by ASIN of the product. The only line you need to modify is the
payload:
1
2
3
4
5
6
7
8
9
10
Note the optional parameter domain. You can use this parameter to get Amazon data from any
domain, such as amazon.co.uk.
Searching products
Again, the only code that changes is the payload. Here is the payload for the search for "bose":
1
2
3
4
5
6
7
8
9
10
Notice how it requests 10 pages beginning with page 1. Also, we limit the search to category id
12097479011, which is Amazon's category id for headphones.
Conclusion
You can write code to scrape Amazon products using the Requests and Beautiful Soup libraries. It
may need some effort, but it works. Sending custom headers, rotating user-agents, and proxy rotation
can help bypass bans or rate limiting.
However, the easiest solution to scrape Amazon products is using the Amazon Scraper API. Oxylabs
also allows you to gather data from 50 other marketplaces using its E-Commerce Scraper API.
If you have any questions, do not hesitate to contact us.
Sign up
Maryia Stsiopkina
Content Manager
Maryia Stsiopkina is a Content Manager at Oxylabs. As her passion for writing was developing, she was
writing either creepy detective stories or fairy tales at different points in time. Eventually, she found herself in
the tech wonderland with numerous hidden corners to explore. At leisure, she does birdwatching with
binoculars (some people mistake it for stalking), makes flower jewelry, and eats pickles.
Related articles
Tutorials
Vytenis Kaubre
2023-05-17
Tutorials
Yelyzaveta Nechytailo
2023-05-16
Tutorials
Vytenis Kaubre
2023-05-08
I’m interested
Contact sales
COMPANY PROXIES
OxyCon
Web Unblocker
Germany
India
All locations
GET IN TOUCH
General: [email protected]
Support: [email protected]
Career: [email protected]
English
Connect with us