Python+Selenium模拟淘宝滑块并爬取商品数据

最新推荐文章于 2025-05-26 16:12:29 发布

原创

最新推荐文章于 2025-05-26 16:12:29 发布 · 4k 阅读

28 ·

CC 4.0 BY-SA版权

文章标签：

#Python #selenium #爬虫 #滑块验证 #淘宝

注：如果侵犯了Alibaba的权益，请联系我删除。

上一篇博客已经完成了模拟淘宝登陆，本节主要记录如何爬取淘宝商品列表页数据，同时如何模拟人的操作完成滑块的验证。

代码如下：

#encoding=utf-8
#上面这句话看起来是注释，但其实是有用的，指明了这个脚本的字符集编码格式
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
import time
from pyquery import PyQuery as pq
from selenium.webdriver import ActionChains



class taobao_clawer:
def __init__(self,url):
    #
    self.url = url
    self.options = webdriver.ChromeOptions()

    # 不加载图片,加快访问速度
    self.options.add_experimental_option("prefs", {"profile.mamaged_default_content_settings.images": 2})

    # 此步骤很重要，设置为开发者模式，防止被各大网站识别出来使用了Selenium
    self.options.add_experimental_option('excludeSwitches', ['enable-automation'])

    #self.options.add_argument('--proxy-server=127.0.0.1')

    self.browser = webdriver.Chrome(executable_path='F:\\Software\\anaconda\\chromedriver', options=self.options)
    self.wait = WebDriverWait(self.browser, 20)
    self.browser.get(url)



def login(self):
    # 等待 密码登录选项 出现
    password_login = self.wait.until(
        EC.presence_of_element_located((By.CSS_SELECTOR, '.qrcode-login > .login-links > .forget-pw