scrapy库解决ja3/tls指纹验证反爬风控问题

方法一:

pip install curl_cffi==0.7.4
pip install scrapy-fingerprint==0.1.3

seetings.py打开中间件

DOWNLOADER_MIDDLEWARES = {
"scrapy_fingerprint.fingerprintmiddlewares.FingerprintMiddleware": 100
}

yield scrapy.Request(url=url,callback=self.parse) 改为以下

from scrapy_fingerprint.request import FingerprintRequest
class AaSpider(scrapy.Spider):
	def start_requests(self):
		url = 'https://www.Aa.com/'
		yield FingerprintRequest(url=url,callback=self.parse)

方法二:

安装tls_client

pip install tls-client==1.0.1

设置中间件

DOWNLOADER_MIDDLEWARES = {
   "reel_rush_daily.middlewares.PassJa3TlsMiddleware": 100
}

middlewares.py增加代码

from scrapy.http import HtmlResponse
from tls_client import Session

class PassJa3TlsMiddleware(object):

    def __init__(self):
        self.session: Session = Session(
            client_identifier="chrome_104"
        )

    def process_request(self, request, spider):
        if '.agoramt.com' in request.url:
            print(f'request.url:{request.url}')
            proxies = request.meta.get("proxies") or None
            headers = request.headers.to_unicode_dict()
            if request.method == "GET":
                response = self.session.get(
                    url=request.url,
                    headers=headers,
                    proxy=proxies,
                    timeout_seconds=60,
                )
            else:
                response = self.session.post(
                    url=request.url,
                    headers=headers,
                    proxy=proxies,
                    timeout_seconds=60,
                )
            return HtmlResponse(
                url=request.url,
                status=response.status_code,
                body=response.content,
                encoding="utf-8",
                request=request,
            )

方法三(推荐):

安装curl-cffi

pip install curl-cffi==0.2.4

设置中间件

DOWNLOADER_MIDDLEWARES = {
   "reel_rush_daily.middlewares.PassJa3TlsMiddleware": 100
}

middlewares.py增加代码

from scrapy.http import HtmlResponse
from curl_cffi import requests

class PassJa3TlsMiddleware(object):

    def __init__(self, settings):
        self.timeout = settings.get('DOWNLOAD_TIMEOUT')
        self.proxies = settings.get('REQUESTS_PROXIES')

    @classmethod
    def from_crawler(cls, crawler):
        s = cls(crawler.settings)
        return s

    def process_request(self, request, spider):
        headers = request.headers.to_unicode_dict()
        body = request.body
        if request.method == "GET":
            response = requests.get(
                url=request.url,
                headers=headers,
                proxies=self.proxies,
                timeout=self.timeout,
                impersonate="chrome101"
            )
        else:
            response = requests.post(
                url=request.url,
                headers=headers,
                data=body,
                proxies=self.proxies,
                timeout=self.timeout,
                impersonate="chrome101"
            )

        return HtmlResponse(
            url=request.url,
            status=response.status_code,
            body=response.content,
            encoding="utf-8",
            request=request,
        )
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

云霄IT

感谢感谢!

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值