eBay页面解析与动态加载：数据抓取实战

亿牛云爬虫专家

于 2025-06-19 11:06:36 发布

阅读量557

点赞数 5

CC 4.0 BY-SA版权

分类专栏： python 代理IP 爬虫代理文章标签： eBay 页面解析动态加载数据抓取电商爬虫代理代理IP

本文链接：https://blog.csdn.net/ip16yun/article/details/148761794

爬虫代理同时被 3 个专栏收录

302 篇文章

订阅专栏

代理IP

193 篇文章

订阅专栏

python

184 篇文章

订阅专栏

爬虫代理

一、从舞台调度到页面行为：灵感来自哪里？

我一直觉得，网页和舞台，其实有点像。

你想象一下：一个剧场演出时，演员什么时候上台，舞台灯光怎么调，谁在前景谁在幕后，完全是导演在背后调度的结果。这种“调度”，看似自然，其实很有逻辑。

网页也是一样。比如电商平台上的页面，不是所有内容一开始就给你，而是根据你的动作——滑动、点击、搜索——才逐步加载。这种背后的“调度系统”，就是JavaScript+接口设计的组合拳。

以eBay为例，它的商品页就像一个剧场舞台，观众（用户）看到的只是最终渲染的效果，而背后其实是分批加载的数据、结构化的标签，以及和反作弊相关的行为判断机制。

二、换个角度看技术：信息加载的逻辑“翻译”

如果把舞台调度比作“可视化脚本编排”，那网页的数据加载逻辑就是一种程序化调度系统。

在eBay这种全球性的电商平台中，一页商品展示页背后，可能经历了以下几步：

浏览器向平台发送搜索请求
页面通过异步方式逐步加载内容（你看不到真实接口，但浏览器在“幕后”做了事）
用户行为（如频繁刷新）可能会被识别为“异常”

这也意味着，想要从页面上提取有价值的信息，需要有些“模拟舞台经验”的能力——懂得如何配合页面节奏、伪装成“正常观众”。

三、实操环节：模拟一个“观众”角色

下面是我实际使用的一个脚本，用Python写的，用于搜索某个关键词后，提取列表页中的几个字段：商品标题、价格、发货地、发布时间等。

为了配合平台行为，我设置了“观众伪装”（User-Agent）、“行为跟踪”（Cookie），并通过代理网络中转访问，防止被识别为异常访问。

import requests
from bs4 import BeautifulSoup
from urllib.parse import quote

# 爬虫代理（参考亿牛云示例 www.16yun.cn）
# 通过中间服务实现IP隔离，
proxy_host = "proxy.16yun.cn"
proxy_port = "8100"
proxy_user = "16YUN"
proxy_pass = "16IP"

proxies = {
    "http": f"http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}",
    "https": f"http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}"
}

# 浏览器伪装
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 \
                   (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
    "Cookie": "nonsession=abc123; srtp=xyz456;"  # 示例Cookie
}

def get_page(keyword, page=1):
    search_url = f"https://www.ebay.com/sch/i.html?_nkw={quote(keyword)}&_pgn={page}"
    try:
        response = requests.get(search_url, headers=headers, proxies=proxies, timeout=10)
        if response.status_code != 200:
            print(f"访问失败：{response.status_code}")
            return []
    except Exception as e:
        print(f"连接错误：{e}")
        return []

    soup = BeautifulSoup(response.text, "html.parser")
    items = []

    for item in soup.select(".s-item"):
        title = item.select_one(".s-item__title")
        price = item.select_one(".s-item__price")
        location = item.select_one(".s-item__location")
        time_info = item.select_one(".s-item__listingDate span")

        if title and price:
            items.append({
                "标题": title.get_text(strip=True),
                "价格": price.get_text(strip=True),
                "地点": location.get_text(strip=True) if location else "-",
                "时间": time_info.get_text(strip=True) if time_info else "-"
            })
    
    return items

# 示例调用：查找关键词“iphone 14”的第一页商品
result_list = get_page("iphone 14", page=1)
for r in result_list:
    print(r)