小白学爬虫（六百度股票requests-BeautifulSoup-re）-CSDN博客

本文链接：https://blog.csdn.net/weixin_43299529/article/details/105077082

本文介绍了一种使用Python爬虫从东方财富网获取股票列表，并通过百度股票网页（现已被取消）抓取详细交易信息的方法。文章详细阐述了如何设计程序结构，包括获取网页信息、解析HTML代码、提取股票名称及交易数据，并将结果存储到本地文件中。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

由于百度股票网页版已经取消，所以已经爬不上了，但还是写出来了解一下代码思路

功能描述

目标：获取上交所和深交所所有股票的名称和交易信息
输出：保存到文件中

候选数据网站选择

新浪股票：http：//financc.sina.com.cn/stock/
百度股票：https://gupiao.baidu.com/stock/

选择标准：股票信息静态存在于HTML页面中，非js代码生成，没有Robots协议限制

选取方法：浏览器F12，源代码查看等
选取心态：不要纠结于某个网站，多找信息源尝试

由于新浪股票的信息存在于js中，所以本次选用百度股票网站

程序的结构设计

步骤1：从东方财富网获取股票列表
步骤2：根据股票列表朱哥道百度股票获取个股信息
步骤3：将结果存储到文件

代码生成：

获取网页信息
代码为：

#默认编码为utf-8
def getHTMLText(url, code = 'utf-8'):

    try:
        r = requests.get(url, timeout=30)
        r.raise_for_status()
        r.encoding = code
        return r.text
    except:
        return ""

获得所有注册公司编号
在东方财富网上找到每个上市公司的信息

url为http://quote.eastmoney.com/center/gridlist.html#hs_a_board
在这里插入图片描述
注册公司的编码在href’中，所以在获取到a标签时，要进行字符串匹配，并将其添加到股票名称列表中

代码为：

def getStockList(lst, stockUrl):
    html = getHTMLText(stockUrl, 'GB2312')
    soup = BeautifulSoup(html, "html.parser")
    a = soup.find_all('a')
    for i in a:
        try:
            href = i.attrs['href']
            lst.append(re.findall(r"[s][hz]\d{6}",href)[0])
        except:
            continue

根据股票名称，到百度股票中获取股票信息，并将其存储到字典格式中

百度股票网页内容：
在这里插入图片描述
查看源码：

提取信息的代码为：

def getStockInfo(lst, stockUrl, fpath):
    count = 0
    for stock in lst:
        url = stockUrl + stock + ".html"
        html = getHTMLText(url)
        try:
            if html == "":
                continue
            infoDict = {}
            soup = BeautifulSoup(html, "html.parser")
            stockInfo = soup.find('div', attrs={'class':'stock-bets'})
            name = stockInfo.find_all(attrs = {'class':'bets-name'})[0]
            infoDict.update({'股票名称':name.text.split()[0]})
            keyList = stockInfo.find_all('dt')
            valueList = stockInfo.find_all('dd')
            for i in range(keyList):
                key = keyList[i].text
                val = valueList[i].text
                infoDict[key] = val
            #a表示增添
            with open(fpath, 'a', encoding='utf-8') as f:
                f.write(str(infoDict)+'\n')
                count = count+1
                print("\r当前的进度为：{:.2f}%".format(count*100/len(lst)),end = '')
        except:
            count = count+1
            print("\r当前的进度为：{:.2f}%".format(count * 100 / len(lst)), end='')
            traceback.print_exc()
            continue

主函数代码为：

def main():
    stock_list_url = 'http://quote.eastmony.com/stocklist.html'
    stock_info_url = 'https://gupiao.baidu.com/stock/'
    output_file = "E://BaiduStockInfo.txt"
    slist = []
    getStockList(slist, stock_list_url)
    getStockInfo(slist, stock_info_url, output_file)