TweetScraper：TweetScraper是Twitter搜索的简单爬虫，无需使用API

共14个文件

py：7个

yml：1个

sh：1个

twitter

tweets

scrapy

需积分: 43 6 浏览量 2021-02-06 07:52:40 上传评论 1 收藏 15KB ZIP 举报

资源详情

资源评论

资源推荐

收起资源包目录

TweetScraper-master.zip （14个子文件）

TweetScraper-master

scrapy.cfg 266B

.github

FUNDING.yml 675B

LICENSE 18KB

install.sh 600B

requirements.txt 35B

.gitignore 37B

TweetScraper

pipelines.py 2KB

spiders

__init__.py 161B

TweetCrawler.py 5KB

utils.py 127B

items.py 153B

__init__.py 0B

settings.py 837B

README.md 2KB

# Introduction # `TweetScraper` can get tweets from [Twitter Search](https://twitter.com/explore). It is built on [Scrapy](http://scrapy.org/) without using [Twitter's APIs](https://dev.twitter.com/rest/public). The crawled data is not as *clean* as the one obtained by the APIs, but the benefits are you can get rid of the API's rate limits and restrictions. Ideally, you can get all the data from Twitter Search. **WARNING:** please be polite and follow the [crawler's politeness policy](https://en.wikipedia.org/wiki/Web_crawler#Politeness_policy). # Installation # 1. Install `conda`, you can get it from [miniconda](https://docs.conda.io/en/latest/miniconda.html). The tested python version is `3.7`. 2. Install selenium python bindings: https://selenium-python.readthedocs.io/installation.html. (Note: the `KeyError: 'driver'` is caused by wrong setup) 3. For ubuntu or debian user, run: ``` $ bash install.sh $ conda activate tweetscraper $ scrapy list $ #If the output is 'TweetScraper', then you are ready to go. ``` the `install.sh` will create a new environment `tweetscraper` and install all the dependencies (e.g., `firefox-geckodriver` and `firefox`), # Usage # 1. Change the `USER_AGENT` in `TweetScraper/settings.py` to identify who you are USER_AGENT = 'your website/e-mail' 2. In the root folder of this project, run command like: scrapy crawl TweetScraper -a query="foo,#bar" where `query` is a list of keywords seperated by comma and quoted by `"`. The query can be any thing (keyword, hashtag, etc.) you want to search in [Twitter Search](https://twitter.com/search-home). `TweetScraper` will crawl the search results of the query and save the tweet content and user information. 3. The tweets will be saved to disk in `./Data/tweet/` in default settings and `./Data/user/` is for user data. The file format is JSON. Change the `SAVE_TWEET_PATH` and `SAVE_USER_PATH` in `TweetScraper/settings.py` if you want another location. # Acknowledgement # Keeping the crawler up to date requires continuous efforts, please support our work via [opencollective.com/tweetscraper](https://opencollective.com/tweetscraper). # License # TweetScraper is released under the [GNU GENERAL PUBLIC LICENSE, Version 2](https://github.com/jonbakerfish/TweetScraper/blob/master/LICENSE)

评论收藏

内容反馈

仆儿

粉丝: 29

TweetScraper：TweetScraper是Twitter搜索的简单爬虫，无需使用API

评论0

最新资源

TweetScraper：TweetScraper是Twitter搜索的简单爬虫，无需使用API

评论0

Twitter爬网和搜索

twitter-scraper:Twitter-Scraper是一款革命性的应用程序，可刮擦单个Twitter个人资料，并告诉客户该个人资料是正面的还是负面的。 激发创新

TwitterScraper：用于从推文中抓取回复数据

twitter-scraper：抓取Twitter的机器人

推特关键词采集，关键词搜索

结合scrapy和selenium爬推特的爬虫总结

TwitterScraper4J：一个Java库，它会刮擦Twitter以获取公开可用的信息

使用Python挖掘Twitter数据：学习数据挖掘的实践

Twitter狗狗数据清洗.html

im-pact:[WIP]供生产者和所有人使用的可扩展Twitter爬虫和漫游器[再开発中（2021年3月中α版公开予定）]

Twitter-Post-Fetcher：无需使用新的Twitter 1.1 API即可获取您的Twitter帖子。 纯JavaScript！ 杰森·梅斯（Jason Mayes）

twitterUsernamefromUserID：twitterUsernameviaUserID是使用Python和Selenium编写的高级Twitter抓取工具，该工具可从Twitter ID抓取tweet用户名，而无需使用Twitter的API

twitter-user-search-interface：使用ReactJS（TypeScript），NodeJS（TypeScript），Express构建并部署在Google Cloud上的简单Twitter用户搜索界面

simple-twitter：一个简单的twitter是一个应用程序，它使用户可以注册，发布推文和按标签搜索。 简单的Twitter用Sanic编写，它使用uvloop来进行异步调用

Twitter api使用例子

用Java访问Twitter的API接口

twitter4j 最新api

Twitter API: 构建与运行 源代码

twitterStream:流式API，可获取实时Twitter数据

Python-Twitter智能分析的最完整的开源工具

Python爬虫实例_城市公交网络站点数据的爬取方法

twitterscraper和数据分析：使用twitterscraper提取推文，然后进行数据分析

mobx_dictionary_search_app：字典搜索应用程序，它能够计算脱机字典中的搜索字母，以及从Urban字典API中获取单词定义和其他数据。 该状态由Mobx管理。 这是一个使用情绪React编写的响应式应用程序，无需使用任何其他CSS库或框架

aws-data-api:AWS Data API为您提供了使用简单HTTP API替换应用程序的传统数据库后端的功能。 它们提供了复杂的NOSQL平台的速度，可伸缩性，可靠性和安全性，但编码零且无需管理任何服务器

rdlc-report-in-dotnet-core:使用WCF服务中的WebFrom ReportViewer控件在Dotnet Core中支持RDLC报表的简单方法，而无需使用所有SSRS Web服务的复杂API

node-oauth2-example:一个简短而简单的示例，使用节点并通过openid-client表达，无需使用SDK即可完成Xero的OAuth 2 API上的OAuth流程

Twitter开放API文档

新建 WinCE7.0 下的 Silverlight 工程

这不够，我要的是全套文档，例如你说得库存管理系统这样的项目，而不是告诉文档包括哪些内容

最新资源

twitter-scraper:Twitter-Scraper是一款革命性的应用程序，可刮擦单个Twitter个人资料，并告诉客户该个人资料是正面的还是负面的。激发创新

Twitter-Post-Fetcher：无需使用新的Twitter 1.1 API即可获取您的Twitter帖子。纯JavaScript！杰森·梅斯（Jason Mayes）

simple-twitter：一个简单的twitter是一个应用程序，它使用户可以注册，发布推文和按标签搜索。简单的Twitter用Sanic编写，它使用uvloop来进行异步调用

Twitter API: 构建与运行源代码

mobx_dictionary_search_app：字典搜索应用程序，它能够计算脱机字典中的搜索字母，以及从Urban字典API中获取单词定义和其他数据。该状态由Mobx管理。这是一个使用情绪React编写的响应式应用程序，无需使用任何其他CSS库或框架

aws-data-api:AWS Data API为您提供了使用简单HTTP API替换应用程序的传统数据库后端的功能。它们提供了复杂的NOSQL平台的速度，可伸缩性，可靠性和安全性，但编码零且无需管理任何服务器