# Introduction #
`TweetScraper` can get tweets from [Twitter Search](https://twitter.com/explore).
It is built on [Scrapy](http://scrapy.org/) without using [Twitter's APIs](https://dev.twitter.com/rest/public).
The crawled data is not as *clean* as the one obtained by the APIs, but the benefits are you can get rid of the API's rate limits and restrictions. Ideally, you can get all the data from Twitter Search.
**WARNING:** please be polite and follow the [crawler's politeness policy](https://en.wikipedia.org/wiki/Web_crawler#Politeness_policy).
# Installation #
1. Install `conda`, you can get it from [miniconda](https://docs.conda.io/en/latest/miniconda.html). The tested python version is `3.7`.
2. Install selenium python bindings: https://selenium-python.readthedocs.io/installation.html. (Note: the `KeyError: 'driver'` is caused by wrong setup)
3. For ubuntu or debian user, run:
```
$ bash install.sh
$ conda activate tweetscraper
$ scrapy list
$ #If the output is 'TweetScraper', then you are ready to go.
```
the `install.sh` will create a new environment `tweetscraper` and install all the dependencies (e.g., `firefox-geckodriver` and `firefox`),
# Usage #
1. Change the `USER_AGENT` in `TweetScraper/settings.py` to identify who you are
USER_AGENT = 'your website/e-mail'
2. In the root folder of this project, run command like:
scrapy crawl TweetScraper -a query="foo,#bar"
where `query` is a list of keywords seperated by comma and quoted by `"`. The query can be any thing (keyword, hashtag, etc.) you want to search in [Twitter Search](https://twitter.com/search-home). `TweetScraper` will crawl the search results of the query and save the tweet content and user information.
3. The tweets will be saved to disk in `./Data/tweet/` in default settings and `./Data/user/` is for user data. The file format is JSON. Change the `SAVE_TWEET_PATH` and `SAVE_USER_PATH` in `TweetScraper/settings.py` if you want another location.
# Acknowledgement #
Keeping the crawler up to date requires continuous efforts, please support our work via [opencollective.com/tweetscraper](https://opencollective.com/tweetscraper).
# License #
TweetScraper is released under the [GNU GENERAL PUBLIC LICENSE, Version 2](https://github.com/jonbakerfish/TweetScraper/blob/master/LICENSE)
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
介绍 TweetScraper可以从获得推文。 它基于构建,无需使用 。 爬网的数据不如API所获得的那么干净,但是好处是您可以摆脱API的速率限制和限制。 理想情况下,您可以从Twitter搜索获取所有数据。 警告:请保持礼貌,并遵守。 安装 安装conda ,您可以从获得它。 经过测试的python版本是3.7 。 安装Seleniumpython绑定: : 。 (注意: KeyError: 'driver'是由错误的设置引起的) 对于ubuntu或debian用户,运行: $ bash install.sh $ conda activate tweetscraper $ sc
资源详情
资源评论
资源推荐
收起资源包目录



















共 14 条
- 1



























仆儿
- 粉丝: 29
上传资源 快速赚钱
我的内容管理 展开
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助


最新资源
- 风景区网络营销推广方案.pptx
- (完整版)信息安全技术-信息系统安全等级保护测评过程指南送审稿.doc
- 人工神经网络-第1章-引言.ppt
- 基于单片机的无线环境监测系统设计论文.doc
- 速成手册网络高手.doc
- 浅析网络营销在中国的发展.doc
- 几个网站外链的非主流推广方法模板.doc
- 基于PLC的水箱温度控制.ppt
- 工程项目管理中质量管理对策研究(毕业论文)-secret.doc
- 第六讲-初识Excel-2010、基础入门与操作.ppt
- 项目管理培训学习.ppt
- 酒店管理软件设计方案.doc
- 旅馆管理系统数据库课程设计.doc
- 网络广告设计与制作教学方法改革方案.doc
- 深入理解计算机系统课程实验全解析与CMU15213CSAPP实验题完整解决方案-计算机系统基础实验CMU15213课程CSAPP实验位操作实验缓冲区溢出实验性能.zip
- 信息系统安全离线作业.docx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈



安全验证
文档复制为VIP权益,开通VIP直接复制

评论0