【爬蟲】用Scrapy做分散式爬蟲:1.環境搭建
由於版本和系統的原因,一直沒有用過Scrapy,前段時間看Scrapy 支持Python3了,就在伺服器上做一個分散式爬蟲實驗一下。這裡是記錄。系統為CentOS7
1.安裝python3
編譯環境
yum install zlib-devel bzip2-devel openssl-devel ncurese-devel
下載安裝python3
wget https://www.python.org/ftp/python/3.6.0/Python-3.6.0.tgztar -xzvf Python-3.6.0.tgzcd Python-3.6.0./configure --prefix=/usr/local/python3make && make installsudo ln -s /usr/local/python3/bin/python3 /usr/bin/python3
下載安裝pip3
wget --no-check-certificate https://github.com/pypa/pip/archive/9.0.1.tar.gztar -zvxf 9.0.1.tar.gzcd pip-9.0.1python3 setup.py installsudo ln -s /usr/local/python3/bin/pip /usr/bin/pip3
更新
pip3 install --upgrade pip
2.virtualenv
pip3 install -U virtualenvmkdir workSpacecd workSpace/
創建虛擬環境
python3 -m venv MS # MS為虛擬環境的名字,可以自己隨意更換
使用虛擬環境
source MS/bin/activate
退出虛擬環境
deactivate
2.Redis
Reids可以不用裝,因為Scrapy有自己對應的redis包。
wget http://labfile.oss.aliyuncs.com/files0422/redis-2.8.9.tar.gztar xvfz redis-2.8.9.tar.gzcd redis-2.8.9makecd srcmake install
配置Redis能隨系統啟動
./utils/install_server.sh
開啟
/etc/init.d/redis_6379 start
關閉
/etc/init.d/redis_6379 stop
3.MySQL
安裝說明
wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpmsudo rpm -ivh mysql-community-release-el7-5.noarch.rpmyum updatesudo yum install mysql-serversudo systemctl start mysqldsudo mysql_secure_installation
4.django
安裝django
pip install django
django相關命令
django-admin startprojectpython manage.py startapppython manage.py makemigrationspython manage.py migratepython manage.py runserverpython manage.py createsuperuserpython manage.py collectstatic
5.Scarpy
首先準備環境
yum install gcc libffi-devel openssl-devel libxml2 libxslt-devel libxml2-devel python-devel -y
安裝easy_install
yum install python-setuptoolseasy_install lxmlpip install scrapy
相關命令
6.uwsgi
pip install uwsgi
配置Django項目里的uwsgi.ini
uwsgi --ini uwsgi.iniuwsgi --stop uwsgi.pid
7.nginx
sudo rpm -Uvh http://nginx.org/packages/centos/7/noarch/RPMS/nginx-release-centos-7-0.el7.ngx.noarch.rpmsudo yum install nginxsudo systemctl start nginx.service#啟動關閉等命令sudo nginx -s stopsudo nginxsudo /sbin/nginx -s stop
推薦閱讀:
※Scrapy學習實例(三)採集批量網頁
※[python]scrapy框架構建(2.7版本)
※第十一章 Scrapy入門程序點評