【爬蟲】用Scrapy做分散式爬蟲:1.環境搭建

由於版本和系統的原因,一直沒有用過Scrapy,前段時間看Scrapy 支持Python3了,就在伺服器上做一個分散式爬蟲實驗一下。這裡是記錄。系統為CentOS7

1.安裝python3

編譯環境

yum install zlib-devel bzip2-devel openssl-devel ncurese-devel

下載安裝python3

wget https://www.python.org/ftp/python/3.6.0/Python-3.6.0.tgztar -xzvf Python-3.6.0.tgzcd Python-3.6.0./configure --prefix=/usr/local/python3make && make installsudo ln -s /usr/local/python3/bin/python3 /usr/bin/python3

下載安裝pip3

wget --no-check-certificate https://github.com/pypa/pip/archive/9.0.1.tar.gztar -zvxf 9.0.1.tar.gzcd pip-9.0.1python3 setup.py installsudo ln -s /usr/local/python3/bin/pip /usr/bin/pip3

更新

pip3 install --upgrade pip

2.virtualenv

pip3 install -U virtualenvmkdir workSpacecd workSpace/

創建虛擬環境

python3 -m venv MS # MS為虛擬環境的名字,可以自己隨意更換

使用虛擬環境

source MS/bin/activate

退出虛擬環境

deactivate

2.Redis

Reids可以不用裝,因為Scrapy有自己對應的redis包。

wget http://labfile.oss.aliyuncs.com/files0422/redis-2.8.9.tar.gztar xvfz redis-2.8.9.tar.gzcd redis-2.8.9makecd srcmake install

配置Redis能隨系統啟動

./utils/install_server.sh

開啟

/etc/init.d/redis_6379 start

關閉

/etc/init.d/redis_6379 stop

3.MySQL

安裝說明

wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpmsudo rpm -ivh mysql-community-release-el7-5.noarch.rpmyum updatesudo yum install mysql-serversudo systemctl start mysqldsudo mysql_secure_installation

4.django

安裝django

pip install django

django相關命令

django-admin startprojectpython manage.py startapppython manage.py makemigrationspython manage.py migratepython manage.py runserverpython manage.py createsuperuserpython manage.py collectstatic

5.Scarpy

首先準備環境

yum install gcc libffi-devel openssl-devel libxml2 libxslt-devel libxml2-devel python-devel -y

安裝easy_install

yum install python-setuptoolseasy_install lxmlpip install scrapy

相關命令

6.uwsgi

pip install uwsgi

配置Django項目里的uwsgi.ini

uwsgi --ini uwsgi.iniuwsgi --stop uwsgi.pid

7.nginx

sudo rpm -Uvh http://nginx.org/packages/centos/7/noarch/RPMS/nginx-release-centos-7-0.el7.ngx.noarch.rpmsudo yum install nginxsudo systemctl start nginx.service#啟動關閉等命令sudo nginx -s stopsudo nginxsudo /sbin/nginx -s stop

推薦閱讀:

Scrapy學習實例(三)採集批量網頁
[python]scrapy框架構建(2.7版本)
第十一章 Scrapy入門程序點評

TAG:scrapy | Python | 爬虫计算机网络 |