谷歌爬蟲主要是用C++開發嗎？

12-28

聽說是以C++為主，以Python為輔，是這樣嗎？覺得Python寫起來比較方便··

從谷歌創始人的論文《The Anatomy of a Large-Scale Hypertextual Web Search Engine
》里可以看到，谷歌一開始的爬蟲是用Python寫的。
http://infolab.stanford.edu/pub/papers/google.pdf

4.3 Crawling the Web
In order to scale to hundreds of millions of web pages, Google has a fast distributed crawling system. A single URLserver serves lists of URLs to a number of crawlers (we typically ran about 3). Both the URLserver and the crawlers are implemented in Python. Each crawler keeps roughly 300 connections open at once. This is necessary to retrieve web pages at a fast enough pace. At peak speeds, the system can crawl over 100 web pages per second using four crawlers. This amounts to roughly 600K per second of data. A major performance stress is DNS lookup. Each crawler maintains a its own DNS cache so it does not need to do a DNS lookup before crawling each document. Each of the hundreds of connections can be in a number of different states: looking up DNS, connecting to host, sending request, and receiving response. These factors make the crawler a complex component of the system. It uses asynchronous IO to manage events, and a number of queues to move page fetches from state to state.

據說現在改成了C++，但是我沒有找到明確的材料。

目前是的
Why did Google move from Python to C++ for use in its crawler?