WSJ文章翻譯:華爾街的無限慾望:數據,數據,數據

Wall Street』s Insatiable Lust: Data, Data, Data

華爾街的無限慾望:數據,數據,數據

『The opportunity we are chasing is that in all this huge data there are little nuggets of alpha gold』

我們所追尋的是在數據的海洋中的那些含金量極高的金塊

By Bradley Hope

A new species is prowling America』s most obscure industry conferences: the data hunter.

一個新的物種在美國各種晦澀的行業論壇上徘徊:數據獵人。

Erik Haines, head of data and analytics at New York-based Guidepoint Global LLC, trawls the globe for meaningful data to sell to hedge-fund clients. One of his best strategies is to attend the most seemingly mundane gatherings, such as the Association for Healthcare Resource & Materials Management conference in San Diego last year, and the National Industrial Transportation League event in New Orleans.

Erik Haines是 Guidepoint Global LLC(一家設立於紐約的公司)數據與分析部門負責人,他的主要工作是篩選有價值的數據賣給對沖基金。一個行之有效的策略是參加看來很細分領域的聚會,比如去年在聖迭戈的健康資源與材料管理協會的論壇,和在新奧爾良的全國工業運輸聯盟的會議。

「I walk the floor, try to talk to companies and get a sense within an industry of who collects data that could provide a unique insight into that industry,」 he said.

我在會場穿行與企業們交談,試圖了解在這個行業中誰收集了能提供行業洞察的數據。

Hedge funds and other sophisticated investors are increasingly relying on intermediaries like Mr. Haines, 35 years old, as they seek insights into a company』s sales and health that aren』t readily available from conventional sources.

對沖基金和其他久經世故的投資人越來越多的開始依賴數據中間商,比如像35歲的Haines先生這樣,嘗試用可以獲得的非傳統數據來源數據評價一個企業銷售和健康程度。

The information ranges from crop yields calculated by satellite images and linguistic analyses of speeches by CEOs to credit-card transactions and monitoring of sentiment about companies on social media.

這些新的數據包羅萬象,從衛星照片中計算穀物產量,從語音分析CEO的演講,到信用卡交易,甚至是社交媒體上關於公司的情緒。

The accuracy of this data is a subject of debate within the investment world, with some arguing the information is based around samples riddled with biases and errors.

這些數據的準確度在投資界還存在爭論,一些人聲稱這些信息是基於充斥著偏見與錯誤的樣本生成的。

Data hunters scour the business world for companies that have data useful for predicting the stock prices of other companies. For instance, a company that processes transactions at stores could have market-moving information on how certain products or brands are selling or a company that provides software to hospitals could give insights into how specific medical devices are being used.

數據獵人四處出擊,尋找那些擁有可以用來預測其他公司股票價格的數據,例如,一家公司處理線下店面交易數據可以獲知市場上某些產品或品牌的份額走勢,或者一家給醫院提供軟體的企業可以獲知某些特定的醫療儀器是如何被使用的。

Gone are the days when a hedge fund would call up a random sampling of Aéropostale stores to ask managers about sales or simply visit big-box retailers to get a feel for the traffic.

在以前獲取這些數據的方式是對沖基金是通過隨機訪問店面(比如Aéropostale ,美國青少年品牌)去詢問經理銷售情況或者去一個大賣場感受相關的客流情況。

In one recent example, Mr. Haines discovered a mobile advertising company that also collected data on the type of device someone was using when displaying an ad to them. The data helped estimate iPhone sales ahead of Apple Inc. 』s announcements in 2011 and 2012, and it was lucrative for Mr. Haines』s old company, Quanton Data.

一個最近的例子是,Haines先生髮現一個移動廣告公司在用戶播放廣告的同時也收集了設備型號相關的數據,這個數據用於在蘋果公布2011和2012年銷售業績之前估計iphone的銷售情況,這個數據使Haines的老東家 Quanton Data大賺一筆。

He and the team from Quanton joined Guidepoint, a company that has traditionally provided experts and survey data to customers, earlier this summer.

後來他和他的團隊離開了Quanton加入了GuidePoint。這個公司原來是從事專家數據和客戶調研數據提供業務的

Some hedge funds have built data-hunting teams internally, especially so-called quants whose strategies rely entirely on finding patterns in large sets of data. Quants typically analyze market data—prices and volume over time—but increasingly are taking those skills to this type of data, which is called 「exhaust,」 because it is a secondary result of a company』s main business.

有一些對沖基金選擇了在內部建立數據搜集團隊,尤其是那些被稱為「寬客(金融工程師)」,這些人的交易策略完全依賴於在大量數據中尋找數據規律。寬客們傳統上分析市場數據,比如價格、交易量及這些數據隨時間的變化。但是,這些數據主要是企業主營業務的經營的附帶結果,越來越多的分析這種數據效果有限。

WorldQuant LLC, a quantitative hedge fund based in Connecticut, has a team that reviews hundreds of data sets a year and works to bring online as many as possible that provide some value, according to a person familiar with its strategy. Its staff of scientists and mathematicians then go to work on the data to see if it helps predict revenues at companies or other market phenomena.

WorldQuant LLC是一個位於康涅迪格州的量化對沖基金,這家基金有一個專業團隊,每年去分析研究幾百個數據集,並將研究成果在線上發布,之後,這些數據科學家進一步了解這些數據是否有助於預測公司的營業收入或市場的特定現象。

A host of startups also are trying to make it easier for funds without high-powered data-science staffers to get the same insights. One, called Quandl Inc., based in Toronto, offers a platform that includes traditional market data alongside several 「alternative」 data.

一批創業企業都致力於使那些沒有高大上的數據科學家團隊的基金更容易獲得對市場的洞見。一個位於多倫多的企業,Quandl Inc提供了一個數據平台,包括傳統的市場數據,同時提供多種「替代性」數據。

「The opportunity we are chasing is that in all this huge data there are little nuggets of alpha gold,」 said Tammer Kamel, its founder and CEO.

Tammer Kamel,這家公司的創始人和CEO說:我們所追尋的是在數據的海洋中的那些含金量極高的金塊

The firm struck a deal with a large insurance company to find out every day what kinds of cars received insurance policies, a possible indicator of how sales are going for automobile manufacturers.

公司剛獲得一個大型保險公司的單子,找到每天什麼類型的汽車買了保險,一個潛在的指標反映各家廠商的車賣的怎麼樣。

Another deal is with a company that surveys construction permits across county municipal offices, which is a 「proxy for construction activity,」 he said. While there are indexes that compile official construction numbers from the same data, the company』s goal is to be ahead of these indexes and take advantage of the government』s infrequent updates.

公司另一個單子是搜集各地各種主管單位頒發的「建築許可證", 這些數據同樣是各種建築數量指數的編製來源,公司的目標是領先於這些指數的發布,從政府緩慢、低頻的發布行為中獲利。

Most data in the world is fairly useless for predicting the prices of stocks and other securities, which makes data hunting all the more difficult, he said. Some cite social media as a poor predictor of company behavior.

世界上大多數的數據對預測股票和其他證券的價格是無用的,這使得獵取數據變得更加困難,有的人認為用社交媒體數據預測公司行為準確度不高。

There are also companies set up to create exhaust. In those cases, often a person』s data is the price of a free phone application or service.

也有些公司創造主動收集數據的機會,在這些案例中,通常,一個人享受免費的App應用是以其個人數據為代價的。

For example, app provider Slice Technologies Inc. lets users track the arrival of packages to their homes in its signature Slice app or block spam through another service it owns called Unroll.me without charge.

比如,App設備商Slice Technology Inc.的app讓用戶跟蹤包裹的到達狀態,或Unroll提供的免費垃圾郵件過濾服務。

But in exchange for those services, about four million users allow the company to read their emails. Slice, in turn, also analyzes receipts and other data in a person』s email which it packages into anonymized data for advertisers and hedge funds. It might show Amazon.com Inc. selling more of a particularly profitable item or an increase in Netflix subscriptions, which investors can use as a factor in their trades.

但是,作為獲得服務的代價,400多萬用戶允許服務商閱讀他們的郵件,Slice公司分析個人的郵件收據和其他郵件中的數據,脫敏後提供給廣告主和對沖基金,進而分析亞馬遜公司的業務情況,比如更有利可圖的產品線 的銷售是否增長,或Netflix的訂閱是否增加,從而投資人可以以此為依據制定交易策略。

Slice users agree to let the company use their data for other purposes, provided it is anonymized, when they sign up.

在用戶註冊時,Slice的用戶同意公司可以使用脫敏後的數據用於其他用途。

The data can also be useful in private-equity transactions by giving investors information about sales at private companies such as Uber Technologies Inc. and Airbnb Inc., said Jaimee Minney, vice president of communications at Slice. 「We can see average fares, number of customers and demographics,」 she said.

Slice的對外合作副總裁說:「這些數據在私募投資交易時也可以幫助投資者了解投資對象的銷售情況,比如對於像Uber和Airbnb公司,我們可以了解客單價,客戶數量和客戶畫像。」

Uber declined to comment and Airbnb didnt respond to a request for comment.

Uber拒絕對此發表評論,Airbnb沒有對此進行回應。

(文章原載於9月13日華爾街日報網站,作者Bradley Hope)

譯後記

從王煜全的得到訂閱號中獲知這篇文章的,後來好奇,上網搜到了原文,借翻譯的機會讓自己認真讀了一讀,想了一想。

國內大數據創業方興未艾,從數據源的角度看,目前投資人和創業者的關注重點不少都在政府開放的數據資源和通過購買介面獲得的數據資源,以及基於這些資源的應用。積累獨特的數據資源,結合行業知識,進行數據跨界應用的案例還不多。

數據應用與數據服務是個特殊的產業,互聯網是先應用後積累,數據是先積累後應用,沒有獨特的數據源,沒有提升數據價值密度的knowhow積累,很多應用做出來是酷炫的,但經不住深入的推敲,更換不來付費的客戶和用戶。

傳統企業和組織其實有很多數據資源,對這些數據資源的搜集,清洗,加工,與其他數據資源的融合,結合行業知識創造新的應用,是大數據創業未來的方向。

這種搜集,清洗,加工的過程是漫長而痛苦的,這個領域未必會有快公司,但可以先定一個可以實現的小目標,成為一家小而美的公司。

PS:鑒於英語水平所限,歡迎大家拍磚指正。
推薦閱讀:

阿里巴巴下一代數據集成技術
基於雲上分散式NoSQL的海量氣象數據存儲和查詢方案
#研發解決方案#數據開放實驗室:再戰即席查詢和數據開放
大數據學習計劃(不斷改善)
消費金融大數據、決策與場景如何做?

TAG:大数据 | 大数据营销 |