Kafka 2017技術峰會摘要（pipeline分類）

02-03

下載全部視頻和PPT，請關注公眾號(bigdata_summit)，並點擊「視頻下載」菜單

Billions of Messages a Day – Yelp』s Real-time Data Pipeline

by Justin Cunningham, Technical Lead, Software Engineering, Yelp

video, slide

Yelp moved quickly into building out a comprehensive service oriented architecture, and before long had over 100 data-owning production services. Distributing data across an organization creates a number of issues, particularly around the cost of joining disparate data sources, dramatically increasing the complexity of bulk data applications. Straightforward solutions like bulk data APIs and sharing data snapshots have significant drawbacks. Yelp』s Data Pipeline makes it easier for these services to communicate with each other, provides a framework for real-time data processing, and facilitates high-performance bulk data applications – making large SOAs easier to work with. The Data Pipeline provides a series of guarantees that makes it easy to create universal data producers and consumers that can be mashed up into interesting real-time data flows. We』ll show how a few simple services at Yelp lay the foundation that powers everything from search to our experimentation framework.

下面的內容來自機器翻譯:

Yelp迅速建立了一個全面的面向服務的架構，不久之後擁有了100多個擁有數據的生產服務。在整個組織中分發數據會產生很多問題，特別是加入不同數據源的成本，這會大大增加批量數據應用程序的複雜性。直接的解決方案，如批量數據API和共享數據快照有明顯的缺點。 Yelp的數據管道使得這些服務可以更容易地相互通信，為實時數據處理提供框架，並促進高性能批量數據應用程序 - 使大型SOA更易於使用。數據管道提供了一系列的保證，使創建通用的數據生產者和消費者變得容易，可以將其融入到有趣的實時數據流中。我們將展示Yelp的一些簡單服務如何奠定了從搜索到實驗框架的所有功能的基礎。

Body Armor for Distributed System

by Michael Egorov, Co-founder and CTO, NuCypher

video, slide

We show a way to make Kafka end-to-end encrypted. It means that data is ever decrypted only at the side of producers and consumers of the data. The data is never decrypted broker-side. Importantly, all Kafka clients have their own encryption keys. There is no pre-shared encryption key. Our approach can be compared to TLS implemented for more than two parties connected together.

下面的內容來自機器翻譯:

我們展示了一種使Kafka端到端加密的方法。這意味著數據只能在數據的生產者和消費者一方解密。數據永遠不會解密經紀方。重要的是，所有的卡夫卡客戶端都有自己的加密密鑰。沒有預共享加密密鑰。我們的方法可以與兩個以上連接在一起的雙方實施的TLS進行比較。

DNS for Data: The Need for a Stream Registry

by Praveen Hirsave, Director Cloud Engineering, HomeAway

video, slide

As organizations increasingly adopt streaming platforms such as kafka, the need for visibility and discovery has become paramount. Increasingly, with the advent of self-service streaming and analytics, a need to increase on overall speed, not only on time-to-signal, but also on reducing times to production is becoming the difference between winners and losers. Beyond Kafka being at the core of successful streaming platforms, there is a need for a stream registry. Come to this session to find out how HomeAway is solving this with a 「just right」 approach to governance.

下面的內容來自機器翻譯:

隨著組織越來越多地採用卡夫卡等流媒體平台，對可視性和發現的需求已經變得至關重要。隨著自助式流媒體和分析技術的出現，越來越需要提高總體速度，不僅要在信號發送時間上，而且要縮短產量時間成為贏家和輸家之間的差異。除了Kafka是成功的流媒體平台的核心之外，還需要一個流註冊表。來參加本次會議，了解HomeAway如何以「恰到好處」的方式解決這個問題。

Efficient Schemas in Motion with Kafka and Schema Registry

by Pat Patterson, Community Champion, StreamSets Inc.

video, slide

Apache Avro allows data to be self-describing, but carries an overhead when used with message queues such as Apache Kafka. Confluent』s open source Schema Registry integrates with Kafka to allow Avro schemas to be passed 『by reference』, minimizing overhead, and can be used with any application that uses Avro. Learn about Schema Registry, using it with Kafka, and leveraging it in your application.

下面的內容來自機器翻譯:

Apache Avro允許數據是自描述的，但是在與諸如Apache Kafka之類的消息隊列一起使用時會帶來開銷。 Confluent的開源Schema Registry與Kafka集成，允許Avro模式通過「引用」傳遞，最大限度地減少開銷，並且可以與任何使用Avro的應用程序一起使用。了解Schema Registry，將其與Kafka一起使用，並在您的應用程序中使用它。

From Scaling Nightmare to Stream Dream : Real-time Stream Processing at Scale

by Amy Boyle, Software Engineer, New Relic

video, slide

On the events pipeline team at New Relic, Kafka is the thread that stitches our micro-service architecture together. We receive billions of monitoring events an hour, which customers rely on us to alert on in real-time. Facing a ten fold+ growth in the system, learn how we avoided a costly scaling nightmare by switching to a streaming system, based on Kafka. We follow a DevOps philosophy at New Relic. Thus, I have a personal stake in how well our systems perform. If evaluation deadlines are missed, I loose sleep and customers loose trust. Without necessarily setting out to from the start, we』ve gone all in, using Kafka as the backbone of an event-driven pipeline, as a datastore, and for streaming updates to the system. Hear about what worked for us, what challenges we faced, and how we continue to scale our applications.

下面的內容來自機器翻譯:

在New Relic的事件管道團隊中，Kafka是將我們的微服務架構縫合在一起的線索。我們每小時接收數十億次監控事件，客戶依靠我們實時提醒。面對系統十倍以上的增長，了解我們如何通過切換到基於Kafka的流媒體系統來避免昂貴的縮放夢魘。我們遵循New Relic的DevOps理念。因此，我們的系統表現如何，我有個人利益。如果錯過了評估的最後期限，我會放鬆睡眠，客戶也會失去信任。從一開始我們就不必一一列出，我們已經全部使用Kafka作為事件驅動管道的骨幹，作為一個數據存儲，以及將更新流式傳輸到系統。聽聽我們的工作，我們面臨的挑戰，以及我們如何繼續擴大我們的應用程序。

How Blizzard Used Kafka to Save Our Pipeline (and Azeroth)

by Jeff Field, Systems Engineer, Blizzard

video, slide

When Blizzard started sending gameplay data to Hadoop in 2013, we went through several iterations before settling on Flumes in many data centers around the world reading from RabbitMQ and writing to central flumes in our Los Angeles datacenter. While this worked at first, by 2015 we were hitting problems scaling to the number of events required. This is how we used Kafka to save our pipeline.

下面的內容來自機器翻譯:

當暴雪於2013年開始向Hadoop發送遊戲數據時，我們經歷了幾次迭代，然後在世界各地的許多數據中心從FlubitMQ讀取Flumes，然後寫入洛杉磯數據中心的中心水槽。雖然這個工作起初，到2015年，我們遇到的問題擴大到所需的事件數量。這就是我們如何使用卡夫卡拯救我們的管道。

Kafka Connect Best Practices – Advice from the Field

by Randall Hauch, Engineer, Confluent

video, slide

This talk will review the Kafka Connect Framework and discuss building data pipelines using the library of available Connectors. We』ll deploy several data integration pipelines and demonstrate :

best practices for configuring, managing, and tuning the connectorsntools to monitor data flow through the pipelinenusing Kafka Streams applications to transform or enhance the data in flight.n

下面的內容來自機器翻譯:

本次演講將回顧Kafka Connect Framework並討論使用可用連接器庫構建數據管道。我們將部署多個數據集成管道並進行演示：

配置，管理和調整連接器的最佳實踐

工具來監視通過管道的數據流

使用Kafka Streams應用程序來轉換或增強飛行中的數據。

One Data Center is Not Enough: Scaling Apache Kafka Across Multiple Data Centers

by Gwen Shapira, Product Manager, Confluent

video, slide

You have made the transition from single machines and one-off solutions to distributed infrastructure in your data center powered by Apache Kafka. But what if one data center is not enough? In this session, we review resilient data pipelines with Apache Kafka that span multiple data centers. We provide an overview of best practices and common patterns including key areas such as architecture and data replication as well as disaster scenarios and failure handling.

下面的內容來自機器翻譯:

您已經從單台機器和一次性解決方案轉換到由Apache Kafka提供支持的數據中心內的分散式基礎架構。但是如果一個數據中心不夠用呢？在本次會議中，我們將使用跨越多個數據中心的Apache Kafka來審查彈性數據管道。我們概述最佳實踐和常見模式，包括架構和數據複製等關鍵領域以及災難場景和故障處理。