Kafka 2017技術峰會摘要（流計算分類）

02-04

下載全部視頻和PPT，請關注公眾號(bigdata_summit)，並點擊「視頻下載」菜單

Building Event-Driven Services with Stateful Streams

by Benjamin Stopford, Engineer, Confluent

video, slide

Event Driven Services come in many shapes and sizes from tiny event driven functions that dip into an event stream, right through to heavy, stateful services which can facilitate request response. This practical talk makes the case for building this style of system using Stream Processing tools. We also walk through a number of patterns for how we actually put these things together.

下面的內容來自機器翻譯:

事件驅動服務具有許多形式和尺寸，從小事件驅動的功能進入事件流，直到沉重，有狀態的服務，這可以方便請求響應。這個實際的談話使得使用流處理工具來構建這種類型的系統成為可能。我們也通過一些模式來解釋我們如何將這些東西放在一起。

Building Stateful Financial Applications with Kafka Streams

by Charles Reese, Senior Software Engineer, Funding Circle

video, slide

At Funding Circle, we are building a global lending platform with Apache Kafka and Kafka Streams to handle high volume, real-time processing with rapid clearing times similar to a stock exchange. In this talk, we will provide an overview of our system architecture and summarize key results in edge service connectivity, idempotent processing, and migration strategies.

下面的內容來自機器翻譯:

在Funding Circle，我們正在與Apache Kafka和Kafka Streams建立一個全球性的貸款平台，以處理大批量，實時的處理，快速的結算時間與證券交易所類似。在本次演講中，我們將概述我們的系統架構，並總結邊緣服務連接，冪等處理和遷移策略的關鍵成果。

Fast Data in Supply Chain Planning

by Jeroen Soeters, Lead Developer, ThoughtWorks

video, slide

We are migrating one of the top 3 consumer packaged goods companies from a batch-oriented systems architecture to a streaming micro services platform. In this talk I』ll explain how we leverage the Lightbend reactive stack and Kafka to achieve this and how the 4 Kafka APIs fit in our architecture. Also I explain why Kafka Streams <3 Enterprise Integration Patterns.

下面的內容來自機器翻譯:

我們正在將三大消費品公司之一從批處理系統架構遷移到流式微服務平台。在這個演講中，我將解釋我們如何利用Lightbend反應堆和Kafka來實現這個目標，以及4個Kafka API如何適應我們的架構。另外我解釋了為什麼Kafka Streams <3企業集成模式。

Kafka Stream Processing for Everyone with KSQL

by Nick Dearden, Director of Engineering, Confluent

video, slide

The rapidly expanding world of stream processing can be daunting, with new concepts (various types of time semantics, windowed aggregates, changelogs, and so on) and programming frameworks to master. KSQL is a new open-source project which aims to simplify all this and make stream processing available to everyone.

下面的內容來自機器翻譯:

隨著新概念（各種類型的時間語義，窗口聚合，更新日誌等）和編程框架的掌握，流處理的迅速發展的世界將變得艱巨。 KSQL是一個新的開源項目，旨在簡化所有這些工作，並為每個人提供流處理。

Portable Streaming Pipelines with Apache Beam

by Frances Perry, Software Engineer, Google

video, slide

Much as SQL stands as a lingua franca for declarative data analysis, Apache Beam aims to provide a portable standard for expressing robust, out-of-order data processing pipelines in a variety of languages across a variety of platforms. By cleanly separating the user』s processing logic from details of the underlying execution engine, the same pipelines will run on any Apache Beam runtime environment, whether it』s on-premise or in the cloud, on open source frameworks like Apache Spark or Apache Flink, or on managed services like Google Cloud Dataflow. In this talk, I will:

Briefly, introduce the capabilities of the Beam model for data processing and integration with IO connectors like Apache Kafka.

Discuss the benefits Beam provides regarding portability and ease-of-use.

Demo the same Beam pipeline running on multiple runners in multiple deployment scenarios (e.g. Apache Flink on Google Cloud, Apache Spark on AWS, Apache Apex on-premise).

Give a glimpse at some of the challenges Beam aims to address in the future.

下面的內容來自機器翻譯:

就像SQL作為聲明性數據分析的通用語言一樣，Apache Beam旨在提供一種攜帶型標準，用於在各種平台上以各種語言表示健壯的，無序的數據處理管道。通過將用戶的處理邏輯與基礎執行引擎的細節完全分離，相同的管道將運行在任何Apache Beam運行時環境（無論是內部部署還是雲中），Apache Spark或Apache Flink等開放源代碼框架上，還是像谷歌雲數據流管理的服務。在這個演講中，我會：

簡而言之，介紹Beam模型的功能，用於數據處理和IO連接器（如Apache Kafka）的集成。

討論Beam提供的有關便攜性和易用性的好處。

在多個部署場景（例如，Google Cloud上的Apache Flink，AWS上的Apache Spark，Apache Apex內部部署）上演示運行在多個運行器上的相同Beam管道。

瞥見梁在未來要解決的一些挑戰。

Query the Application, Not a Database: 「Interactive Queries」 in Kafka』s Streams API

by Matthias Sax, Engineer, Confluent

video, slide

Kafka Streams allows to build scalable streaming apps without a cluster. This 「Cluster-to-go」 approach is extended by a 「DB-to-go」 feature: Interactive Queries allows to directly query app internal state, eliminating the need for an external DB to access this data. This avoids redundantly stored data and DB update latency, and simplifies the overall architecture, e.g., for micro-services.

下面的內容來自機器翻譯:

Kafka Streams允許在沒有群集的情況下構建可擴展的流式應用程序。這種「Cluster-to-go」方法通過「DB-to-go」功能進行擴展：互動式查詢允許直接查詢應用程序內部狀態，無需外部資料庫來訪問這些數據。這避免了冗餘存儲的數據和資料庫更新等待時間，並且簡化了整體架構，例如對於微服務。

Real-Time Document Rankings with Kafka Streams

by Hunter Kelly, Senior Software/Data Engineer, Zalando

video, slide

The HITS algorithm creates a score for documents; one is 「hubbiness」, the other is 「authority」. Usually this is done as a batch operation, working on all the data at once. However, with careful consideration, this can be implemented in a streaming architecture using KStreams and KTables, allowing efficient real time sampling of rankings at a frequency appropriate to the specific use case.

下面的內容來自機器翻譯:

HITS演算法為文檔創建分數;一個是「喧囂」，一個是「權威」。通常這是作為批處理操作完成的，一次處理所有的數據。然而，經過慎重的考慮，這可以在使用KStreams和KTables的流式架構中實現，從而以適合特定用例的頻率對排名進行高效的實時採樣。

Streaming Processing in Python – 10 ways to avoid summoning Cuthulu

by Holden Karau, Principal Software Engineer, IBM

video, slide

<3 Python & want to process data from Kafka? This talk will look how to make this awesome. In many systems the traditional approach involves first reading the data into the JVM and then passing the data to Python, which can be a little slow, and on a bad day results in almost impossible to debug. This talk will look at how to be more awesome in Spark & how to do this in Kafka Streams.

下面的內容來自機器翻譯:

<3 Python＆想要處理來自Kafka的數據？這個演講將看看如何使這個真棒。在許多系統中，傳統的方法是首先將數據讀入JVM，然後將數據傳遞給Python，這可能會稍微慢一點，而在糟糕的一天中，幾乎不可能調試。這個演講將討論如何在Spark中更加棒，以及如何在Kafka Streams中做到這一點。