Debezium Blog

I’m happy to announce that Debezium 0.2.1 is now available. The MySQL connector has been significantly improved and is now able to monitor and produce change events for HA MySQL clusters using GTIDs, perform a consistent snapshot when starting up the first time, and has a completely redesigned event message structure that provides a ton more information with every event. Our change log has all the details about bugs, enhancements, new features, and backward compatibility notices. We’ve also updated our tutorial.

Update (Oct. 11 2019): An alternative, and much simpler, approach for running Debezium (and Apache Kafka and Kafka Connect in general) on Kubernetes is to use a K8s operator such as Strimzi. You can find instructions for the set-up of Debezium on OpenShift here, and similar steps apply for plain Kubernetes.
Our Debezium Tutorial walks you step by step through using Debezium by installing, starting, and linking together all of the Docker containers running on a single host machine. Of course, you can use things like Docker Compose or your own scripts to make this easier, although that would just automating running all the containers on a single machine. What you really want is to run the containers on a cluster of machines. In this blog, we’ll run Debezium using a container cluster manager from Red Hat and Google called Kubernetes.
Kubernetes is a container (Docker/Rocket/Hyper.sh) cluster management tool. Like many other popular cluster management and compute resource scheduling platforms, Kubernetes' roots are in Google, who is no stranger to running containers at scale. They start, stop, and cluster 2 billion containers per week and they contributed a lot of the Linux kernel underpinnings that make containers possible. One of their famous papers talks about an internal cluster manager named Borg. With Kubernetes, Google got tired of everyone implementing their papers in Java so they decided to implement this one themselves :)
Kubernetes is written in Go-lang and is quickly becoming the de-facto API for scheduling, managing, and clustering containers at scale. This blog isn’t intended to be a primer on Kubernetes, so we recommend heading over to the Getting Started docs to learn more about Kubernetes.

When our MySQL connector is reading the binlog of a MySQL server or cluster, it parses the DDL statements in the log and builds an in-memory model of each table’s schema as it evolves over time. This process is important because the connector generates events for each table using the definition of the table at the time of each event. We can’t use the database’s current schema, since it may have changed since the point in time (or position in the log) where the connector is reading.
Parsing DDL of MySQL or any other major relational database can seem to be a daunting task. Usually each DBMS has a highly-customized SQL grammar, and although the data manipulation language (DML) statements are often fairly close the standards, the data definition language (DDL) statements are usually less so and involve more DBMS-specific features.
So given this, why did we write our own DDL parser for MySQL? Let’s first look at what Debezium needs a DDL parser to do.

As you may have noticed, we have a new website with documentation, a blog, and information about the Debezium community and how you can contribute. Let us know what you think, and contribute improvements.

Debezium is a distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.
Now the good news — Debezium 0.1 is now available and includes several significant features:
-
A connector for MySQL to monitor MySQL databases. It’s a Kafka Connect source connector, so simply install it into a Kafka Connect service (see below) and use the service’s REST API to configure and manage connectors to each DBMS server. The connector reads the MySQL binlog and generates data change events for every committed row-level modification in the monitored databases. The MySQL connector generates events based upon the tables' structure at the time the row is changed, and it automatically handles changes to the table structures.
-
A small library so applications can embed any Kafka Connect connector and consume data change events read directly from the source system. This provides a much lighter weight system (since Zookeeper, Kafka, and Kafka Connect services are not needed), but as a consequence is not as fault tolerant or reliable since the application must maintain state normally kept inside Kafka’s distributed and replicated logs. Thus the application becomes completely responsible for managing all state.