It’s with great excitement that I’m announcing the release of Debezium 0.8.0.Beta1!

This release brings many exciting new features as well as bug fixes, e.g. the first drop of our new Oracle connector, a brand new DDL parser for the MySQL connector, support for MySQL default values and the update to Apache Kafka 1.1.

Due to the big number of changes (the release contains exactly 42 issues overall), we decided to alter our versioning schema a little bit: going forward we may do one or more Beta and CR ("candidate release") releases before doing a final one. This will allow us to get feedback from the community early on, while still completing and polishing specific features. Final (stable) releases will be named like 0.8.0.Final etc.

This release would not have been possible without our outstanding community; a huge "thank you" goes out to the following open source enthusiasts who all contributed to the new version: Echo Xu, Ivan Vucina, Listman Gamboa, Omar Al-Safi, Peter Goransson, Roman Kuchar (who did a tremendous job with the new DDL parser implementation!), Sagar Rao, Saulius Valatka, Sairam Polavarapu, Stephen Powis and WenZe Hu.

Thank you all very much for your help!

Now let’s take a closer look at some of the features new in Debezium 0.8.0.Beta1; as always, you can find the complete list of changes of this release in the change log. Plese take a special look at the breaking changes and the upgrade notes.

XStream-based Oracle Connector (Tech Preview)

Support for a Debezium Oracle connector has been one of the most asked for features for a long time (its original issue number is DBZ-20!). So we are very happy that we eventually can release a first work-in-progress version of that connector. At this point this code is still very much evolving, so it should be considered as a first tech preview. This means it’s not feature complete (most notably, there’s no support for initial snapshots yet), the emitted message format may still change etc. So while we don’t recommend using it in production quite yet, you should definitely give it a try and report back about your experiences.

One challenge for the Oracle connector is how to get the actual change events out of the database. Unlike with MySQL and Postgres, there’s unfortunately no free-to-use and easy-to-work-with API which would allow to do the same for Oracle. After some exploration we decided to base this first version of the connector on the Oracle XStream API. While this (kinda) checks the box for "easy-to-work-with", it doesn’t do so for "free-to-use": using this API requires you to have a license for Oracle’s separate GoldenGate product. We’re fully aware of this being not ideal, but we decided to still go this route as a first step, allowing us to get some experiences with Oracle and also get a connector into the hands of those with the required license handy. Going forward, we are going to explore alternative approaches. We already have some ideas and discussions around this, so please stay tuned (the issue to track is DBZ-137).

The Oracle connector is going to evolve within the next 0.8.x releases. To learn more about it, please check its connector documentation page.

Antlr-based MySQL DDL Parser

In order to build up an internal meta-model of the captured database’s structure, the Debezium MySQL connector needs to parse all issued DDL statements (CREATE TABLE etc.). This used to be done with a hand-written DDL parser which worked reasonably well, but over time it also revealed some shortcomings; as the DDL language is quite extensive, we saw repeatedly bug reports caused by some specific DDL constructs not being parseable.

So we decided to go back to the drawing board and came up with a brand new parser design. Thanks to the great work of Roman Kuchar, we now have a completely new DDL parser which is based on the proven and very mature Antlr parser generator (luckily, the Antlr project provides a complete MySQL grammar). So we should see much less issue reports related to DDL parsing going forward.

For the time being, the old parser still is in place and remains to be the default parser for Debezium 0.8.x. You are very encouraged though to test the new implementation by setting the connector option ddl.parser.mode to antlr and report back if you run into any issues doing so. We plan to improve and polish the Antlr parser during the 0.8.x release line (specifically we’re going to measure its performance and optimize as needed) and switch to it by default as of Debezium 0.9. Eventually, the old parser will be removed in a future release after that.

Further MySQL Connector Changes

The MySQL Connector propagates column default values to corresponding Kafka Connect schemas now (DBZ-191). That’s beneficial when using Avro as serialization format and the schema registry with compatibility checking enabled.

By setting the include.query connector option to true, you can add the original query that caused a data change to the corresponding CDC events (DBZ-706). While disabled by default, this feature can be a useful tool for analyzing and interpreting data changes captured with Debezium.

Some other changes in the MySQL connector include configurability of the heartbeat topic name (DBZ-668), fixes around timezone handling for TIMESTAMP (DBZ-578) and DATETIME columns (DBZ-741) and correct handling of NUMERIC column without an explicit scale value (DBZ-727).

Postgres Connector

The Debezium Connector for Postgres has seen quite a number of bugfixes, including the following ones:

  • wal2json can handle transactions now that are bigger than 1Gb (DBZ-638)

  • the transaction ID is consistently handled as long now (DBZ-673)

  • multiple fixes related to temporal column types (DBZ-681, DBZ-696)

  • OIDs are handled correctly as unsigned int now (DBZ-697, DBZ-701)

MongoDB Connector

Also for the MongoDB Connector a number of small feature implementations and bugfixes has been done:

  • Tested against MongoDB 3.6 (DBZ-529)

  • Nested documents can be flattened using a provided SMT now (DBZ-561), which is useful when sinking changes from MongoDB into a relational database

  • The unwrapping SMT can be used together with Avro now (DBZ-650)

  • The unwrapping SMT can handle arrays with mixed element types (DBZ-649)

  • When interrupted during snapshotting before completion, the connector will redo the snapshot after restarting (DBZ-712)

What’s next?

As per the new Beta/CR/Final release scheme, we hope to get some feedback by the community (i.e. you :) on this Beta release. Depending on the number of issues reported, we’ll either release another Beta or go to CR1 with the next version. The 0.8.0.Final version will be released within a few weeks. Note that the Oracle connector will remain a "tech preview" component also in the final version.

After that, we’ve planned to do a few 0.8.x releases with bug fixes mostly, while work on Debezium 0.9 will commence in parallel. For that we’ve planned to work on a connector for SQL Server (see DBZ-40). We’d also like to explore means of creating consistent materializations of joins from multiple tables' CDC streams, based on the ids of originating transactions. Also there’s the idea and a first prototype of exposing Debezium change events as a reactive event stream (DBZ-566), which might be shipped eventually.

Please take a look at the roadmap for some more long term ideas and get in touch with us, if you got thoughts around that.

Gunnar Morling

Gunnar is a software engineer at Red Hat and open-source enthusiast by heart. A long-time Hibernate core team member, he's now the project lead of Debezium. Gunnar is the spec lead for Bean Validation 2.0 (JSR 380). He’s based in Hamburg, Germany.

   


About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.