Debezium Blog
I’m pleased to announce the release of Debezium 1.6.0.Beta1!
This release introduces incremental snapshot support for SQL Server and Db2, performance improvements for SQL Server, support for BLOB/CLOB for Oracle, and much more. Lets take a few moments and explore some of these new features in the following.
It’s my pleasure to announce the first release of the Debezium 1.6 series, 1.6.0.Alpha1!
This release brings the brand new feature called incremental snapshots for MySQL and PostgreSQL connectors, a Kafka sink for Debezium Server, as well as a wide range of bug fixes and other small feature additions.
I’m thrilled to announce the release of Debezium 1.5.0.Final!
With Debezium 1.5, the LogMiner-based CDC implementation for Oracle moves from Incubating to Stable state,
and there’s a brand-new implementation of the MySQL connector,
which brings features like transaction metadata support.
Other key features include support for a new "signalling table", which for instance can be used to implement schema changes with the Oracle connector,
and support for TRUNCATE
events with Postgres.
There’s also many improvements to the community-led connectors for Vitess and Apache Cassandra,
as well as wide range of bug fixes and other smaller improvements.
It’s my pleasure to announce the release of Debezium 1.5.0.CR1!
As we begin moving toward finalizing the Debezium 1.5 release stream, the Oracle connector has been promoted to stable and there were some TLS improvements for the Cassandra connector, as well as numerous bugfixes. Overall, 50 issues have been addressed for this release.
Kafka Streams is a library for developing stream processing applications based on Apache Kafka. Quoting its docs, "a Kafka Streams application processes record streams through a topology in real-time, processing data continuously, concurrently, and in a record-by-record manner". The Kafka Streams DSL provides a range of stream processing operations such as a map, filter, join, and aggregate.
Non-Key Joins in Kafka Streams
Debezium’s CDC source connectors make it easy to capture data changes in databases and push them towards sink systems such as Elasticsearch in near real-time. By default, this results in a 1:1 relationship between tables in the source database, the corresponding Kafka topics, and a representation of the data at the sink side, such as a search index in Elasticsearch.
In case of 1:n relationships, say between a table of customers and a table of addresses, consumers often are interested in a view of the data that is a single, nested data structure, e.g. a single Elasticsearch document representing a customer and all their addresses.
This is where KIP-213 ("Kafka Improvement Proposal") and its foreign key joining capabilities come in: it was introduced in Apache Kafka 2.4 "to close the gap between the semantics of KTables in streams and tables in relational databases". Before KIP-213, in order to join messages from two Debezium change event topics, you’d typically have to manually re-key at least one of the topics, so to make sure the same key is used on both sides of the join.
Thanks to KIP-213, this isn’t needed any longer, as it allows to join two Kafka topics on fields extracted from the Kafka message value, taking care of the required re-keying automatically, in a fully transparent way. Comparing to previous approaches, this drastically reduces the effort for creating aggregated events from Debezium’s CDC events.