Debezium Features
Debezium is a set of source connectors for Apache Kafka Connect. Each connector ingests changes from a different database by using that database’s features for change data capture (CDC). Unlike other approaches, such as polling or dual writes, log-based CDC as implemented by Debezium:
-
Ensures that all data changes are captured.
-
Produces change events with a very low delay while avoiding increased CPU usage required for frequent polling. For example, for MySQL or PostgreSQL, the delay is in the millisecond range.
-
Requires no changes to your data model, such as a "Last Updated" column.
-
Can capture deletes.
-
Can capture old record state and additional metadata such as transaction ID and causing query, depending on the database’s capabilities and configuration.
Five Advantages of Log-Based Change Data Capture is a blog post that provides more details.
Debezium connectors capture data changes with a range of related capabilities and options:
-
Snapshots: optionally, an initial snapshot of a database’s current state can be taken if a connector is started and not all logs still exist. Typically, this is the case when the database has been running for some time and has discarded trannsaction logs that are no longer needed for transaction recovery or replication. There are different modes for performing snapshots, including support for incremental snapshots, which can be triggered at connector runtime. For more details, see the documentation for the connector that you are using.
-
Filters: you can configure the set of captured schemas, tables and columns with include/exclude list filters.
-
Masking: the values from specific columns can be masked, for example, when they contain sensitive data.
-
Monitoring: most connectors can be monitored by using JMX.
-
Ready-to-use message transformations for message routing, filtering, event flattening, and more; see Transformations for an overview of all the SMTs coming with Debezium.
See the connector documentation for a list of all supported databases and detailed information about the features and configuration options of each connector.
Debezium can also be used as library embedded into your JVM-based applications; via Debezium Server, you can emit change events to messaging infrastructure like Amazon Kinesis, Google Cloud Pub/Sub, Apache Pulsar, etc.