What is change data capture? Does it have any importance or bearing on the work that you do? We’ll answer the first question shortly, but the answer to the second question is most certainly “yes.”
What Is Change Data Capture?
Change data capture (CDC) solves data integration problems by monitoring, inserting, updating, and deleting changes to records or activity on table data.
The first step in this process is to identify that a change has been made to a data source. To do so, a CDC procedure will inspect the relevant table to see if there is a delta between the old values and new values, based on what’s stored in the source systems, then update the pertinent record accordingly.
There are several good methods for doing this — high watermark, diff, and database triggers, for example. However, one of the superior approaches to track data changes is log-based CDC.
While interpreting the changes in the transaction log is difficult, the biggest benefit of log-based change data capture is the asynchronous nature of the CDC: changes are captured independent of the source application performing the changes. With log-based CDC, new database transactions – including inserts, updates, and deletes – are read from source databases’ transaction or redo logs. The changes are captured without making application level changes and without having to scan operational tables, both of which add additional workload and reduce source systems’ performance.
By allowing you to capture and extract only changed data, change data capture eliminates or reduces the need for batch windows, replacing bulk load updates with continuous streaming or incremental loading of data. Change data capture helps you improve efficiency by reducing redundant data replication and delivery, reducing data warehousing costs and facilitating real-time data integration across your data stores.
Why Change Data Capture Matters
There are many advantages of using transaction logs for CDC:
- This solution is transparent to databases and has minimal impact on the database.
- Near real-time publishing of the changes to the target system. This means the business can take accurate decisions based on the most current data.
- No impact on the transactions at the source application, no additional SQL load on the system.
- Because CDC transfers only incremental changes it reduced the cost of transferring data.
- This approach maintains the order in which original transactions were committed. This is important when the target application depends on the order of transactions in the source system. The ordering guarantees are most often desired by target applications.
- The target system can take time to process the messages.
Together these advantages enable the building of streaming data pipelines that help to share application data across a business. This means that businesses are getting fed insights that are up to date and accurate based on the latest data being fed from across many systems. The decisions made from these insights help businesses to remain competitive in their respective markets.
How Etlworks can help
Etlworks is a high-performance data integration platform with the next-generation change data capture technology.
Etlworks supports native log-based change data capture for PostgreSQL, SQL Server, MySQL, Oracle, DB2 and MongoDB databases. Our intuitive visual interface makes it easy to set up, monitor, and manage your data pipelines, eliminating the need for scripting and ensuring quick time-to-value. Unlike other tools that support CDC, there are only two moving parts – your database and Etlworks. You will be up and running with CDC in a matter of minutes!