These are chat archives for debezium/user

Mar 2021
Mar 30 2021 00:35 UTC
@Naros I am using the ojdbc8- version.
I was seeing this, perform this test. Thank you very much
It is multiple database instances, and each database instance corresponds to a connector. After Kafka's connect distributed is started, it is normal for me to create multiple connectors. However, if I restart connect distributed, I will report missing log file
Mar 30 2021 01:47 UTC

@Naros I was able to connect to the database, but now I am facing another problem, the connector is running a 'setSessionToPdb' function and I'm taking the error below:

Caused by: Error: 2248, Position: 18, Sql = alter session set container = DBZUSER, OriginalSql = alter session set container = DBZUSER, Error Msg = ORA-02248: invalid option for ALTER SESSION

Chris Cranford
Mar 30 2021 05:42 UTC
@eltonmesquita87 Oracle 11 doesn't support CDB mode, so you should not provide the configuration option.
Hi, I'd recommend to enable trace level logging and see what is coming from the binlog
Also please check database history topic and verify that the DDL in there corresponds to current DDL
Well, it is more complicated. Debezium needs upon start to iterate through the first transaction to find the end but it does not store it anywhere. When it is done throught the initial iteration and identifies safely the restart point then it resum streaming from that so there is no worry about memory consumtpion
@smiklosovic You don't need to. We call it cut and paste error ;-)
sure, will do
Ok, that makes sense. One thing I found really odd though is the instances CPU utilization was really low, with spikes only as high as 20% while this was happening. We'd like to find a way to speed this process up but it doesn't seem like more CPU is going to help. Possibly this iteration is strictly serial so it can only effectively use a limited number of CPU cores? Any thoughts on how to scale against large transactions? (this was 6 GB and took nearly an hour to starting replicating)
Not sure we can do much about that. But could you please try to experiment with to see if there is any impact on the runtime
Oh right, I now recall that Postgres cpu was very high and most time was taken by walsender. So possibly Postgres was the bottleneck trying to deliver the wal to Debezium and providing Postgres more resources would have sped up delivery and lowered the time to get past this iteration phase. Does this sound reasonable to you?
Yes, definitely! Albeit I can image we can be I/O contained after that too. But please keep with the experiments as that would be great to document as hint in FAQ

Great. Thank you for your input.

In terms of I'm not sure what the expected impact might be. Should I increase it or decrease it and what for?

Yes, decrease it but please don't make both changes in parallel so we can evaluate the impact
Ok, thank you, and noted.
Alexander Ryzhenko
Mar 30 2021 07:58 UTC

Hi there. I saw the same issue several times here, but still no solution.
My debezium mysql connector v1.0.0.Final "loses" binlog position after every restart.
I always receive an exception on each restart:

The connector is trying to read binlog starting at GTIDs 62b708e1-4916-11ea-af5e-42010a4ca04c:5196111034-5227084359 and binlog file 'mysql-bin.088385', pos=46414474, skipping 2 events plus 1 rows, but this is no longer available on the server. Reconfigure the connector to use a snapshot when needed.
... Stack trace here

But We always fix it by editing offsets topic. We take the last record in this topic and republish it CHANGING gtids to null. And then it works and no gaps in data (we never seen them)

i.e. record from offsets topic: {"ts_sec":1617071154,"file":"mysql-bin.088421","pos":59304882,"gtids":"62b708e1-4916-11ea-af5e-42010a4ca04c:5228299190-5228458870","row":9,"server_id":953753795,"event":98685}
After edit (restarts fine): {"ts_sec":1617071154,"file":"mysql-bin.088421","pos":59304882,"gtids":null,"row":9,"server_id":953753795,"event":98685}
Sometimes its not enough, than we additionally set event to 0.

7 replies
Could you please take a look at the earlier log messages? There should be details about GTID set calculations
[2021-03-29 21:29:08,404] INFO [Consumer clientId=mysql_connector_google_stage-dbhistory, groupId=mysql_connector_google_stage-dbhistory] Member mysql_connector_google_stage-dbhistory-ee0bbb50-fad0-49e0-9552-38924857d23a sending LeaveGroup request to coordinator (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:879) [2021-03-29 21:29:08,624] INFO MySQL current GTID set 3b16c742-1183-11e8-8cb4-3497f65a102f:1-9556115497,443731d5-cf5e-11e7-9479-2c4d54466ca9:1-282778390,51aa0127-3381-11ea-8a7d-e4434b9771b8:1-1520234595,62b708e1-4916-11ea-af5e-42010a4ca04c:1-5227227153,7ffd66b7-3701-11ea-a965-e4434b96a6c8:1-49776,97ab08ad-2487-11ea-971f-42010a9c008f:1-5271:5273-196197,a206b385-291a-11ea-9eb7-42010a9c0060:1-179:181-391420 does contain the GTID set required by the connector 62b708e1-4916-11ea-af5e-42010a4ca04c:5196111034-5227084359 (io.debezium.connector.mysql.MySqlConnectorTask:512) [2021-03-29 21:29:08,629] INFO GTIDs known by the server but not processed yet 3b16c742-1183-11e8-8cb4-3497f65a102f:1-9556115497,443731d5-cf5e-11e7-9479-2c4d54466ca9:1-282778390,51aa0127-3381-11ea-8a7d-e4434b9771b8:1-1520234595,62b708e1-4916-11ea-af5e-42010a4ca04c:1-5196111033:5227084360-5227227153,7ffd66b7-3701-11ea-a965-e4434b96a6c8:1-49776,97ab08ad-2487-11ea-971f-42010a9c008f:1-5271:5273-196197,a206b385-291a-11ea-9eb7-42010a9c0060:1-179:181-391420, for replication are available only 62b708e1-4916-11ea-af5e-42010a4ca04c:5139615322-5196111033:5227084360-5227227153 (io.debezium.connector.mysql.MySqlConnectorTask:517) [2021-03-29 21:29:08,630] INFO Some of the GTIDs needed to replicate have been already purged (io.debezium.connector.mysql.MySqlConnectorTask:519) [2021-03-29 21:29:08,630] INFO Stopping MySQL connector task (io.debezium.connector.mysql.MySqlConnectorTask:446) [2021-03-29 21:29:08,630] INFO WorkerSourceTask{id=mysql_connector_google_stage-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:398) [2021-03-29 21:29:08,630] INFO WorkerSourceTask{id=mysql_connector_google_stage-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:415) [2021-03-29 21:29:08,630] ERROR WorkerSourceTask{id=mysql_connector_google_stage-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:179) org.apache.kafka.connect.errors.ConnectException: The connector is trying to read binlog starting at GTIDs 62b708e1-4916-11ea-af5e-42010a4ca04c:5196111034-5227084359 and binlog file 'mysql-bin.088385', pos=46414474, skipping 2 events plus 1 rows, but this is no longer available on the server. Reconfigure the connector to use a snapshot when needed. at io.debezium.connector.mysql.MySqlConnectorTask.start( at io.debezium.connector.common.BaseSourceTask.start( at org.apache.kafka.connect.runtime.WorkerSourceTask.execute( at org.apache.kafka.connect.runtime.WorkerTask.doRun( at at java.base/java.util.concurrent.Executors$ at java.base/ at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker( at java.base/java.util.concurrent.ThreadPoolExecutor$ at java.base/ [2021-03-29 21:29:08,631] ERROR WorkerSourceTask{id=mysql_connector_google_stage-0} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:180) [2021-03-29 21:29:08,631] INFO Stopping MySQL connector task (io.debezium.connector.mysql.MySqlConnectorTask:446) [2021-03-29 21:29:08,631] INFO [Producer clientId=connector-producer-mysql_connector_google_stage-0] Closing the Kafka producer with timeoutMillis = 30000 ms. (org.apache.kafka.clients.producer.KafkaProducer:1153)
its a complete log during failed startup
Is it possible that part of the 62b708e1-4916-11ea-af5e-42010a4ca04c GTIDs has been purged befor the connector start? It seems to me there is a gap there in the GTID range

it says:
MySQL has 1-5227227153 offsets for this GTID
Last commited offsets range I have for this GTID: 5196111034-5227084359.

And it totally fits on available offsets.
So why it can not read binlog from 5227084359 (last commited range end)

thanks for reference.
Avi Mualem
Mar 30 2021 12:31 UTC
any ETA for mysql connector support for initial_only mode where connector only do initial snapshot and wont stream bin log afterwards?
29 replies
Mar 30 2021 12:51 UTC
@Naros One thing that I am observing in the tests with version 11 XE, following the debezium documentation, there are some errors occurring at the moment I am capturing the events for the topic kafka, I am using the VM as the link: / debezium / oracle-vagrant-box and adding the properties as per the documentation. Very strange
3 replies
I am not using the property, this problem has already been overcome
I haven't looked at Jira to see if there is a feature request for such a mode, but if one doesn't exist could you create it? It's not a complex change and could easily be contributed by the community if you're able.
Hi @eltonmesquita87, what errors? Oracle 11 support is a best effort use case for us as we currently have no way to test the connector on that version, so its mostly driven entirely by the community's help.
Speaking of Vagrant box, I would highly suggest you set the CPUs used by the VM to at least a minimum of 2, preferably 3 or 4 if possible. The Vagrantfile doesn't do this because we didn't need it for XStream but it definitely helps performance with LogMiner.
will check, if not ill seriously consider contributing it.
just to be sure - currently there isnt a workaround in order to achieve this behaviour in the mySql connector right ?
The only time the connector snapshots the data right now is if there are no offsets or if we detect that the binlog position is no longer available.

"initial - the connector runs a snapshot only when no offsets have been recorded for the logical server name."

can you elaborate a bit on - no offsets have been recorded for the logical server name.

So every connector has this initial default mode which is where the connector upon startup checks to see if we've recorded any offsets in Kafka. If no offsets exist, we then proceed with Snapshot -> Streaming. If offsets are found, then we simply skip Snapshot and go right into Streaming.
yeah got it, so no offset will lead to snapshot and then currently we dont have the option so stop it and no to continue with streaming.
i guess as a hacky workaround i can let my consumer check the payload and check if the message origin was snapshot.
Right so if you look at SQL Server's implementation, there is an extra entry in the SnapshotMode enum called initial_only. You could follow that as a guide for what to do for MySQL.
can you send me a link for contributing instructions you have for the project?
Let us know if you have any questions.
The last snapshot message is marked with the keyword "last" am i right ?
Yes, it should be.
It also has possible values of true/false.
the snapshot value is "true" / "false" / "last" from what i saw. (using version 1.3.1)
assuming ill contribute the change, i assume it will be included only in the upcoming debezium version right ?
Since it's technically a new feature, it'll be merged into 1.6 alpha1 in a couple weeks.
makes sense.
1.5.0.Final goes out in 2 days, it's really frozen except for major bugs.
in 1.5.0 final parallel snapshot for mysql will be still experimental ?
Yes, and it's only available by using internal.implementation=legacy as a connector option and it's still a beta feature. We're hoping to have more time to dedicate to better snapshot options across all connectors in a release later this year.
I am using logMiner because version 11 does not support xstream. I have some errors at the moment that I am capturing the events and sending to the topic kafka. But looking closely, I believe that it is in the configuration that you are sending to kafka connect. Anyway, I will analyze it better.
Christian Dorner
Mar 30 2021 15:25 UTC
Hey guys did any of you experience a delay on debezium connecting to the database, after start debezium could take up to 10 minutes before I can see the slot and debezium assign a pid and start consuming from WAL (I`m using postgres)
Chris Cranford
Mar 30 2021 15:27 UTC
@christiandorner_twitter I wonder if it could be an issue with priming the TypeRegistry, particularly if you have lots of custom data types.
If you enable trace logging, where does it seem to slow down?
15 replies
Arun Prasadh
Mar 30 2021 15:28 UTC


I set up a MySQL Source connector for couple of our tables, and it seems as though the topics aren't created. I do see the history topic created but no table topics.

On checking WorkerSourceTask logging, I got the below message.

WorkerSourceTask{id=<connector_name>-0} flushing 0 outstanding messages for offset commit

Here are my config for this connector.

18 replies
What version of MySQL connector?
We are using Debezium 0.9 Final
Nothing stands out to me at first glance to the configuration. You have several SMTs at play here so have you tried with the tutorial or with a less complex configuration first? Version 0.9 is quite old at this point fwiw, so upgrading may help if you're encountering some type of bug.
Christian Dorner
Mar 30 2021 15:37 UTC
@Naros When I enabled debug mode I could only see debezium reading from the kafka offset topics like crazy, and after some time the logs show the producer been created

Unfortunately, this is in production so this connector follows others in terms of configuration. But I'll see if I can just use a simple config for the time being.

I should also mention that we are missing few tables in other connectors so it's either the transformations or older version, as you suggest.

I'll report back on what I find.

    "name": "shipment-order-connector",
    "config": {
        "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
        "tasks.max": "1",
        "database.hostname": "{host}",
        "database.port": "5432",
        "database.user": "{user}",
        "database.password": "{password}",
        "database.dbname": "esprinter",
        "": "shipment_order",
        "database.sslmode": "require",
        "database.tcpKeepAlive": "true",
        "": "shipment_order",
        "": "wal2json_streaming",
        "snapshot.mode": "never",
        "schema.include.list": "{schema}",
        "table.include.list": "{tables}",
        "transforms": "replaceField,routeRecords,SetSchemaMetadata",
        "transforms.replaceField.type": "org.apache.kafka.connect.transforms.ReplaceField$Key",
        "transforms.replaceField.blacklist": "id",
        "transforms.SetSchemaMetadata.type": "org.apache.kafka.connect.transforms.SetSchemaMetadata$Key",
        "": "shipment_order.esprinter_data.shipment_order.Key",
        "transforms.SetSchemaMetadata.schema.version": "1",
        "transforms.routeRecords.type": "org.apache.kafka.connect.transforms.RegexRouter",
        "transforms.routeRecords.regex": "(.*)",
        "transforms.routeRecords.replacement": "debezium.esprinter_data.shipment_order_all"
Chris Cranford
Mar 30 2021 16:20 UTC
@christiandorner_twitter Can you share the logs with debug/trace so we can take a look?
1 reply
well i have to share something :) i was about to contribute the feature and then i saw its already implemented (unless im missing something) checked also in my integration env "snapshot.mode":"initial_only" works in 1.3.1 for mySql connectors.
i can even see on the log:
"2021-03-30 17:04:41,337 WARN MySQL|db|task This connector will only perform a snapshot, and will stop after that completes. [io.debezium.connector.mysql.MySqlConnectorTask]
so it looks the docs are lacking.
Cem Nura
Mar 30 2021 18:17 UTC

Hello debezium devs

We were considering to build a quarkus microservice with the outbox pattern and found out that there is a designated quarkus extension to support outbox pattern.

Unfortunately, the extension documentation states that this extension is in incubating state. Is there any near future plans for this extension release?

4 replies
hi @cemnura, the main reason the extension is marked incubating is primarily because the SMT that's its based upon is also still incubating. This doesn't mean that the code isn't mature, it's more just that we may alter the API behavior at any point and we don't guarantee backward compatibility when doing so.
While we do try to be mindful of being backward compatibile, it's just our way to have some flexibility moving forward both with being compatible and adding new features based on user feedback.
@avimualem I'm glad you spotted that. Could you open a jira so we can adjust the documentation accordingly?
Ruslan Danilin
Mar 30 2021 20:08 UTC
Hello Guys!
How could I initiate a snapshot operation for the same database again. I've removed and posted the connector configuration again. Also tried to change "" option. But it looks like it just continues to read changes and does not start snapshot operation from scratch.
3 replies
@Naros thanks for your response. I will notify if we will use this extension and give feedback if we find anything :)
You can check the connector's configuration options you're using to see if snapshot.mode has something like always to always generate a snapshot. If it doesn't have such a setting, then you'd need to either change the connector's name or remove the offsets from Kafka so that the snapshot happens again.
Thanks, looking forward to it @cemnura.

@Naros -
Hi Chris, so I created a new connector using the very basic config from the tutorial on DZM documentation, and I'm getting the below error from the task.

Caused by: org.apache.kafka.connect.errors.DataException: Failed to serialize Avro data from topic

Schema for this table has been unchanged since inception (this is a lkp table with just 2 columns - I wanted to test this with the lkp table first).

Any pointers as to what could cause this?

This is the stripped down config for this test.
Can you share the full stack trace @bimmerN62_twitter?
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(
at org.apache.kafka.connect.runtime.WorkerSourceTask.convertTransformedRecord(
at org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(
at org.apache.kafka.connect.runtime.WorkerTask.doRun(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
Caused by: org.apache.kafka.connect.errors.DataException: Failed to serialize Avro data from topic prod_analytics :
at io.confluent.connect.avro.AvroConverter.fromConnectData(
at org.apache.kafka.connect.runtime.WorkerSourceTask.lambda$convertTransformedRecord$1(
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(
... 11 more
Caused by: org.apache.kafka.common.errors.SerializationException: Error registering Avro schema: {"type":"record","name":"SchemaChangeKey","namespace":"io.debezium.connector.mysql","fields":[{"name":"databaseName","type":"string"}],"":"io.debezium.connector.mysql.SchemaChangeKey"}
Caused by: Register operation timed out; error code: 50002; error code: 50002
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.registerAndGetId(
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(
at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(
at io.confluent.connect.avro.AvroConverter$Serializer.serialize(
at io.confluent.connect.avro.AvroConverter.fromConnectData(
at org.apache.kafka.connect.runtime.WorkerSourceTask.lambda$convertTransformedRecord$1(
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(
at org.apache.kafka.connect.runtime.WorkerSourceTask.convertTransformedRecord(
at org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(
at org.apache.kafka.connect.runtime.WorkerTask.doR
So you're reusing the same history topic between 2 connectors if I understand?
It's fine to use the same database with a different, you ideally you should make sure that all other settings are unique and aren't sharing existing topics.

I think it's the new history_topic, I might have masked it here. I changed and history.topic as well. But let me try new history topic, just to confirm.

it's ok to update the existing connector's history topic, right?

Sure, i`ll try generate one
Mar 30 2021 21:10 UTC

Lazy internet question: I'm setting up a Dockerfile for Debezium Connect using Oracle (logminer) and Azure EventHubs (just to give the overall picture).

I know I need to put the ojdbc.jar and instantclient in the image, but since I'm using LogMiner, do I need to copy the xstreams.jar over to libs? I expect not but wanted to double check.

18 replies
My goal is to have this setup in our github so people can just clone/build/run wiht minimal setup, so I expect I'll have several other questions as I go.
If you're using the latest versions of Debezium, then no xstream.jar is not required to run the Oracle connector when using the LogMiner adapter.
Assuming we can come to a consensus on, then I would expect the debezium/connect image to eventually have Oracle baked in like other connectors with no need to add any jars.
Oh, nice.
Is there a planned release date for 1.5.0? I'm using the latest of that for now.
We're planning to do 1.5.0.Final on Thursday this week.

Chris, so ended up creating a brand new connector and new history topic, etc. but still receiving the same error as above.

I checked in the schema registry (using curl -X GET http://localhost:8081/subjects ) as well and confirmed the table I'm trying to add, has no subject in there.

Ruslan Danilin
Mar 30 2021 22:34 UTC
What is the correct way to change "characterEncoding" and other properties in JDBC connection string?
I tried to set it via database.characterEncoding=latin as described in
but I still see UTF-8 value in logs: Starting snapshot for jdbc:mysql://my_host:3306/?useInformationSchema=true&nullCatalogMeansCurrent=false&useSSL=false&useUnicode=true&characterEncoding=UTF-8&characterSetResults=UTF-8&
Thank you
11 replies