These are chat archives for arnaud-lb/php-rdkafka

27th
Oct 2018
abduraappf
@abduraoof_gitlab
Oct 27 2018 05:03
Is there an option to delete older messages than one hour in a topic. Need only last one hour messages in a topic
Is there any alternative method for this
chabior
@chabior
Oct 27 2018 09:07
@abduraoof_gitlab You can set up retention in kafka by log.retention.hours cofiguration option, https://kafka.apache.org/documentation/#configuration
abduraappf
@abduraoof_gitlab
Oct 27 2018 10:22

@chabior It will remove all the message in a topic right?
eg : First message time in a topic : 1pm
Last message time in a tpic : 4pm

     when i consume the topic  at 4pm,  i need data only form 3pm to 4pm

1.Need to delete the data from 1pm to 3pm

  1. Or consume only data from 3pm to 4pm

Is it possible?

chabior
@chabior
Oct 27 2018 11:34
@abduraoof_gitlab so if you set up retention for 1 hour, when you start consuming at 4pm you will get only messages starting from 3pm. Everything from before will be deleted, no matter if they were consumed or not.
karavzeka
@karavzeka
Oct 27 2018 21:15

@chabior honestly it's not that. Kafka doest't delete messages, it deletes log file, which contains messages. Your should consider a pair of options log.retention.hours and log.retention.bytes together.

Lets see the example. You set log.retention.hours=1 (1 hour) and log.retention.bytes=100000000 (100Mb). Lets consider producer puts new message every 10 minutes and 1 message weighs 1Mb. So in a hour there will be 60 messages in the log. It seems 1st message should be deleted, in that case Kafka should delete log file, but it can't do it because rest messages in the log exist less than 1 hour, so 1st message will live further. Lets see 2 cases:
1) Producer stops put new messages to Kafka. In that case Kafka will wait 1 hour. When the most fresh message became older than 1 hour, Kafka delete log file. What we have... 1st message has lived 2 hours.
2) Producer continues put messages every 10 minutes. After 100st message we reach log.retention.bytes and next messages Kafka starts write to new log file. Kafka can't delete last log file immediately, it guarantees that every message will live at least 1 hour, so previous log will live 1 hour more. What we have... 100 messages * 10 min = 1 hour 40 min. + 1 hour (ttl of last message). As result 1st message will live 2h 40m.

karavzeka
@karavzeka
Oct 27 2018 21:42
@abduraoof_gitlab Kafka is not the best tool to work with time periods. If I were you I would choose another tool. For example Redis.
Or as the option, you can store offset in Redis for an hour. Then you get the smallest offset which still alive in Redis and use it to consume proper messages from Kafka.