by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Rap70r
    @Rap70r
    Hello
    do you have any plans re-writing streaming class using DStream instead of structured streaming?
    Ron
    @RonBarabash
    Not at the moment - DStrean is the older way of doing stream processing as it uses underlying rdds, i suggest you use dataframes as they are more optimized and performant
    Rap70r
    @Rap70r
    Hello Ron, thank you. the reason I was asking is because I wasn't able to make dynamic allocation work but I finally figured it out.
    One question. Can we set spark.dynamicAllocation.cachedExecutorIdleTimeout on the spark dynamic allocation or does the app uses memory to cache data?
    Ron
    @RonBarabash
    all spark configurations are compatible with metorikku
    like u set dynamic allocation u can set cachedExecutorIdleTimeout aswell
    Rap70r
    @Rap70r
    Thanks, Ron. I was just wondering if the app cachea data in memory or only in s3. Because if I don't set cachedExecutorIdleTimeout, the executor doesn't get removed. Does the app caches any data?
    Ron
    @RonBarabash
    not that i know of
    Rap70r
    @Rap70r
    Hello, I'm getting an error using the latest release. ClassNotFoundException: org.apache.hudi.common.model.OverwriteWithLatestAvroPayload
    Can you please help?
    Liran Yogev
    @lyogev
    Hi @Rap70r , you are running with Hudi?
    Rap70r
    @Rap70r
    Hi, yes. I have hudi as output
    Rap70r
    @Rap70r
    all good. I got it to work
    Rap70r
    @Rap70r
    Hi, is it possible to query source parquet file sitting in S3?
    unakaha
    @unakaha
    Hi, am new to this metorikku. can we write ORC file format data.?
    Ron
    @RonBarabash
    @unakaha u can use ORC like u would with parquet, the FileOutputWriter class supports all native formats Spark is using
    Rap70r
    @Rap70r
    Hi, how can I enable schema evolution when output is hudi? I have set the schema registry compatibility mode to FULL. But when adding a new column to topic, is not being picked up by hudi.
    Ron
    @RonBarabash
    @Rap70r hudi schema compatibility and schema registry compatibility are not the same. check out the manual
    under What's Hudi's schema evolution story
    Hudi does not enable fields deletes, though Avro and schema registry are ok with it
    Rap70r
    @Rap70r
    @RonBarabash Thanks for getting back to me. I came across that article. It does say it should be fine as long as it's backward compatible. Although, hudi is not adding new columns to parquet. I just want to point out that I don't have any Hive running.
    Rap70r
    @Rap70r
    @RonBarabash I got it to work. I had to enable mergeSchema in spark. Is that a bad idea? They say is an expensive operation and they have it disabled by default.
    Rap70r
    @Rap70r
    Hello, I have a very simple config file where the output is parquet and dir is s3. but it's saving to hdfs. do you know why?
    Rap70r
    @Rap70r
    all good. it was a config issue.
    Rap70r
    @Rap70r
    Hello, can metorikku work with debezium-embedded?
    Liran Yogev
    @lyogev
    @Rap70r I'm not sure what debezium-embedded does, can you explain how it can interact with metorikku?
    Rap70r
    @Rap70r
    Hi @lyogev, debezium-embedded is a standalone database Change Data Capture (CDC) connector that sends data directly to application level without the use of Kafka Connect. It can be used to build a simple Java app that utilizes debezium-api to extract data from CDC and send it directly to the app level: https://debezium.io/documentation/reference/1.2/development/engine.html
    Can metorikku work with custom Input source? Thank you.
    forestlzj
    @forestlzj
    Hi, can someone point me the correct approach to compile metorikku? I am struggling in YotpoLtd/metorikku#367 for quite a while..
    Liran Yogev
    @lyogev
    @Rap70r I'm not sure I understand what would be the benefit of using it with metorikku, it's basically running debezium at the application level, so no proper way to scale or use any of Spark's features with this
    @forestlzj we are basically using sbt assembly to build, what is the current status?
    forestlzj
    @forestlzj
    @lyogev it was fixed after using sbt asembly and the correct sbt version.
    forestlzj
    @forestlzj
    Helloļ¼Œcan metorikku (spark) job run on mesos cluster instead of yarn?
    Liran Yogev
    @lyogev
    I guess it can