Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Rap70r
    @Rap70r
    Thanks, Ron. I was just wondering if the app cachea data in memory or only in s3. Because if I don't set cachedExecutorIdleTimeout, the executor doesn't get removed. Does the app caches any data?
    Ron
    @RonBarabash
    not that i know of
    Rap70r
    @Rap70r
    Hello, I'm getting an error using the latest release. ClassNotFoundException: org.apache.hudi.common.model.OverwriteWithLatestAvroPayload
    Can you please help?
    Liran Yogev
    @lyogev
    Hi @Rap70r , you are running with Hudi?
    Rap70r
    @Rap70r
    Hi, yes. I have hudi as output
    Rap70r
    @Rap70r
    all good. I got it to work
    Rap70r
    @Rap70r
    Hi, is it possible to query source parquet file sitting in S3?
    unakaha
    @unakaha
    Hi, am new to this metorikku. can we write ORC file format data.?
    Ron
    @RonBarabash
    @unakaha u can use ORC like u would with parquet, the FileOutputWriter class supports all native formats Spark is using
    Rap70r
    @Rap70r
    Hi, how can I enable schema evolution when output is hudi? I have set the schema registry compatibility mode to FULL. But when adding a new column to topic, is not being picked up by hudi.
    Ron
    @RonBarabash
    @Rap70r hudi schema compatibility and schema registry compatibility are not the same. check out the manual
    under What's Hudi's schema evolution story
    Hudi does not enable fields deletes, though Avro and schema registry are ok with it
    Rap70r
    @Rap70r
    @RonBarabash Thanks for getting back to me. I came across that article. It does say it should be fine as long as it's backward compatible. Although, hudi is not adding new columns to parquet. I just want to point out that I don't have any Hive running.
    Rap70r
    @Rap70r
    @RonBarabash I got it to work. I had to enable mergeSchema in spark. Is that a bad idea? They say is an expensive operation and they have it disabled by default.
    Rap70r
    @Rap70r
    Hello, I have a very simple config file where the output is parquet and dir is s3. but it's saving to hdfs. do you know why?
    Rap70r
    @Rap70r
    all good. it was a config issue.
    Rap70r
    @Rap70r
    Hello, can metorikku work with debezium-embedded?
    Liran Yogev
    @lyogev
    @Rap70r I'm not sure what debezium-embedded does, can you explain how it can interact with metorikku?
    Rap70r
    @Rap70r
    Hi @lyogev, debezium-embedded is a standalone database Change Data Capture (CDC) connector that sends data directly to application level without the use of Kafka Connect. It can be used to build a simple Java app that utilizes debezium-api to extract data from CDC and send it directly to the app level: https://debezium.io/documentation/reference/1.2/development/engine.html
    Can metorikku work with custom Input source? Thank you.
    forestlzj
    @forestlzj
    Hi, can someone point me the correct approach to compile metorikku? I am struggling in YotpoLtd/metorikku#367 for quite a while..
    Liran Yogev
    @lyogev
    @Rap70r I'm not sure I understand what would be the benefit of using it with metorikku, it's basically running debezium at the application level, so no proper way to scale or use any of Spark's features with this
    @forestlzj we are basically using sbt assembly to build, what is the current status?
    forestlzj
    @forestlzj
    @lyogev it was fixed after using sbt asembly and the correct sbt version.
    forestlzj
    @forestlzj
    Hello,can metorikku (spark) job run on mesos cluster instead of yarn?
    Liran Yogev
    @lyogev
    I guess it can
    Rap70r
    @Rap70r
    Hello, if a streaming job is running and someone modifies yaml file, does the streaming app consumes the changes while it's running or does it need to restart? For example, if I'm consuming from a Kafka topic, using streaming job, and I have added an additional column, if I modify yaml spark sql script to consume that column, will the streaming job update its query according to the new yaml changes without a restart? Thank you.
    Ron
    @RonBarabash
    u need to restart
    Rap70r
    @Rap70r
    Hi Ron, thank you for getting back to me. When running the app using spark on an emr cluster, what's the best way to restart? I usually just kill the app manually. Is there a way to gracefully restart a streaming job?
    1 reply
    Rap70r
    @Rap70r
    question. is it possible to have a non-streaming job consuming from kafka that uses maxOffsetsPerTrigger? and would terminate when all data is consumed?
    1 reply
    Rap70r
    @Rap70r
    Hi, we upgraded to Spark 3.0.0 but we got this error: "java.lang.NoSuchMethodError: java.lang.Object org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.fromRow"
    9 replies
    Spz.Takumi
    @SpeedxPz
    I’m new to big data
    I just curious how to manage update/delete the data if i didn’t use hudi?
    or just ingest a fresh source everytime ?
    K.I. (Dennis) Jung
    @djKooks
    Hello~I'm new on this project...
    What will be a good way to start on?
    • I'm trying to build project, but cannot find a way to make jar from source
    • Also trying to run in intellij, but failed as:
      SLF4J: Found binding in [jar:file:/Users/kwangin.jung/Library/Caches/Coursier/v1/https/repo1.maven.org/maven2/org/slf4j/slf4j-jdk14/1.7.30/slf4j-jdk14-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      SLF4J: Found binding in [jar:file:/Users/kwangin.jung/Library/Caches/Coursier/v1/https/repo1.maven.org/maven2/org/slf4j/slf4j-log4j12/1.7.30/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
      SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
      ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2
      Exception in thread "main" java.lang.NoSuchFieldError: JAVA_9
        at org.apache.spark.storage.StorageUtils$.<init>(StorageUtils.scala:207)
        at org.apache.spark.storage.StorageUtils$.<clinit>(StorageUtils.scala)
        at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:93)
        at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:370)
        at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:311)
        at org.apache.spark.SparkEnv$.create(SparkEnv.scala:359)
        at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:189)
        at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:442)
        at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2555)
        at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$1(SparkSession.scala:930)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
        at com.yotpo.metorikku.utils.FileUtils$.getHadoopPath(FileUtils.scala:64)
        at com.yotpo.metorikku.utils.FileUtils$.readFileWithHadoop(FileUtils.scala:73)
        at com.yotpo.metorikku.utils.FileUtils$.readConfigurationFile(FileUtils.scala:57)
    Liran Yogev
    @lyogev
    Which java version are u using? u need jdk 8
    K.I. (Dennis) Jung
    @djKooks
    @lyogev Thanks~
    Could you also let me know how to build jar file?
    1 reply
    Serwan Gupta
    @Serwan91
    Hi Metorikku team
    Serwan Gupta
    @Serwan91
    I'm trying to use the Metorikku jar as in lib dependency, And as specified in ReadME I specified the spark-core 2.12:3.0.1 jar and scala 2.11.8 version in dependencies in gradle build.. but am below 1getting error, while the jar
    error:
    java.lang.nosuchmethoderror scala.product.$init$(lscala/product )v
    May you please suggest what am missing?
    3 replies
    K.I. (Dennis) Jung
    @djKooks
    @lyogev hello~I've used sbt assembly for build jar, but could you let me know how to build 'standalone' jar?
    1 reply
    Rap70r
    @Rap70r
    Hi @lyogev, any plans on upgrading Abris to 4.0.1?
    14 replies
    Pham Nguyen
    @akizminet
    Hello everyone, how can i automatically pass today to date_range?
    2 replies
    Rap70r
    @Rap70r
    Can postQuery property be used to trigger a sql server stored procedure?
    1 reply
    Yurio Windiatmoko
    @Yuriowindiatmoko2401
    Hi @lyogev , would you mind to take a look at https://github.com/YotpoLtd/metorikku/pull/418/files , i've just made changes for a typo at main file
    Hi everyone, can i get the dockerfile for metorikku/metorikku:latest ?? .. tks before