Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Hubert Fan Chiang
    @hubertfc
    而且其實我們不一定要講Spark,其實big data相關的都很歡迎ㄋㄟ~~
    sayuan
    @sayuan
    謝謝邀請,但最近有點忙,我先繼續在這看看有什麼大家感興趣的題目是我能夠分享的
    JamJam
    @jaminglam
    @cleaton Hi, I only use logInfo in foreachPartition and it's logInfo("test") caused the exception
    Vito Jeng
    @vitojeng
    @sayuan & @cleaton 二位都很歡迎... XD
    Jesper Lundgren
    @cleaton
    @jaminglam I assume loginfo is a member function on some class/object.
    if it's a member function on a class the whole class instance will be serialized and sent to each node
    if it's an object (scala object) the object will be instantiated already on each node (as a singleton) and the each node can use the local object member function without serializing the whole object.
    This is one of the traps with the simplicity of the spark programming model. it's simple until it's not.
    You have to consider in which scope the function will run (on driver or on executor) and consider what objects have been initialized where.
    Jesper Lundgren
    @cleaton
    The recommendation is to try and structure your program using objects as much as possible (objects and lambda functions)
    Not OO with classes
    The OO approach can easily pull in a lot of dependencies into each serialized task.
    I can't really tell if that is your issue in this case though. The example is too small to tell from.
    Max Huang
    @sakanamax
    佔好位子了
    今天是三樓345喔
    Max Huang
    @sakanamax
    謝謝講師昨天的分享還有 mesos 的 plus 分享
    再麻煩取得 slide 讓我 commit
    :)
    JamJam
    @jaminglam
    @cleaton Thanks for your advice, I agree with your point that avoid using too many classes and using object instead. logInfo() is a member function of trait org.apache.spark.Logging. It is weird that I read source code of apache spark 1.6, I found some codes such as JdbcUtils (org.apache.spark.sql.execution.datasources.jdbc) also use member functions of Logging in foreachPartition and based on my previous experience, in other function(map, reduceByKey), logInfo works well, exception only occurs in foreachPartition. Besides, how to log for debug while developing spark application, do you have any advice?
    Jesper Lundgren
    @cleaton
    I mean is it a class or an object that you extend with the spark.Logging trait?
    JdbcUtils is an object.
    If you only change foreachPartition to foreach it works?
    JamJam
    @jaminglam
    yes, does you mean that If I have a class which include codes doing foreachPartition. Spark will serialize the whole class code?
    Jesper Lundgren
    @cleaton
    Yes
    the function will be a member of the class instance, and thus it will only exist on the driver at first
    when you try to call that function inside foreach/foreachpartition the driver will need to send the whole class instance to the executor machine
    if it's an object it is initalized separately
    for each jvm
    and thus spark only needs to give the method reference
    (and any parameter you give on the driver side)
    You should be able to see more information in the log from the spark driver
    this error happens before it reaches any of the executor nodes.
    JamJam
    @jaminglam
    I think I got it. Thanks a lot. @cleaton
    Jesper Lundgren
    @cleaton
    Hope it helps. Good luck :)
    Max Huang
    @sakanamax
    早安
    JamJam
    @jaminglam
    請問有沒有人遇到過spark streaming每處理一段時間后可能會有1,2個batch處理時間特別長遠超batch interval,但event數量其實沒有突然爆發式增長~這種情況,那個batch處理完后接下來處理時間又恢復穩定
    Stana
    @mathsigit
    請問這個spark streaming有做什麼特別的事情嗎?例如讀Hdfs檔案之類的
    Max Huang
    @sakanamax
    @mathsigit 麻煩協助取得簡報喔 :)
    Stana
    @mathsigit
    @sakanamax 抱歉現在才回覆,我會通知講師,取得簡報後馬上上傳
    Stana
    @mathsigit
    @sakanamax meetup讀書會的資料已經上傳囉!
    Max Huang
    @sakanamax
    感謝, 同步上傳到 nctu330
    iGene
    @iGene
    @hubertfc 11月的時間有決定了嗎?
    現在330已經可以用了
    Hubert Fan Chiang
    @hubertfc
    @iGene 那11/23晚上教室有空嗎? 麻煩你了~
    iGene
    @iGene
    @hubertfc 我問問看~
    禮拜五回報
    Hubert Fan Chiang
    @hubertfc
    謝謝!! @iGene
    Max Huang
    @sakanamax
    QQ 11/23跟活動衝堂
    我跟我們家新人說好了
    看看他有沒有時間
    Vito Jeng
    @vitojeng
    @sakanamax 是什麼活動呢 ?