Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    iGene
    @iGene
    @hubertfc 下禮拜二開門的人電話號碼:0929851012
    Max Huang
    @sakanamax
    感謝 @iGene
    Wei Hao Lin
    @LinNeil7758_twitter
    @vitojeng ,最後找到問題了,我把搜尋條件先轉成大寫後就正常的,筆數一樣。應該是sql server的存檔名稱問題。 :)
    Vito Jeng
    @vitojeng
    解決就好 :smile:
    JamJam
    @jaminglam
    哈嘍,大家好,想請教一個spark streaming的問題
    inStream.foreachRDD( rdd => {
        rdd.foreachPartition( iterator => {
            logInfo("test")
       })
    })
    在這會出現Task not serialization的問題。但我看foreach裡面并沒有unserialzed的referrence,請問大家有遇到過么?
    JamJam
    @jaminglam
    inStream is a DSream[((String, Int, String), Long)]
    sayuan
    @sayuan
    logInfo 裡怎麼寫的?
    JamJam
    @jaminglam
    logInfo 就是 org.apache.spark.Logging
    我只是直接CALL Logging的Function
    sayuan
    @sayuan
    在 foreachPartition 裡頭 import org.apache.spark.Logging.* 試試?
    JamJam
    @jaminglam
    thanks, I will have a try
    Jesper Lundgren
    @cleaton
    @jaminglam You are using a class or object as the parent?
    Jesper Lundgren
    @cleaton
    If it's a class you might have other class members that is not serializable but is pulled in when you try to use the member function.
    Vito Jeng
    @vitojeng
    Thanks @cleaton & @sayuan
    @jaminglam 或許你可以把較完整的 code & log 放上來, 這樣大家會比較清楚怎麼給建議.
    Max Huang
    @sakanamax
    晚上讀書會喔
    just reminder
    Vito Jeng
    @vitojeng
    感謝提醒
    Vito Jeng
    @vitojeng
    @sakanamax 請教一個 docker 方面的問題,
    當我在 clinet 用 ssh 連入一個 server 時, 有什麼指令可以簡單判斷這是不是一個 docker container ?
    Max Huang
    @sakanamax
    hostname? XD
    echo $HOSTNAME
    如果是雜湊名字?
    XD
    Vito Jeng
    @vitojeng
    別人家的 server 無法管到,只是好奇想到這個問題... XD
    Max Huang
    @sakanamax
    我覺得簡單就是看 hostname
    還有 有沒有 systemd or SystemV 服務
    嘿嘿
    Vito Jeng
    @vitojeng
    你的意思是 docker container 不會有 systemd ?
    也不會有 systemV ?
    Vito Jeng
    @vitojeng
    有趣. 想不到有各種解法都有.
    Thanks @sayuan
    jackyoh
    @jackyoh
    很像真的可以直接在docker container裡面輸入cat /proc/1/cgroup可以看到有docker為命名的路徑
    Max Huang
    @sakanamax
    @sayuan 才是高手, 快找他出來講一場
    :)
    謝謝 @sayuan 的技巧, 真的很實用喔
    sayuan
    @sayuan
    客氣了,我只是覺得問題很有趣所以找了一下
    Hubert Fan Chiang
    @hubertfc
    @sayuan 說真的要不要來分享一下,我們可以提供高鐵接送,USB紀念隨身碟 贈送~ XD
    而且其實我們不一定要講Spark,其實big data相關的都很歡迎ㄋㄟ~~
    sayuan
    @sayuan
    謝謝邀請,但最近有點忙,我先繼續在這看看有什麼大家感興趣的題目是我能夠分享的
    JamJam
    @jaminglam
    @cleaton Hi, I only use logInfo in foreachPartition and it's logInfo("test") caused the exception
    Vito Jeng
    @vitojeng
    @sayuan & @cleaton 二位都很歡迎... XD
    Jesper Lundgren
    @cleaton
    @jaminglam I assume loginfo is a member function on some class/object.
    if it's a member function on a class the whole class instance will be serialized and sent to each node
    if it's an object (scala object) the object will be instantiated already on each node (as a singleton) and the each node can use the local object member function without serializing the whole object.
    This is one of the traps with the simplicity of the spark programming model. it's simple until it's not.
    You have to consider in which scope the function will run (on driver or on executor) and consider what objects have been initialized where.
    Jesper Lundgren
    @cleaton
    The recommendation is to try and structure your program using objects as much as possible (objects and lambda functions)
    Not OO with classes
    The OO approach can easily pull in a lot of dependencies into each serialized task.