@cleaton Hi, I only use logInfo in foreachPartition and it's logInfo("test") caused the exception
Vito Jeng
@vitojeng
@sayuan & @cleaton 二位都很歡迎... XD
Jesper Lundgren
@cleaton
@jaminglam I assume loginfo is a member function on some class/object.
if it's a member function on a class the whole class instance will be serialized and sent to each node
if it's an object (scala object) the object will be instantiated already on each node (as a singleton) and the each node can use the local object member function without serializing the whole object.
This is one of the traps with the simplicity of the spark programming model. it's simple until it's not.
You have to consider in which scope the function will run (on driver or on executor) and consider what objects have been initialized where.
Jesper Lundgren
@cleaton
The recommendation is to try and structure your program using objects as much as possible (objects and lambda functions)
Not OO with classes
The OO approach can easily pull in a lot of dependencies into each serialized task.
I can't really tell if that is your issue in this case though. The example is too small to tell from.