Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jan 23 15:58
    pomadchin labeled #3177
  • Jan 23 15:58
    pomadchin labeled #3177
  • Jan 23 14:41
    vpipkt synchronize #3177
  • Jan 23 14:41
    vpipkt edited #3177
  • Jan 23 14:39
    vpipkt opened #3177
  • Jan 22 23:31
    pomadchin commented #1951
  • Jan 22 23:27
    mjj203 commented #1951
  • Jan 22 19:48
    pomadchin unlabeled #3164
  • Jan 22 19:48
    pomadchin edited #3164
  • Jan 22 19:33
    pomadchin closed #3174
  • Jan 22 19:33
    pomadchin commented #3174
  • Jan 22 19:33
    pomadchin labeled #3176
  • Jan 14 10:12
    soxofaan edited #3175
  • Jan 11 17:37
    mjigmond opened #3176
  • Jan 10 15:44
    echeipesh assigned #3175
  • Jan 10 11:35
    soxofaan synchronize #3175
  • Jan 10 11:31
    soxofaan edited #3175
  • Jan 10 11:30
    soxofaan edited #3175
  • Jan 10 11:30
    soxofaan synchronize #3175
  • Jan 10 11:15
    soxofaan synchronize #3175
Grigory
@pomadchin
ha, it looks like it fails in a UNION?
Frank Dekervel
@kervel
problem is that i don't know if the out of memory was from there... an out of memory is not propagated to the sparkUI
Grigory
@pomadchin
How I am debugging such problems - trying to locate a part of the code that is inefficient;
like ~ remove and .updatecalls; and replace it with layers_reduced.count()
^ after doing this, you’ll 1. speedup the ob probably 2. remove the unused (for now) code 3. there would be less code in the jar that could cause that
Frank Dekervel
@kervel
the out of memory workers get restarted and spark retries (and i can't go back to the failed tries). the error that is then fatal is a hdfs file not found for a part/0/data file for a layer. i guess that's only a symptom
Grigory
@pomadchin
+
probably smth happens during the union of the reduced by keys RDDs or after union makeing a reduce by key
or smth like that
Frank Dekervel
@kervel
i find the spark ui to be very confusing here .. you see in the screenshot ("retry 1") but you can't go back to retry 0
i would think that either the collection of layers doesn't fit in the memory or that my partitions are very very skewed
would it make sense to reduce not with .union() but with .union().repartition(20) or smth ?
argh markdown
every "mini-layer" contains a small geographical area (one railway line).
Grigory
@pomadchin
^ before doing any steps I recommend you to find the evil line; ~ by commenting out the code and forcing compute of RDDs via .count() call
after we’ll find the line that causes issues we can look into the spark ui ~ how many partitions there were; why did it happen; etc etc
Frank Dekervel
@kervel
ok
sjiyoo
@sjiyoo
hi, I have two RDDs (one is within the extent of the other). I was wondering what the best way to add together the overlapping portions of the RDDs is?
Grigory
@pomadchin
hey @sjiyoo what does largerRDD.localAdd(smallerRDD) produces?
sjiyoo
@sjiyoo
@pomadchin The area the RDDs covers includes a river with a distinctive shape, when doing the localAdd the river appears twice in the resulting RDD, but when viewed individually both of the RDDs have the river in the proper location. so it seems like one of the RDDs is shifted when doing the localadd
Grigory
@pomadchin
@sjiyoo can you describe what is going on more detailed?
like what is the left rdd; what is the right rdd; what happens and what is the expected output?
also some code that you use would help to answer as well
but hm I see; it looks like localAdd does not care about keys?

you can do smth like

rdd1.leftOuterJoin(rdd2).mapValues { 
  case (tl, Some(tr)) => tl + tr 
  case (tl, _) => tl
}

or you can try an approach with union + groupByKey;

I am not sure what variant will work faster actually

Frank Dekervel
@kervel
@pomadchin wrt your suggestion and my out of memory, i think i pinpointed the "offending" code. tis actually the "HadoopLayerWriter.update" that does a groupby and causes a very skewed partitioning (based on ranges)
if you look at all the tasks in this stage, every task ends very quickly but one keeps running and eventually out of memories: https://pasteboard.co/IQZ6SMN.png
and the stage in which this task is has a rather simple DAG: https://pasteboard.co/IQZ7l9G.png
Frank Dekervel
@kervel
going to debugprint the number of objects per partition
Frank Dekervel
@kervel
this shows the partitioning of the RDD that failed to .update() https://pasteboard.co/IQZwk5M.png and this one https://pasteboard.co/IQZwDyo.png succeeded to .update() (the failing one was twice as big)
Frank Dekervel
@kervel
i am running at 4-cores per executor with 8GB of ram per executor, which means that i don't have a lot of ram. that's because the process before this (rasterizing) is cpu bound and i need to reduce ram in order to be able to take advantage of all cpu cores in the cluster
but still, i would expect that since materializing the RDD works (eg printing the partition distribution) it would also work for saving
Grigory
@pomadchin
@kervel this story about the groupBy troubles makes a lot of sense to me ):
Frank Dekervel
@kervel
a lot ? or no ?
Grigory
@pomadchin
a lot*
Frank Dekervel
@kervel
btw, there are some remarks on groupBy here https://spark.apache.org/docs/latest/tuning.html but i don't really get them
Grigory
@pomadchin
how many partitions do you have? // sry these screenshots for some reason do not load
the story behind hadoop backend is that Sparky parallelism relies on the files hdfs partitioing; mb that is why you have large partitions?
Frank Dekervel
@kervel
450 partitions with only 6 tiles each on average and no skew
hmm the image paste service is down indeed
Grigory
@pomadchin
lets wait for the pictures of the spark UI
btw you can paste them directly here (drag and drop works with gitter)
Frank Dekervel
@kervel
image_1.png
image_2.png
Jean-Denis Giguère
@jdenisgiguere
Hi! When I store data using a LayerWriter of given Geotrellis version, is it expected that I can read the data with another Geotrellis version?
Eugene Cheipesh
@echeipesh
@jdenisgiguere Yes, the layers have been backwards and forwards compatible since version 0.9.something-- but certainly since 1.0
Grigory
@pomadchin
@kervel what are the shuffle sizes (read and write) on the group by stage?
Frank Dekervel
@kervel
will have to lookup tomorrow. you mean for all stages here ? (not only the failing one ?)
Grigory
@pomadchin
yep
I think it is a separate Spark UI tab called stages; so i’d like to see the entire page, if possible