@johnynek We added a guard into Fill
function to fail on (at least some) mutations https://github.com/twitter/scalding/pull/1894/files . Wdyt?
As follow up items we think about next:
PriorityQueue
monoidImmutablePriorityQueue
abstraction which just holds sorted array under the hood. I'm pretty sure it's going to be even more performant in most our use cases , wdyt?
could not build flow from assembly: [[_pipe_1-0*IterableSour...][com.twitter.scalding.typed.cascading_backend.CascadingBackend$.com$twitter$scalding$typed$cascading_backend$CascadingBackend$$planHashJoin(CascadingBackend.scala:662)] found duplicate field names in joined tuple stream: ['key', 'value', 'key1', 'value1']['key1', 'value1']]
cascading.flow.planner.PlannerException: could not build flow from assembly: [[_pipe_1-0*IterableSour...][com.twitter.scalding.typed.cascading_backend.CascadingBackend$.com$twitter$scalding$typed$cascading_backend$CascadingBackend$$planHashJoin(CascadingBackend.scala:662)] found duplicate field names in joined tuple stream: ['key', 'value', 'key1', 'value1']['key1', 'value1']]
at cascading.flow.planner.FlowPlanner.handleExceptionDuringPlanning(FlowPlanner.java:578)
at cascading.flow.local.planner.LocalPlanner.buildFlow(LocalPlanner.java:108)
at cascading.flow.local.planner.LocalPlanner.buildFlow(LocalPlanner.java:40)
at cascading.flow.FlowConnector.connect(FlowConnector.java:459)
at com.twitter.scalding.ExecutionContext$class.buildFlow(ExecutionContext.scala:95)
at com.twitter.scalding.ExecutionContext$$anon$1.buildFlow(ExecutionContext.scala:210)
at com.twitter.scalding.typed.cascading_backend.AsyncFlowDefRunner$$anon$2.go$1(AsyncFlowDefRunner.scala:172)
at com.twitter.scalding.typed.cascading_backend.AsyncFlowDefRunner$$anon$2.run(AsyncFlowDefRunner.scala:201)
at java.lang.Thread.run(Thread.java:745)
Caused by: cascading.pipe.OperatorException: [_pipe_1-0*IterableSour...][com.twitter.scalding.typed.cascading_backend.CascadingBackend$.com$twitter$scalding$typed$cascading_backend$CascadingBackend$$planHashJoin(CascadingBackend.scala:662)] found duplicate field names in joined tuple stream: ['key', 'value', 'key1', 'value1']['key1', 'value1']
at cascading.pipe.Splice.resolveDeclared(Splice.java:1299)
at cascading.pipe.Splice.outgoingScopeFor(Splice.java:992)
at cascading.flow.planner.ElementGraph.resolveFields(ElementGraph.java:628)
at cascading.flow.planner.ElementGraph.resolveFields(ElementGraph.java:610)
at cascading.flow.local.planner.LocalPlanner.buildFlow(LocalPlanner.java:95)
... 7 more
Caused by: cascading.tuple.TupleException: field name already exists: key1
at cascading.tuple.Fields.copyRetain(Fields.java:1397)
at cascading.tuple.Fields.appendInternal(Fields.java:1266)
at cascading.tuple.Fields.append(Fields.java:1215)
at cascading.pipe.Splice.resolveDeclared(Splice.java:1290)
... 11 more
Hello,
I'm a little stuck here not sure if theres an easy answer to this question. But I've been trying to get Hadoop 3's s3a output committers working with Scalding. https://hadoop.apache.org/docs/r3.1.1/hadoop-aws/tools/hadoop-aws/committers.html
I've tried several approaches, such as setting the outputcommittter manually (can't, cause new s3a output committers are based on the newer mapreduce apis)
Also tried to set a custom output format (using mapreduce APIs with the new output committer) in the sinkConfInit using setClass:
conf.setUseNewMapper(true)
conf.setUseNewReducer(true) //using conf.use NewAPIs
conf.setClass("mapreduce.job.outputformat.class", classOf[CustomOutputFormat[_]], classOf[OutputFormat[_, _]])
, but i'm stuck at a point where cascading is looking to use the older mapred api "mapred.output.format.class"
Are there any workarounds aside from creating a custom output committer that uses the older mapred APIs?