各位大佬,萌新请教一个问题。我在三台机器上部署了doris,一台fe、be混部,两台单独部署be。doris版本是0.15.0
在用spark DataFrame向doris导入数据时,一开始可以正常导入,但是后面会报如下的错误
2022-01-26 14:29:41,867 ERROR Executor:94 Executor task launch worker for task 11714 718847 - Exception in task 9.0 in stage 8206.0 (TID 11714)
org.apache.doris.spark.exception.StreamLoadException: stream load error: too many filtered rows
at org.apache.doris.spark.DorisStreamLoad.load(DorisStreamLoad.java:162)
at org.apache.doris.spark.DorisStreamLoad.load(DorisStreamLoad.java:149)
at org.apache.doris.spark.sql.DorisSourceProvider
KaTeX parse error: Can't use function '$' in math mode at position 8: anonfun$̲createRelation$…: anonfun$createRelation$1
anonfun$org$apache$doris$spark$sql$DorisSourceProvideranonfunflush$1$1KaTeX parse error: Can't use function '$' in math mode at position 8: anonfun$̲apply$mcV$sp$1.…: anonfun$apply$mcV$sp$1.apply$mcVI$sp(DorisSourceProvider.scala:96)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
at org.apache.doris.spark.sql.DorisSourceProvider
anonfun$createRelation$1KaTeX parse error: Can't use function '$' in math mode at position 8: anonfun$̲org$apache$dori…: anonfun$org$apache$doris$spark$sql$DorisSourceProvider
anonfunKaTeX parse error: Can't use function '$' in math mode at position 6: flush$̲1$1.apply$mcV$s…: flush$1$1.apply$mcV$sp(DorisSourceProvider.scala:86)
at scala.util.control.Breaks.breakable(Breaks.scala:38)
at org.apache.doris.spark.sql.DorisSourceProvider
anonfun$createRelation$1.org$apache$doris$spark$sql$DorisSourceProvideranonfunflush$1(DorisSourceProvider.scala:84)KaTeX parse error: Can't use function '$' in math mode at position 8: anonfun$̲createRelation$…: anonfun$createRelation$1
anonfun$apply$2.apply(DorisSourceProvider.scala:70)KaTeX parse error: Can't use function '$' in math mode at position 8: anonfun$̲createRelation$…: anonfun$createRelation$1
anonfun$apply$2.apply(DorisSourceProvider.scala:62)KaTeX parse error: Can't use function '$' in math mode at position 8: anonfun$̲createRelation$…: anonfun$createRelation$1.apply(DorisSourceProvider.scala:62)
at org.apache.doris.spark.sql.DorisSourceProvider
anonfun$createRelation$1.apply(DorisSourceProvider.scala:60)KaTeX parse error: Can't use function '$' in math mode at position 8: anonfun$̲foreachPartitio…: anonfun$foreachPartition$1
anonfun$apply$28.apply(RDD.scala:935)KaTeX parse error: Can't use function '$' in math mode at position 8: anonfun$̲foreachPartitio…: anonfun$foreachPartition$1
anonfun$apply$28.apply(RDD.scala:935)KaTeX parse error: Can't use function '$' in math mode at position 8: anonfun$̲runJob$5.apply(…: anonfun$runJob$5.apply(SparkContext.scala:2121)
at org.apache.spark.SparkContext
anonfun$runJob$5.apply(SparkContext.scala:2121)今年上半年doris有哪些规划?
Roadmap 2022: apache/incubator-doris#7502
java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.util.StringUtils
receive a ping request, request detail: TBrokerPingBrokerRequest(version:VERSION_ONE, clientId:172.31.3.146)
[INFO ] 2022-02-08 08:07:10,947 method:org.apache.doris.broker.hdfs.HDFSBrokerServiceImpl.listPath(HDFSBrokerServiceImpl.java:67)
received a list path request, request detail: TBrokerListPathRequest(version:VERSION_ONE, path:hdfs://test.internal:8020/tmp/data.csv, isRecursive:false, properties:{_DORIS_STORAGE_TYPE_=BROKER})
[INFO ] 2022-02-08 08:07:10,948 method:org.apache.doris.broker.hdfs.FileSystemManager.getDistributedFileSystem(FileSystemManager.java:244)
create file system for new path: hdfs://test.internal:8020/tmp/data.csv
Exception in thread "pool-2-thread-13" java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.util.StringUtils
at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1437)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:369)
at org.apache.doris.broker.hdfs.FileSystemManager.getDistributedFileSystem(FileSystemManager.java:360)
at org.apache.doris.broker.hdfs.FileSystemManager.getFileSystem(FileSystemManager.java:152)
at org.apache.doris.broker.hdfs.FileSystemManager.listPath(FileSystemManager.java:427)
at org.apache.doris.broker.hdfs.HDFSBrokerServiceImpl.listPath(HDFSBrokerServiceImpl.java:74)
at org.apache.doris.thrift.TPaloBrokerService$Processor$listPath.getResult(TPaloBrokerService.java:815)
at org.apache.doris.thrift.TPaloBrokerService$Processor$listPath.getResult(TPaloBrokerService.java:795)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:313)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
[INFO ] 2022-02-08 08:07:10,963 method:org.apache.doris.broker.hdfs.HDFSBrokerServiceImpl.listPath(HDFSBrokerServiceImpl.java:67)
received a list path request, request detail: TBrokerListPathRequest(version:VERSION_ONE, path:hdfs://test.internal:8020/tmp/data.csv, isRecursive:false, properties:{_DORIS_STORAGE_TYPE_=BROKER})
用Doris做标签圈人,以标签为主键做聚合操作,导致单个标签上的userId过多,如何将userId行转列导出 有什么好办法吗
下个版本支持Lateral View
有没有大佬使用了doris on es,建外部表的。我目前的需求是画像大宽表想要关联其他小表做人群生成预览,本来想每日同步数据进doris,但是同步时间比较长,卡在早高峰会使预览功能不可用。想做es外部表,宽表数据本身就在es中,只要把小表数据同步到doris,然后再关联es外部表count。看了下官网描述不知道这种join的操作谓词还能不能下推到es查询了。
支持