These are chat archives for trueaccord/ScalaPB

25th
Feb 2018
Nadav Samet
@thesamet
Feb 25 00:14
@codeexplorer , it turns out that to create a dataset, we will have to generate custom Encoders, which is going to take some time, though I'd like to have it done. In the meantime, is using DataFrame sufficient for your use case?
codeexplorer
@codeexplorer
Feb 25 16:51
yes I can work with dataframe, but if I change spark.sqlContext.createDataset to spark.sqlContext.createDataFrame, I still get the exception
Nadav Samet
@thesamet
Feb 25 17:59
You should use protoToDataframe: https://scalapb.github.io/sparksql.html
codeexplorer
@codeexplorer
Feb 25 22:44
I am trying to understand how I can use protoToDataframe with structured streaming where I am reading from Kafka: structured streaming already provides a kafka dataframe where I can read the value as an array of bytes. I convert that to RDD and then use toDataFrame on that to interpret the protobuf i.e will something like this work : val ds1 = spark.readStream.format("kafka")
.option("kafka.bootstrap.servers", kafkaBrokers.get)
.option("subscribe", kafkaTopic.get)
.option("startingOffsets", "latest")
.load()
codeexplorer
@codeexplorer
Feb 25 23:37
The issue is will spark be able to convert the RDD[Array[bytes]} to RDD[case class xxx] where case class contains enums ? by doing a rdd2=rdd1.map(el => xxx(el))