These are chat archives for nrinaudo/kantan.csv
kantan.csvis able to work with Spark's RDD. For example, giving a HDFS file URL and a row definition
case class MyCsvRow(cell0: String, cell1: Int), is
kantan.csvable to load the HDFS file to a
URL, for instance, you're good to go
CsvInput, which allows you to turn any type into something that can be read as CSV data
RDDis not a in-memory collection. It is a handle of distributed typed data.
URLdon't contain the data either, just the path to it
RDDand I'll get back to you later
RDDinto CSV, but CSV into a
RDD. At the same time, I also wonder how to write a
RDDas a CSV.
RDD, you can just pass that to, say,
Stringin memory and then save that, but that will probably end up being prohibitively expensive
RDD[String]and then write the RDD into HDFS
RowEncoder[A]. That's essentially a
A => Seq[String]
mkString(","), if you want your column separator to be
def transform[A: RowEncoder](input: RDD[A]): RDD[String] = input.map(a => RowEncoder[A].encode(a).mkString(","))
seq.mkString(",")work? Is there any escaping issue about the comma?
def transform[A: RowEncoder](input: RDD[A]): RDD[String] = input.map(a => List(a).asCsv(','))
RDD[A], frankly, I've no idea. I'd need to learn quite a bit more about spark than I know right now.