These are chat archives for nrinaudo/kantan.csv

6th
Dec 2018
Jules Ivanic
@guizmaii
Dec 06 2018 14:08

Hello,

I have the following error:

could not find implicit value for evidence parameter of type kantan.csv.HeaderEncoder[com.guizmaii.easy.excel.jruby.stream.ConstantRAMSpaceExcelJRuby.Row]

I don’t understand how to get this HeaderEncoder implicit

Row is defined like this type Row = Array[Cell] and Cell is just case class Cell(data: String, cellType: CellType /* simple sum type */)

I have these imports in scope:
  import kantan.csv._
  import kantan.csv.ops._
  import kantan.csv.generic._
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:10
I don't remember for sure, but I'm pretty sure I didn't implement generic derivation of header values
Jules Ivanic
@guizmaii
Dec 06 2018 14:10
I don’t have any header
I just have values
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:10
mmm... hang on... your types don't mke sense
right. A cell is a single value
Jules Ivanic
@guizmaii
Dec 06 2018 14:11
  sealed abstract class CellType
  case object BlankCell   extends CellType
  case object StringCell  extends CellType
  case object NumericCell extends CellType

  object CellType {
    private[this] final val BLANK_CELL   = "b"
    private[this] final val STRING_CELL  = "s"
    private[this] final val NUMERIC_CELL = "n"

    implicit final val encoder: CellEncoder[CellType] = {
      case BlankCell   => BLANK_CELL
      case StringCell  => STRING_CELL
      case NumericCell => NUMERIC_CELL
    }
    implicit final val decoder: CellDecoder[CellType] = CellDecoder.fromUnsafe({
      case BLANK_CELL   => BlankCell
      case STRING_CELL  => StringCell
      case NUMERIC_CELL => NumericCell
    })
  }

  final case class Cell(data: String, cellType: CellType)
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:11
your case class has two values, so unless you manually specify a CellDecoder[Cell], you won't be able to get what you want
Jules Ivanic
@guizmaii
Dec 06 2018 14:11
ok
thanks
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:12
also, I'm not 100% sure, but I'm not sure you'll get a RowDecoder or HeaderDecoder for Array
it depends on whether Array has a CanBuildFrom, which might or might not be the case. It's Array. It's weird.
Jules Ivanic
@guizmaii
Dec 06 2018 14:13
weird but performant :/
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:15
I honestly don't understand your CellType thing though. What's this about? Wouldn't it be more idiomatic to have:
sealed abstract class Cell extends Product with Serializable

object Cell {
  final case class StringCell(value: String) extends Cell
  final case class IntCell(value: Int) extends Cell

  // Couldn't this be a `StringCell` with an empty string?
  final case object BlankCell extends Cell
}
And then you (probably) get a CellDecoder[Cell] derived generically with the right imports
Jules Ivanic
@guizmaii
Dec 06 2018 14:17
yes good point
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:17
you might want to have a type parameter on Cell and have StringCell(value: String) extends Cell[String], but I don't know your use case so this might be overkill
or you could just have type Cell = Either[String, Int]
Jules Ivanic
@guizmaii
Dec 06 2018 14:19
  final case class Cell(data: String, cellType: CellType)

  object Cell {
    final val stupidlyUniqSeparator = "--||--//--"

    implicit final val encoder: CellEncoder[Cell] = {
      case Cell(data, cellType) => s"${data}${stupidlyUniqSeparator}${CellType.encoder(cellType)}"
    }

    implicit final val decoder: CellDecoder[Cell] = CellDecoder.fromUnsafe({ s: String =>
      val Array(data, cellType) = s.split(stupidlyUniqSeparator)
      Cell(data, CellType.decoder(cellType))
    })
  }

or you could just have type Cell = Either[String, Int]

Is a bit simplistic and non evolutive :D

Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:20
I still find it prefferable to encoding type information at the value level. It's usually better to do it the other way around
Jules Ivanic
@guizmaii
Dec 06 2018 14:20
but the CellType is stupid. I should use your code :)
thanks :)
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:21
but yeah, an ADT is better
if only because it gets a unique name at the type level, rather than just another Either
so. Do you control the data or not?
if you do, might I suggest that the cell format is potentially not great? If it's an int, stick an int, if it's a string, don't, if it's blank, just leave the cell empty
I prefer 1,foo, to 1:n,foo:s,:b, say
(where : is the stupidlyUniqueSeparator)
Jules Ivanic
@guizmaii
Dec 06 2018 14:23
I need to serialize and then to deserialize
so I need the type information serialized
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:23
I often do too and still write ints for ints and strings for strings
it's perfectly possible to write sane encoders and decoders for that
the one reason to do what you're doing is if CSV parsing is your bottleneck
if it is, this is a valid optimisation. If it's not, it's, in my opinion, a massive premature optimisation
Jules Ivanic
@guizmaii
Dec 06 2018 14:24
it’s not but it should be fast
and I don’ t really see how to write sane decoder without the type information previously encoded 🤔
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:25
it's already implemented, let me show you
Jules Ivanic
@guizmaii
Dec 06 2018 14:26
the thing is that I don’t control how the Row is contructed
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:26
a row is a sequence of cell, basically
Jules Ivanic
@guizmaii
Dec 06 2018 14:26
it could contains any sequence of Cell
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:26
it could, yes
or a case class
the idea is that you try to parse it as Int. If that works, good! You've got an IntCell. If it doesn't, it must be a valid string (because you actually get a string), so it's a StringCell
Jules Ivanic
@guizmaii
Dec 06 2018 14:28

you try to parse it as Int

Will fail with an exception ?

Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:29
depends on how the CellDecoder[Int] is implemented
it could be with an exception (it is in the case of Int), or it could not
Jules Ivanic
@guizmaii
Dec 06 2018 14:29
the thing is that maybe later, they’ll be more Cell type
My runtime is JRuby, exceptions are immensely costly. Can’t use them
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:29
not a problem, although it probably means you want an ADT rather than an Either as I suggested
Jules Ivanic
@guizmaii
Dec 06 2018 14:30
Either is too small. How to handle n Cell types ?
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:30
a sum type
exactly what I suggested above, but you add more members to your ADT
Jules Ivanic
@guizmaii
Dec 06 2018 14:31
yep
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:31
regarding exceptions: honestly, that's usually a misconception that comes from too much Java. Exceptions can be costly, but they're almost never your bottleneck and trying to optimise around them is usually premature optimisation
that being said, I don't know JRuby, so maybe it's problematic there
Jules Ivanic
@guizmaii
Dec 06 2018 14:32
It is on JRuby
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:33
to me, that reeks of everybody knows exceptions are slow, so never use them ever
if it was my project, I'd benchmark and then if exceptions are problematic, I'd work around them
but I wouldn't start with an unverified assumption that makes my code (and serialisation format) much nastier than it could be
Jules Ivanic
@guizmaii
Dec 06 2018 14:41

If you want a proof of very bad performances of exceptions on JRuby, you can read this issue: DataDog/dd-trace-rb#640

;)

Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:42
sorry, I don't mean that I doubt exceptions are slow in ruby. I mean that unless measured, I wouldn't consider them a bottleneck in any software I write and wouldn't make my problem more complex by building in optimisations that might not be necessary
Jules Ivanic
@guizmaii
Dec 06 2018 14:42
where there’s a table describing some performances differences
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:42
it's very possible that they would be a bottleneck in your case. It's just that neither you or I know, because it's not been measured.
Jules Ivanic
@guizmaii
Dec 06 2018 14:42
it’s been measured
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:43
in your use case?
Jules Ivanic
@guizmaii
Dec 06 2018 14:43
see the github links ;)
in general
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:43
that's precisely it. I'm sure they're slow in general. I'm not sure they're a bottleneck in your case
it's your project though, do exactly as you see fit. It's just that I find this everybody knows this will be a bottlneck so I'll make my code more complex by working around it highly suspicious
Jules Ivanic
@guizmaii
Dec 06 2018 14:44
why use exception when you can avoid them ? The “trick” here is simple enough to justify itslelf compared to exceptional code
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:44
it's not. It makes your serialistion format nasty
Jules Ivanic
@guizmaii
Dec 06 2018 14:44
I disagree
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:44
and unsafe, although the odds of a normal string containing --||--//-- are probably low
Jules Ivanic
@guizmaii
Dec 06 2018 14:45
plus this serialization format is only used for temp files
new version, thanks to you :)
  sealed abstract class Cell extends Product with Serializable
  object Cell {
    final case object BlankCell                 extends Cell
    final case class StringCell(value: String)  extends Cell
    final case class NumericCell(value: Double) extends Cell

    private[this] final val BLANK_CELL   = "b"
    private[this] final val STRING_CELL  = "s"
    private[this] final val NUMERIC_CELL = "n"

    implicit final val encoder: CellEncoder[Cell] = {
      case BlankCell          => s"$BLANK_CELL:"
      case StringCell(value)  => s"$STRING_CELL:$value"
      case NumericCell(value) => s"$NUMERIC_CELL:$value"
    }

    implicit final val decoder: CellDecoder[Cell] = CellDecoder.fromUnsafe({ s =>
      val Array(cellType, data) = s.split(":", 1)
      cellType match {
        case BLANK_CELL   => BlankCell
        case STRING_CELL  => StringCell(data)
        case NUMERIC_CELL => NumericCell(data.toDouble)
      }
    })
  }
now it’s only s.split(":", 1)
which should be pretty fast
and safe
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:46
not really safe, no: s:s:foo
what's the content of your cell?
Jules Ivanic
@guizmaii
Dec 06 2018 14:46
don’t know
could be any string
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:46
probably s, instead of s:foo as you'd expect
alright, I think maybe I didn't ask my question right. What if the string you want to store is s:foo?
encoded: s:s:foo
decoded: s
Jules Ivanic
@guizmaii
Dec 06 2018 14:47
s.split(":", 1) stop on the first match
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:47
(I think)
Jules Ivanic
@guizmaii
Dec 06 2018 14:47
so "s:s:foo”.split(":", 1) will be Array(“s”, “s:foo”), which is correct
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:48
ah, didn't know that one. Then yes, it does seem perfectly safe
Jules Ivanic
@guizmaii
Dec 06 2018 14:48
:)
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:48
erm...
λ> "s:s:foo".split(":", 1)
res0: Array[String] = Array("s:s:foo")
what am I doing wrong?
Jules Ivanic
@guizmaii
Dec 06 2018 14:49
ahah
don’t know
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:49
ah right, you want 2, not 1, but same difference
Jules Ivanic
@guizmaii
Dec 06 2018 14:49
I’ll have a look
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:50
ok. So this looks safe. I still stand by my other point that s:foo,i:1 is not as nice as foo,1, but it's not my data
Jules Ivanic
@guizmaii
Dec 06 2018 14:51
I still have: could not find implicit value for evidence parameter of type kantan.csv.HeaderEncoder[com.guizmaii.easy.excel.jruby.stream.ConstantRAMSpaceExcelJRuby.Row]
:/
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:51
Is Row still an Array?
Jules Ivanic
@guizmaii
Dec 06 2018 14:52
type Row = Array[Cell]
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:52
try with a List
I don't mean that you should keep List afterwards, just to see whether there's a decoder for Array[A: CellDecoder]
I'm not sure there is
(which, again, I wouldn't use Array here either unless performance was really, really critical and this was a bottlneck, because mutability)
Jules Ivanic
@guizmaii
Dec 06 2018 14:53
with List, it compiles
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:53
right. There's no RowDecoder[Array[A: CellDecoder]]
Jules Ivanic
@guizmaii
Dec 06 2018 14:53
ok thanks
I’ll try to implement it
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 14:53
it's fairly simple to write if you need it, but it'll hardly be an optimisation
since your RowDecoder[Array[A]] is basically a List[String] => Array[A]
the list of cells already exists as a List
Jules Ivanic
@guizmaii
Dec 06 2018 15:08
seems not that easy
  private implicit def arrayDecoder[A](
      implicit CellDecoder: CellDecoder[A]
  ): Decoder[Array[String], Array[A], DecodeError, codecs.type] = RowDecoder.fromUnsafe(_.map(CellDecoder.unsafeDecode))
does not compile
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 15:08
well, no, the types don't line up
Jules Ivanic
@guizmaii
Dec 06 2018 15:08
the compiler always want me to go back to Seq[A] instead of Array
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 15:09
there's that, but you can always use an array builder and fold on that
I mean if you're going to use mutability you might as well go all the way
your main problem is that you're returning a Seq[Either[Error, A]] where the compiler expects an Either[Error, Seq[A]]
Jules Ivanic
@guizmaii
Dec 06 2018 15:10
the pb is that you define RowEncoder like that: type RowEncoder[A] = Encoder[Seq[String], A, codecs.type] but Array is not a Seq
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 15:10
who cares? Seq[String] is the input, A is the output
Jules Ivanic
@guizmaii
Dec 06 2018 15:11
the input is also an Array ?
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 15:11
no it's not, it's a Seq[String]. I'm pretty sure. I wrote the library.
it's runtime class might be Array, in some circumstances
or, well, WrappedArray, I guess
but you're writing a Seq[String] => Either[Error, A]
Jules Ivanic
@guizmaii
Dec 06 2018 15:12
Array is not a Seq, how “it's runtime class might be Array” coumd be possible ?
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 15:12
because it's a WrappedArray which is a Seq.
but, again, who cares? This is not the part you have to implement. You have to turn that Seq[String] into an Array[A]
Jules Ivanic
@guizmaii
Dec 06 2018 15:15
my bad
  private implicit def arrayDecoder[A: ClassTag](
      implicit CellDecoder: CellDecoder[A]
  ): RowDecoder[Array[A]] =
    RowDecoder.fromUnsafe { array =>
      val acc = Array.empty[A]
      for (a <- array) acc :+ CellDecoder.unsafeDecode(a)
      acc
    }

  private implicit def arrayEncoder[A](implicit CellEncoder: CellEncoder[A]): RowEncoder[Array[A]] =
    (array: Array[A]) => array.map(CellEncoder.encode)
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 15:16
for someone who doesn't like exceptions you sure rely on them a lot
Jules Ivanic
@guizmaii
Dec 06 2018 15:16
😄
Here I’m sure that it’ll be safe
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 15:17
sure, it's just that the usual "clean" implementation of this problem is traverse, not throw an exception and catch it :)
Jules Ivanic
@guizmaii
Dec 06 2018 17:17

Result here: guizmaii/easy_excel_jruby#1

:)

The code could looks like maybe a bit strange. It’s bacause it's intended to be used in a JRuby app… so there’re constraints. Also it replaces a Ruby lib. So in order to not rewrite everything, I try keep the same interface as the previous lib.

Thanks for your help @nrinaudo ! :)

Nicolas Rinaudo
@nrinaudo
Dec 06 2018 18:04
I get the impression that you don’t want people to modify your instances of Cell - is this why you’re hiding the class and its companion object?
If so, you might have forgotten about the copy method
This is where I introduce you to the wonders of sealed abstract case classes
Jules Ivanic
@guizmaii
Dec 06 2018 18:32
You're right !
I know sealed abstract case classes :)
There's a famous Tpolecat gist about it
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 18:33
There you go then this seems like a good place to use them
Jules Ivanic
@guizmaii
Dec 06 2018 18:35
I'll take a look tomorrow ! Thanks again 🙂
Nicolas Rinaudo
@nrinaudo
Dec 06 2018 19:00
Quite welcome!