Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • May 21 05:22
    scala-steward closed #602
  • May 21 05:22
    scala-steward commented #602
  • May 21 05:22
    scala-steward opened #609
  • Apr 25 12:33
    scala-steward closed #589
  • Apr 25 12:33
    scala-steward commented #589
  • Apr 25 12:33
    scala-steward opened #608
  • Apr 17 08:39
    scala-steward closed #605
  • Apr 17 08:39
    scala-steward commented #605
  • Apr 17 08:39
    scala-steward opened #607
  • Apr 12 18:54
    codecov-commenter commented #606
  • Apr 12 18:54
    codecov-commenter commented #606
  • Apr 12 18:51
    scala-steward closed #592
  • Apr 12 18:51
    scala-steward commented #592
  • Apr 12 18:50
    scala-steward opened #606
  • Apr 10 00:44
    scala-steward closed #577
  • Apr 10 00:44
    scala-steward commented #577
  • Apr 10 00:44
    scala-steward opened #605
  • Apr 08 07:25
    scala-steward opened #604
  • Apr 07 12:55
    scala-steward opened #603
  • Apr 06 00:44
    scala-steward closed #600
Adelbert Chang
@adelbertc
sounds good
dwhitney
@dwhitney
it seems that I am just following @adelbertc around since he's in all of the same gitter channels I am :)
Adelbert Chang
@adelbertc
;)
dwhitney
@dwhitney
anyway @travisbrown I just took the advice you gave me at SBTB and read up on your iteratee lib. Looks really cool!
RomanIakovlev
@RomanIakovlev

I figured I better bring this topic to iteratee channel. As mentioned in Circe channel, I want to produce json with Circe in a streaming fashion. I’ve tried to start and faced one question. Basically I need a function like this (imports omitted):

def writeJson[T: Encoder](enum: Enumerator[Task, T], file: File): Task[Unit] = {
    val printer = Printer.noSpaces.copy(dropNullKeys = true)
    val opener = Enumerator.enumOne[Task, String]("[")
    val closer = Enumerator.enumOne[Task, String]("]")
    val entries = enum.map(_.asJson.pretty(printer) + ",")
    opener.append(entries).append(closer).into(writeLines(file))
  }

The problem is the comma after the last entry, which makes the resulting json invalid. Is there a way to somehow introspect the Enumerator and to know if that’s the last entry, to handle it differently?

RomanIakovlev
@RomanIakovlev
Okay, I’ve found Enumeratee.intersperse, it solves this particular problem.
Travis Brown
@travisbrown
@RomanIakovlev right, intersperse is the way to go.
you could wrap all of that work up in an enumeratee, I guess—I haven't yet just because it's more straightforward than the decoding side.
RomanIakovlev
@RomanIakovlev

It would be nice to have streaming parser also for the non-trivial json structures. By non-trivial I mean something like this:

{
  “my_objects”: [… a huge list of objects …]
}

as opposed to just [… a huge list of objects …]. Have no idea how to approach it though.

Travis Brown
@travisbrown
@RomanIakovlev agreed, but figuring out the API for that is hard.
@RomanIakovlev I could imagine some navigation methods on the decoding enumeratees that allow you to navigate into the structure before streaming starts…
RomanIakovlev
@RomanIakovlev
It surprises me how efficient the streaming reading/writing is. A mediocre 150MB of serialized minified json required 8+ gigs of RAM to write and just under 8 gigs to read. Streaming makes it work in constant memory under 4 gigs, for both reading and writing.
But it forces the rigid structure of array of objects though. It’s tolerable for now, because I own both producer and consumer side.
Travis Brown
@travisbrown
@RomanIakovlev I'm more surprised about how bad that is on the non-streaming side…
RomanIakovlev
@RomanIakovlev
I didn’t dig deep into it, so I can’t provide much info now. I first had a non-trivial structure, an object with 2 huge arrays in it. It required a lot of RAM. When I switched to 2 separate arrays and streaming, it works with default SBT memory size.
RomanIakovlev
@RomanIakovlev
To be clear, the memory size I’m describing is not exactly how much json processing have taken, but how much I had to give to SBT to make my applications run (in a non-forked mode). Json part took somewhat less, but still the most of that memory.
RomanIakovlev
@RomanIakovlev
@travisbrown I’m going to give an internal tech talk for my team at work about my experience using Circe streaming with Iteratee. I’m trying to find a good and concise definitions of Iteratee’s main classes: Iteratee, Enumeratee and Enumerator, what they do on their own and how they interact with each other. IIRC you’ve promised a blog post about Iteratee architecture some time ago (no pressure! :smile: ) , but in absence of it, could you please explain here what they are and what they do, in general?
Travis Brown
@travisbrown
@RomanIakovlev cool! have you seen the descriptions in my original blog post?
@RomanIakovlev depending on the audience enumerator = stream, enumerate = transformer, iteratee = fold (or sink) might work.
s/enumerate/enumeratee
RomanIakovlev
@RomanIakovlev
I don't think I've seen any blog post about Iteratee.
RomanIakovlev
@RomanIakovlev
RomanIakovlev
@RomanIakovlev
This blog post definitely helps! I guess my questions are pretty much covered there. I had somehow missed it before.
Teodor Dimov
@teodimoff
i tried the library and i liked it ... 10x
Teodor Dimov
@teodimoff
@travisbrown i was looking at fs2 , but i am a fan of your work :D
Travis Brown
@travisbrown
@teodimoff thanks! there are plenty of things fs2 can do that are out of scope for this project, but I find this model works for a lot of the stuff I need, and it's simpler / faster.
Teodor Dimov
@teodimoff
Agreed. The three abstractions are composable enough and flexible enough to make things happen... quickly
Teodor Dimov
@teodimoff
@travisbrown closest thing to Iteratee.sortBy(w => w.length -> w)? i want the hole file sorted by length and by alphabetic order.
Travis Brown
@travisbrown
@teodimoff there's not a super nice way, since all of the off-the-shelf enumeratees don't require gathering all elements in memory…
@teodimoff this isn't too terrible, though:
scala> import cats.Monad
import cats.Monad

scala> import io.iteratee.{ Enumeratee, Enumerator, Iteratee }
import io.iteratee.{Enumeratee, Enumerator, Iteratee}

scala> def sortBy[F[_]: Monad, A, B: Ordering](f: A => B): Enumeratee[F, A, A] =
     |   Enumeratee.sequenceI(Iteratee.consume[F, A]).map(_.sortBy(f)).andThen(Enumeratee.flatMap(Enumerator.enumVector[F, A]))
sortBy: [F[_], A, B](f: A => B)(implicit evidence$1: cats.Monad[F], implicit evidence$2: Ordering[B])io.iteratee.Enumeratee[F,A,A]

scala> import cats.instances.option._
import cats.instances.option._

scala> import io.iteratee.modules.option._
import io.iteratee.modules.option._

scala> enumVector(Vector("a", "aaa", "aa")).through(sortBy((_: String).length)).toVector
res0: Option[Vector[String]] = Some(Vector(a, aa, aaa))
Teodor Dimov
@teodimoff
@travisbrown nope its not bad at all... thx
Travis Brown
@travisbrown
Srepfler Srdan
@schrepfler
:clap:
circe coming in 3… 2… 1…
Travis Brown
@travisbrown
@schrepfler yep :smile:
John Sullivan
@sullivan-
so i wrote this method to convert a scala.collection.Iterator into a io.iteratee.Enumerator. it seems to be working, but i wonder if ppl would be willing to look it over to see what you think? I am still pretty shaky with this cats stuff...
   def toEnumerator[F[_], E](iterator: => Iterator[E])(implicit F: Monad[F]): Enumerator[F, E] = {
     new Enumerator[F, E] {
       final def apply[A](step: Step[F, E, A]): F[Step[F, E, A]] = {
         if (iterator.hasNext) {
           F.flatMap(step.feedEl(iterator.next))(s => apply[A](s))
         } else {
           F.pure(step)
         }
       }
     }
   }
John Sullivan
@sullivan-
@travisbrown i would especially appreciate your opinion
Travis Brown
@travisbrown
@sullivan- looks reasonable to me (you could add chunking, etc., but that's just an optimization). the only reason we don't provide something exactly like that is because the resulting enumerator inherits the mutability of the iterator.
John Sullivan
@sullivan-
thanks @travisbrown ! when you say "inherits the mutability" do you mean something like other holders of the Iterator might call methods on the iterator, which would cause the enumerator to behave differently?
Travis Brown
@travisbrown

@sullivan- that's part of it, but it's more about the mutability breaking referential transparency even when you're working just with the enumerator itself—e.g. even if you're reading a file like this:

val e = io.iteratee.monix.task.readLines(new java.io.File("build.sbt"))

you can reuse e as many times as you like and never have to worry about the internal state of the enumerator, etc.

@sullivan- that wouldn't be the case for an enumerator you'd get from toEnumerator.
John Sullivan
@sullivan-

Got it! Thanks, that makes sense. So if I changed the signature from

def toCatsEnumerator[F[_], E](iterator: => Iterator[E])(implicit F: Monad[F])

to

def toCatsEnumerator[F[_], E](iteratorGen: () => Iterator[E])(implicit F: Monad[F])

i could theoretically get around that problem by calling iteratorGen in the right place inside the Enumerator, right?

Ghost
@ghost~55118a7f15522ed4b3ddbe95
That's still not enough since iteratorGen could be returning the same iterator each time.
John Sullivan
@sullivan-
so my assumption is that the caller of this toCatsEnum method knows they have to provide a function that is going to produce a new, equivalent iterator each time. e.g., i could put that in the scaladoc comment
John Sullivan
@sullivan-
@travisbrown you're right, chunking would help. its taking like 20 seconds to process 5M ints..
John Sullivan
@sullivan-
(for the record, that 20 seconds was due to a bad fold somewhere else)
Ghost
@ghost~55118a7f15522ed4b3ddbe95
@sullivan- Then it doesn't really matter if you take your Iterator by => or by () =>.
They are almost entirely equivalent.
The former is syntactically easier to use.