Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jan 19 2019 17:15
    asears starred mediative/sparrow
  • Apr 27 2018 00:33
    benwbooth starred mediative/sparrow
  • Aug 15 2017 06:15
    jenniemanphonsy starred mediative/sparrow
  • Mar 14 2017 19:13

    jonas on master

    Update to use mediative org (compare)

  • Jan 03 2017 06:12
    kingice starred ypg-data/sparrow
  • Dec 06 2016 03:46
    tovbinm starred ypg-data/sparrow
  • Oct 23 2016 11:17
    ammachado starred ypg-data/sparrow
  • Oct 17 2016 14:54
    jfraj added as member
  • May 12 2016 05:16
    suhailshergill added as member
  • Mar 13 2016 02:47
    tscholak starred ypg-data/sparrow
  • Mar 01 2016 01:39
    Travis ypg-data/sparrow (0.2.0) passed (77)
  • Mar 01 2016 01:35

    jonas on master

    Update release process to - `… (compare)

  • Mar 01 2016 01:29

    jonas on 0.2.0

    (compare)

  • Mar 01 2016 01:29

    jonas on master

    Update copyright notice Version 0.2.0 (compare)

  • Feb 29 2016 18:24

    jonas on version-bumps

    (compare)

  • Feb 29 2016 18:24
    jonas commented #24
  • Feb 29 2016 18:24

    jonas on master

    travis: Compile against the las… Bump Spark version to 1.6.0 Fi… Bump the Macro Paradise plugin … and 1 more (compare)

  • Feb 29 2016 18:24
    jonas closed #24
  • Feb 29 2016 18:21
    suhailshergill commented #24
  • Feb 29 2016 15:18
    jonas commented #24
Jonas Fonseca
@jonas
Didn't work either. I'll try Monday
Jonas Fonseca
@jonas
FYI, I've made a request to publish to Maven Central.
Jonas Fonseca
@jonas
OK, apparently I'd misconfigured the spark package plugin. Opened #21 and published 0.1.1 releases: http://spark-packages.org/package/ypg-data/sparrow
Jonas Fonseca
@jonas
@gpoirier Did you ever look at Sparks DataFrameReader API?
val df = sqlContext.read.format("com.databricks.spark.csv").schema(csvSchema).option("header", "true").load(csvPath)
Options only support String values, but the ability to plug readers and provide schemas looks interesting.
Guillaume Poirier
@gpoirier
I haven't looked at it.
What does it do exactly, go from CSV file to a DataFrame/Row with a better schema than the JSON reader we were using?
Jonas Fonseca
@jonas
It plugs in https://github.com/databricks/spark-csv for parsing a CSV file so goes from CSV to Dataframe.
Guillaume Poirier
@gpoirier
I hadn't looked at it, but I knew we could load CSV to a DataFrame.
Jonas Fonseca
@jonas
I wrote the schema by hand since it had 30+ fields and I wanted something quick and dirty, but the @schema macro should be able to generate that from a case class
OK
Guillaume Poirier
@gpoirier
With the prototype I'm working on, you (will?) get a StructType out of a RowConverter.
^ I think might turn out OK. My previous attempts I had issues that sent me back to a different API.
Jonas Fonseca
@jonas
That sounds good. I'll take a look later.
Jonas Fonseca
@jonas
@gpoirier I really like the way it's going and I'm seeing a lot of potential use cases. When do you think it will be in a mergeable state?
Guillaume Poirier
@gpoirier
This message was deleted
This message was deleted
@jonas I'm not sure, having some issues with the Row abstraction. Then I'll have to work on the schema validation and Safe/Error handling support.
Jonas Fonseca
@jonas
OK
Guillaume Poirier
@gpoirier
@jonas I don't know if you wanted to be involved in what I'm doing. But I'm not really good at splitting up code like that in the design phase. But anyway, if you want to be involved and have a suggestion on how to split things or if you want to propose API improvements, let me know. We can do an hangout too if you want more details.
Jonas Fonseca
@jonas
Yes, I'd like to get involved. Let me know your availability.
@gpoirier ... For a hangout.
Guillaume Poirier
@gpoirier
@jonas could be this morning around 8am. Or tonight when I'm back home.
Jonas Fonseca
@jonas
@gpoirier Tonight works best for me.
Guillaume Poirier
@gpoirier
@jonas that works
Guillaume Poirier
@gpoirier
@suhailshergill To answer your question on the PR:
:point_up: July 27, 2015 8:49 PM
Suhail Shergill
@suhailshergill
@gpoirier off topic, but in case you're interested, i've started a meetup group on probabilistic programming. the first meetup is next week: http://www.meetup.com/Toronto-Probabilistic-Programming-Meetup/events/226746558/
Guillaume Poirier
@gpoirier
Sounds interesting. :) Although, you picked a day with already 2 other big data meetups:
Suhail Shergill
@suhailshergill
hm i had thought about it, but wasn't sure how much crossover there would be. i should open up a poll to the group members though
Guillaume Poirier
@gpoirier
@suhailshergill I ended going to the Spark meetup because we are actually looking at using Spark very soon and we actually had some questions we wanted to ask. But hopefully I'll be able to make the next one.
But more on topic, have you looked at Spark 1.6 Dataset? I'm wondering if it does part of what Sparrow is meant to do.
Suhail Shergill
@suhailshergill
@gpoirier i assumed that's what happened. the first session was mostly overview, and i'm hoping we dig into the actual tutorial content next meetup. would love to have you there
Suhail Shergill
@suhailshergill
@gpoirier regd. dataset and sparrow. we've been wondering the same. or well i've been thinking about it. my current understanding (it might be mistaken, i haven't spent enough time on this) is that dataset doesn't have all the capabilities that RDDs afford. this reduced expressivity is traded off for better optimization. but datasets retain the typesafe aspect. as such, i would imagine there to be a need to want to switch between RDD and Dataset, but it's unclear to what extent would that need to be aided by additional code (say an extension to sparrow). it's also unclear (i haven't read https://issues.apache.org/jira/browse/SPARK-9999 to completion) if there would be a need to convert between dataframe and dataset, and if so if that would be done in a manner which isn't lossy wrt type information
Suhail Shergill
@suhailshergill
@gpoirier @yawaramin the slack channel for probabilistic programming https://cscabal.slack.com/messages/propl/ you guys are the alpha testers (i want to understand how easy or not it is for you to register, before i unleash it on the other members)
Guillaume Poirier
@gpoirier
It asks me an email @cscabal.com
Guillaume Poirier
@gpoirier
@suhailshergill ^
Suhail Shergill
@suhailshergill
@gpoirier it doesn't ask you to request signup?
or rather allow you to do it
Guillaume Poirier
@gpoirier
It allows me to request sign up, but I have to provide an email @cscabal.com
@suhailshergill I tried to sign up providing guillaume.poirier[@cscabal.com] as email, then it says it will send me an email to complete sign up. And of course I couldn't receive such email.
Suhail Shergill
@suhailshergill
huh, interesting. lemme tinker
Suhail Shergill
@suhailshergill
@gpoirier i needed to get admin acces. ok so what's your email?
Guillaume Poirier
@gpoirier
Suhail Shergill
@suhailshergill
invitation sent
Guillaume Poirier
@gpoirier
@suhailshergill that worked
Suhail Shergill
@suhailshergill
yay