Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Matthew Powers
    @MrPowers
    @manuzhang @nvander1 - Yo!
    Let's start chatting
    @nvander1 - Do you think I should try to transition spark-daria to Mill or is it not worth the effort... I'm not really interested in making the switch unless Mill is way better in certain ways. Let me know what you think.
    Nik Vanderhoof
    @nvander1
    Well there's not currently a SparkPackageModule for mill, so that's a big disadvantage since you publish on spark-packages repo.
    Matthew Powers
    @MrPowers
    @nvander1 - I'm fine getting away from spark-packages. spark-packages has had bugs for me for the last two years and they haven't been fixed: databricks/sbt-spark-package#31
    Nik Vanderhoof
    @nvander1
    I think mill is easier to deal with if we start doing cross-builds for different versions of Scala + Spark. There's probably a way to do it in SBT if we tried to figure it out. But I've already done it in mill
    Matthew Powers
    @MrPowers
    I'd be best for me to just publish directly to Maven I think. Some users need stuff published directly on Maven Central: MrPowers/spark-fast-tests#50
    @nvander1 - Have you figured out how to publish Mill projects to Maven?
    Nik Vanderhoof
    @nvander1
    I've gotten it into the snapshot repos on sonatype, but haven't tried promoting artifacts to maven yet. I think it is as simple as changing a flag on my publish command though
    Matthew Powers
    @MrPowers
    Yep, so sounds like Mill might be the path forward for spark-daria. Sounds exciting :)
    Not sure if I'll be able to use it for my other projects that need shading though. Mill doesn't support shading right?
    Nik Vanderhoof
    @nvander1
    Not sure, I'll find out
    fyi: mill's gitter is very active always https://gitter.im/lihaoyi/mill
    Matthew Powers
    @MrPowers
    So @nvander1 are you cool with this plan:
    1. You confirm that you can get projects published to Maven via Mill with no problems
    2. We transition spark-daria to Mill on a branch and gather as much benchmarks as possible (e.g. SBT test runtime vs Mill test runtime)
    3. If Mill is working properly, we shift spark-daria to Mill, cut out spark-packages completely, and start publishing JAR files directly to Maven
    4. If we can't get Mill working properly for whatever reason, we still cut out spark-packages and just start publishing JAR files to Maven via SBT
    @nvander1 Sweet, just joined the Mill Gitter channel
    Nik Vanderhoof
    @nvander1
    We'll also need to figure out how to do build matrices with different spark/scala versions. To backport the higher order functions api, I think we need slightly differing implementations depending on the spark version people use
    Sbt has support for cross-scala builds out of the box, but I'm trying to find out if people also build against different versions of dependencies
    Hoping mill turns out to be at least as speedy, because I know how to do those types of matrices in mill
    Matthew Powers
    @MrPowers
    Yea, might not be worth the complexity in that case, haha. I'm all about having a spark-daria philosophy of "upgrade your Spark to the latest version if you want the latest features"
    Nik Vanderhoof
    @nvander1
    Yeah that's another option
    Matthew Powers
    @MrPowers
    @nvander1 - I need to sign off now. Can we jump on a Hangout at some point to brainstorm next steps?
    Nik Vanderhoof
    @nvander1
    Yeah not today though. Sometime during the week will work for me though.
    Matthew Powers
    @MrPowers
    Cool, sounds good.
    Nik Vanderhoof
    @nvander1
    @manuzhang I probably won't have the example of the higher order functions pushed until we get the Mill / Maven stuff sorted
    Manu Zhang
    @manuzhang
    @nvander1 Take your time. It's not must-have since users can always leverage UDAFs as in MrPowers/spark-daria#79. Also following @MrPowers's philosophy, we can merge in your Spark PR apache/spark#24232 for latest Spark version if it doesn't get into the main tree.
    Manu Zhang
    @manuzhang
    I've also done some experiment with Mill in https://github.com/gearpump/gearpump-externals. My impression is that Mill has made it easier for developers to dig into and figure things out while SBT is always like a mystery.
    On the other hand, it has surprised me as in lihaoyi/mill#385
    Manu Zhang
    @manuzhang
    My concern is that Mill is more like one-man's project while SBT has a larger community, a company behind it and many plugins.
    Matthew Powers
    @MrPowers
    It’s a good think spark-daria is such a simple project. It’s probably easy to use either build tool with spark-daria.
    Joaquín Chemile
    @jchemile
    Hello! Greetings from Buenos Aires!!
    Matthew Powers
    @MrPowers
    Welcome @jchemile :)
    Matthew Powers
    @MrPowers
    I released v0.32.0 and scoverage was causing my downstream CIs to throw this error (when spark-daria was included in other projects): scoverage/sbt-scoverage#228 Super annoying. I set coverageEnabled := false in the spark-daria build.sbt file to fix this and did a v0.32.1 release.
    Nik Vanderhoof
    @nvander1
    I don't think we can switch to mill soon. Not until it supports shading. I was taking a look down into https://github.com/shevek/jarjar and into Mill. If I find the time, I'll try to get a PR on mill for it.
    Nik Vanderhoof
    @nvander1
    @MrPowers Do you manually update the docs for spark-daria?
    Maybe we could add a hook in travis to build the docs for each tagged commit to push to the docs to an appropriate github pages?
    ie
    Nik Vanderhoof
    @nvander1
    @MrPowers @manuzhang RE long name, I think daria._ is an acceptable name, only thing to be cautious of is forcing users to change existing imports, although it should just be a simple find and replace for them
    Nik Vanderhoof
    @nvander1
    Nik Vanderhoof
    @nvander1
    mill mill.scalalib.PublishModule/publishAll --sonatypeCreds "$SONATYPE_USER:$SONATYPE_PASS" --publishArtifacts __.publishArtifacts --release false
    Matthew Powers
    @MrPowers
    mill mill.scalalib.PublishModule/publishAll --sonatypeCreds X:Y --publishArtifacts __.publishArtifacts --release false
    [218/218] mill.scalalib.PublishModule.publishAll
    1 targets failed
    mill.scalalib.PublishModule.publishAll os.SubprocessException: CommandResult 2
    os.proc.call(ProcessOps.scala:74)
    mill.scalalib.publish.SonatypePublisher.poorMansSign(SonatypePublisher.scala:146)
    mill.scalalib.publish.SonatypePublisher.$anonfun$publishAll$4(SonatypePublisher.scala:33)
    scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
    scala.collection.Iterator.foreach(Iterator.scala:941)
    scala.collection.Iterator.foreach$(Iterator.scala:941)
    scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
    scala.collection.IterableLike.foreach(IterableLike.scala:74)
    scala.collection.IterableLike.foreach$(IterableLike.scala:73)
    scala.collection.AbstractIterable.foreach(Iterable.scala:56)
    scala.collection.TraversableLike.map(TraversableLike.scala:237)
    scala.collection.TraversableLike.map$(TraversableLike.scala:230)
    scala.collection.AbstractTraversable.map(Traversable.scala:108)
    mill.scalalib.publish.SonatypePublisher.$anonfun$publishAll$2(SonatypePublisher.scala:32)
    scala.collection.TraversableLike$WithFilter.$anonfun$map$2(TraversableLike.scala:742)
    scala.collection.Iterator.foreach(Iterator.scala:941)
    scala.collection.Iterator.foreach$(Iterator.scala:941)
    scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
    scala.collection.IterableLike.foreach(IterableLike.scala:74)
    scala.collection.IterableLike.foreach$(IterableLike.scala:73)
    scala.collection.AbstractIterable.foreach(Iterable.scala:56)
    scala.collection.TraversableLike$WithFilter.map(TraversableLike.scala:741)
    mill.scalalib.publish.SonatypePublisher.publishAll(SonatypePublisher.scala:24)
    mill.scalalib.PublishModule$.$anonfun$publishAll$2(PublishModule.scala:117)
    scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    Manu Zhang
    @manuzhang
    @nvander1 :thumbsup: hopefully this could get into the main repo
    Matthew Powers
    @MrPowers
    @nvander1 @manuzhang - I wrote a blog post on dependency injection: https://medium.com/@mrpowers/dependency-injection-with-spark-8367b6956343 Let me know what you think!
    @nvander1 - I created a Giter8 template to easily create Spark SBT projects: https://github.com/MrPowers/spark-sbt.g8 Do you think we should make a Giter8 template project for Mill? That’d hopefully make it easier for other developers to start using Mill.
    Manu Zhang
    @manuzhang
    @MrPowers
    Considering the following codes, where do we get that spark session ? Another thing is I'm not sure it's a good style to have such a long default parameter
    def withStateFullNameInjectDF(
      stateMappingsDF: DataFrame = spark
        .read
        .option("header", true)
        .csv(Config.get("stateMappingsPath"))
    )(df: DataFrame): DataFrame = {
      df
        .join(
          broadcast(stateMappingsDF),
          df("state") <=> stateMappingsDF("state_abbreviation"),
          "left_outer"
        )
        .drop("state_abbreviation")
    }