@nvander1 - Do you think I should try to transition spark-daria to Mill or is it not worth the effort... I'm not really interested in making the switch unless Mill is way better in certain ways. Let me know what you think.
Nik Vanderhoof
@nvander1
Well there's not currently a SparkPackageModule for mill, so that's a big disadvantage since you publish on spark-packages repo.
Matthew Powers
@MrPowers
@nvander1 - I'm fine getting away from spark-packages. spark-packages has had bugs for me for the last two years and they haven't been fixed: databricks/sbt-spark-package#31
Nik Vanderhoof
@nvander1
I think mill is easier to deal with if we start doing cross-builds for different versions of Scala + Spark. There's probably a way to do it in SBT if we tried to figure it out. But I've already done it in mill
Matthew Powers
@MrPowers
I'd be best for me to just publish directly to Maven I think. Some users need stuff published directly on Maven Central: MrPowers/spark-fast-tests#50
@nvander1 - Have you figured out how to publish Mill projects to Maven?
Nik Vanderhoof
@nvander1
I've gotten it into the snapshot repos on sonatype, but haven't tried promoting artifacts to maven yet. I think it is as simple as changing a flag on my publish command though
Matthew Powers
@MrPowers
Yep, so sounds like Mill might be the path forward for spark-daria. Sounds exciting :)
Not sure if I'll be able to use it for my other projects that need shading though. Mill doesn't support shading right?
You confirm that you can get projects published to Maven via Mill with no problems
We transition spark-daria to Mill on a branch and gather as much benchmarks as possible (e.g. SBT test runtime vs Mill test runtime)
If Mill is working properly, we shift spark-daria to Mill, cut out spark-packages completely, and start publishing JAR files directly to Maven
If we can't get Mill working properly for whatever reason, we still cut out spark-packages and just start publishing JAR files to Maven via SBT
@nvander1 Sweet, just joined the Mill Gitter channel
Nik Vanderhoof
@nvander1
We'll also need to figure out how to do build matrices with different spark/scala versions. To backport the higher order functions api, I think we need slightly differing implementations depending on the spark version people use
Sbt has support for cross-scala builds out of the box, but I'm trying to find out if people also build against different versions of dependencies
Hoping mill turns out to be at least as speedy, because I know how to do those types of matrices in mill
Matthew Powers
@MrPowers
Yea, might not be worth the complexity in that case, haha. I'm all about having a spark-daria philosophy of "upgrade your Spark to the latest version if you want the latest features"
Nik Vanderhoof
@nvander1
Yeah that's another option
Matthew Powers
@MrPowers
@nvander1 - I need to sign off now. Can we jump on a Hangout at some point to brainstorm next steps?
Nik Vanderhoof
@nvander1
Yeah not today though. Sometime during the week will work for me though.
Matthew Powers
@MrPowers
Cool, sounds good.
Nik Vanderhoof
@nvander1
@manuzhang I probably won't have the example of the higher order functions pushed until we get the Mill / Maven stuff sorted
Manu Zhang
@manuzhang
@nvander1 Take your time. It's not must-have since users can always leverage UDAFs as in MrPowers/spark-daria#79. Also following @MrPowers's philosophy, we can merge in your Spark PR apache/spark#24232 for latest Spark version if it doesn't get into the main tree.
Manu Zhang
@manuzhang
I've also done some experiment with Mill in https://github.com/gearpump/gearpump-externals. My impression is that Mill has made it easier for developers to dig into and figure things out while SBT is always like a mystery.
My concern is that Mill is more like one-man's project while SBT has a larger community, a company behind it and many plugins.
Matthew Powers
@MrPowers
It’s a good think spark-daria is such a simple project. It’s probably easy to use either build tool with spark-daria.
Joaquín Chemile
@jchemile
Hello! Greetings from Buenos Aires!!
Matthew Powers
@MrPowers
Welcome @jchemile :)
Matthew Powers
@MrPowers
I released v0.32.0 and scoverage was causing my downstream CIs to throw this error (when spark-daria was included in other projects): scoverage/sbt-scoverage#228 Super annoying. I set coverageEnabled := false in the spark-daria build.sbt file to fix this and did a v0.32.1 release.
Nik Vanderhoof
@nvander1
I don't think we can switch to mill soon. Not until it supports shading. I was taking a look down into https://github.com/shevek/jarjar and into Mill. If I find the time, I'll try to get a PR on mill for it.
Nik Vanderhoof
@nvander1
@MrPowers Do you manually update the docs for spark-daria?
Maybe we could add a hook in travis to build the docs for each tagged commit to push to the docs to an appropriate github pages?
@MrPowers@manuzhang RE long name, I think daria._ is an acceptable name, only thing to be cautious of is forcing users to change existing imports, although it should just be a simple find and replace for them