by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 09:24

    regadas on master

    Update beam to 2.24 (#3325) (compare)

  • 09:24
    regadas closed #3325
  • 09:24
    regadas commented #3237
  • 08:54
    jto closed #3234
  • 08:53
    jto closed #3233
  • 08:53
    jto commented #3237
  • Sep 17 20:41
    ClaireMcGinty commented #3314
  • Sep 17 20:05
    nevillelyh commented #3314
  • Sep 17 19:51
    ClaireMcGinty commented #3314
  • Sep 17 13:56
    regadas ready_for_review #3325
  • Sep 17 13:18
    codecov[bot] commented #3325
  • Sep 17 13:16
    codecov[bot] commented #3325
  • Sep 17 13:01
    regadas synchronize #3325
  • Sep 17 12:11
    regadas closed #3324
  • Sep 17 12:11
    regadas opened #3325
  • Sep 17 09:00

    regadas on master

    Filter out potentially included… (compare)

  • Sep 17 09:00
    regadas closed #3322
  • Sep 17 07:34
    irvifa commented #696
  • Sep 17 07:34
    irvifa commented #696
  • Sep 17 00:45
    scala-steward opened #3324
Steven Fines
@sfines-clgx
so I will go with my implementation
Neville Li
@nevillelyh
no the combiner optimization seems to work as expected at least for me, and under the hood it's using TopCombineFn backed by a BoundedHeapwhich should also be the most reasonable way
Steven Fines
@sfines-clgx
hrm.
BoundedHeap, is it public?
Neville Li
@nevillelyh
so algo-wise shouldn't be different than aggregateByKey with a bounded PQ
static class BoundedHeap<T, ComparatorT extends Comparator<T> & Serializable> in Top.java
you can see it by cmd-click a few times in intellij
Steven Fines
@sfines-clgx
yeah, didn't dig that deep
cool, thanks
Neville Li
@nevillelyh
my 2nd run with aggregateByKey(Aggregator.sortedReverseTake) is actually slower, i don't think there's significant diff in the actual algo between the 2
Steven Fines
@sfines-clgx
hrmmm
Deepak Telkar
@d66pak

Hi, I'm new to Scio and dataflow. What is the recommended way to run Scio pipelines in production?
Do you guys recommend using sbt "runMain ...." approach?
or
using the bash script generated by sbt-pack plugin?
target/pack/bin/word-count --project=...

Thanks!

Neville Li
@nevillelyh
we use sbt-pack& sbt-docker to build deployable containers, with luigi (https://github.com/spotify/luigi) & schedule with styx (https://github.com/spotify/styx)
[REMINDER]: take this 1 Q (y/n) survery if you care about chatroom service. so far slack is leading. poll closes in 2 weeks
https://www.surveymonkey.com/r/MT6LQPG
Deepak Telkar
@d66pak
thanks, Ive already voted for slack
Deepak Telkar
@d66pak
Have posted the same question in slack now
Steven Fines
@sfines-clgx
i don't have access to the slack
:(
Neville Li
@nevillelyh
@sfines-clgx https://slackin.spotify.com/ you should be able to get an invite here? what doesn't work?
Steven Fines
@sfines-clgx
didn't know where to go ;)
is the channel #scio?
Neville Li
@nevillelyh
yep looks like u found it :D
Neville Li
@nevillelyh
Reminder the slack/gitter poll closes next Tue 5PM ET please vote if you care: https://www.surveymonkey.com/r/MT6LQPG
Manish A.Shetty
@ManishShetty1

Hi Guys, I had a question. Is there a way to branch transformations based on IF statements?

For ex. I am trying to execute different transformation based on a value of a variable. Is this possible?

Neville Li
@nevillelyh
not possible if the value is inside an SCollection. see this comparison f with spark: https://spotify.github.io/scio/Scio,-Scalding-and-Spark.html#scio-and-spark
last reminder that the slack/gitter survey is closing tomorrow at 5PM ET. slack is still leading. if the result stays the same we'll start migrating afterwards
https://www.surveymonkey.com/r/MT6LQPG
Neville Li
@nevillelyh
thanks all who took the time to respond to the survey. we have ~64% in favor of the move so i'll update the relevant links in the next few days.
https://www.surveymonkey.com/results/SM-YGJFHJB37/
shimmy
@lilshim
Hey ya'll, I'm trying to run tests with dataflow with PubSubIO and the pUb/Sub emulator. Any chance scio does an integration test like this? Having a really hard time figuring out why pubSubIO cannot connect to my emulator
William Byrne
@wbyrnetx
Hello! I have been searching for a single example of how com.spotify.scio.values.SCollection readFilesAsString is used. Any example would be greatly appreciated
Neville Li
@nevillelyh
we've moved over to our Spotify-FOSS slack, please get invite from https://slackin.spotify.com/ and find us in #scio
William Byrne
@wbyrnetx
thanks!
kali786516
@kali786516
Hello,
I have a quick question about schema evolution let’s say day 1 in my gcs file I have 5 columns and I populated to big query by creating five column table and then on day 5 my files had 7 columns do I need to manually add extra 2 columns in my big query or is there a way in Scala scio to add extra two columns in target and finish the process ?