Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Wei Cheng
    @cpwc
    Ah! Thanks! Is there any reference to figure out the format of the records in gzip? Is it custom? Or a standardized format that can be read in easily?
    however, the event recovery project lets you not care about the format
    and once the payloads are fixed, if you want to just forward stuff from s3 into kinesis, you shouldn't have to care about what they contain
    Wei Cheng
    @cpwc
    Simply put it. I just need to push the contents into kinesis?
    Ben Fradet
    @BenFradet
    yup
    Wei Cheng
    @cpwc
    Nice! Thank you! @BenFradet hopefully I can get them working.
    Ben Fradet
    @BenFradet
    nw :+1:
    Patrick Oberherr
    @poberherr
    I have a question: If we are updating the snowplow js - do we need to clear the localstorage cache of our clients?
    Ben Fradet
    @BenFradet
    you might want to move that to question discourse @poberherr
    Patrick Oberherr
    @poberherr

    we changed it already; I could move it there.
    @BenFradet I just read your article https://discourse.snowplowanalytics.com/t/a-new-bad-row-format/2558 -> I am not sure if it's to late but I'd like to have one more row in
    iglu:com.snowplowanalytics.snowplow.badrows/schema_violations/jsonschema/1-0-0

    I have schemas which are fired from Front-End and Back-End respectively, or more generally - from different emitters, for me it would be super useful to have also the app_id in there;
    With this I could finally have a reasonable escalation path to approach teams in my company.

    Patrick Oberherr
    @poberherr
    The general sorting I'd be looking for would be something like events per app_id > event_type [counts]
    Ben Fradet
    @BenFradet
    hey Patrick, feel free to leave this feedback directly on the RFC as you'll reach more people
    however yes, we plan on changing what is in the payload field after the tracker protocol validation stage such that it won't be an http payload but a tracker payload
    which will contain such fields as the app id
    we just haven't had time to update the RFC yet
    Patrick Oberherr
    @poberherr
    I think this would be awesome - that is something I am waiting 2 years for :D
    Ben Fradet
    @BenFradet
    yes
    Patrick Oberherr
    @poberherr
    awesome - thanks - will leave my comments then there, thanks a lot !
    Ben Fradet
    @BenFradet
    nw :+1:
    ofaz
    @ofaz
    Hey all, I'm attempting to process records that have been dumped to S3 via Kinesis Firehose - they have been read from the output Kinesis stream of the Snowplow Scala collector. Whats the best way to go about this? We seem to be running into issues with working out the delimiter of the records/where to start reading them from? (As it seems the Kinesis firehose was configured to write records without inserting its own delimiter)
    Ben Fradet
    @BenFradet
    hey @ofaz, the raw stream is thrift encoded
    is the issue in writing them to s3, or reading them from s3 once they've been dumped?
    Patrick Oberherr
    @poberherr

    :wave: Question: Just curious if you considered it moving forward: I am using events with context's often, so the main event gives the event_name in events;
    I am just curious if you consider to denormalize the event / contexts sent somewhere into the event?

    Reason: Sometimes I am not sure which context was set and would like to have a programmatic way of finding that out in Redshift;
    Could also be that I am missing one piece / best practise here ...

    Ben Fradet
    @BenFradet
    mmh, I think this question would be best asked on the discourse @poberherr
    Patrick Oberherr
    @poberherr
    Ok sure
    Ben Fradet
    @BenFradet
    for anything around best practices, you'll reach more people on discourse I think
    Patrick Oberherr
    @poberherr
    Ok maybe I missed one best practice - it was more about if this is considered to add as a feature in the future;
    But I posted my question here: https://discourse.snowplowanalytics.com/t/denormalized-context-event-name-versions-in-events-table/2748
    Ben Fradet
    @BenFradet
    we plan on refactoring atomic events at some point in the future, maybe that could be part of this refactor
    Patrick Oberherr
    @poberherr
    hehe that was the idea :) - awesome
    Steffi
    @teffi
    Hi! Can anyone share the link of the Snowplow document url, its the one with detailed API and file structures? I can't find it again :(
    Josh
    @jbeemster
    Hey @teffi are you talking about https://docs.snowplowanalytics.com ?
    Steffi
    @teffi
    @jbeemster Sadly not that. There were a few times where I got to this page thats looks very old school ✌️and has a tree structure and definition of the classes, properties etc..
    Liwen S
    @sunliwen
    @BenFradet Hey Ben, I PR-ed some domains to https://github.com/snowplow-referer-parser/referer-parser/ Please let me know how to make it accepted or where I should go. Seems this repo is quite quiet in recent months.
    Steve Coppin-Smith
    @coppin_smith_twitter
    Hi @sunliwen, we're in the process of rolling a release for the referer parser. I'll give the team a nudge on your PRs and provide updates!
    Liwen S
    @sunliwen
    @coppin_smith_twitter I see, thanks!
    Amir H Movahed
    @amirhmd
    hi all
    is there any note how to build the project?
    Felix Palludan Hargreaves
    @hejfelix
    Hey everyone, I'm trying to modify the spark-enricher to add some stateful computation, but it's unclear how to run the application without using the EMR-ETL-runner
    is there an example of supplying the arguments for the spark enricher job?
    Anton Parkhomenko
    @chuwy

    @hejfelix, @amirhmd sorry for delay guys.

    Felix, you can have a look at CLI arguments in EMR console. EER just sumbits the steps and moves data around. But please bear in mind that we’re deprecating a batch pipeline along with Spark enrich and EER

    Amir, it depends on which subproject you’re interested in. There’s no way to build whole snowplow/snowplow repo, but if you go into a specific subfolder - they’re almost all typical Scala projects with SBT configuration

    Mohit Gupta
    @thedeveloperr_gitlab
    Hi I am new to Snowplow and want to contribute. I looked at issues tagged with good first issue but it's old like 2018, are those issue still a priority? Can anyone suggest a good first issue for me?
    Piyush Rana
    @piyushknoldus
    Hi, @team how to submit a Pr on the snowplow GitHub. I am trying to push a branch from my Github account but getting the error -
    remote: Permission to snowplow/snowplow.git denied to piyushknoldus. fatal: unable to access 'https://github.com/snowplow/snowplow.git/': The requested URL returned an error: 403
    Edoardo Zan
    @ZanonEdoardo
    Hi, I'm working with a scala enrich library. There is a place where I can found examples to ho use it reading events from a bucket s3 from an etl job?
    Sean Halliburton
    @seanhall
    Hi, I also have a PR to submit for a bugfix, but I believe contributors need to be added to the project to gain permissions to push. Is signing a release also still necessary?
    Paul Boocock
    @paulboocock

    Hi all, we don't actively monitor this gitter room any more - our new primary communication channel is our Discourse: https://discourse.snowplowanalytics.com/

    As for pushing fixes, you need to fork the repository in to your own account first, make the changes in your own version of the snowplow repository and then you can open a PR from your version of snowplow into snowplow/snowplow. This applies for all repositories on Github no matter the original author :-)

    And for reading from S3 you're going to want to use Athena, this blog post should point you in the right direction: https://snowplowanalytics.com/blog/2019/04/04/use-glue-and-athena-with-snowplow-data/
    Ashwini Kumar Padhy
    @Akpadhy
    Hi Guys,