Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Repo info
    Patrick Oberherr

    we changed it already; I could move it there.
    @BenFradet I just read your article https://discourse.snowplowanalytics.com/t/a-new-bad-row-format/2558 -> I am not sure if it's to late but I'd like to have one more row in

    I have schemas which are fired from Front-End and Back-End respectively, or more generally - from different emitters, for me it would be super useful to have also the app_id in there;
    With this I could finally have a reasonable escalation path to approach teams in my company.

    Patrick Oberherr
    The general sorting I'd be looking for would be something like events per app_id > event_type [counts]
    Ben Fradet
    hey Patrick, feel free to leave this feedback directly on the RFC as you'll reach more people
    however yes, we plan on changing what is in the payload field after the tracker protocol validation stage such that it won't be an http payload but a tracker payload
    which will contain such fields as the app id
    we just haven't had time to update the RFC yet
    Patrick Oberherr
    I think this would be awesome - that is something I am waiting 2 years for :D
    Ben Fradet
    Patrick Oberherr
    awesome - thanks - will leave my comments then there, thanks a lot !
    Ben Fradet
    nw :+1:
    Hey all, I'm attempting to process records that have been dumped to S3 via Kinesis Firehose - they have been read from the output Kinesis stream of the Snowplow Scala collector. Whats the best way to go about this? We seem to be running into issues with working out the delimiter of the records/where to start reading them from? (As it seems the Kinesis firehose was configured to write records without inserting its own delimiter)
    Ben Fradet
    hey @ofaz, the raw stream is thrift encoded
    is the issue in writing them to s3, or reading them from s3 once they've been dumped?
    Patrick Oberherr

    :wave: Question: Just curious if you considered it moving forward: I am using events with context's often, so the main event gives the event_name in events;
    I am just curious if you consider to denormalize the event / contexts sent somewhere into the event?

    Reason: Sometimes I am not sure which context was set and would like to have a programmatic way of finding that out in Redshift;
    Could also be that I am missing one piece / best practise here ...

    Ben Fradet
    mmh, I think this question would be best asked on the discourse @poberherr
    Patrick Oberherr
    Ok sure
    Ben Fradet
    for anything around best practices, you'll reach more people on discourse I think
    Patrick Oberherr
    Ok maybe I missed one best practice - it was more about if this is considered to add as a feature in the future;
    But I posted my question here: https://discourse.snowplowanalytics.com/t/denormalized-context-event-name-versions-in-events-table/2748
    Ben Fradet
    we plan on refactoring atomic events at some point in the future, maybe that could be part of this refactor
    Patrick Oberherr
    hehe that was the idea :) - awesome
    Hi! Can anyone share the link of the Snowplow document url, its the one with detailed API and file structures? I can't find it again :(
    Hey @teffi are you talking about https://docs.snowplowanalytics.com ?
    @jbeemster Sadly not that. There were a few times where I got to this page thats looks very old school ✌️and has a tree structure and definition of the classes, properties etc..
    Liwen S
    @BenFradet Hey Ben, I PR-ed some domains to https://github.com/snowplow-referer-parser/referer-parser/ Please let me know how to make it accepted or where I should go. Seems this repo is quite quiet in recent months.
    Steve Coppin-Smith
    Hi @sunliwen, we're in the process of rolling a release for the referer parser. I'll give the team a nudge on your PRs and provide updates!
    Liwen S
    @coppin_smith_twitter I see, thanks!
    Amir H Movahed
    hi all
    is there any note how to build the project?
    Felix Bjært Hargreaves
    Hey everyone, I'm trying to modify the spark-enricher to add some stateful computation, but it's unclear how to run the application without using the EMR-ETL-runner
    is there an example of supplying the arguments for the spark enricher job?
    Anton Parkhomenko

    @hejfelix, @amirhmd sorry for delay guys.

    Felix, you can have a look at CLI arguments in EMR console. EER just sumbits the steps and moves data around. But please bear in mind that we’re deprecating a batch pipeline along with Spark enrich and EER

    Amir, it depends on which subproject you’re interested in. There’s no way to build whole snowplow/snowplow repo, but if you go into a specific subfolder - they’re almost all typical Scala projects with SBT configuration

    Mohit Gupta
    Hi I am new to Snowplow and want to contribute. I looked at issues tagged with good first issue but it's old like 2018, are those issue still a priority? Can anyone suggest a good first issue for me?
    Piyush Rana
    Hi, @team how to submit a Pr on the snowplow GitHub. I am trying to push a branch from my Github account but getting the error -
    remote: Permission to snowplow/snowplow.git denied to piyushknoldus. fatal: unable to access 'https://github.com/snowplow/snowplow.git/': The requested URL returned an error: 403
    Edoardo Zan
    Hi, I'm working with a scala enrich library. There is a place where I can found examples to ho use it reading events from a bucket s3 from an etl job?
    Sean Halliburton
    Hi, I also have a PR to submit for a bugfix, but I believe contributors need to be added to the project to gain permissions to push. Is signing a release also still necessary?
    Paul Boocock

    Hi all, we don't actively monitor this gitter room any more - our new primary communication channel is our Discourse: https://discourse.snowplowanalytics.com/

    As for pushing fixes, you need to fork the repository in to your own account first, make the changes in your own version of the snowplow repository and then you can open a PR from your version of snowplow into snowplow/snowplow. This applies for all repositories on Github no matter the original author :-)

    And for reading from S3 you're going to want to use Athena, this blog post should point you in the right direction: https://snowplowanalytics.com/blog/2019/04/04/use-glue-and-athena-with-snowplow-data/
    Ashwini Kumar Padhy
    Hi Guys,
    Souhail Hanfi
    hello guys, i have an issue with deploying the enrich kinesis i hope someone here can help me fix it
    i deploy it inside a kubernetes cluster
    to auth to aws it uses web tokens capability on kubernetes
    connecting to kinesis works fine
    but dynamodb fails
    it tries to connect using the instance profile assigned to the eks node
    software.amazon.kinesis.leases.exceptions.DependencyException: software.amazon.awssdk.services.dynamodb.model.DynamoDbException: User: arn:aws:sts::xxxxxxxx:assumed-role/node/i-xxxxxx is not authorized to perform: dynamodb:DescribeTable on resource: arn:aws:dynamodb:xxxxxxx:xxxxx:table/snowplow-enrich-kinesis because no identity-based policy allows the dynamodb:DescribeTable action (Service: DynamoDb, Status Code: 400, Request ID:xxxxxxxx)
    it's supposed to use the web token role to auth
    i forked the project but since i'm like lvl 0 on scala :'( it's really complicated for me to debug
    has anyone have any idea on why this happens ? i think that one aws service client in the code is not well configured :'(