by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Tomasz Gawęda
    @TomaszGaweda
    in this repro I'm working on I've removed one filter. Key is that all transforms use .setName() with some longer description. The same code but without .setName() works fine
    Can Gencer
    @cangencer
    does it work with shorter names?
    maybe some escaping error..
    or if you remove the commas?
    ah no, the fused is auto-generated
    Tomasz Gawęda
    @TomaszGaweda
    Commas, dots, etc. are ok. Shortening the names help
    Can Gencer
    @cangencer
    I could reproduce it with a long stage name
    very strange
    Tomasz Gawęda
    @TomaszGaweda
    When stage name is more than 128 character it fails
    Tomasz Gawęda
    @TomaszGaweda
    Thanks for creating the issue, @cangencer :)
    Tomasz Gawęda
    @TomaszGaweda
    Hello! When is 4.1 planned? I have one pending PR here, but won't be able to do it till mid-next week. Hazelcast's 4.1 is planned for fall this year, but Jet may have other cadence.
    HazelcastGitter
    @HazelcastGitter
    [Emin Demirci, Hazelcast] Hi Tomasz, Jet 4.1 is planned to be released end of this month
    Tomasz Gawęda
    @TomaszGaweda
    ok, great
    Happy Easter to whole Team :)
    HazelcastGitter
    @HazelcastGitter
    [Emin Demirci, Hazelcast] Happy Easter!
    Caio Guedes
    @caioguedes
    Hey @cangencer , I am trying to figure out how to solve ##2045 conflict names in option, but can't find a solution that allows us to use the same option on subcommands... So, we could rename the top-level option (-a addresses and -n cluster-name) to something else then add mixins on subcommand... or vice versa. Cluster name option could be -cn --cluster-name and I don't have a suggestion to -a --addresses (Yep, we have conflict in jobList subcommand)...
    Can Gencer
    @cangencer
    @caioguedes is -n and -a used for any other command?
    Caio Guedes
    @caioguedes
    not both together, -n in submit and -a in list-jobs
    Can Gencer
    @cangencer
    we should keep backwards compatibility for job submit. For now we can have the -cn and -a for the mixins.
    Viliam Durina
    @viliam-durina
    @cangencer isn't -cn equivalent to -c -n? It has to be a single letter i think.
    Can Gencer
    @cangencer
    yes, you're right -cn isn't probably even valid, since it means "-c" + "-n"
    Caio Guedes
    @caioguedes
    seems that -cn is valid, the posix short options are only for options without value (boolean's) followed or not with a option with value...
    in this case, submit -n is a valued option, so we could use -cn in this case
    also, -a is used for --all in list-jobs, so we need other name for addresses in this case
    Can Gencer
    @cangencer
    We can use -c for cluster name and -a for "addresses". It will break list-jobs, but that's probably OK, we can only have --all in list-jobs. Another option is to use something like -t ("targets")
    441061753
    @441061753
    Why is 4.0 IMap cannot be resolved to a type
    com.hazelcast.core.IMap to com.hazelcast.map.IMap
    Can Gencer
    @cangencer
    are you using an older version? the package was moved to com.hazelcast.map.IMap
    Caio Guedes
    @caioguedes

    We can use -c for cluster name and -a for "addresses". It will break list-jobs, but that's probably OK, we can only have --all in list-jobs. Another option is to use something like -t ("targets")

    unfortunately -c is used to --class in submit too ahahah, I liked the -t to address, also remove -a in list-job. Any other suggestion for cluster name?

    Can Gencer
    @cangencer
    we can use -k like kluster :) another option is to do something like jet -t jet@127.0.0.1:5701
    Caio Guedes
    @caioguedes
    ohh jet -t jet@127.0.0.1:5701 is really good! I will stick with that, we don't use -t anywhere, and seems logical because both parameters just work if provided together. Cool with multiples address jet -t jet@127.0.0.1:5701,127.0.0.1:5702 \o/
    Can Gencer
    @cangencer
    yeah, and you could default to jet if @ part is missing
    Sunil Jain
    @Sunil-Jain
    Hi @Holmistr , regarding #2085, I will move the constant to JetConfig class keeping in mind KISS and YAGNI principles. let me know if you guys think If there is a better solution.
    Tomasz Gawęda
    @TomaszGaweda
    Hi! I've got a question for you. Our customer wants one node to be only a "front-end" node (well, it's with UI for controlling the app), rest of the nodes are nodes that hold data and run processing. Frontend node is a lite member. We want to do the same for Jet module, but there's a problem: we cannot drop files for "front-end" node, because it's a lite member; reading processor will be sent only to other nodes. Is there a way to allow sources-only to be run on lite Jet members?
    Marko Topolnik
    @mtopolnik
    that would have to be a new feature
    Can Gencer
    @cangencer
    maybe you can setup some kind of shared folder/ file system?
    or rsync
    Tomasz Gawęda
    @TomaszGaweda
    yes, that's the one option we've think about, just wanted to know if there's some built-in feature I'm now aware of. Thank you :)
    Lucas Kinne
    @DeveloperPad

    Hey guys,

    I am facing a problem with event discorder and max lag.

    My jet job is supposed to:

    • consume data from a kafka topic (temperature values from different devices/sensors)
    • use native timestamps (timestamp of temperature measurement)
    • group by key (device/sensor from which the measurement is from)
    • do window aggregation (averaging double values with window size = 60 (1 min), window slide = 60 (1 min)
    • write window results to another kafka topic

    My problem is that the devices/sensors upload their measurements to the kafka topic (source for jet job) in parallel and only in 5-minute intervals.
    This results in event disorder. I thought that I could fix this by setting withNativeTimestamps(300_000) (5 min) to counteract the upload interval.
    No events are skipped now, but I am not getting any window result either.

    What am I doing wrong? Do I use a wrong time unit for conversion or is my understand of the max lag concept wrong?

    Thanks in advance!

    Sincerely,
    Pad

    Viliam Durina
    @viliam-durina
    @DeveloperPad So you have 1-minute tumbling windows, but your measurements are every 5 minutes? That means that 4 windows have no measurement and every 5th window has 1 measurement to aggregate?
    With this sparse events you probably also hit the idle timeout, which is 60 seconds by default. You can set it by calling StreamSource.setPartitionIdleTimeout()
    Marko Topolnik
    @mtopolnik

    use native timestamps (timestamp of temperature measurement)

    These will actually be the timestamps determined by Kafka. You may want to use the timestamps from the original events for better precision.

    if they upload in 5-min intervals but aren't aligned, i suggest using the sliding window instead. You can have it emit results every minute, but aggregate over 5 or 6 minutes so that it always contains the most recent upload from each device
    you'll still have to set the event disorder to at least 5 minutes (but probably a bit more)
    Can Gencer
    @cangencer
    event lag represents how much out of order you have within the same kafka topic and partition. If within the same partition, you have items out of order then you need to set the lag larger.
    how many partitions do you use in Kafka and does your messages have a key?
    Caio Guedes
    @caioguedes
    @cangencer at https://github.com/hazelcast/hazelcast-jet/pull/2276#discussion_r431671600, Could you suggest how to mark the options as deprecated? I did not see a way via picocli. I am thinking just preppend "[DEPRECATED]" on descriptions.
    Can Gencer
    @cangencer
    it should be fine just to add a note to the description
    Lucas Kinne
    @DeveloperPad

    @viliam-durina @mtopolnik @cangencer
    We have around 100 (and rising) Raspberry Pis collecting measurements from various sensors in mostly 5 minute intervals.
    The measurements are stored in a redis database on the Pis first and uploaded in 5-minute intervals to a Spring backend.
    The backend parses the measurements (sent via HTTP Request) and sends them to Kafka.
    We thereby specify the timestamp of the measurement as the Kafka timestamp, so that timestamp is what Jet uses as the "native timestamp", isn't it?

    We have 4 kafka partitions and use a Raspberry Pi unique ID as the key, so we probably have like 25 Pis in each partition.
    Given the fact that we have multiple Pis per partition, which upload their data independently of each other, we necessarily have event disorder on partition level.

    This 1-minute tumbling window was probably chosen unfortunately in this case, I agree.
    It is just for testing though and from my understanding I thought that I should at least get one window result every 5 window aggregations then.

    So tomorrow I will test to:

    • adjust the window so that I will always have at least one measurement in each window
    • increase the partition idle timeout to more than 5 minutes (This was probably the main problem here, right?)
    • maybe increase the max lag a bit more (but I think it should be fine, otherwise Jet would have printed a "skipping event" message)