Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Becca P.
    @bpownow_twitter
            kvConfig:
              namespace: /placement
              environment: m3db-system/m3db-cluster-pv
              zone: embedded
            placementWatcher:
              key: m3aggregator
              initWatchTimeout: 10s
          hashType: murmur32
          bufferDurationBeforeShardCutover: 10m
          bufferDurationAfterShardCutoff: 10m
          resignTimeout: 1m
          flushTimesManager:
            kvConfig:
              environment: m3db-system/m3db-cluster-pv
              zone: embedded
            flushTimesKeyFmt: shardset/%d/flush
            flushTimesPersistRetrier:
              initialBackoff: 100ms
              backoffFactor: 2.0
              maxBackoff: 2s
              maxRetries: 3
          electionManager:
            election:
              leaderTimeout: 10s
              resignTimeout: 10s
              ttlSeconds: 10
            serviceID:
              name: m3aggregator
              environment: m3db-system/m3db-cluster-pv
              zone: embedded
            electionKeyFmt: shardset/%d/lock
            campaignRetrier:
              initialBackoff: 100ms
              backoffFactor: 2.0
              maxBackoff: 2s
              forever: true
              jitter: true
            changeRetrier:
              initialBackoff: 100ms
              backoffFactor: 2.0
              maxBackoff: 5s
              forever: true
              jitter: true
            resignRetrier:
              initialBackoff: 100ms
              backoffFactor: 2.0
              maxBackoff: 5s
              forever: true
              jitter: true
            campaignStateCheckInterval: 1s
            shardCutoffCheckOffset: 30s
            checkEvery: 1s
            jitterEnabled: true
            maxJitters:
              - flushInterval: 5s
                maxJitterPercent: 1.0
              - flushInterval: 10s
                maxJitterPercent: 0.5
              - flushInterval: 1m
                maxJitterPercent: 0.5
              - flushInterval: 10m
                maxJitterPercent: 0.5
              - flushInterval: 1h
                maxJitterPercent: 0.25
            numWorkersPerCPU: 0.5
            flushTimesPersistEvery: 10s
            maxBufferSize: 5m
            forcedFlushWindowSize: 10s
          flush:
            handlers:
              - dynamicBackend:
                  name: m3msg
                  hashType: murmur32
                  producer:
                    buffer:
                      maxBufferSize: 1000000000 # max buffer before m3msg start dropping data.
                    writer:
                      topicName: aggregated_metrics
                      topicServiceOverride:
                        zone: embedded
                        environment: m3db-system/m3db-cluster-pv
                      messageRetry:
                        initialBackoff: 1m
                        maxBackoff: 2m
                      messageQueueNewWritesScanInterval: 1s
                      ackErrorRetry:
                        initialBackoff: 2s
                        maxBackoff: 10s
                      connection:
                        dialTimeout: 5s
                        writeTimeout: 5s
                        retry:
                          initialBackoff: 1s
                          maxBackoff: 10s
                        flushInterval: 1s
                        writeBufferSize: 16384
                        readBufferSize: 256
          forwarding:
            maxSingleDelay: 5s
          entryTTL: 6h
          entryCheckInterval: 10m
          maxTimerBatchSizePerWrite: 140
          defaultStoragePolicies:
            - 10s:2d
          maxNumCachedSourceSets: 2
          discardNaNAggregatedValues: true
          entryPool:
            size: 4096
          counterElemPool:
            size: 4096
          timerElemPool:
            size: 4096
          gaugeElemPool:
            size: 4096
    (I know this is a lot but I figure it'll be more searchable for other folks - been digging through other people's links and haven't gotten too much traction)
    arnikola
    @arnikola
    And you only see data in unaggregated?
    Becca P.
    @bpownow_twitter
    yes
    arnikola
    @arnikola
    Try adding
        downsample:
          bufferPastLimits:
            - bufferPast: 5m
              resolution: 10s
            - bufferPast: 5m
              resolution: 15s
            - bufferPast: 30m
              resolution: 2m
    to your coordinator
    Becca P.
    @bpownow_twitter
    i'll give this a go. also, is there a good explanation somewhere of all these configs or no?
    arnikola
    @arnikola
    lol no
    aggregator configs are pretty arcane unfortunately
    had to read through code to find anything that may be relevant haha
    Becca P.
    @bpownow_twitter

    hmmm still seeing

    {"level":"debug","ts":1576789067.5656402,"msg":"could not connect, failed health check","host":"m3coordinator:7508","error":"dial tcp 10.43.80.108:7508: connect: connection refused"}
    {"level":"error","ts":1576789068.3865516,"msg":"could not read message from consumer","error":"decoded message size 10420480 is larger than maximum supported size 4194304"}
    ``` even though i did the following much earlier

    curl -vvvsSf -H "Cluster-Environment-Name: m3db-system/m3db-cluster-pv" -X POST http://localhost:7201/api/v1/services/m3coordinator/placement/init -d '{
    "instances": [
    {
    "id": "m3coordinator-0",
    "zone": "embedded",
    "endpoint": "m3coordinator:7507",
    "hostname": "m3coordinator",
    "port": 7507
    },
    {
    "id": "m3coordinator-1",
    "zone": "embedded",
    "endpoint": "m3coordinator:7508",
    "hostname": "m3coordinator",
    "port": 7508
    },
    {
    "id": "m3coordinator-2",
    "zone": "embedded",
    "endpoint": "m3coordinator:7509",
    "hostname": "m3coordinator",
    "port": 7509
    }
    ]
    }' | jq .
    ```

    still only seeing unaggregated in the m3query logs
    Becca P.
    @bpownow_twitter
    @arnikola i'm also seeing 2019-12-19T21:18:39.603Z ERROR write error {"remoteAddr": "10.43.100.118:42088", "httpResponseStatusCode": 400, "numRegularErrors": 0, "numBadRequestErrors": 100, "lastRegularError": "", "lastBadRequestErr": "tag name cannot be empty"} on the m3db nodes
    xmcqueen
    @xmcqueen
    i saw that "tag name cannot be empty" in one of these chat rooms just the other day
    Becca P.
    @bpownow_twitter
    yeah still don't have a grasp on why that happens
    n4mine
    @n4mine_twitter
    hi , how many series can handler 1 m3db node. millions? ten millions?
    @n4mine_twitter for Xeon E5-2630 * 2, 128G mem, and 3T NVME SSD.
    martin-mao
    @martin-mao
    @n4mine_twitter it really depends on your resolution, retention, writes per second and metrics churn. Can give you a better estimate if you provide more details, but generally with those specs, you'd expect in the low millions I'd say in terms of number of unique series that can be stored on that node.
    n4mine
    @n4mine_twitter
    @martin-mao 25 millions points write per sec, or 7.5k messages write per sec. and most series's resolution is 10 second. most series is active, just few churn. we need store all data at least 3 month. and at least 8 days in origin resolution without downsample.
    @martin-mao We handle about about 800-1110 million series in one node with open-falcon solution(rrdtool) , but it's eat too many mem(80G+). so I am seeking for more economic and more efficient tsdb in our situation
    martin-mao
    @martin-mao
    @n4mine_twitter are you currently using OpenTSDB as the persisted storage for open-falcon or do you just send a copy to Graph/Alert component?
    n4mine
    @n4mine_twitter
    @martin-mao just send messages to graph and alert component, we didn’t use opentsdb.
    sayf eddine hammemi
    @piratos
    Hello, I am not able to get metrics from m3dbnodes (m3dbcoordinator is running fine in a separate server) ports 9000-9004 do not expose /metrics , any clue ? thanks
    Benjamin Raskin
    @benraskin92
    @piratos any errors in the logs? also if you could post your configs and namespaces that’d be helpful
    sayf eddine hammemi
    @piratos

    @benraskin92 I was getting [root@hostname~]# curl 127.0.0.1:9004/metrics #: curl: (52) Empty reply from server

    but then I restarted the m3dbnode services and metrics are here everything is working smoothly (Although I tried restarting them before and it didnt work). Nothing in the logs only the normal info logs

    my config

    db:
      logging:
        level: info
      metrics:
        prometheus:
          handlerPath: /metrics
        sanitization: prometheus
        samplingRate: 1.0
        extended: detailed
      hostID:
        resolver: hostname
      listenAddress: 0.0.0.0:9000
      clusterListenAddress: 0.0.0.0:9001
      httpNodeListenAddress: 0.0.0.0:9002
      httpClusterListenAddress: 0.0.0.0:9003
      debugListenAddress: 0.0.0.0:9004
      client:
        writeConsistencyLevel: majority
        readConsistencyLevel: unstrict_majority
      gcPercentage: 100
      writeNewSeriesAsync: true
      writeNewSeriesLimitPerSecond: 1048576
      writeNewSeriesBackoffDuration: 2ms
      bootstrap:
        bootstrappers:
            - filesystem
            - commitlog
            - peers
            - uninitialized_topology
        commitlog:
          returnUnfulfilledForCorruptCommitLogFiles: false
      cache:
        series:
          policy: lru
        postingsList:
          size: 262144
      commitlog:
        flushMaxBytes: 524288
        flushEvery: 1s
        queue:
          calculationType: fixed
          size: 2097152
      fs:
        filePathPrefix: /var/lib/m3db
      config:
        service:
          env: default_env
          zone: embedded
          service: m3db
          cacheDir: /var/lib/m3kv
          etcdClusters:
            - zone: embedded
              endpoints:
                - 10.200.3.237:2379
                - 10.200.5.117:2379
                - 10.200.5.104:2379
    great work ! and sorry for the noise
    Benjamin Raskin
    @benraskin92
    ah okay, well if it happens again, let us know!
    maybe it wasnt bootstrapped before or something?
    sayf eddine hammemi
    @piratos
    Nope, I was able to push data through the coordinator, thing is the m3dbnode conf was including the coordinator section, so I removed it and restarted
    I will try to reproduce it and let you know
    Becca P.
    @bpownow_twitter
    getting panics on m3coordinators:
    {"level":"error","ts":1578361157.6484852,"msg":"error initializing namespaces values, retrying in the background","key":"/namespaces","error":"initializing value error:init watch timeout"}
    {"level":"info","ts":1578361157.6486154,"msg":"successfully created new cache dir","path":"/var/lib/m3kv","mode":493}
    {"level":"warn","ts":1578361157.6486604,"msg":"could not load cache from file","file":"/var/lib/m3kv/_placement_m3db-system_m3db-cluster-pv_m3coordinator_embedded.json","error":"error opening cache file /var/lib/m3kv/_placement_m3db-system_m3db-cluster-pv_m3coordinator_embedded.json: open /var/lib/m3kv/_placement_m3db-system_m3db-cluster-pv_m3coordinator_embedded.json: no such file or directory"}
    {"level":"info","ts":1578361157.6541867,"msg":"received kv update","version":1,"key":"m3aggregator"}
    {"level":"info","ts":1578361157.6613328,"msg":"starting m3msg server","address":"0.0.0.0:7507"}
    {"level":"info","ts":1578361157.6614075,"msg":"starting API server","address":"0.0.0.0:7201"}
    panic: runtime error: integer divide by zero
    
    goroutine 9784 [running]:
    github.com/m3db/m3/src/dbnode/sharding.NewHashFn.func1(0x1afeb40, 0xc06e93c2c0, 0xc0004c1d40)
        /go/src/github.com/m3db/m3/src/dbnode/sharding/shardset.go:164 +0x93
    github.com/m3db/m3/src/dbnode/sharding.(*shardSet).Lookup(0xc002b12340, 0x1afeb40, 0xc06e93c2c0, 0x0)
        /go/src/github.com/m3db/m3/src/dbnode/sharding/shardset.go:77 +0x3e
    github.com/m3db/m3/src/dbnode/client.(*session).writeAttemptWithRLock(0xc0011fe800, 0xed5a59902, 0x1afeb40, 0xc002934000, 0x1afeb40, 0xc06e93c2c0, 0x1b048a0, 0xc06e1f0120, 0x16f7cb45d61, 0x0, ...)
        /go/src/github.com/m3db/m3/src/dbnode/client/session.go:1072 +0x1c5
    github.com/m3db/m3/src/dbnode/client.(*session).writeAttempt(0xc0011fe800, 0x2, 0x1afeb40, 0xc002934000, 0x1afeb40, 0xc06e93c2c0, 0x1b048a0, 0xc06e1f0120, 0x2fbe4a40, 0xed5a599e8, ...)
        /go/src/github.com/m3db/m3/src/dbnode/client/session.go:997 +0x1b8
    github.com/m3db/m3/src/dbnode/client.(*writeAttempt).perform(0xc002b9e180, 0xc027a8d747, 0x10929789f59c5)
        /go/src/github.com/m3db/m3/src/dbnode/client/write_attempt.go:68 +0xdc
    github.com/m3db/m3/src/x/retry.(*retrier).attempt(0xc000127550, 0x0, 0xc002b11780, 0x160d900, 0xc002b9e180)
        /go/src/github.com/m3db/m3/src/x/retry/retry.go:113 +0x7f
    github.com/m3db/m3/src/x/retry.(*retrier).Attempt(0xc000127550, 0xc002b11780, 0xc002b9e180, 0x40c3f8)
        /go/src/github.com/m3db/m3/src/x/retry/retry.go:98 +0x3e
    github.com/m3db/m3/src/dbnode/client.(*session).WriteTagged(0xc0011fe800, 0x1afeb40, 0xc002934000, 0x1afeb40, 0xc06e93c2c0, 0x1b048a0, 0xc06e1f0120, 0x2fbe4a40, 0xed5a599e8, 0x28e4180, ...)
        /go/src/github.com/m3db/m3/src/dbnode/client/session.go:965 +0x162
    github.com/m3db/m3/src/dbnode/client.replicatedSession.replicate(0x1b14d80, 0xc0011fe800, 0x2906048, 0x0, 0x0, 0x18d1bd8, 0x0, 0x0,
    kind of started out of the blue
    Rob Skillington
    @robskillington
    hey @bpownow_twitter - sounds like you have cluster with a placement that has zero shards as part of the cluster definition
    database create APIs are resilient to that, but placement create API might not be - if you were using the older placement based APIs for placement creation
    Becca P.
    @bpownow_twitter
    @robskillington thanks for looking. i have {"topic":{"name":"aggregated_metrics","numberOfShards":1024,"consumerServices":[{"serviceId":{"name":"m3coordinator","environment":"m3db-system/m3db-cluster-pv","zone":"embedded"},"consumptionType":"SHARED","messageTtlNanos":"300000000000"}]},"version":2}% when i hit localhost:7201/api/v1/topic
    i used localhost:7201/api/v1/services/{m3coordinator, m3aggregator}/placement/init
    the log line with "msg":"error initializing namespaces values, retrying in the background","key":"/namespaces","error":"initializing value error:init watch timeout" seems pretty troubling.
    Becca P.
    @bpownow_twitter
    also i do specify numberOfShards in the cluster spec
    xmcqueen
    @xmcqueen
    i don't know what that topic api call is. What's that topic thing? Is there some queue in m3db somewhere I've not heard of yet?
    I'm familiar with the collector and aggregators and all that, but where does topic fit into the picture?
    Becca P.
    @bpownow_twitter
    @xmcqueen you can set the number of shards/consumer services by posting to the topic api: https://github.com/m3db/m3/blob/e701ef5d44c6b51bd11c7d28f9e92e8e295a4cba/src/msg/topic/topic.go#L72-L76
    for instance if you carve out m3coordinators from the m3db nodes into their own designated nodes, you need to POST to /api/v1/topic. see https://github.com/m3db/m3/blob/71f731854ade9bbf4358c01f07cc22aac29911dd/scripts/docker-integration-tests/aggregator/test.sh#L53-L88
    xmcqueen
    @xmcqueen
    cool. that's good to know. thanks
    YY Wan
    @yywandb

    Hi! I have a question about the placement configuration. How is the endpoint here configured?

    {
      "placement": {
        "instances": {
          "{\"name\":\"m3db-rep0-0\"}": {
            "id": "{\"name\":\"m3db-rep0-0\"}",
            "isolationGroup": "group1",
            "zone": "embedded",
            "weight": 100,
            "endpoint": "m3db-rep0-0.m3dbnode-m3db:9000",
            "shards": [
              {
                "id": 1,
                "state": "AVAILABLE",
                "sourceId": "",
                "cutoverNanos": "0",
                "cutoffNanos": "0"

    I’m deploying m3db on kubernetes using the operator, and I’m trying to have the coordinator and the storage node in different k8s namespaces.
    It seems like the coordinator works (i.e. is able to query the storage nodes) when it is in the same namespace as my storage nodes, but with exactly the same configuration, is not able to work when it is in a different k8 namespace.
    I think it should work if I change the endpoint to be m3db-rep0-0.m3dbnode-m3db.<STORAGE_NODE_NAMESPACE>:9000 since ping-ing that from inside the coordinator node works while the other doesn’t.

    ❯ k exec m3coord-write-deployment-698f8bc445-p76vc sh -ti
    / # ping m3db-rep0-0.m3dbnode-m3db
    ping: bad address 'm3db-rep0-0.m3dbnode-m3db'
    / # ping m3db-rep0-0.m3dbnode-m3db.m3
    PING m3db-rep0-0.m3dbnode-m3db.m3 (10.2.24.21): 56 data bytes
    64 bytes from 10.2.24.21: seq=0 ttl=60 time=0.444 ms
    64 bytes from 10.2.24.21: seq=1 ttl=60 time=0.427 ms
    ^C
    --- m3db-rep0-0.m3dbnode-m3db.m3 ping statistics ---
    2 packets transmitted, 2 packets received, 0% packet loss
    round-trip min/avg/max = 0.427/0.435/0.444 ms

    Is there a way to change the placement config so that it appends the k8s namespace? I followed this (https://operator.m3db.io/configuration/configuring_m3db/) about configuring the env variable to be $NAMESPACE/$NAME . This is what the last bit of my m3db config looks like. (m3 is the namespace of the m3db nodes and m3db is the m3dbcluster name) I’m not really sure if this is the relevant config though and I'm missing somewhere else where I need to set the k8s namespace.

       config:
        service:
            env: "m3/m3db"
            zone: embedded
            service: m3db
            cacheDir: /var/lib/m3kv
            etcdClusters:
            - zone: embedded
              endpoints:
              - "http://etcd-0.etcd:2379"
              - "http://etcd-1.etcd:2379"
              - "http://etcd-2.etcd:2379"
    dshowing
    @dshowing
    If I connect to an external etcd cluster via m3db, can I configure it via base auth or HTTPS authentication?
    darraghjones
    @darraghjones

    I'm struggling to understand how m3db handles data aggregation. For one thing, it's unclear to me what the difference is between a namespace with

    type: unaggregated

    , and one with

    type: aggregated 
         downsample: 
              all: false

    and how to these correspond to the rule below for the graphite/carbon ingester where aggregation is once again disabled?

    carbon:
      ingester:
        listenAddress: "0.0.0.0:7204"
        rules:
          - pattern: .*
            aggregation:
              enabled: false
    martin-mao
    @martin-mao
    @darraghjones , M3DB itself doesn't really aggregate data, you create namespace there to store data that is aggregated at different resolutions. Be default, all data is written raw into an unaggregated namespace in M3DB. You can choose to aggregate using the coordinator (or m3aggregator), which will generate aggregated metrics on the ingest path and then write them to different namespaces in m3db. Hopefully that makes more sense.
    martin-mao
    @martin-mao
    @dshowing Yes HTTPS and TLS certs are supported. See this doc for external cluster connectivity: http://m3db.github.io/m3/operational_guide/etcd/