Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
    Jim Dickinson
    @dmsolow I don't think that could be the issue you're hitting unless you're using code straight off master
    more info would help me reason about it - how many racks?
    @jimdickinson I think my issue would be better rephrased as a question: "how do I add a volume/volumeMount to the cassandra container?"
    Jim Dickinson
    aha - you can't until you're using code in master / release 1.5.0 (coming out within some days)
    we originally opted for not letting users reconfigure the cassandra container because of the potential to break things, but we flipped on that position when we saw the variety of different customizations that folks wanted to do
    gotcha -- I'll wait until upgrading before I try anything
    Geoff Bourne

    so if the JVM is 18 GB and you have 64 GB, the rest of the default calculations would give you 23 GB max direct memory

    Sorry @jimdickinson I missed your follow up. I'm curious how you did the math to arrive at 23 GB?

    Geoff Bourne
    @Smita8081 in case it helps, I pushed a sanitized version of the composite helm chart I use to wrap the cass-operator helm chart. Some of the customization is already possible and other bits I see in your snippet probably aren't applicable with cass-operator usage:
    Geoff Bourne
    Was able to confirm that with disk_access_mode changed to standard, accounting for the table cache limits (about 2.2GB in my case) and table off heap usage, setting a container resource memory limit of 24Gi was honored by the cassandra process and topped out just below that limit.
    Jim Dickinson
    very nice @itzg
    23 = (64-18)/2
    there is a bash script that does this math
    MaxDirectMemory = (AllMemory - JVM Heap)/2
    Hello - I am having some issues getting cass-operator up and running on my cluster. I was able to create the datacenter, but it will not create any pods.
    I have already confirmed that the nodepool has ample resources to spin up new pods
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
      name: server-storage
    provisioner: kubernetes.io/gce-pd
      type: pd-ssd
      replication-type: none
    volumeBindingMode: WaitForFirstConsumer
    reclaimPolicy: Delete
    apiVersion: cassandra.datastax.com/v1beta1
    kind: CassandraDatacenter
      name: teamwork-dc1
      clusterName: cluster1
      size: 3
        - name: rack1
          zone: us-central1-a
          memory: 4Gi
          cpu: 2000m
          memory: 4Gi
          cpu: 2000m
          storageClassName: server-storage
            - ReadWriteOnce
              storage: 8Gi
      allowMultipleNodesPerWorker: false
      stopped: false
      rollingRestartRequested: false
      canaryUpgrade: false
      serverType: "dse"
      serverVersion: "6.8.0"
        insecure: {}
      serviceAccount: "default"
      replaceNodes: []
            enabled: true
            enabled: true
            default_scheme: internal
          enabled: true
          port: 9103
          staleness-delta: 300
            enabled: true
          initial_heap_size: "200M"
          max_heap_size: "1G"
            - "-Ddse.system_distributed_replication_dc_names=teamwork-dc1"
            - "-Ddse.system_distributed_replication_per_dc=3"
    Is there something that I am missing here?
    @AG-Guardian is the cass-operator pod in the same namespace as the cass DC?
    I couldn't get it to work with them in different namespaces
    Smita Srivastava
    can someone help me on how to connect with application using cass-operator? what endpoints to use to connect application with cass-operator's cassandra?
    Tomer Eliyahu
    Hi guys, I am trying to understand the failure domain support in cass-operator - I know I can specify multiple racks using the operator, which will create different pods that are matched to nodes via labels.
    However, it doesn't really provide any fault tolerance for remote storage - say we have a storage provider which supports different fault domains, the operator needs to sync the configuration to the storage backend somehow. Is this supported? Or is it planned?
    Jim Dickinson
    @AG-Guardian the default way things are installed, the operator only watches things in its own namespace
    @tomereli can you be much more specific? if we supported node affinity labels per rack would that be useful? that's planned and there's a PR up for it
    @Smita8081 there's a <clustername>-<dcname>-service ClusterIP service created by the operator
    6 replies
    Tomer Eliyahu
    @jimdickinson Consider a cassandra cluster with DAS (Direct attached storage) running on bare metal (not Kubernetes), as it was intended. Having different nodes on different racks creates separate failure domains - it makes sense to replicate data on nodes on different racks so if one rack loses connectionn/power, other racks may not so data will be available.
    In kubernetes, this still works with cass-operator but only with DAS (hostpath or local volumes), obviously.
    When you are using any other storage class which uses remote storage, it doesn't make sense to use racks anymore. Actually, the whole benefit of data replication across nodes replication becomes questionable - how can we know where the data is stored when using gcs / ebs / any other SP (storage privider)?
    I am working on a dissagregated storage system which supports separation to different failures domains. I brought k8ssandra with our CSI plugin and it works, but I came to realize replication means nothing now that the data is not stored on the cassandra node itself.
    What I am missing is some way to propagate the racks information (for starters) from the cass-operator to the CSI plugin (through the PVC / storageclass params) and ensure the racks separation is kept in the storage backend.
    I hope it is more clear now:pray:
    Hey guys. I have a DSE DC up and running in K8s, and it seems to be working fine. I am trying to import data from my old Cassandra DB using dsbulk, however, I am having some issues. One of the tables is fairly large (23GB) and has text data upwards of 10K characters in some columns. My import process keeps getting stuck and crashing on this table. I can see from the export process, 2 CSV files were created to hold the data for the table.
    Any ideas on what I could try out? Can I somehow cut those files down into smaller chunks? 2 files for 23GB of data seems like it could be part of the issue, though I really have no idea. At some point I just start seeing client failed to connect errors.
    com.datastax.oss.dsbulk.executor.api.exception.BulkExecutionException: Statement execution failed (All 1 node(s) tried for the query failed (showing first 1 nodes, use getAllErrors() for more):
    at java.lang.Thread.run(Thread.java:748) [16 skipped]
    Suppressed: com.datastax.oss.driver.api.core.connection.ClosedConnectionException: Unexpected error on channel
    Caused by: java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
    at sun.nio.ch.IOUtil.read(IOUtil.java:192)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
    at java.lang.Thread.run(Thread.java:748) [11 skipped]
    Here is the error message I see on the dsbulk pod. I do not see any errors in system.log on the cassandra pod.
    Jim Dickinson
    hi @AG-Guardian - could you post a CassandraDatacenter yaml?
    @jimdickinson I was able to figure out the issue. I needed to add "-XX:MaxDirectMemorySize=8G" to additional JVM options. It was autodetecting 22GB for some reason when the pod only had 8GB max. After that things worked perfectly.
    Jim Dickinson
    yes, I think we'll be making that more prominent in the docs
    Smita Srivastava

    Here's traceback of error that I am getting while trying to integrate cass-operator with application
    Traceback (most recent call last):
    File "/usr/bin/contrail-api", line 9, in <module>
    load_entry_point('contrail-api-server==0.1dev', 'console_scripts', 'contrail-api')()
    File "/usr/lib/python2.7/site-packages/vnc_cfg_api_server/api_server.py", line 5112, in server_main
    main(args_str, VncApiServer(args_str))
    File "/usr/lib/python2.7/site-packages/vnc_cfg_api_server/api_server.py", line 2059, in init
    File "/usr/lib/python2.7/site-packages/vnc_cfg_api_server/api_server.py", line 3363, in _db_connect
    File "/usr/lib/python2.7/site-packages/vnc_cfg_api_server/vnc_db.py", line 971, in init
    self._zk_db.master_election("/api-server-election", db_client_init)
    File "/usr/lib/python2.7/site-packages/vnc_cfg_api_server/vnc_db.py", line 522, in master_election
    func, args)
    File "/usr/lib/python2.7/site-packages/cfgm_common/zkclient.py", line 522, in master_election
    args, kwargs)
    File "/usr/lib/python2.7/site-packages/kazoo/recipe/election.py", line 54, in run
    File "/usr/lib/python2.7/site-packages/vnc_cfg_api_server/vnc_db.py", line 969, in db_client_init
    ssl_enabled=cassandra_use_ssl, ca_certs=cassandra_ca_certs)
    File "/usr/lib/python2.7/site-packages/vnc_cfg_api_server/vnc_db.py", line 102, in init
    File "/usr/lib/python2.7/site-packages/cfgm_common/vnc_cassandra.py", line 155, in init
    File "/usr/lib/python2.7/site-packages/cfgm_common/vnc_cassandra.py", line 573, in _cassandra_init
    self.existing_keyspaces = self.sys_mgr.list_keyspaces()
    File "/usr/lib/python2.7/site-packages/pycassa/system_manager.py", line 121, in list_keyspaces
    return [ks.name for ks in self._conn.describe_keyspaces()]
    File "/usr/lib/python2.7/site-packages/pycassa/cassandra/Cassandra.py", line 1209, in describe_keyspaces
    return self.recv_describe_keyspaces()
    File "/usr/lib/python2.7/site-packages/pycassa/cassandra/Cassandra.py", line 1219, in recv_describe_keyspaces
    (fname, mtype, rseqid) = self._iprot.readMessageBegin()
    File "/usr/lib64/python2.7/site-packages/thrift/protocol/TBinaryProtocol.py", line 126, in readMessageBegin
    sz = self.readI32()
    File "/usr/lib64/python2.7/site-packages/thrift/protocol/TBinaryProtocol.py", line 206, in readI32
    buff = self.trans.readAll(4)
    File "/usr/lib64/python2.7/site-packages/thrift/transport/TTransport.py", line 58, in readAll
    chunk = self.read(sz - have)
    File "/usr/lib64/python2.7/site-packages/thrift/transport/TTransport.py", line 271, in read
    File "/usr/lib64/python2.7/site-packages/thrift/transport/TTransport.py", line 275, in readFrame
    buff = self.__trans.readAll(4)
    File "/usr/lib64/python2.7/site-packages/thrift/transport/TTransport.py", line 58, in readAll
    chunk = self.read(sz - have)
    File "/usr/lib64/python2.7/site-packages/thrift/transport/TSocket.py", line 103, in read
    buff = self.handle.recv(sz)
    File "/usr/lib64/python2.7/site-packages/gevent/_socket2.py", line 280, in recv
    File "/usr/lib64/python2.7/site-packages/gevent/_socket2.py", line 179, in _wait
    File "/usr/lib64/python2.7/site-packages/gevent/hub.py", line 630, in wait
    result = waiter.get()
    File "/usr/lib64/python2.7/site-packages/gevent/hub.py", line 878, in get
    return self.hub.switch()
    File "/usr/lib64/python2.7/site-packages/gevent/hub.py", line 609, in switch
    return greenlet.switch(self)
    timeout: timed out

    Need help here to identify if the application is even able to initialize connection with cass-operator or whats the cause of error.

    Smita Srivastava
    also how do I pass rpc_address and broadast_address in config file? I mean how can we set these parameters with POD IP in config of cassandra.yaml in the deployment file?
    Jim Dickinson
    @Smita8081 you shouldn't need to change those settings - what are you trying to do?


    I have recently been working on backups for our K8S DSE deployment, the creation of backup store and backup configuration needs to be passed in as a cql script it looks like. Does anybody have any examples of how to pass in a script from outside of the container?

    Jim Dickinson
    @amarbarot the backup and restore feature in the DB is used with plain CQL statements, you can run those however you like
    @jimdickinson I've just added a node to my cluster, and it started/joined okay, but it doesn't appear to be bootstrapping (no data added to disk)
    is there a guide for adding nodes? am I expected to do anything proactive besides editing the cassdc?
    I see the following message in the logs:
    INFO [main] 2021-01-28 21:14:36,388 StorageService.java:933 - This node will not auto bootstrap because it is configured to be a seed node.
    guessing that's the issue -- should I run nodetool rebuild?
    Jim Dickinson
    @dmsolow what version of the operator are you running?
    sorry for the quite laggy reply :(
    @jimdickinson no worries -- I've figured the issue out, it was a problem on my end (I had modified the seed service manually and forgot to switch it back)
    Cannot connect to Cassandra from JetBrains DataGrip. The cassandra servers are up and running fine.
    Shardendu Gautam
    Is it possible in cass-operator to downscale the size of the data center? We have scaled up for a use case and now we want to get back to our previous set of nodes.
    Rodrigo Mageste
    Hey @jimdickinson! My Cassandra cluster is in an unbalanced state. Is there anything I can do to balance it?
    Datacenter: dc1
    |/ State=Normal/Leaving/Joining/Moving
    --  Address        Load       Tokens       Owns (effective)  Host ID                               Rack
    UN  1.44 TiB   8            47.9%             deeca093-cbd6-4860-882c-4142e857a7ea  rack1
    UN    1.53 TiB   8            50.9%             bb17749e-6437-4621-aefc-dd8a3227a2a1  rack1
    UN    1.66 TiB   8            55.0%             96d8572d-0e10-40e4-bb85-2104dc724a0f  rack2
    UN    1.51 TiB   8            49.8%             a4aa723d-cbb4-4032-980a-5bbc9a4c5692  rack3
    UN  1.02 TiB   8            34.0%             b855d4a7-258e-49d7-b775-d30200001743  rack2
    UN    1.88 TiB   8            62.4%             7345dbb3-0b51-41ed-b041-263d3bcaf445  rack3