Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Jim Dickinson
    @jimdickinson
    @dmsolow I don't think that could be the issue you're hitting unless you're using code straight off master
    more info would help me reason about it - how many racks?
    dmsolow
    @dmsolow
    @jimdickinson I think my issue would be better rephrased as a question: "how do I add a volume/volumeMount to the cassandra container?"
    Jim Dickinson
    @jimdickinson
    aha - you can't until you're using code in master / release 1.5.0 (coming out within some days)
    we originally opted for not letting users reconfigure the cassandra container because of the potential to break things, but we flipped on that position when we saw the variety of different customizations that folks wanted to do
    dmsolow
    @dmsolow
    gotcha -- I'll wait until upgrading before I try anything
    Geoff Bourne
    @itzg

    so if the JVM is 18 GB and you have 64 GB, the rest of the default calculations would give you 23 GB max direct memory

    Sorry @jimdickinson I missed your follow up. I'm curious how you did the math to arrive at 23 GB?

    Geoff Bourne
    @itzg
    @Smita8081 in case it helps, I pushed a sanitized version of the composite helm chart I use to wrap the cass-operator helm chart. Some of the customization is already possible and other bits I see in your snippet probably aren't applicable with cass-operator usage:
    https://github.com/itzg/try-helm-cass-operator
    Geoff Bourne
    @itzg
    Was able to confirm that with disk_access_mode changed to standard, accounting for the table cache limits (about 2.2GB in my case) and table off heap usage, setting a container resource memory limit of 24Gi was honored by the cassandra process and topped out just below that limit.
    image.png
    Jim Dickinson
    @jimdickinson
    very nice @itzg
    !!!
    23 = (64-18)/2
    there is a bash script that does this math
    MaxDirectMemory = (AllMemory - JVM Heap)/2
    AG-Guardian
    @AG-Guardian
    Hello - I am having some issues getting cass-operator up and running on my cluster. I was able to create the datacenter, but it will not create any pods.
    I have already confirmed that the nodepool has ample resources to spin up new pods
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: server-storage
    provisioner: kubernetes.io/gce-pd
    parameters:
      type: pd-ssd
      replication-type: none
    volumeBindingMode: WaitForFirstConsumer
    reclaimPolicy: Delete
    ---
    apiVersion: cassandra.datastax.com/v1beta1
    kind: CassandraDatacenter
    metadata:
      name: teamwork-dc1
    
    spec:
      clusterName: cluster1
    
      size: 3
    
      racks:
        - name: rack1
          zone: us-central1-a
    
      resources:
        requests:
          memory: 4Gi
          cpu: 2000m
        limits:
          memory: 4Gi
          cpu: 2000m
    
      storageConfig:
        cassandraDataVolumeClaimSpec:
          storageClassName: server-storage
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 8Gi
    
      allowMultipleNodesPerWorker: false
    
      stopped: false
    
      rollingRestartRequested: false
    
      canaryUpgrade: false
    
      serverType: "dse"
    
      serverVersion: "6.8.0"
    
      managementApiAuth:
        insecure: {}
    
      serviceAccount: "default"
    
      replaceNodes: []
    
      config:
        dse-yaml:
          authorization_options:
            enabled: true
          authentication_options:
            enabled: true
            default_scheme: internal
    
        10-write-prom-conf:
          enabled: true
          port: 9103
          staleness-delta: 300
    
        cassandra-yaml:
          backup_service:
            enabled: true
    
        jvm-server-options:
          initial_heap_size: "200M"
          max_heap_size: "1G"
    
          additional-jvm-opts:
            - "-Ddse.system_distributed_replication_dc_names=teamwork-dc1"
            - "-Ddse.system_distributed_replication_per_dc=3"
    AG-Guardian
    @AG-Guardian
    Is there something that I am missing here?
    dmsolow
    @dmsolow
    @AG-Guardian is the cass-operator pod in the same namespace as the cass DC?
    I couldn't get it to work with them in different namespaces
    Smita Srivastava
    @Smita8081
    can someone help me on how to connect with application using cass-operator? what endpoints to use to connect application with cass-operator's cassandra?
    Tomer Eliyahu
    @tomereli
    Hi guys, I am trying to understand the failure domain support in cass-operator - I know I can specify multiple racks using the operator, which will create different pods that are matched to nodes via labels.
    However, it doesn't really provide any fault tolerance for remote storage - say we have a storage provider which supports different fault domains, the operator needs to sync the configuration to the storage backend somehow. Is this supported? Or is it planned?
    Jim Dickinson
    @jimdickinson
    @AG-Guardian the default way things are installed, the operator only watches things in its own namespace
    @tomereli can you be much more specific? if we supported node affinity labels per rack would that be useful? that's planned and there's a PR up for it
    @Smita8081 there's a <clustername>-<dcname>-service ClusterIP service created by the operator
    6 replies
    Tomer Eliyahu
    @tomereli
    @jimdickinson Consider a cassandra cluster with DAS (Direct attached storage) running on bare metal (not Kubernetes), as it was intended. Having different nodes on different racks creates separate failure domains - it makes sense to replicate data on nodes on different racks so if one rack loses connectionn/power, other racks may not so data will be available.
    In kubernetes, this still works with cass-operator but only with DAS (hostpath or local volumes), obviously.
    When you are using any other storage class which uses remote storage, it doesn't make sense to use racks anymore. Actually, the whole benefit of data replication across nodes replication becomes questionable - how can we know where the data is stored when using gcs / ebs / any other SP (storage privider)?
    I am working on a dissagregated storage system which supports separation to different failures domains. I brought k8ssandra with our CSI plugin and it works, but I came to realize replication means nothing now that the data is not stored on the cassandra node itself.
    What I am missing is some way to propagate the racks information (for starters) from the cass-operator to the CSI plugin (through the PVC / storageclass params) and ensure the racks separation is kept in the storage backend.
    I hope it is more clear now:pray:
    AG-Guardian
    @AG-Guardian
    Hey guys. I have a DSE DC up and running in K8s, and it seems to be working fine. I am trying to import data from my old Cassandra DB using dsbulk, however, I am having some issues. One of the tables is fairly large (23GB) and has text data upwards of 10K characters in some columns. My import process keeps getting stuck and crashing on this table. I can see from the export process, 2 CSV files were created to hold the data for the table.
    Any ideas on what I could try out? Can I somehow cut those files down into smaller chunks? 2 files for 23GB of data seems like it could be part of the issue, though I really have no idea. At some point I just start seeing client failed to connect errors.
    AG-Guardian
    @AG-Guardian
    com.datastax.oss.dsbulk.executor.api.exception.BulkExecutionException: Statement execution failed (All 1 node(s) tried for the query failed (showing first 1 nodes, use getAllErrors() for more):
    ...
    com.datastax.oss.driver.internal.core.channel.InFlightHandler.exceptionCaught(InFlightHandler.java:297)
    at java.lang.Thread.run(Thread.java:748) [16 skipped]
    Suppressed: com.datastax.oss.driver.api.core.connection.ClosedConnectionException: Unexpected error on channel
    Caused by: java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
    at sun.nio.ch.IOUtil.read(IOUtil.java:192)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
    at java.lang.Thread.run(Thread.java:748) [11 skipped]
    Here is the error message I see on the dsbulk pod. I do not see any errors in system.log on the cassandra pod.
    Jim Dickinson
    @jimdickinson
    hi @AG-Guardian - could you post a CassandraDatacenter yaml?
    AG-Guardian
    @AG-Guardian
    @jimdickinson I was able to figure out the issue. I needed to add "-XX:MaxDirectMemorySize=8G" to additional JVM options. It was autodetecting 22GB for some reason when the pod only had 8GB max. After that things worked perfectly.
    Jim Dickinson
    @jimdickinson
    yes, I think we'll be making that more prominent in the docs
    Smita Srivastava
    @Smita8081

    Here's traceback of error that I am getting while trying to integrate cass-operator with application
    Traceback (most recent call last):
    File "/usr/bin/contrail-api", line 9, in <module>
    load_entry_point('contrail-api-server==0.1dev', 'console_scripts', 'contrail-api')()
    File "/usr/lib/python2.7/site-packages/vnc_cfg_api_server/api_server.py", line 5112, in server_main
    main(args_str, VncApiServer(args_str))
    File "/usr/lib/python2.7/site-packages/vnc_cfg_api_server/api_server.py", line 2059, in init
    self._db_connect(self._args.reset_config)
    File "/usr/lib/python2.7/site-packages/vnc_cfg_api_server/api_server.py", line 3363, in _db_connect
    cassandra_ca_certs=self._args.cassandra_ca_certs)
    File "/usr/lib/python2.7/site-packages/vnc_cfg_api_server/vnc_db.py", line 971, in init
    self._zk_db.master_election("/api-server-election", db_client_init)
    File "/usr/lib/python2.7/site-packages/vnc_cfg_api_server/vnc_db.py", line 522, in master_election
    func, args)
    File "/usr/lib/python2.7/site-packages/cfgm_common/zkclient.py", line 522, in master_election
    self._election.run(func,
    args, kwargs)
    File "/usr/lib/python2.7/site-packages/kazoo/recipe/election.py", line 54, in run
    func(*args,
    kwargs)
    File "/usr/lib/python2.7/site-packages/vnc_cfg_api_server/vnc_db.py", line 969, in db_client_init
    ssl_enabled=cassandra_use_ssl, ca_certs=cassandra_ca_certs)
    File "/usr/lib/python2.7/site-packages/vnc_cfg_api_server/vnc_db.py", line 102, in init
    ca_certs=ca_certs)
    File "/usr/lib/python2.7/site-packages/cfgm_common/vnc_cassandra.py", line 155, in init
    self._cassandra_init(server_list)
    File "/usr/lib/python2.7/site-packages/cfgm_common/vnc_cassandra.py", line 573, in _cassandra_init
    self.existing_keyspaces = self.sys_mgr.list_keyspaces()
    File "/usr/lib/python2.7/site-packages/pycassa/system_manager.py", line 121, in list_keyspaces
    return [ks.name for ks in self._conn.describe_keyspaces()]
    File "/usr/lib/python2.7/site-packages/pycassa/cassandra/Cassandra.py", line 1209, in describe_keyspaces
    return self.recv_describe_keyspaces()
    File "/usr/lib/python2.7/site-packages/pycassa/cassandra/Cassandra.py", line 1219, in recv_describe_keyspaces
    (fname, mtype, rseqid) = self._iprot.readMessageBegin()
    File "/usr/lib64/python2.7/site-packages/thrift/protocol/TBinaryProtocol.py", line 126, in readMessageBegin
    sz = self.readI32()
    File "/usr/lib64/python2.7/site-packages/thrift/protocol/TBinaryProtocol.py", line 206, in readI32
    buff = self.trans.readAll(4)
    File "/usr/lib64/python2.7/site-packages/thrift/transport/TTransport.py", line 58, in readAll
    chunk = self.read(sz - have)
    File "/usr/lib64/python2.7/site-packages/thrift/transport/TTransport.py", line 271, in read
    self.readFrame()
    File "/usr/lib64/python2.7/site-packages/thrift/transport/TTransport.py", line 275, in readFrame
    buff = self.__trans.readAll(4)
    File "/usr/lib64/python2.7/site-packages/thrift/transport/TTransport.py", line 58, in readAll
    chunk = self.read(sz - have)
    File "/usr/lib64/python2.7/site-packages/thrift/transport/TSocket.py", line 103, in read
    buff = self.handle.recv(sz)
    File "/usr/lib64/python2.7/site-packages/gevent/_socket2.py", line 280, in recv
    self._wait(self._read_event)
    File "/usr/lib64/python2.7/site-packages/gevent/_socket2.py", line 179, in _wait
    self.hub.wait(watcher)
    File "/usr/lib64/python2.7/site-packages/gevent/hub.py", line 630, in wait
    result = waiter.get()
    File "/usr/lib64/python2.7/site-packages/gevent/hub.py", line 878, in get
    return self.hub.switch()
    File "/usr/lib64/python2.7/site-packages/gevent/hub.py", line 609, in switch
    return greenlet.switch(self)
    timeout: timed out

    Need help here to identify if the application is even able to initialize connection with cass-operator or whats the cause of error.

    Smita Srivastava
    @Smita8081
    also how do I pass rpc_address and broadast_address in config file? I mean how can we set these parameters with POD IP in config of cassandra.yaml in the deployment file?
    Jim Dickinson
    @jimdickinson
    @Smita8081 you shouldn't need to change those settings - what are you trying to do?
    amarbarot
    @amarbarot

    Hi

    I have recently been working on backups for our K8S DSE deployment, the creation of backup store and backup configuration needs to be passed in as a cql script it looks like. Does anybody have any examples of how to pass in a script from outside of the container?

    Jim Dickinson
    @jimdickinson
    @amarbarot the backup and restore feature in the DB is used with plain CQL statements, you can run those however you like
    dmsolow
    @dmsolow
    @jimdickinson I've just added a node to my cluster, and it started/joined okay, but it doesn't appear to be bootstrapping (no data added to disk)
    is there a guide for adding nodes? am I expected to do anything proactive besides editing the cassdc?
    dmsolow
    @dmsolow
    I see the following message in the logs:
    INFO [main] 2021-01-28 21:14:36,388 StorageService.java:933 - This node will not auto bootstrap because it is configured to be a seed node.
    guessing that's the issue -- should I run nodetool rebuild?
    Jim Dickinson
    @jimdickinson
    @dmsolow what version of the operator are you running?
    sorry for the quite laggy reply :(
    dmsolow
    @dmsolow
    @jimdickinson no worries -- I've figured the issue out, it was a problem on my end (I had modified the seed service manually and forgot to switch it back)
    Debjit
    @debjitk
    Cannot connect to Cassandra from JetBrains DataGrip. The cassandra servers are up and running fine.
    Shardendu Gautam
    @ShardenduGautam_twitter
    Is it possible in cass-operator to downscale the size of the data center? We have scaled up for a use case and now we want to get back to our previous set of nodes.
    Rodrigo Mageste
    @rodrigomageste
    Hey @jimdickinson! My Cassandra cluster is in an unbalanced state. Is there anything I can do to balance it?
    Datacenter: dc1
    ===============
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address        Load       Tokens       Owns (effective)  Host ID                               Rack
    UN  10.42.248.136  1.44 TiB   8            47.9%             deeca093-cbd6-4860-882c-4142e857a7ea  rack1
    UN  10.42.14.78    1.53 TiB   8            50.9%             bb17749e-6437-4621-aefc-dd8a3227a2a1  rack1
    UN  10.42.29.31    1.66 TiB   8            55.0%             96d8572d-0e10-40e4-bb85-2104dc724a0f  rack2
    UN  10.42.14.79    1.51 TiB   8            49.8%             a4aa723d-cbb4-4032-980a-5bbc9a4c5692  rack3
    UN  10.42.248.140  1.02 TiB   8            34.0%             b855d4a7-258e-49d7-b775-d30200001743  rack2
    UN  10.42.29.20    1.88 TiB   8            62.4%             7345dbb3-0b51-41ed-b041-263d3bcaf445  rack3