Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to specify Cassandra BroadcastAddress #2825

Open
tobiasbhn opened this issue Oct 27, 2023 · 4 comments
Open

Unable to specify Cassandra BroadcastAddress #2825

tobiasbhn opened this issue Oct 27, 2023 · 4 comments
Assignees

Comments

@tobiasbhn
Copy link

Currently it is not possible to configure the broadcast address for Cassandra. If it is, I couldn't find anything in the documentation.

We are currently using Stargate in a Docker environment. Therefore, there will be servers in the cluster on which both a Cassandra node and a Stargate node will run. Standalone Cassandra also runs in Docker and is configured so that the listen_address gets the Docker internal IP when the Docker container starts. The broadcast address is then set to the public IP of the server. The Cassandra cluster boots up as expected and is able to communicate with each other.

As soon as the Stargate Docker Container is additionally started, the following error occurs:

cassandra  | WARN  [Messaging-EventLoop-3-3] 2023-10-27 15:04:20,016 NoSpamLogger.java:108 - /192.168.0.238:7000->/192.168.80.3:7000-URGENT_MESSAGES-[no-channel] dropping message of type ECHO_REQ whose timeout expired before reaching the network
cassandra  | INFO  [Messaging-EventLoop-3-3] 2023-10-27 15:04:21,961 NoSpamLogger.java:105 - /192.168.0.238:7000->/192.168.80.3:7000-URGENT_MESSAGES-[no-channel] failed to connect
cassandra  | io.netty.channel.ConnectTimeoutException: connection timed out: /192.168.80.3:7000
cassandra  |    at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:576)
cassandra  |    at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
cassandra  |    at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
cassandra  |    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
cassandra  |    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
cassandra  |    at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
cassandra  |    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
cassandra  |    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
cassandra  |    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
cassandra  |    at java.base/java.lang.Thread.run(Thread.java:829)

Nodetool Gossipinfo displays the following information for the cluster:

root@cassandra:/# nodetool gossipinfo
/192.168.0.18
  generation:1698412990
  heartbeat:1224
  STATUS:77:NORMAL,-1023730530279313827
  LOAD:1211:118057.0
  SCHEMA:71:54e17321-3f2e-37ca-9b08-d91ba7bdd369
  DC:12:datacenter02
  RACK:14:rack02
  RELEASE_VERSION:5:4.1.3
  INTERNAL_IP:10:192.168.80.2
  RPC_ADDRESS:4:192.168.80.2
  NET_VERSION:1:12
  HOST_ID:2:c7ba2faa-43d0-456c-8731-b21be587361a
  RPC_READY:87:true
  INTERNAL_ADDRESS_AND_PORT:8:192.168.80.2:7000
  NATIVE_ADDRESS_AND_PORT:3:192.168.80.2:9042
  STATUS_WITH_PORT:76:NORMAL,-1023730530279313827
  SSTABLE_VERSIONS:6:big-nb
  TOKENS:75:<hidden>
/192.168.80.3
  generation:1698414128
  heartbeat:40
  LOAD:21:82867.0
  SCHEMA:27:0df99921-330b-3c46-b749-b7e4a3b8c119
  DC:12:rack02
  RACK:14:datacenter02
  RELEASE_VERSION:5:4.0.10
  NET_VERSION:1:12
  HOST_ID:2:1da3459c-df6a-4858-8ded-a9ba70d81552
  INTERNAL_ADDRESS_AND_PORT:8:192.168.80.3:7000
  NATIVE_ADDRESS_AND_PORT:3:192.168.80.3:9042
  SSTABLE_VERSIONS:6:big-nb
  X9:33:stargate
  TOKENS: not present
/192.168.0.238
  generation:1698413418
  heartbeat:783
  LOAD:779:81617.0
  SCHEMA:22:54e17321-3f2e-37ca-9b08-d91ba7bdd369
  DC:12:datacenter01
  RACK:14:rack01
  RELEASE_VERSION:5:4.1.3
  NET_VERSION:1:12
  HOST_ID:2:ba705ea7-401c-4729-9db8-ca35d5882e39
  RPC_READY:102:true
  INTERNAL_ADDRESS_AND_PORT:8:192.168.128.2:7000
  NATIVE_ADDRESS_AND_PORT:3:192.168.128.2:9042
  STATUS_WITH_PORT:88:NORMAL,-1024700059177903998
  SSTABLE_VERSIONS:6:big-nb
  TOKENS:87:<hidden>

It can be seen that the standalone Cassandra nodes (192.268.0.18 and 192.168..238) are recognized correctly using the internal IP as the listen address and the public IP as the broadcast address, but the Stargate node is publishing its internal IP. Thus, all other nodes logically get a timeout when they try to call this internal IP.

@tobiasbhn
Copy link
Author

The setting boradcast_address is setted in the following line:

but the variable is defaulted to the same as the listen_address, as System.getProperty("stargate.broadcast_address") is not set.

String broadcastAddress = System.getProperty("stargate.broadcast_address", listenAddress);

@jeffreyscarpenter
Copy link
Collaborator

Thank you for raising the issue. I wanted to check and see if you had looked at the docker compose configurations we provide in the repo. For example this configuration for running Stargate with Cassandra 4: https://github.com/stargate/stargate/blob/main/docker-compose/cassandra-4.0/docker-compose.yml. In this configuration with everything on the same Docker network there is no need to configure listen or broadcast address. If you have a similar deployment perhaps there is no need to configure these properties?

In the code that you cite, Stargate does default the broadcast address to the listen address if no broadcast address is provided. This is the expected behavior and emulates what Cassandra itself does. Are you saying that you are specifying the stargate.broadcast_address property and it is not being used? If you are still having issues, it might be good to have a look at your configuration. If you don't want to put code here you can also reach out on our discord server (invite available at stargate.io).

@tobiasbhn
Copy link
Author

tobiasbhn commented Nov 1, 2023

@jeffreyscarpenter Thank you for the rapid feedback. I looked at the Docker Compose settings on the website, but as you said, they are only for a cluster on the same Docker network. None of the examples (at least as far as I have found) demonstrate the use case across a cluster of multiple physically separated servers, which is the actual use case for us.

I agree that the automatic fallback to the listen address is consistent in logic with Cassandra. However, I don't know how to set the broadcast address, I haven't found a corresponding environment variable.

Our current setting looks like this:

services:
  #
  # CASSANDRA NODE
  #
  cassandra:
    image: bitnami/cassandra:4.1.3
    container_name: cassandra
    hostname: cassandra
    user: root
    # 7000 for inter cluster communication of cassandra
    # 9042 for native cql requests. as this node should not get native access (stargate handles this) this should not be opened
    ports: [7000:7000,9042:9042]
    profiles: [CAS,CAS+API,CAS+KAF,CAS+KAF+API]
    volumes:
      # data volume
      - ./Data/Cassandra:/bitnami/cassandra
    environment:
      # the name of the cluster
      CASSANDRA_CLUSTER_NAME: ${CASSANDRA_CLUSTER_NAME}
      # the address other nodes should use to contact this node
      CASSANDRA_BROADCAST_ADDRESS: ${CASSANDRA_BROADCAST_ADDRESS}
      # the transport post number for inter node communication
      CASSANDRA_TRANSPORT_PORT_NUMBER: 7000
      # where to fetch the data from. leave empty if this is the first node
      CASSANDRA_SEEDS: ${CASSANDRA_SEEDS}
      # credentials
      CASSANDRA_USER: ${CASSANDRA_USERNAME}
      CASSANDRA_PASSWORD: ${CASSANDRA_PASSWORD}
      CASSANDRA_PASSWORD_SEEDER: ${CASSANDRA_PASSWORD_SEEDER}
      # specify a production ready snitch
      CASSANDRA_ENDPOINT_SNITCH: GossipingPropertyFileSnitch
      # datacenter and rack specification
      CASSANDRA_CFG_RACKDC_DC: ${CASSANDRA_DC}
      CASSANDRA_CFG_RACKDC_RACK: ${CASSANDRA_RACK}
      # use the local ip if communication inside the sam datacenter
      CASSANDRA_CFG_RACKDC_PREFER_LOCAL: true
      # performance limitations
      MAX_HEAP_SIZE: 2G
      HEAP_NEWSIZE: 200M
    healthcheck:
      # check if the cassandra shell is availabel
      test: cqlsh -u cassandra -p cassandra -e 'describe keyspaces'
      # after what time should the first healthcheck be run
      start_interval: 30s
      # which time to tolerate failed responses
      start_period: 5m
      # check again after
      interval: 10s
      # timeout after
      timeout: 8s
      # the number of retries before fail
      retries: 3
    # restart on failed container
    restart: always

  #
  # CASSANDRA API
  #
  stargate-coordinator:
    image: stargateio/coordinator-4_0:v2.1
    container_name: stargate-coordinator
    hostname: stargate-coordinator
    # 8080 is graphQL interface
    # 8081 is auth service
    # 8082 is rest interface for crud
    # 8084 is health check
    # 8180 is document api
    # 8090 is gRPC interface
    # 7001 is cassandra native transport
    # 9043 is cassandra cql service
    ports: [8081:8081,8084:8084,8090:8090,7001:7001]
    profiles: [CAS+API,CAS+KAF+API]
    environment:
      # the host (listen_address) will be set to the internal docker ip
      # CASSANDRA_HOST: 
      # the cassandra cluster name
      CLUSTER_NAME: ${CASSANDRA_CLUSTER_NAME}
      # use local cassandra node
      SEED: ${CASSANDRA_BROADCAST_ADDRESS}
      # the port for cassandra native transport
      SEED_PORT: 7001
      # the port for the cql service
      CQL_PORT: 9043
      # datacenter and rack, as this node
      DATACENTER_NAME: ${CASSANDRA_RACK}
      RACK_NAME: ${CASSANDRA_DC}
      # the major cassandra version
      CLUSTER_VERSION: 4.0
      ENABLE_AUTH: true
    depends_on:
      cassandra:
        condition: service_healthy
        restart: true
    healthcheck:
      # check if coordinator is available
      test: curl -f http://localhost:8084/checker/readiness
      # after what time should the first healthcheck be run
      start_interval: 30s
      # which time to tolerate failed responses
      start_period: 5m
      # check again after
      interval: 10s
      # timeout after
      timeout: 8s
      # the number of retries before fail
      retries: 3
    # restart on failed container
    restart: always

In addition, this is an example .env file. Of course, theese settings change from server to server..

# settings for cassandra
# which ip or address does this server listen to
CASSANDRA_BROADCAST_ADDRESS=192.168.0.18
CASSANDRA_CLUSTER_NAME=test-cassandra-cluster
# datacenter and rack specification
CASSANDRA_RACK=rack02
CASSANDRA_DC=datacenter02
# credentials
CASSANDRA_USERNAME=cassandra
CASSANDRA_PASSWORD=cassandra
# if this is the first cassandra node (ne clsuter) set this to yes, otherwise no
CASSANDRA_PASSWORD_SEEDER=no
# where should the data be fetched from. comma seperated list of other ACTIVE nodes
# if this is only the own ip, this node will automatically start a new cluster
CASSANDRA_SEEDS=192.168.0.238

I tried to resolve this Issue in the Stargate Discord beforehand, but sadly there was no response.

@tobiasbhn
Copy link
Author

@tatu-at-datastax @jeffreyscarpenter Is there any new information here?

@sync-by-unito sync-by-unito bot closed this as completed Feb 16, 2024
@sync-by-unito sync-by-unito bot reopened this Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants