How to setup Cassandra on Debian 9.4 and Ubuntu 16.04

Created by Jordy Leffers at 06-12-2017 11:05:35 +0100

Apache Cassandra is a Database Management System focused on high scalability and availability. It's proven to be fault-tolerant toward both commodity hardware and cloud infrastructure. Cassandra's ability to spread its clusters over multiple datacenters makes it especially attractive, as it avoids regional outages from taking your system down with it. In this guide we'll go over how to set up your Linux server(s) to run the Cassandra database. 

The installation is the same on different nodes in the system. So if you want to build a cluster, I advice opening multiple terminals and executing the installation commands on all. If you just want to use one node, that's fine too.

Before we start, we need a Linux installation. We'll skip that step in this tutorial since you can easily get a default Linux installation on one of your containers on the www.cloudcontainers.net website. This tutorial is based on the cloud containers created on the my.cloudcontainers.net page. This means that you are by default the root user, so all of the commands below don't make use of sudo. If however, you are not the root user on your system, you'll have to add "sudo" in front of the commands found in the guide below.

 

The first thing we're going to do is update our current packages using the following command:

apt update && apt upgrade -y

 

Requirements

To run Cassandra we're going to need the latest version from java 8. We've got another tutorial for that, but it's installed using only one command:

apt install default-jdk-headless -y

You can verify the installation with the java -version command.

 

Next, you'll need the latest version of Python 2.7

apt install python -y

you can verify the installation with the python --version command.

 

Installation from Apache repository

Now we add the latest 3.11 Apache Cassandra repository to our sources.list:

echo "deb http://www.apache.org/dist/cassandra/debian 311x main" | tee -a /etc/apt/sources.list.d/cassandra.sources.list

 

Download the repository's key and add it to your system's accepted keys:

wget https://www.apache.org/dist/cassandra/KEYS -O cassandra_key && apt-key add cassandra_key

 

Update your repositories and install Cassandra:

apt update && apt install cassandra -y

 

After installation Cassandra will automatically be running. If you want to make any configurations, you'll want to stop it. You can restart it later, or check its status with the following commands:

service cassandra stop service cassandra start service cassandra status

 

Congratulations, you've now setup a single-node Cassandra cluster!

 

 

(Optional) How to setup a multi-node cluster

A Cassandra multi-node cluster is useful when, for example, you want to prevent your system from being vulnerable to regional outages (or even worse) at datacenters. It allows you to spread your system over, not just multiple servers, but multiple datacenters. It's therefore very suitable for applications that can't afford to lose any data -ever.

 

In this tutorial, I'll be using the text-editor Nano, you can use whichever one you want. If you don't have it installed already you can use this command to install it:

apt install nano

(How to use the Nano text editor)

 

You can find Cassandra's configuration tools in the /etc/cassandra directory. The file we'll be using, to set up the cluster, is called cassandra.yaml This file contains a lot of comments, but these are the lines that will need editing:

  1. cluster_name : This is the name of your cluster, pick a name that's appropriate for your project.

  2. authenticator : By default, everyone is allowed to log in, without any password (AllowAllAuthenticator). Set this to "org.apache.cassandra.auth.PasswordAuthenticator" to avoid this insecure setting.

  3. authorizer : By default, everyone is allowed to log in, without any password (AllowAllAuthorizer). Set this to "org.apache.cassandra.auth.CassandraAuthorizer" to avoid this insecure setting.

  4. -seeds : This is a comma separated list of the IP addresses belonging to all the other nodes in the cluster.

  5. listen_address : This is the IP address other nodes in the cluster will use to connect to the current one. By default it's set to localhost, change it to the actual IP address of this node.

  6. endpoint_snitch : This tells Cassandra what our network looks like. By default it's configured to SimpleSnitch. For our cluster, we'll set it to GossipingPropertyFileSnitch.

  7. auto_bootstrap : This directive does not yet exist in the cassandra.yaml file. We'll need to add it on the bottom of the file and set it to false.

 

Because Cassandra starts running right after installation, we will need to stop it and remove it's system data directory, that now holds default configuration data.

service cassandra stop && rm -rf /var/lib/cassandra/data/system/*

 

Open and edit the configuration file:

nano /etc/cassandra/cassandra.yaml

 

When you're done, your setting should look like this:

# 
#
cluster_name: 'My_First_Cluster' 
#
# 
authenticator: org.apache.cassandra.auth.PasswordAuthenticator 
#
# 
authorizer: org.apache.cassandra.auth.CassandraAuthorizer 
#
# 
seed_provider:  
- class_name: org.apache.cassandra.locator.SimpleSeedProvider    
parameters:         
- seeds: "current_node_IP,second_node_IP,third_node_IP" 
#
# 
listen_address: current_node_IP 
#
# 
endpoint_snitch: GossipingPropertyFileSnitch 
#
# 
auto_bootstrap: false

Save and exit the file, repeat this step for all nodes in the cluster.

 

When you're done configuring each node, we can start them back up again.

service cassandra start

 

We can check if our cluster was configured correctly by checking the status of our nodes:

nodetool status

If everything works correctly, your output should look like this:

 

Logging in to the CQL shell for Cassandra

If you want to log in to the CQL shell for Cassandra, you can log in with the following command:

cqlsh -u cassandra -p cassandra

Note that opening the shell without user specified will result in either a connection- or an authentication error.

 

Congratulations, you've just configured a multi-cluster Cassandra setup!

Comments

Comments are turned off.