From Fedora Project Wiki

No edit summary
No edit summary
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Introduction ==
== Introduction ==
[https://en.wikipedia.org/wiki/Apache_Cassandra Apache Cassandra] is a free and open-source distributed NoSQL database system designed to handle large amounts of data across multiple servers, providing high availability with no single point of failure.
[https://en.wikipedia.org/wiki/Apache_Cassandra Apache Cassandra] is a free and open-source distributed NoSQL database system designed to handle large amounts of data across multiple servers, providing high availability with no single point of failure. One of the main features of Apache Cassandra is its ability to run in a multi-node setup, hence providing the following benefits:


* '''Fault tolerance''': Data is automatically replicated to multiple nodes for fault-tolerance. Also, replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.
* '''Decentralization''': There are no single points of failure, no network bottlenecks and every node in the cluster is identical.
* '''Scalability & Elasticity''': Capability to run with dozens of thousands of nodes with petabytes of data. Read and write throughput both increase linearly as new machines are added, with no downtime or interruption to applications.


== Installation ==
== Installation ==
Line 19: Line 22:
</pre>
</pre>
will install database server and tools for working with it.
will install database server and tools for working with it.


== Basic setup ==
== Basic setup ==
Line 42: Line 44:
# Create a new superuser: <pre>cqlsh> CREATE ROLE <new_super_user> WITH PASSWORD = '<some_secure_password>' AND SUPERUSER = true AND LOGIN = true;</pre>
# Create a new superuser: <pre>cqlsh> CREATE ROLE <new_super_user> WITH PASSWORD = '<some_secure_password>' AND SUPERUSER = true AND LOGIN = true;</pre>
# Log in as the newly created superuser: <pre>cqlsh -u <new_super_user> -p <some_secure_password></pre>
# Log in as the newly created superuser: <pre>cqlsh -u <new_super_user> -p <some_secure_password></pre>
# The Cassandra superuser cannot be deleted from Cassandra, so to neutralize the account, change the password to something long and incomprehensible, and alter the user’s status to NOSUPERUSER: <pre>cqlsh> ALTER ROLE cassandra WITH PASSWORD='SomeNonsenseThatNoOneWillThinkOf' AND SUPERUSER=false;</pre>
# ''cassandra'' superuser cannot be deleted from Cassandra, so to neutralize the account, change the password to something long and incomprehensible, and alter the user’s status to NOSUPERUSER: <pre>cqlsh> ALTER ROLE cassandra WITH PASSWORD='SomeNonsenseThatNoOneWillThinkOf' AND SUPERUSER=false;</pre>


=== Ports and remote access ===
=== Ports and remote access ===
Line 87: Line 89:
To configure the server you have to edit the file ''/etc/cassandra/cassandra.yaml''. For more information about how to change configuration, see the the [https://docs.datastax.com/en/archived/cassandra/3.x/cassandra/configuration/configCassandra_yaml.html upstream configuration].
To configure the server you have to edit the file ''/etc/cassandra/cassandra.yaml''. For more information about how to change configuration, see the the [https://docs.datastax.com/en/archived/cassandra/3.x/cassandra/configuration/configCassandra_yaml.html upstream configuration].


== Cluster setup ==
For example refer to page [[Apache Cassandra Cluster]].
== Apache Cassandra in the container ==
An Apache Cassandra container image can be found in [https://hub.docker.com/ DockerHub] as '''[https://hub.docker.com/r/centos/cassandra-3-centos7/ centos/cassandra-3-centos7]'''. Starting a container for serving Cassandra is simple.
[[Getting started with docker|Install]] and start docker.
Prepare directory for database data:
<pre>
mkdir data
chown 143:143 data
</pre>
{{admon/note|Note:|You have to change ownership for data directory to match ''cassandra'' user in container to allow reading and writing.}}
Start the container:
<pre>
docker run --name cassandra -d -p 9042:9042 \
    -e CASSANDRA_ADMIN_PASSWORD=secret \
    -v "`pwd`/data":/var/opt/rh/sclo-cassandra3/lib/cassandra:Z \
    centos/cassandra-3-centos7
</pre>
The container uses the prepared directory to store data into and creates a user and database. '''Important''' is second line with defined password for ''admin'' user.
{{admon/note|Note:|Apache Cassandra has ''admin'' user instead ''cassandra'', this one is deleted in initialization phase.}}
Now you can try the Cassandra client. See [[#Usage example|Usage example]]. If you don't have client tools installed, you can use one provided by container:
<pre>
docker exec -it cassandra 'bash' -c 'cqlsh '`docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' cassandra`' -u admin -p secret'
</pre>
More options are available, see '''[https://hub.docker.com/r/centos/cassandra-3-centos7/ container README]'''.


== Usage example ==
== Usage example ==
Line 106: Line 137:
(1 rows)
(1 rows)
</pre>
</pre>
== Feedback ==
We will be glad to see any feedback from you.
Also we are looking for some help with maintaining Apache Cassandra in Fedora, so if you feel ready to help us, just contact us.

Latest revision as of 13:42, 16 May 2018

Introduction

Apache Cassandra is a free and open-source distributed NoSQL database system designed to handle large amounts of data across multiple servers, providing high availability with no single point of failure. One of the main features of Apache Cassandra is its ability to run in a multi-node setup, hence providing the following benefits:

  • Fault tolerance: Data is automatically replicated to multiple nodes for fault-tolerance. Also, replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.
  • Decentralization: There are no single points of failure, no network bottlenecks and every node in the cluster is identical.
  • Scalability & Elasticity: Capability to run with dozens of thousands of nodes with petabytes of data. Read and write throughput both increase linearly as new machines are added, with no downtime or interruption to applications.

Installation

The database have been available since Fedora 26 and there are multiple packages in Fedora repositories:

cassandra Client tools
cassandra-server Server part, mainly database daemon
cassandra-javadoc Documentation
More packages can be listed with command: dnf list cassandra\*
dnf install cassandra cassandra-server

will install database server and tools for working with it.

Basic setup

Initialization and startup

Start database daemon:

systemctl start cassandra

Enable start of database daemon after boot:

systemctl enable cassandra

To test if server initialization was successful you can try the Cassandra client. See Usage example.

Users authentication

It’s especially relevant to note that by default authentication is disabled and to enable it you have to take the following steps:

  1. Change the authenticator option in the /etc/cassandra/cassandra.yaml file to PasswordAuthenticator:
    authenticator: PasswordAuthenticator
  2. Restart cassandra:
    systemctl restart cassandra
  3. Start cqlsh using the default superuser name and password:
    cqlsh -u cassandra -p cassandra
  4. Create a new superuser:
    cqlsh> CREATE ROLE <new_super_user> WITH PASSWORD = '<some_secure_password>' AND SUPERUSER = true AND LOGIN = true;
  5. Log in as the newly created superuser:
    cqlsh -u <new_super_user> -p <some_secure_password>
  6. cassandra superuser cannot be deleted from Cassandra, so to neutralize the account, change the password to something long and incomprehensible, and alter the user’s status to NOSUPERUSER:
    cqlsh> ALTER ROLE cassandra WITH PASSWORD='SomeNonsenseThatNoOneWillThinkOf' AND SUPERUSER=false;

Ports and remote access

By default these ports should be binded to Cassandra Java process after start:

Port number Description
TCP / 7000 Cassandra inter-node cluster communication
TCP / 7199 Cassandra JMX monitoring port
TCP / 9042 Cassandra client port
Encrypted communication
SSL/TLS in Apache Cassandra can be configured, by default it uses TCP / 7001 for inter-node communication and TCP / 9142 as client port.
Thrift API
was deprecated in Apache Cassandra 4 and in Fedora version of Cassandra 3 is also stripped. This means there is not port TCP / 9160.

To allow remote access to database, edit the /etc/cassandra/cassandra.yaml file, changing the following parameters (needs service restart):

listen_address: external_ip
rpc_address: external_ip
seed_provider/seeds: "<external_ip>"

Also open ports in firewall.

firewalld:

firewall-cmd --add-port=7000/tcp
firewall-cmd --add-port=9042/tcp
# probably you do not want to expose JMX port on external network
# firewall-cmd --add-port=7199/tcp
# save configuration
firewall-cmd --runtime-to-permanent

iptables:

iptables -A INPUT -p tcp --dport 7000 -j ACCEPT
iptables -A INPUT -p tcp --dport 9042 -j ACCEPT
# probably you do not want to expose JMX port on external network
# iptables -A INPUT -p tcp --dport 7199 -j ACCEPT
Warning:
By default authentication is disabled and data are unprotected. See Users authentication.

More about how to configure Apache Cassandra

To configure the server you have to edit the file /etc/cassandra/cassandra.yaml. For more information about how to change configuration, see the the upstream configuration.

Cluster setup

For example refer to page Apache Cassandra Cluster.

Apache Cassandra in the container

An Apache Cassandra container image can be found in DockerHub as centos/cassandra-3-centos7. Starting a container for serving Cassandra is simple.

Install and start docker.

Prepare directory for database data:

mkdir data
chown 143:143 data
Note:
You have to change ownership for data directory to match cassandra user in container to allow reading and writing.

Start the container:

docker run --name cassandra -d -p 9042:9042 \
    -e CASSANDRA_ADMIN_PASSWORD=secret \
    -v "`pwd`/data":/var/opt/rh/sclo-cassandra3/lib/cassandra:Z \
    centos/cassandra-3-centos7

The container uses the prepared directory to store data into and creates a user and database. Important is second line with defined password for admin user.

Note:
Apache Cassandra has admin user instead cassandra, this one is deleted in initialization phase.

Now you can try the Cassandra client. See Usage example. If you don't have client tools installed, you can use one provided by container:

docker exec -it cassandra 'bash' -c 'cqlsh '`docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' cassandra`' -u admin -p secret'

More options are available, see container README.

Usage example

$ cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.1 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh> CREATE KEYSPACE k1 WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
cqlsh> USE k1;
cqlsh:k1> CREATE TABLE users (user_name varchar, password varchar, gender varchar, PRIMARY KEY (user_name));
cqlsh:k1> INSERT INTO users (user_name, password, gender) VALUES ('John', 'test123', 'male');
cqlsh:k1> SELECT * from users;

 user_name | gender | password
-----------+--------+----------
      John |   male |  test123

(1 rows)

Feedback

We will be glad to see any feedback from you.

Also we are looking for some help with maintaining Apache Cassandra in Fedora, so if you feel ready to help us, just contact us.