Install Apache Flink on Multi-node Cluster: RHE8

Preparation:

  • Set up a password less SSH connection between the nodes for easy communication

Setting up the cluster nodes:

Install the latest version of Java on all nodes in the cluster. 

sudo yum install java-1.8.0-openjdk-devel

Install Apache ZooKeeper on all nodes in the cluster.

ZooKeeper is used for coordination between the nodes. 

Install ZooKeeper 3.6.2:

wget https://downloads.apache.org/zookeeper/zookeeper-3.6.2/apache-zookeeper-3.6.2-bin.tar.gz 

tar -xvf apache-zookeeper-3.6.2-bin.tar.gz 

sudo mv apache-zookeeper-3.6.2-bin /usr/local/zookeeper

Installing Apache Flink:

Download the latest version of Apache Flink (1.16.1) from the official website:

wget https://mirrors.ocf.berkeley.edu/apache/flink/flink-1.16.1/flink-1.16.1-bin-scala_2.12.tgz

Unpack the archive to a directory on all nodes in the cluster:

tar -xvf flink-1.16.1-bin-scala_2.12.tgz sudo mv flink-1.16.1 /usr/local/flink

Configuring the Apache Flink cluster:

  • Create a copy of the flink-conf.yaml configuration file and customize it:
cd /usr/local/flink/conf 
cp flink-conf.yaml flink-conf.yaml.orig
  • Configure the jobmanager.rpc.address setting to the hostname or IP address of the master node.
  • Configure the taskmanager.numberOfTaskSlots setting to the number of parallel tasks that each task manager should run.
  • Configure the taskmanager.memory.process.size setting to the amount of memory that each task manager should use.

for example:

taskmanager.memory.process.size: 4GB
taskmanager.numberOfTaskSlots: 30

Configure the high-availability section to set up a high-availability setup using ZooKeeper:

The zoo.cfg file is the configuration file for ZooKeeper, which is used to set up a high-availability setup for Apache Flink.

The following details need to be added to this file:

  1. Data Directory: Specify the directory where ZooKeeper will store its data.
  2. Client Port: Specify the port that ZooKeeper will listen on for client connections.
  3. Server List: Specify a list of servers in the ZooKeeper ensemble, including the hostname and client port for each server.
  4. Tick Time: Specify the length of a single tick, which is the basic time unit used by ZooKeeper.
  5. Init Limit: Specify the number of ticks that the initial synchronization phase between a ZooKeeper server and its followers can take.
  6. Sync Limit: Specify the number of ticks that a follower can be behind a leader.
  7. Snapshot Counter: Specify the number of transactions that can be processed before a snapshot of the ZooKeeper state is taken.

Here’s an example of a basic zoo.cfg file:

dataDir=/tmp/zookeeper 
clientPort=2181 

server.1=localhost:2888:3888 
server.2=localhost:2889:3889 
server.3=localhost:2890:3890 

tickTime=2000 
initLimit=10 
syncLimit=5 
snapCount=1000

Add the following lines to the flink-conf.yaml file

high-availability: zookeeper 
high-availability.zookeeper.quorum: host1:port,host2:port,host3:port 
high-availability.zookeeper.path.root: /flink:

Note: Replace host1:port, host2:port, and host3:port with the hostnames and ports of the ZooKeeper nodes in your cluster.

Starting the cluster:

Start ZooKeeper on all nodes in the cluster:

cd /usr/local/zookeeper/bin ./zkServer.sh start

Start the JobManager on the master node by running the following command:

cd /usr/local/flink/bin ./standalone-job.sh start

Start the TaskManagers on all other nodes by running the following command on each node:

cd /usr/local/flink/bin ./taskmanager.sh start 

 Here are the additional steps for setting up TLS/SSL/HTTPS :

  1. Obtain a certificate
  2. Install the certificate: Copy the certificate and private key files to a location on each node in the cluster. The location should be accessible to the user that runs the Flink process.
  3. Install OpenSSL: If it’s not already installed, install the OpenSSL package on each node in the cluster. You can do this by running the following command:
sudo yum install openssl

Configure Flink: Modify the flink-conf.yaml file on each node to enable SSL/TLS and specify the location of the certificate and private key files. Here is an example configuration:

security.ssl.enabled: true 
security.ssl.certificate: /path/to/cert.pem 
security.ssl.private-key: /path/to/key.pem

Restart the nodes: After making the changes to the configuration file, restart the Job Manager and Task Manager nodes.

Verify the configuration: You can verify that the configuration is working by accessing the Flink web UI using an HTTPS URL (e.g. https://<jobmanager_host&gt;:8081).The browser should show that the connection is secure and that the certificate was issued by a trusted CA.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.