Elasticsearch out of the box does a good job of locating nodes and building a cluster. Just assign everyone the same cluster name and ES does the rest. But running a cluster on Amazon’s EC2 presents some additional challenges specific to that environment. I recently set up a docker-ized Elasticsearch cluster on two EC2 medium instances, and thought I would put down in one place the main things that I ran into setting it up.
The most fundamental constraint in EC2 is that multicast discovery (unsurprisingly) will not work. EC2 doesn’t allow multicast traffic. This gives you two choices: use unicast discovery, or set up EC2 dynamic discovery. Unicast discovery requires that all nodes have the host IPs for the other nodes in the cluster. I don’t like that idea, so I chose to set up dynamic discovery using the EC2 APIs. Getting EC2 dynamic discovery to work requires changes in three places: the Elasticsearch installation, the configuration file at /etc/elasticsearch/elasticsearch.yml, and finally the instance and security configuration on Amazon. I’ll take these in order.
The best way to integrate EC2 into the Elasticsearch discovery workflow is to use the cloud-aws plugin module. You can install this at any time with the following command:
sudo /usr/share/elasticsearch/bin/plugin -install \ elasticsearch/elasticsearch-cloud-aws/2.0.0.RC1
This will pull the latest plugin from the download site and install it, which is basically just extracting files. Note that the version in the command is the latest one. You can check here to see if it is still current. And that’s all there is to that. Adding the cloud-aws plugin enables the discovery.ec2 settings in elasticsearch.yml, which is where we’ll head next.
The Elasticsearch config file is located in /etc/elasticsearch/elasticsearch.yml, and we’ll want to change/add a few things there. First, obviously, give everyone the same cluster name:
cluster.name: my_cluster
Another setting that makes sense in production, at least, is to require the cloud-aws plugin to be present on start:
plugin.mandatory: cloud-aws
The next two settings are required for the cloud-aws plugin to communicate on your behalf with the great AWS brain in the sky:
cloud.aws.access_key: NOTMYACCESSKEYATALL cloud.aws.secret_key: ITw0UlDnTB3seCRetiFiPuTItHeR3
Not to digress too much into AWS access management, but if you’ve set things up the right way then the access key and secret key used above will be for an IAM sub-account that grants just the specific permissions needed. The next two settings deal with discovery directly:
discovery.type: ec2 discovery.zen.ping.multicast.enabled: false
The first one just sets the discovery type to use the EC2 plugin. Pretty self-explanatory. The second one disables the use of multicast discovery, on the principle of “it doesn’t work, so don’t try it.” The last two settings we’ll discuss can be seen as alternatives to one and other, but are not mutually exclusive, which requires a bit of explanation.
Basically, we may need to filter the instances that get returned to the discovery process. When the cloud-aws plugin queries EC2 and gets a list of addresses back, it is going to assume they are all Elasticsearch nodes. During discovery it will try to contact them, and if some are not actually nodes it will just keep trying. This behavior makes sense with the multicast discovery process, because if you are not listening for multicast traffic then you don’t respond to it. But the EC2 discovery APIs will return all the instances in an availability zone, so we need some way to identify to Elasticsearch discovery which ones are really nodes.
One way to do this is by using a security group. You can create a security group and assign all the instances that you want in a particular cluster to that group, then make the following addition to the config:
discovery.ec2.groups: my_security_group
This setting tells the plugin to return only those instances that are members of the security group ‘my_security_group.’ Since you will need a security group anyway, as explained below, this is a convenient way to separate them from the crowd. But there can be cases where you don’t want to partition on the security group name. You might, for example, want to have one security group to control the access rules for a set of instances representing more than one cluster. In that case you can use tags:
discovery.ec2.tag.my_tag: my_tag_value
This setting tells the plugin to return only those instances that have the tag ‘my_tag’ with the value ‘my_tag_value.’ This is even more convenient, since it doesn’t involve mucking about with security groups, and setting the tag on an instance is easily done. Finally, as mentioned before these aren’t mutually exclusive. You can use the groups option to select a set of instances, and then partition them into clusters using tags.
And that’s it for the elasticsearch.yml settings, or at least the ones I had to change to make this work on EC2. There are a lot of other options if your specific case requires tweaking, and you can find an explanation of them here. The last thing I want to go into are the necessary steps to take in the Amazon EC2 console with respect to configuration. These fall into two areas: security groups and instance configuration. I don’t want to digress far into the specific AWS console steps, but I’ll outline in general what needs to happen.
Security groups first. You’re going to need all the nodes in your cluster to be able to talk to each other, and you’re going to want some outside clients to be able to talk to the nodes in the cluster as well. There are lots of different cases for connecting clients for querying purposes, so I’m going to ignore that topic and focus on communications between the nodes.
The nodes in a cluster need to use the Elasticsearch transport protocol on port 9300, so you’ll need to create a security group, or modify an existing one that applies to your instances, and create a rule allowing this traffic. For my money the easiest and most durable way to do this is to have a single security group covering just the nodes, and to add a rule allowing inbound TCP traffic on port 9300 from any instance in the group. If you are using the discovery.ec2.groups method discussed above, make sure to give your group the same name you used in the settings.
The last point is instance configuration, and for my purposes here it’s really just setting the tag or security group membership appropriately so that the instance gets included in the list returned from discovery. There are lots of other specifics regarding how to set up an instance to run Elasticsearch efficiently, but those are topics for another time (after I figure them out myself!)
The very last thing I want to mention is a tip for Docker users. If you’re running Elasticsearch inside a container on EC2 your discovery is going to fail. The first node that starts is going to elect itself as master, and if you query cluster health on that node it will succeed and tell you there is one node in the cluster, which is itself. The other nodes will hang when you try the same query and that is because they are busy failing discovery. If you look in /var/log/elasticsearch/cluster_name.log you’re going to see some exceptions that look like this:
[2014-03-18 18:20:04,425][INFO ][discovery.ec2 ] [MacPherran] failed to send join request to master [[Amina][Pi6vJ470SYy4fEQhGXiwEA][38d8cffbdcd5][inet[/172.17.0.2:9300]]], reason [org.elasticsearch.transport.RemoteTransportException: [MacPherran][inet[/172.17.0.2:9300]][discovery/zen/join]; org.elasticsearch.ElasticsearchIllegalStateException: Node [[MacPherran][aHWxgYAhSpaWlPEXNhs7RA][4fb92271cce6][inet[/172.17.0.2:9300]]] not master for join request from [[MacPherran][aHWxgYAhSpaWlPEXNhs7RA][4fb92271cce6][inet[/172.17.0.2:9300]]]]
I cut out a lot of detail, but basically the reason this is happening is that, in this example, node MacPherran is talking to itself and doesn’t know it. The problem is caused because a running Docker container has a different IP address than the host instance it is running on. So when the node does discovery it first finds itself at the container IP, something like:
[MacPherran][inet[/172.17.0.2:9300]]
And then finds itself at the instance IP returned from EC2:
[MacPherran][inet[/10.147.39.21:9300]]
From there things do not go well for the discovery process. Fortunately this is easily fixed with another addition to elasticsearch.yml:
network.publish_host: 255.255.255.255
This setting tells Elasticsearch that this node should advertise itself as being at the specified host IP address. Set this to the host address of the instance and Elasticsearch will now tell everyone that is its address, which it is, assuming you are mapping the container ports over to the host ports. Now, of course, you have the problem of how to inject the host IP address into the container, but hopefully that is a simpler problem, and is left as an exercise for the reader.