Elasticsearch discovery in EC2

Elasticsearch out of the box does a good job of locating nodes and building a cluster. Just assign everyone the same cluster name and ES does the rest. But running a cluster on Amazon’s EC2 presents some additional challenges specific to that environment. I recently set up a docker-ized Elasticsearch cluster on two EC2 medium instances, and thought I would put down in one place the main things that I ran into setting it up.

The most fundamental constraint in EC2 is that multicast discovery (unsurprisingly) will not work. EC2 doesn’t allow multicast traffic. This gives you two choices: use unicast discovery, or set up EC2 dynamic discovery. Unicast discovery requires that all nodes have the host IPs for the other nodes in the cluster. I don’t like that idea, so I chose to set up dynamic discovery using the EC2 APIs. Getting EC2 dynamic discovery to work requires changes in three places: the Elasticsearch installation, the configuration file at /etc/elasticsearch/elasticsearch.yml, and finally the instance and security configuration on Amazon. I’ll take these in order.

The best way to integrate EC2 into the Elasticsearch discovery workflow is to use the cloud-aws plugin module. You can install this at any time with the following command:

sudo /usr/share/elasticsearch/bin/plugin -install \
elasticsearch/elasticsearch-cloud-aws/2.0.0.RC1

This will pull the latest plugin from the download site and install it, which is basically just extracting files. Note that the version in the command is the latest one. You can check here to see if it is still current. And that’s all there is to that. Adding the cloud-aws plugin enables the discovery.ec2 settings in elasticsearch.yml, which is where we’ll head next.

The Elasticsearch config file is located in /etc/elasticsearch/elasticsearch.yml, and we’ll want to change/add a few things there. First, obviously, give everyone the same cluster name:

cluster.name: my_cluster

Another setting that makes sense in production, at least, is to require the cloud-aws plugin to be present on start:

plugin.mandatory: cloud-aws

The next two settings are required for the cloud-aws plugin to communicate on your behalf with the great AWS brain in the sky:

cloud.aws.access_key: NOTMYACCESSKEYATALL
cloud.aws.secret_key: ITw0UlDnTB3seCRetiFiPuTItHeR3

Not to digress too much into AWS access management, but if you’ve set things up the right way then the access key and secret key used above will be for an IAM sub-account that grants just the specific permissions needed. The next two settings deal with discovery directly:

discovery.type: ec2
discovery.zen.ping.multicast.enabled: false

The first one just sets the discovery type to use the EC2 plugin. Pretty self-explanatory. The second one disables the use of multicast discovery, on the principle of “it doesn’t work, so don’t try it.” The last two settings we’ll discuss can be seen as alternatives to one and other, but are not mutually exclusive, which requires a bit of explanation.

Basically, we may need to filter the instances that get returned to the discovery process. When the cloud-aws plugin queries EC2 and gets a list of addresses back, it is going to assume they are all Elasticsearch nodes. During discovery it will try to contact them, and if some are not actually nodes it will just keep trying. This behavior makes sense with the multicast discovery process, because if you are not listening for multicast traffic then you don’t respond to it. But the EC2 discovery APIs will return all the instances in an availability zone, so we need some way to identify to Elasticsearch discovery which ones are really nodes.

One way to do this is by using a security group. You can create a security group and assign all the instances that you want in a particular cluster to that group, then make the following addition to the config:

discovery.ec2.groups: my_security_group

This setting tells the plugin to return only those instances that are members of the security group ‘my_security_group.’ Since you will need a security group anyway, as explained below, this is a convenient way to separate them from the crowd. But there can be cases where you don’t want to partition on the security group name. You might, for example, want to have one security group to control the access rules for a set of instances representing more than one cluster. In that case you can use tags:

discovery.ec2.tag.my_tag: my_tag_value

This setting tells the plugin to return only those instances that have the tag ‘my_tag’ with the value ‘my_tag_value.’ This is even more convenient, since it doesn’t involve mucking about with security groups, and setting the tag on an instance is easily done. Finally, as mentioned before these aren’t mutually exclusive. You can use the groups option to select a set of instances, and then partition them into clusters using tags.

And that’s it for the elasticsearch.yml settings, or at least the ones I had to change to make this work on EC2. There are a lot of other options if your specific case requires tweaking, and you can find an explanation of them here. The last thing I want to go into are the necessary steps to take in the Amazon EC2 console with respect to configuration. These fall into two areas: security groups and instance configuration. I don’t want to digress far into the specific AWS console steps, but I’ll outline in general what needs to happen.

Security groups first. You’re going to need all the nodes in your cluster to be able to talk to each other, and you’re going to want some outside clients to be able to talk to the nodes in the cluster as well. There are lots of different cases for connecting clients for querying purposes, so I’m going to ignore that topic and focus on communications between the nodes.

The nodes in a cluster need to use the Elasticsearch transport protocol on port 9300, so you’ll need to create a security group, or modify an existing one that applies to your instances, and create a rule allowing this traffic. For my money the easiest and most durable way to do this is to have a single security group covering just the nodes, and to add a rule allowing inbound TCP traffic on port 9300 from any instance in the group. If you are using the discovery.ec2.groups method discussed above, make sure to give your group the same name you used in the settings.

The last point is instance configuration, and for my purposes here it’s really just setting the tag or security group membership appropriately so that the instance gets included in the list returned from discovery. There are lots of other specifics regarding how to set up an instance to run Elasticsearch efficiently, but those are topics for another time (after I figure them out myself!)

The very last thing I want to mention is a tip for Docker users. If you’re running Elasticsearch inside a container on EC2 your discovery is going to fail. The first node that starts is going to elect itself as master, and if you query cluster health on that node it will succeed and tell you there is one node in the cluster, which is itself. The other nodes will hang when you try the same query and that is because they are busy failing discovery. If you look in /var/log/elasticsearch/cluster_name.log you’re going to see some exceptions that look like this:

[2014-03-18 18:20:04,425][INFO ][discovery.ec2            ]
[MacPherran] failed to send join request to master 
[[Amina][Pi6vJ470SYy4fEQhGXiwEA][38d8cffbdcd5][inet[/172.17.0.2:9300]]],
reason [org.elasticsearch.transport.RemoteTransportException:
[MacPherran][inet[/172.17.0.2:9300]][discovery/zen/join]; 
org.elasticsearch.ElasticsearchIllegalStateException: 
Node [[MacPherran][aHWxgYAhSpaWlPEXNhs7RA][4fb92271cce6][inet[/172.17.0.2:9300]]] 
not master for join request from 
[[MacPherran][aHWxgYAhSpaWlPEXNhs7RA][4fb92271cce6][inet[/172.17.0.2:9300]]]]

I cut out a lot of detail, but basically the reason this is happening is that, in this example, node MacPherran is talking to itself and doesn’t know it. The problem is caused because a running Docker container has a different IP address than the host instance it is running on. So when the node does discovery it first finds itself at the container IP, something like:

[MacPherran][inet[/172.17.0.2:9300]]

And then finds itself at the instance IP returned from EC2:

[MacPherran][inet[/10.147.39.21:9300]]

From there things do not go well for the discovery process. Fortunately this is easily fixed with another addition to elasticsearch.yml:

network.publish_host: 255.255.255.255

This setting tells Elasticsearch that this node should advertise itself as being at the specified host IP address. Set this to the host address of the instance and Elasticsearch will now tell everyone that is its address, which it is, assuming you are mapping the container ports over to the host ports. Now, of course, you have the problem of how to inject the host IP address into the container, but hopefully that is a simpler problem, and is left as an exercise for the reader.

Docker: run startup scripts then exit to a shell

As I’ve mentioned before the way Docker is set up it expects to run one command at container start up, and when that command exits the container stops. There are all sorts of creative ways to get around that if, for example, you want to mimic init behavior by launching some daemons before opening a shell to work in the container. Here’s a little snippet from a Docker build file that illustrates one simple approach:

CMD bash -C '/path/to/start.sh';'bash'

In the ‘start.sh’ script (which I usually copy to /usr/local/bin during container build) you can put any start up commands you want. For example the one in the container I’m working on now just has:

service elasticsearch start

The result when launching the container is that bash runs, executes the command which spools up es, and then executes another shell and quits:

mark:~$ sudo docker run -i -t mark/example
 * Starting Elasticsearch Server                                         [ OK ] 
root@835600d4d0b2:/home/root# curl http://localhost:9200
{
  "status" : 200,
  "name" : "Box IV",
  "version" : {
    "number" : "1.0.1",
    "build_hash" : "5c03844e1978e5cc924dab2a423dc63ce881c42b",
    "build_timestamp" : "2014-02-25T15:52:53Z",
    "build_snapshot" : false,
    "lucene_version" : "4.6"
  },
  "tagline" : "You Know, for Search"
}
root@835600d4d0b2:/home/root# exit
exit
mark:~$

Docker Builds and -no-cache

I was building out a search server container with Elasticsearch 1.0.1 today, and I ran into one of those irritating little problems that I could solve a lot faster if I would just observe more carefully what is actually going on. One of the steps in the build is to clone some stuff from our git repo that includes config files that will get copied to various places. In the process of testing I added a new file and pushed it, then re-ran the build. Halfway through I got a stat error from a cp command that couldn’t find the file.

But, but, I had pushed it, and pulled the repo, so where the hell was it? Yesterday something similar had happened when building a logstash/redis container. One of the nice things about a Docker build is that it leaves the interim containers installed until the end of the build (or forever if you don’t use the -rm=true option). So you can start up the container from the last successful build step and look around inside it. In yesterday’s case it turned out I was pushing to one branch and cloning from another.

But that problem had been solved yesterday. Today’s problem was different, because I was definitely cloning the right branch. I took a closer look at the output from the Docker build, and where I expected to see…

Step 4 : RUN git clone blahblahblah.git
 ---> Running in 51c842191693

I instead saw…

Step 4 : RUN git clone blahblahblah.git
 ---> Using cache

Docker was assuming the effect of the RUN command was deterministic and was reusing the interim image from the last time I ran the build. Interestingly it did the same thing with a later wget command that downloaded an external package. I’m not sure how those commands could ever be considered deterministic, since they pull data from outside sources, but whatever. The important thing is you can add the -no-cache option to the build command to get Docker to ignore the cache.

sudo docker build -no-cache -rm=true - < DockerFile

Note that this applies to the whole build, so if you do have some other commands that are in fact deterministic they are not going to use the cache either. It would be nice to have an argument to the RUN command to do this on per-step basis, but at least -no-cache will make sure all your RUN steps get evaluated every time you build.