In twenty-five years as a professional software person I’ve done quite a few things but they have all focused on, or at least started with, writing code. I am basically a programmer. Somewhere along the way, around the 2000’s if I recall, the term “software engineer” became the fashionable title. I always felt a little silly using it because I don’t have a degree in software anything, and my Dad is an actual engineer with stuff hanging on his walls to prove it. I didn’t go to family parties and talk about what an engineer I was. In fact I’m not sure I ever actually had the word “engineer” in my role until now. In this post I’m going to talk a little bit about how that changed.
Back in 2015 I had just finished up a gig writing a specialty search engine from the ground up, working on a two man team with friend and repeat colleague Joey Espinosa. With just two of us working for a somewhat tech-savvy business person that project was hands-on full-stack everything. We did the data layer, scraping engine, customized spiders for horrible ancient broken sites, web layer, networking, admin, everything. We deployed the app in docker containers using custom scaffolding on AWS instances. It was a ton of fun almost all the time, but business-wise it went nowhere.
The general theory of pod scheduling in kubernetes is to let the scheduler handle it. You tell the cluster to start a pod, the cluster looks at all the available nodes and decides where to put the new thing, based on comparing available resources with what the pod declares it needs. That’s scheduling in a nutshell. Sometimes, however, you need a little more input into the process. For example you may have been asked to run a thing that requires more resources than any single node in your cluster offers. You can add a new node with enough juice, maybe using a nodepool if you’re running on GKE, but how do you make sure the right pods run on it? How do you make sure the wrong pods don’t run on it?
You can often nudge the scheduler in the right direction simply by setting resource requests appropriately. If your new pod needs 5 GB of ram and the only node big enough is the one you added for it to run on, then setting the memory request for that pod to 5 GB will force the scheduler to put it there. This is a fairly fragile approach, however, and while it will get your pod onto a node with sufficient resources it won’t keep the scheduler from putting other things there as well, as long as they will fit. Maybe that’s not important, but if it is, or if for some other reason you need positive control over which nodes your pod schedules to then you need the finer level of scheduling control that kubernetes offers through the use of taints, tolerations and affinity.
As discussed in my recent post on kubernetes ingress there is really only one way for traffic from outside your cluster to reach services running in it. You can read that article for more detail but the tl;dr is that all outside traffic gets into the cluster by way of a nodeport, which is a port opened on every host/node. Nodes are ephemeral things and clusters are designed to scale up and down, and because of this you will always need some sort of load balancer between clients and the nodeports. If you’re running on a cloud platform like GKE then the usual way to get there is to use a type LoadBalancer service or an ingress, either of which will build out a load balancer to handle the external traffic.
This isn’t always, or even most often what you want. Your case may vary but at Olark we deploy a lot more internal services than we do external ones. Up until recently the load balancers created by kubernetes on GKE were always externally visible, i.e. they were allocated a non-private IP that is reachable from outside the project. Maintaining firewall rules to sandbox lots of internal services is not a tradeoff we want to make, so for these use cases we created our services as type NodePort, and then provisioned an internal TCP load balancer for them using terraform.
There’s a reason why the kubernetes project is the current crown jewel of the cloud native community, with attendance at Kubecon 2017 in Austin nearly four times that of last year’s conference in Seattle and seemingly every major enterprise vendor perched behind a booth in the exhibit hall eager to help attendees take advantage of the platform. The reason is that the advantages are significant, especially in those areas that matter most to developers and system engineers: application reliability, observe-ability, control-ability and life-cycle management. If Docker built the engine of the container revolution then it was kubernetes that supplied the chassis and got it up to highway speed.
But driving at highway speed means keeping your hands on the wheel and obeying the rules of the road. Kubernetes has its own rules, and applications that adhere to best practices with respect to certain key touch points are much less likely to wipe out and take a few neighboring lanes of traffic with them. In this post I am going to briefly discuss five important design features that will affect how well your application behaves when running on kubernetes: configuration, logging, signal handling, health checks, and resource limits. The treatment of each topic will be necessarily high level, but I will provide links to more detailed information where it will be useful.
In the first post of this series I described the network that enables pods to connect to each other across nodes in a kubernetes cluster. The second focused on how the service network provides load balancing for pods so that clients inside the cluster can communicate with them reliably. For this third and final installment I want to build on those concepts to show how clients outside the cluster can connect to pods using the same service network. For various reasons this will likely be the most involved of the three, and the concepts introduced in parts one and two are prerequisites to getting much value out of what follows.
First, having just returned from kubecon 2017 in Austin I’m reminded of something I might have made clear earlier in the series. Kubernetes is a rapidly maturing platform. Much of the architecture is plug-able, and this includes networking. What I have been describing here is the default implementation on Google Kubernetes Engine. I haven’t seen Amazon’s Elastic Kubernetes Service yet but I think it will be close to the default implementation there as well. To the extent that kubernetes has a “standard” way of handling networking I think these posts describe it in its fundamental aspects. You have to start somewhere, and getting these concepts well in hand will help when you start to think about alternatives like unified service meshes, etc. With that said, let’s talk ingress.
In the first post of this series I looked at how kubernetes employs a combination of virtual network devices and routing rules to allow a pod running on one cluster node to communicate with a pod running on another, as long as the sender knows the receiver’s pod network IP address. If you aren’t already familiar with how pods communicate then it’s worth a read before continuing. Pod networking in a cluster is neat stuff, but by itself it is insufficient to enable the creation of durable systems. That’s because pods in kubernetes are ephemeral. You can use a pod IP address as an endpoint but there is no guarantee that the address won’t change the next time the pod is recreated, which might happen for any number of reasons.
You will probably have recognized this as an old problem, and it has a standard solution: run the traffic through a reverse-proxy/load balancer. Clients connect to the proxy and the proxy is responsible for maintaining a list of healthy servers to forward requests to. This implies a few requirements for the proxy: it must itself be durable and resistant to failure; it must have a list of servers it can forward to; and it must have some way of knowing if a particular server is healthy and able to respond to requests. The kubernetes designers solved this problem in an elegant way that builds on the basic capabilities of the platform to deliver on all three of those requirements, and it starts with a resource type called a service.
This post is going to attempt to demystify the several layers of networking operating in a kubernetes cluster. Kubernetes is a powerful platform embodying many intelligent design choices, but discussing the way things interact can get confusing: pod networks, service networks, cluster IPs, container ports, host ports, node ports… I’ve seen a few eyes glaze over. We mostly talk about these things at work, cutting across all layers at once because something is broken and someone wants it fixed. If you take it a piece at a time and get clear on how each layer works it all makes sense in a rather elegant way.
In order to keep things focused I’m going to split the post into three parts. This first part will look at containers and pods. The second will examine services, which are the abstraction layer that allows pods to be ephemeral. The last post will look at ingress and getting traffic to your pods from outside the cluster. A few disclaimers first. This post isn’t intended to be a basic intro to containers, kubernetes or pods. To learn more about how containers work see this overview from Docker. A high level overview of kubernetes can be found here, and an overview of pods specifically is here. Lastly a basic familiarity with networking and IP address spaces will be helpful.
HashiCorp’s terraform is a powerful and extensible tool for defining and creating cloud infrastructure in a repeatable way. At Olark we use it to manage a number of different environments on Google Cloud Platform. On the journey from imperative to declarative infrastructure we’ve learned a few things. Here are five that I feel are particularly important. What follows are entirely my own opinions.
Workloads running in kubernetes pods commonly need access to services outside the cluster. In heterogeneous architectures where some services run in kubernetes and others are implemented on cloud VMs this often means resolving private DNS names that point to either specific hosts or to internal load balancers that provide ingress to groups of hosts.
In kubernetes the standard DNS resolver is kube-dns, which is a pod in the kube-system namespace that runs a dnsmasq container as well as a container with some custom golang glue that interfaces between the dns server and the rest of the cluster control plane. The kube-dns service cluster IP is injected into pods via /etc/resolv.conf as we can see here:
Chatbots are the hot topic of conversation among developers and investors alike right now. Advances in natural language processing (NLP) and wide adoption of a few messaging platforms like Facebook Messenger, Slack, Snapchat and Whatsapp have presented developers with a unique opportunity. Would you rather navigate to, say, Nordstrom’s website, find the shoe section, figure out the categories, search, choose a pair, add them to a cart, etc., or would you rather message the Nordstrom’s bot: “I need a pair of brown dress shoes, size 10 1/2, leather, and I like Johnson and Murphy” and have it respond with a selection which you can order from simply by tapping? Yeah, thought so. Me too.
Chatbots have a number of advantages. The chat platform (FB for example) knows who you are so that whole create account/auth business can be dispensed with. You can save your personal details like shipping address and payment methods once, and have them shared with companies whose bots you interact with. You don’t have to learn the organization of a new website with a unique information flow. You don’t even have to remember their domain name, and there are still plenty of opportunities for branding and positive customer interactions.