Ingress load balancing issues on Google’s GKE

Originally published on the Google Cloud Community blog at https://medium.com/google-cloud/ingress-load-balancing-issues-on-googles-gke-f54c7e194dd5

Usually my posts here are about some thing I think I might have figured out and want to share. Today’s post is about a thing I’m pretty sure I haven’t figured out and want to share. I want to talk about a problem we’ve been wrestling with over the last couple of weeks; one which we can suggest a potential fix for but do not yet know the root cause of. In short, if you are running certain types of services behind a GCE class ingress on GKE you might be getting traffic even when your pods are unready, as during a deployment for example. Before I get into the details here is the discovery story. If you just want the tl;dr and recommendations jump to the end.

[Update 4/17/2019 — Google’s internal testing confirmed that this is a problem with the front end holding open connections to kube-proxy and reusing them. Because the netfilter NAT table rules only apply to new connections this effectively short-circuited kubernetes’ internal service load balancing and directed all traffic to a given node/nodeport to the same pod. Google also confirmed that removing the keep-alive header from the server response is a work-around, and we’ve confirmed this internally. If you need the keep-alive header then the next best choice is to move to container native load balancing with a VPC-native cluster, since this takes the nodeport hop right out of the equation. Unfortunately that means building a new cluster if yours is not already VPC-native. So that is the solution… if you’re still interested in the story read on!]

Over the last couple of months we’ve been prepping one of our most critical services for migration to GKE. This service consists partly of an http daemon that handles long poll requests from our javascript client, and runs on 90 GCE instances. These instances handle approximately 15k requests per second at peak load. Because many of these requests are long polls with a timeout of 30 seconds we need the ability to gracefully shut down instances of this service. To accomplish this we have a command we can send the service that causes it to take itself out of rotation, wait 60 seconds for all existing long polls to complete, and then exit.

Continue reading