Originally published on the Google Cloud Community blog at https://medium.com/google-cloud/vpc-native-clusters-on-google-kubernetes-engine-b7c022c07510
If you’re a GKE user and you’ve created a cluster within the last six months or so you might have noticed a new option:
You may also have caught the press release announcing this feature back in May, or the announcement last October of container-native load balancing for GKE pods, a related thing. VPC-native, container-native, alias IP: these all seem like fairly intimidating terms, and since this networking architecture is set to become the default for new clusters “soon” I thought it would be useful to relate what we’ve learned about it, based on creating and running both types of clusters in production and comparing the way they work.
First, the anxiety-mitigation portion of the post: running a cluster as VPC-native changes almost nothing inside the cluster itself. Nothing about the way your workloads are deployed, discovered or connected to by other workloads inside the cluster is affected. In fact if you compare two clusters, one using VPC-native and the other using the legacy approach, now inexplicably called “advanced routing,” you’ll find they’re pretty much identical from the inside down to the command line arguments passed to the kubelet, kube-dns and kube-proxy on startup. So you’re not going to break anything switching your workloads to a VPC-native cluster, unless you’re doing something stranger than I can currently imagine as I write this.
So what does it change? It alters the way routes are established to handle pod traffic between nodes. A little background: if you’re familiar with kubernetes networking or perhaps a reader of my post on the topic from last year, then you know that the default network assigned to pods in a cluster is a /14 that is divided into one /24 per node. In order for a pod in one /24 to communicate with a pod in another routes need to be set up at the project level. If you’re running a cluster with the old routing setup then what you’ll see when you look at the routes in the project containing your cluster is something like this:
In Google Cloud parlance these are called “custom routes,” and specifically they are “dynamic custom routes” because they are established and maintained by the control plane as the cluster is changed to add or remove nodes. Each node has a /24 assigned for its pods, and each /24 gets a route created for it so traffic will flow to the right instances. The key points are 1) that the pod network in this case has no relation to any subnet in the VPC and only the addition of these custom routes allows pod traffic to flow; and 2) these custom routes count against route quota for the project.
In a VPC-native cluster the pod network is no longer an abstraction implemented by custom routes. Instead the pod network, as well as the kubernetes services network, are set up as subnets on the VPC.
And if you examine the routes in your project’s VPC you’ll find there is just one that matters for pod traffic:
So now instead of having a bunch of dynamic custom routes covering all the nodes in the cluster the entire /14 is natively routable from anywhere else in the VPC. I’m not enough of an expert on Google’s networking backend to dive into what “natively routable” entails behind the scenes, but from a user perspective it simply means that we don’t need to use up route quota as we add nodes, and as we’ll see below it enables some cool features that I think will become pretty important over time.
Why do they also add the kubernetes services network to the VPC as a subnet? One of the cool features I mentioned above is that the pod network of a VPC-native cluster is integrated into the IP address management system of the VPC, which prevents double allocation of address space among other things. So although no project level routing is established for the services network adding it to the VPC as a subnet protects that range and prevents double allocation of the space. Why not add routes for the services network too? Packets to services IPs are currently DNATed onto the pod network on the originating node, so those routes would serve no purpose: the way to get into a cluster from the outside is still to use an ingress.
However the way an ingress works will change if you leverage VPC-native pod networking to make use of the container-native load balancing feature mentioned above. I have not used this feature yet so I am not going into any detail here. The promised capability is interesting for several reasons and I would encourage you to read the press release to get a sense of it. In brief, container native load balancing means that external GCP load balancers, such as those satisfying a request for ingress to a cluster, are able to load balance requests directly to a set of running pods rather than to the entire set of instances in the node instance group. This saves a hop, at least, and should improve performance. I’ll have more to say on this when I’ve had a chance to put it to work.
There are a couple of other useful things that emerge from establishing the pod network as a VPC subnet: clusters can scale with less networking impact, because nodes don’t use route quota; the pod IP network can be advertised over Cloud Router, Cloud VPN and Cloud Interconnect to support hybrid connectivity use cases; VPC-native pods can connect directly to googleapis.com services like BigQuery without NAT; and VPC subnet addresses pass Google’s anti-spoofing checks on instances by default, so you gain an additional layer of security; lastly, if you use VPC peering to connect projects Google will add routes in peered projects to handle traffic on the pod network. In fact if you want to connect to a cluster in a peered project then you must make that cluster VPC-native, because you can’t use static custom routes across a VPC peering connection as far as we’ve been able to determine.
That last point should be sufficient motivation to switch over to using VPC-native pod networking as you create new clusters (you can’t switch an existing cluster). Even if the examples I’ve given are not use cases you expect to deal with the other features and performance improvements make it worth checking out. Hopefully this post will serve to both provide an outline of the changes and possibilities that accompany this new architecture, and allay any anxiety about the affects it will have on your current development, deployment and operations practices.