Kubernetes Eats Network

by | Dec 5, 2022 | Blog

Kubernetes eats infrastructure

Kubernetes is one of the most critical platform transitions in the history of computing. Kubernetes is how we deploy, schedule, and operate distributed cloud-native applications. The platform is quickly becoming the de-facto way modern applications deal with cloud infrastructure, including compute, storage, networking, GPU, and DPU resources. In the public cloud, at the edge, or in private environments (private data centers or colocation centers), Kubernetes is everywhere. Unlike VMs, Kubernetes abstracts and quantifies infrastructure resources in a friendly and intuitive way designed for people deploying cloud-native applications. It views infrastructure from a practical application-component-centric point of view. By abstracting infrastructure this way, Kubernetes enables application portability across multiple clouds and environments, private or public. (Of course, for as long as there are no direct “hard” dependencies on platform services provided by the respective clouds.)

The Cloud Native Landscape

As Kubernetes’ popularity grows, an already sizable ecosystem of operational tools is rapidly expanding. Like Kubernetes, these tools are cloud-native and can be adapted to any environment, in any cloud, private or public, leveraging the same infrastructure abstraction. Observability, deployment, and debugging across multiple clouds is performed in standard non-bespoke ways using de-facto open toolchains. Hundreds of thousands of developers, SREs, and DevOps teams adopt these tools for daily use without fear of being locked into a proprietary dead-end. These tools are shared across organizations, companies, and clouds, making knowing these tools a portable skill set applicable anywhere. 

Networking should become more cloud

Almost anywhere – none of this applies to modern networking. Even though standard cloud-native ops tools are commonplace for managing and monitoring cloud networking, we can’t say the same about physical networking off-public-cloud. Physical networking in private environments remains in the pre-cloud age. We still rely on bespoke vendor-specific tools that primarily serve vendors’ interests but not the practitioner. Management tools are quickly turning into a platform play for major networking infrastructure vendors. They use these tools, often offered hosted in the cloud, as a service, as means of locking-in customers to their broader offering. One of the main inspirations for this model was Cisco’s successful Meraki platform. Meraki consumerized enterprise/campus networking around their cloud-hosted management platform, greatly simplifying network operations. But at the cost of reducing the choice of hardware systems and vendors that the platform supports. A similar product line transition is happening with data center platforms – Cisco’s NEXUS Cloud, Arista’s Cloud Vision, and, to a lesser degree, Juniper’s Apstra. Network management tools are now a mechanism for operationally locking out competition. 

Historically, the form factor of being delivered as a hardware appliance defined everything – it’s just a “box.” The way we manage networks is just like that – as a bunch of boxes – a box at a time. Networking was the original distributed application. The emergence of software network functions makes the distributed application nature of networking even more pressing. New networking applications or services can run anywhere – on switches, smartNICs, regular servers, or special service nodes (well, regular servers, but dedicated for networking.) Networking is becoming more software, and network operations are morphing into software ops. 

Networking is transforming into a cloud.

This “box approach” does not work for software – there’s a better way. Operationally, we need to bring networking closer to the computing and applications platforms. We must deliver the same experience enjoyed today by so many in the cloud, where NetOps is just a part of the normal DevOps process. As so many younger infrastructure specialists grow up in the cloud, we need to make physical networking feel, act, and operate the way these new infrastructure folks find familiar. 

Networking should become more Kubernetes

Despite large networking gear vendor thinking, network management tools aren’t the platform. The physical network is just a small, but essential part of the infrastructure picture. It’s responsible for connecting applications together, to the internet, and other clouds and environments. In any environment, physical networking is an integral part of the application. But it’s the only part managed in a specialized bespoke way that is often incompatible with the operations of the rest of the infrastructure and application stack. And that part of the stack will be entirely dominated by Kubernetes, just as in the public cloud. Kubernetes is a rapidly emerging de-facto infrastructure consumption platform and an excellent way of managing modern distributed apps – just what networking needs. It integrates well into a wide array of de-facto operational tools and works well with modern operational approaches like GitOps, infrastructure as code/data, continuous deployment, etc.  By utilizing Kubernetes, physical networking becomes just another component of the standard open infrastructure stack.

How do we get there?

A Network OS

First, we need an open Network Operating System (NOS) to deliver networking as a modern distributed app. The NOS should be modularized and containerized. We should be able to treat each of its components or services as just applications. With a little bit (well, maybe a lot) of cleanup, SONiC can deliver on these properties. SONiC already has a large community following and has the support of most white-box and branded system vendors. There’s some work required, though. A good number of initialization and housekeeping scripts must be decoupled, untangled, and modularized. The goal is to make each service independently managed and updatable without requiring broader reboots affecting other services.  A container runtime might need to be migrated from Docker to containerd to ensure smoother interoperability with Kubernetes and other modern cloud-native tools. 

Kubernetes Control Plane

Kubernetes is an excellent tool for deploying software and distributing configurations. It’s highly modular and extensible (via CRDs, controllers, and operators), making it an ideal foundation for infrastructure controllers. Turning a network of switches, smartNICs, and service nodes into a Kubernetes cluster allows us to treat the network as a bunch of distributed applications. They can be routing protocols, utility functions, observability probes, proxies, API gateways, policy enforcers, log collection agents, or debug bots. Today’s hardware is incredible; what we need now is better software: commercial, open-source, or custom-developed experiments.

Cloud-Native Ops and Observability

On the operational side, Kubernetes will simplify integration with commonly used cloud-native tools. This will allow the native use of observability/telemetry tools like ELK-stack, Prometheus, Jaeger, and Grafana. Automation tools like Terraform and ArgoCD will enable standard DevOps practices such as GitOps, and Infrastructure As Code. Integration with these tools will bring the same operational approaches enjoyed by DevOps and SRE teams in the public cloud.

Fabric Abstraction

We need to open up networking infrastructure to people who grew up in the cloud. Everyone who deals with cloud networking is intimately familiar with the concept of a VPC. VPC is a basic unit of network consumption and isolation in AWS. It lets you have your own “slice” of the cloud’s network, where you can run your applications. The concept is highly abstract and doesn’t leak the underlying implementation of networking. To the user, it doesn’t matter how the network is implemented under the hood. The user has no direct control over underlying networking concepts and protocols. The only way the user can interact with the network is the abstraction.

A similar method should be employed for the off-cloud physical network. The abstraction should not leak any instrumentation or protocol detail to the user and must be adaptable to multiple networking technologies via corresponding instrumentation policies. For example, based on the org, user, or properties of the VPC object, isolation can be implemented as either VLANs or BGP eVPNs. Instrumentation policies are owned and controlled by the network architecture team and aren’t changeable by the user allocating a VPC.

The future will be far more futuristic than originally anticipated.

Let’s bring the new open physical networking to the masses, simplify physical infrastructure by making it more cloud, and finally get physical networking out of the way.  We’re hard at work on this, are you going to join our revolution?