Software defined networking with BGP and Quagga

What we needed

Most of our infrastructure runs on Docker, on bare metal clusters. At some point we needed to give the ability to our apps to talk to each other using overlay networking instead of the usual Docker port binding. We could've used service discovery and port binding, but it would've been more complicated than we needed: we would've had to bind several ports for a single app on a single host if we wanted to have several replica per host and change our load balancing configuration each time to match the topology change. Plus, the setup of an overlay network is a nice preamble to a migration on a Kubernetes cluster.

Why did we chose BGP?

Our software choice comes from a simple observation:
Internet relies on a commonly unknown protocol, a.k.a. BGP. The very principle of this protocol is to provide dynamic routing based on some basic principles. A lot of network defining softwares are relying on either OSPF or BGP. We chose BGP for its simplicity and proven robustness. A lot of open-source initiatives software made similar design choices.

Why did we chose Quagga?

Quagga is quite reliable and often seen in networking stacks with proprietary hardware such as Cisco or Juniper. It is still in active development and has had a lot of updates over the time even though Zebra (the routing program behind Quagga) seems to be a bit outdated from an outside perspective.

A bit of tech

Since Kubernetes requires overlay networking, the documentation suggest to use a various set of tools to achieve this. A lot of those tools are either blackboxes or unpractical in our use case. We thought that using this network abstraction would also be nice for our "not yet kubernetes compliant" applications, so we did a bit of testing and got a satisfying result.

A cluster, a lot of networks.

The very principle of the overlay network is to provide a network for each node inside a cluster accessible by every other nodes. Our stack looked like this from a logical point of view :
network_vs_overlay-1
As you can see, each docker container needs a individual port binding to work properly. Those ports then have to be mapped in our load-balancers to receive some trafic.

Then with the overlay network we created a basic abstraction of this bare-metal entanglement :
network_vs_overlay2
Now each container has its own routable IP address and can use the same port even on the same host.

This topology change is given by BGPd on Quagga, a sample configuration snippet could look like this:

! Ansible managed
log file /var/log/quagga/bgpd.log
!debug bgp events
!debug bgp filters
!debug bgp fsm
!debug bgp keepalives
!debug bgp updates
router bgp 65500
  bgp router-id 172.16.200.1
! # This is the ipaddress of the observed host
  timers bgp 30 90
  redistribute static
! # we want to send away our static routes
  network 10.0.0.0 mask 255.255.255.0 
! # This is the docker0 network, so we need to append 
! # the "bip": "10.0.0.1/24" flag to docker's daemon.json
!
! # Following : a description of our neighbors in the same AS
  neighbor 172.16.200.10 remote-as 65500
  neighbor 172.16.200.10 route-map foo out
  neighbor 172.16.200.10 route-map foo in
  neighbor 172.16.200.10 activate
  neighbor 172.16.200.200 remote-as 65500
  neighbor 172.16.200.200 route-map bar out
  neighbor 172.16.200.200 route-map bar in
  neighbor 172.16.200.200 activate
!
!
!
! # We set the same preference to each router
route-map foo permit 10
  set local-preference 222
!
route-map bar permit 10
  set local-preference 222

The resulting routing table is quite straightforward :

$ ip r|grep -i zebra
10.0.2.0/24 via 172.16.200.10 dev eth1  proto zebra 
10.0.3.0/24 via 172.16.200.200 dev eth1  proto zebra 

We now have a fully operational overlay network. At this point, you may think that a classic SDN tool like calico would be easier to manage. Upon a certain scale it could be true, but we also need to take account of the main constraint in our environment : we are not on a public cloud. Therefore, we need to manually manage some stuff. Fortunately for us, a long time ago, ansible appeared in our world to make our life easier. At this moment, rolling a topology change in our overlay stack is a matter of seconds, with no service interruption whatsoever.

Quagga being not self sufficient, we added a home-brewed service discovery software to ensure that all of our live apps' capability to communicate with eachother and receive trafic from our load-balancers. We subsequently have been able with this networking feature to enable automated gossiping between our apps and do a lot of other fun stuff.

For Kubernetes : Load balancers, ingresses and stuff

Since there is a lot to read on the Internet about those ones, I think it's better to point out the "good ones", rather than poorly paraphrasing :

Our load-balancers are aware of every app on every container, they are able to send packets to each application based on our ACLs. Ingresses will be coming in an other article to be written.

Paving the way to multihomed infrastructure

Since we have a reproductible network model, why not apply it to a L2/L3 interconnection? You can check why and how here.

Acknowledgments

Even though the setup is quite simple while it is running, starting it from scratch was quite a ride. We have to acknowledge here the help provided by Paul Jakma on some steps of debugging our BGP setup.

Show Comments