Redundant ISPs from a Linux router.
I have a need to survive ISP outages, but am not large enough to have real things like BGP and serious internet connections, so I am using a telco and a cable company with a few static IPs on each.
There are various tutorials on the internet for how to cope with this, but they seem to primarily involve using iptables MARK to stain packets and then use the iproute2 functions to route them. I dislike conjoining these two tools. I am using source routing to keep everything straight, though there is a gotcha involving SNAT that needs attention when a link goes down.
Requirements:
- Support more than one ISP.
- Use the right source IP on each link to not trip packet spoof detection.
- Survive a link going down and back up.
- Lots of IPv4 private address machines on the inside need to be NATed on the way out.
- IPv6 is mandatory. Using 6rd while my ISPs recover from being blindsided by that 20 year old RFC for IPv6.
- Some servers live outside the firewall/router, I won’t speak any more of them, but they are there.
Non-requirement:
- Load balancing. This can be addressed with your outgoing rules, but given the disparity in quality, there is little point in using the U-verse link for outbound connections if the Charter one is up. The inbound connections will still use it.
- A single point of failure router is fine with me.
Strategy Overview:
- Use a VLAN switch so I don’t need a flock of switches and multiple
ethernet ports on the router and “outside the firewall” boxes.
Not required, but when you see decimal points in my ethernet device names, those are the VLAN ids. - Use source routing so that all packets go out the interface that matches their source IP address.
- Use iptables SNAT to let the local machines out. Choose their SNAT address based on the outgoing interface. Let the routing rules do the routing.
- Use
conntrack
to forget cached SNAT mappings when a link goes down or comes up. This is important!
VLAN Switch, 802.11q Is Your Friend
Go read about 802.11q if you are not familiar. With this you need only one switch. You can have as many virtual LANs as you like and configure on a port by port basis which LANs appear on that port. If you have gear that doesn’t do 802.11q you can set a single VLAN to show up there and work fine without any changes to that device. You will pay more for a “smart switch” with 802.11q support, but you are going to save on the number of switches, cabling, and ethernet cards. (e.g. in January 2013 I paid $220 for a 24 port gigabit 802.11q switch.)
You will have to configure your switch. My NetGear switch is configured through a web interface apparently writing by a maniacal sociopath, but it can be made to do the job.
The Source Routing
We are going to need two auxiliary routing tables to hold rules for when we know we have a U-verse address or a Charter address. These are going to get names which means we add lines to /etc/iproute2/rt_tables, (which is just a file mapping numbers to names)…
echo "200 att" >> /etc/iproute2/rt_tables
echo "201 charter" >> /etc/iproute2/rt_tables
When an interface comes up, we are going to add an ip routing rule to force packets with a Charter source address to look in that charter routing table and go out the right interface, likewise for AT&T… (Notice the “throw” rules. Some people duplicate their main table here, but I’d never keep that in sync, so I defer to the main table instead.)
# This is what makes source routing happen
ip rule add from 99.178.257.57/29 table att
# get a fresh start on the routing table
ip route flush table att
ip route add default via 99.178.257.62 dev eth2.4 table att
# get the RFC1812 private networks out, they don't want to go out this interface
# the "throw" will make them go back to your regular routing tables.
ip route add throw 10.0.0.0/8 table att
ip route add throw 172.16.0.0/12 table att
ip route add throw 192.168.0.0/16 table att
The SNAT For Our Private Addresses
Nothing new here, yet…
iptables -t nat -A POSTROUTING -o eth2.4 -s 172.16.0.0/12 -j SNAT --to-source 99.178.257.57
iptables -t nat -A POSTROUTING -o eth2.4 -s 192.168.0.0/16 -j SNAT --to-source 99.178.257.57
iptables -t nat -A POSTROUTING -o eth2.4 -s 10.0.0.0/8 -j SNAT --to-source 99.178.257.57
But wait! Now we have a problem. iptables connection tracking is going
to learn these SNAT rules, and for instance, if you have a ping running,
it will happily keep trying the dead interface after you take one down.
The fix I’m using is to clear the SNAT connection tracking information
with an interface goes up or down. I use this in my
/etc/network/interfaces stanzas (install conntrack
first)…
# We need to make NAT'd addresses choose a new path
# e.g. ICMP echo will be stuck on a dead interface if it was using this one
up conntrack -D --src-nat
down conntrack -D --src-nat
Choosing the Best Interface for Outgoing Traffic
You will want to use a metric on your default routes in order to choose the best one. (Alternatively you can get into load balancing, but my asymmetry is too high to care about that.)
I do this by not using the gateway
declaration in my iface
stanzas,
but just do a up command instead…
#gateway 99.178.257.62 --- but we want an explicit metric, so we do it this way
up ip route add default via 99.178.257.62 dev eth2.4 metric 1 || true
… that is my shunned AT&T connection. I use a metric of zero on the Charter line so traffic prefers it, but will use AT&T if Charter goes down.
Now IPv6
IPv6 gets the same treatment, except you don’t have to screw with SNAT and conntrack, unless you really want to. Also, you will need some “-6” keystrokes. It helps to remember that those routing tables for att and charter are really four tables, two for IPv4 and two for IPv6.
I’ll just show you my Charter 6rd stanza, you can work it out from there.
iface charter6rd inet6 v4tunnel
# Force 6rd gateway to be on the Charter interface
pre-up ip route add 68.114.165.1 via 96.35.289.49 || true
# 2nd 32bits of this is my IPv4 address
address 2602:0100:6023:gd32::1
netmask 32
remote 68.114.165.1
endpoint 68.114.165.1
local 96.35.289.50
tty 64
up ip -6 rule add from 2602:100:6023:gd32::/59 table charter || true
down ip -6 rule del from 2602:100:6023:gd32::/59 table charter || true
up ip -6 route add default dev charter6rd table charter
post-down ip route del 68.114.165.1 via 96.35.289.49
up ip -6 route add 2000::/3 dev charter6rd metric 5
down ip -6 route flush dev charter6rd
What Is Wrong With This Strategy
When one of the ISPs is broken, I need to bring down their interface, otherwise traffic will happily still try to use it. There may be automated ways to do this, but I’m a simple barbarian and given the rarity of the events, I just use a little cron job that if it can’t see some portion of the internet out a particular interface, brings that interface down for a little while. I suppose playing with the default route metrics would be nicer, but like I said, simple barbarian. (I do have a nagging suspicion that if I were smarter about the load balancing it would “just work”. But I’m not.)