OpenWrt Upgrading
31/May 2021
Management of your home network is educational and allows you to control a small data center. When it comes to WiFi, vendor management consoles are consumer oriented, have limited functionality, and lose support. The OpenWrt project can update the firmware on many vendor devices to an open-source, embedded Linux system focused on network management. I tried it out in 2015, upgraded it a few times, but have been unsuccessful trying to upgrade it to latest release for years now. I will chronicle the successive automation steps and a framework leading to benchmarking OpenWrt!
WiFi Access Point Devices
Thanks to a house mate who worked at Juniper? back in the late 90’s, he provided a WiFi B device and we experienced wireless networking, which was an amazing, easy, and liberating experience!
Later, we moved homes, and I purchased a D-Link device for WiFi G? access. It was a much faster network and easy to configure through a web console. I could finally experiment and learn about NAT, DHCP, and firewall port-forwarding on our home network. I had a Linux iptables
based firewall with a web GUI (the project is now discontinued) on a repurposed i386 Dell Optiplex with extra NICs protecting the WiFi access point. Again, D-Link consumer experience was great, so I never thought about long-term network management; it was a pet. When that device started to fail in 2015, I researched open source alternatives because vendor firmware could have security bugs, slow patch cadence, and become unsupported. I had encountered the same situation for cable and DSL modems.
I found the OpenWrt project, purchased a Linksys AC1900 device, and installed OpenWrt 15. Again, the experience was easy to run B-G and N-AC networks, so I never thought about long-term management; it remained a pet. Some of the house devices couldn’t support the latest PSK authentication, so we settled for the lowest common denominator for wireless authentication. When we moved in 2016, I decommissioned many devices, including the firewall. I was happy enough with OpenWrt on Linux to use it for our entire home network. One feature that distinguished it was dual firmware install partitions: if an upgrade or configuration change failed, rebooting to the previous partition or factory reset could return you back to production quickly. This facilitated experimentation and learning!
Upgrading OpenWrt
The OpenWrt project community split and LEDE seemed to be a path forward for upgrading, but upon upgrading and rebooting, the LEDE firmware never seemed to work. Fortunately, a reboot would return OpenWrt to a working configuration. The demands by my family for WiFi uptime at our home made it onerous to do experiments at any time with much downtime.
LEDE would come up with a web console and I could configure it, but the upstream WAN network to the ISP didn’t work. I asked for help in the forums, but it was hard to troubleshoot because I couldn’t figure that any problems out from the web console logs or error messages. Uptime demands limited my attempts, so it never occurred to me to start troubleshooting from SSH/CLI, which eventually helped reveal the root cause. I was stuck without security upgrades for a long time: a hand maintained pet that was end of life! This was no better than a dead-end commercial device, but open source allows the story to continue on.
Fortunately, the LEDE project and OpenWrt reconciled over the years. When I learned that OpenWrt wouldn’t continue to support my Linksys device because it lacked enough memory for the modern releases, I looked for a second, modern WiFi device to replace it: I wanted the latest standard and highest speed, of course! My research showed that WiFi 6 vendors hadn’t supported open source, so drivers for OpenWrt would be a long time coming, probably through reverse engineering. So I looked through the forums for a decent, cheap WiFi modern resourced device and landed on a Netgear AC2000/WiFi 5 devices.
I grabbed a used one off EBay for $50, upgraded the vendor firmware, and then installed OpenWrt, but I couldn’t get it to work because I couldn’t access the web console even though it did come up on the network. I tried another used Netgear device off EBay for another $50, it was mislabeled and an AC1750 device. After upgrading the vendor firmware, I installed OpenWrt, but had the same results. After a few tries reading the OpenWrt, I finally learned about the default SSH experience. My failure was because the default installation does not enable the web console and does not provide default WiFi networking! One must SSH into the device and provide a password and enable the WebGUI before that consumer-like experience returns.
Known Unknowns
After configuration through the web console, I cut over to the Netgear and turned off the Linksys. Much like LEDE on the Linksys, it couldn’t connect to the WAN via the cable modem, so I turned off the Linksys Wifi NAC radios and connected the Netgear behind the Linksys for a WAN to handle WiFi B, and that worked.
I found that the Playstation still couldn’t support the latest PSK2 authentication, but that wasn’t the real problem. After a day, my family complained about disconnects and problems getting on WiFi. Was it a placement/signal issue? Was it a driver issue? While I did see a lower signal using Android WiFi app, it wasn’t much lower and my usage seemed fine. What was the best way to test?
After a few days, the device seemed unresponsive overnight, so I rebooted it. I wasn’t happy with stability and AC1750 being slight slower than my original Linksys AC1900, so I reverted back to the Linksys. I had more questions than answers and limited time!
Earlier in 2021, through AlpineLinux.org becoming a popular, minimal Linux OS for container base images, I learned about BusyBox and became comfortable with it. Alpine used a different runlevel and packaging system, but they were easy to learn. OpenWrt leveraged BusyBox and used it’s own packaging system. I tried more experiments on the Netgear and found I could install Python3 and perform an Ansible inventory, so that was progress!
A New Plan: Cattle
I searched for the latest OpenWrt supported Linksys and purchased a new one, which it turned out to be refurbished, so I got 33% discount which made me happy. I put it on the LAN, updated the vendor firmware, installed OpenWrt, and SSH configured and enabled web console: all good! Configured and tested WiFi, all good! Then I tried to migrate from the old LinkSys on the WAN, but the new Linksys device couldn’t reach the WAN; this was the same problem as before.
We had a lightning storm that damaged my ethernet switch and a home server NIC, that stopped progress until I replaced the switch. When I returned to experimentation, I found that DHCP IPv4 client didn’t work with the cable modem from the new Linksys.
I SSH’d into the older Linksys and started to diagnose DHCP from the CLI. DHCP worked, so I captured the MAC address of the NIC.
Now that I had more than one pet WiFi device, it was time to cattle the WAN behind the cable modem with a switch rather than direct connect the WiFi device. I could speed up cycle time to test configurations by switching my desktop ethernet into each WiFi device to test configuration without disrupting the entire network’s connection to the cable modem.
classDiagram ISP <|-- CableModem CableModem <|-- Switch Switch <|-- WiFiRouter1 Switch <|-- WiFiRouter2 ISP: DHCP(DNS,IP) class CableModem{ IPv4 FromISP Firmware Proprietary +DHCP() +SNMP() +EthernetPort1(Switch) } class Switch{ Unmanaged +EthernetPort1(WiFiRouter1) +EthernetPort2(WiFiRouter2) +EthernetPort3-4(idle) +EthernetPort5(CableModem) } class WiFiRouter1{ +Manufacturer Linksys +Model WRT1900AC +Firmware OpenWrt_15 -EthernetPort5(Switch) -WANIPv4(FromCableModemDHCP) +DHCP() +SNMP?() +EthernetPort1-4(idle) +WiFi(BGN) +WiFi(NAC) } class WiFiRouter2{ +Manufacturer Linksys +Model WRT32X +Firmware OpenWrt_19 -EthernetPort5(Switch) -WANIPv4(FromCableModemDHCP) +DHCP() +SNMP?() +EthernetPort1-4(idle) +WiFi(BGN) +WiFi(NAC) }
What I found was that getting a DHCP lease from the cable modem was always unsuccessful for the new device and always successful for the original device. If I cloned the original device MAC address to the new device, it worked. I was puzzled why should a MAC address matter? I found an OpenWrt forum describe a hidden “feature” of cable modems: they only allow a new DHCP lease within a short period (two minutes for my modem) after booting.
A cable modem reboot allows one pet DHCP lease and the new device worked fine with this operation. While I can understand how this stabilizes the network for most consumers, it is counter-intuitive to the notion of a dynamic lease. When you have access to a cable modem management web GUI, you can learn and troubleshoot the network. To save on monthly rental costs, I bought a Zoom modem, replaced it with an Arris when it failed after many years (it was out of warranty and manufacturer left the market). Even though I could see logs, it never reported rejecting a new DHCP lease request. Arris tech support wasn’t helpful and I have never found this in any documentation! That is evil and had an opportunity cost of many troubleshooting hours over months while the new device capabilities were idle!
Since solving this issue, the Arris device has failed (out of warranty) and my internet provider now provides the first cable modem without fees. I replaced it for free, but lost management and troubleshooting access; it has the same DHCP lease operational behavior.
Automation with a Virtual Pet
Creating a VirtualBox OpenWrt VM allowed me to experiment with Ansible automation and a Galaxy role. I automate my split DNS configuration, providing an API key for Home-Assistant device presence automation and a HomePage widget, and full device configuration backup blobs. I haven’t been able to recreate any network configuration; I’m unsure how to mock WiFi radios, virtual NICs aren’t a suitable substitute.
I was able to experiment with the new OpenWrt 24.10 series virtually, giving me the confidence to update my pet device.
The Future
Until I get the entire OpenWrt configuration automated, I haven’t started proper experiments to learn and tune all of the new network parameters. The network configuration remains a pet, but fortunately it is very simple to recreate by hand if necessary, but I also have backups. I’ve conducted manual operations for tweaks with small experimental observations, but I minimize that effort to keep stability and uptime.
Automating VM creation and running with KVM is another task in progress with Ansible (versus Packer and OpenTofu).
Proper network parameter tuning requires automation to perform, record, regress, and analyze bandwidth and latency measurements. I plan to fully automate before getting a second hardware device, hopefully with a new generation of WiFi support. I want to isolate my living room and IoT devices with a second access point/dumb repeater, moving them to ethernet when possible to reduce WiFi congestion.