Fortinet AP Failure: Control Message Maximal Retransmission Limit Reached

I was troubleshooting the wireless network for a remote office because lot of users were complaining the WiFi is unstable and their devices got randomly disconnected. I checked the log and found out some APs are disconnected from the controller due to “Control message maximal retransmission limit reached”. And I came cross this document from Fortinet: https://kb.fortinet.com/kb/documentLink.do?externalID=FD40970

These messages imply that the keep alive packets ‘ECHO REQ (FGT)’ and ‘ECHO RESPONSE (FAP))’ were not successful or complete.

So basically, the APs are having troubles talking to the controller and the controller thinks the APs are dead because they don’t get any response from them.

The solution from the Fortigate document:

config wireless-controller global
set max-retransmit 3 <<<< default - please input integer value (0-64) ---> increase to 25
config wireless-controller timers
set echo-interval <1-255> ---> increase to 100
end

This allows the controller to talk to the AP more frequently and increase the number of re-transmissions before considering the AP dead. Great! Perhaps this will solve my problems. In retrospect, this solution only scratches the surface of the problems and didn’t address why the controller and AP are having troubles talking to each other in the first place.

After the configuration changes, users complaint that their devices still disconnected from the WiFi frequently. So, I looked at the log again and found the AP failures persisted. The increase of keep alive packets wasn’t helping. Therefore, I open a ticket with Fortinet support and promptly scheduled a remote session to investigate the controller and AP configuration. They did some optimizations such as reducing the transmit power as there were lots of interference in 2.4Ghz and set darrp-optimize to 0. But most importantly, they mention that this may be caused by the latency on the wire network between the AP and Fortigate.

That makes a lot of sense. All this time I’ve been suspecting that this is merely caused by some glitches of the configuration but never the physical network because the same setup is used on all the other offices. So, when the APs dropped by the controller again, I decided to look at the physical topology.

A very simple topology. APs are connected to a Linksys unmanaged switch, then an uplink to the Fortinet firewall. Could the problem caused by the middle unmanaged switch, I wonder? As the remote office doesn’t have a spare switch hanging around and there are enough ports on the firewall for the APs I decided to connect all the APs directly to the firewall, rooting out all the issues that might have caused by the middleman. By this time, most users have fallen back to previous WiFi network we tried to retire. The local IT team monitored the FortiAP for a few days and concludes the failures were put an end.

Looking back, the error message “Control message maximal retransmission limit reached” and " Echo request timeout" indicate that this is possibly a layer 2 issue. And the solution lies on the Layer 2 switch and Layer 1 the physical wiring.

Next post I will write a simple tutorial on how to connect multiple APs to a Fortigate firewall.