So last week I attempted an upgrade of our ACI environment from version 1.2 to version 1.3. I know 2.0 is already available, but does not offer anything we need at this point and the upgrade to 1.3 was done because of an annoying bug.
A minor upgrade shouldn’t be a big issue but apparently it was.
We started the upgrade normally. We uploaded the new software and started the APIC upgrade. Easy. Just follow the upgrade instructions and you’ll be fine. The APICs will all install the new software and reboot when required. ACI is even smart enough to wait for a rebooting controller to come back online and pass all the health checks. You can’t do anything wrong.
At least, thats what we thought. Apparently after installing the new version we couldn’t reach the APICs using HTTPS anymore. After some troubleshooting we had the following information:
- Ping doesn’t work
- SSH does work
- HTTP(S) doesn’t work
We started looking further. A collegue of mine tried to access the APIC from a server in the same network as the APICs (we use the out-of-band addresses on the APIC). That worked. We were baffled. We knew because of this behaviour it had to be a policy on the APIC itself. Fortunately, using the server in the same subnet as the APIC we had HTTPS access to ACI again which made it possible to troubleshoot. However, since we’re both fairly new at this and weren’t the guys who implemented the network we didn’t know where to look.
Fortunately the supplier did know where to look and helped us fix the problem. It was indeed a policy. I’ll come back on this topic in a bit.
Unfortunately this issue was my own fault. It is documented in the release notes for version 1.2(2), which I glanced over when preparing for the change. The actual text from Cisco is:
When upgrading to the 1.2(2) release, a non-default out-of-band contract applied to the out-of-band node management endpoint group can cause unexpected connectivity issues to the APICs. This is because prior to the 1.2(2) release, the default out-of-band that was contract associated with the out-of-band endpoint group would allow all default port access from any address. In 1.2(2), when a contract is provided on the out-of-band node management endpoint group, the default APIC out-of-band contract source address changes from any source address to only the local subnet that is configured on the out-of-band node management address. Thus, if an incorrectly configured out-of-band contract is present that had no impact in 1.2(1) and prior releases, upgrading to the 1.2(2) release can cause a loss of access to the APICs from the non-local subnets.
These release notes can be found here.
For all of you preparing to do the upgrade from 1.2(1) to a version higher, please remember this one as it will bite you.
To check whether you will encounter this you can go to Tenants > mgmt > Node Management EPGs > Out of Band EPG – default.
Here you can view whether you use the default contract. In our case a non-default contract was specified here. You can look up this contract at: Tenants > mgmt > Out of Band Contracts > Name of your contract
You need to specify HTTPS access in this contract to be able to reach the APIC.
Unfortunately I can’t post any screenshots here as the referenced environment is a production environment which I’m not allowed to show, but if you have any questions or need for clarification, please let me know.