Lessons captured after the project closed -- the kind of detail that doesn't make it into the Cisco docs.
We upgraded 13 Cat9300X / 9500 switches in install-mode and came out with a list of things we wish someone had told us up front. In rough order of "most likely to bite you":
If you don't issue install commit within 360 minutes of install activate, the switch reverts on its own. Sounds great until your debug session crosses the line.
install commit needs an interactive shellYou can't fire-and-forget it from a single SSH command. Use paramiko invoke_shell(), enable, then send the commit -- otherwise AAA exec authorization may quietly refuse the elevation.
install activate, not install activate file ...Once you've added the image, the bare form is correct. The longer form fails silently in some 17.x trains.
write memory before install addIf you skip this and the add fails part-way, you can lose unsaved config across the reload. Save first.
show install summary distinguishes "U" (uncommitted / active) from "C" (committed). They are not the same. Don't claim "done" on U.
An aaa authorization exec default group radius local if-authenticated chain stops on explicit AUTHZ-REJECT from RADIUS rather than falling through. We locked ourselves out of one switch this way; recovery required serial console.
SCP push is slow (~140 KB/s) and shares the AAA path. Switch-initiated copy http://... hit ~5.8 MB/s and didn't get killed by ANSIBLE_PERSISTENT_COMMAND_TIMEOUT on transient drops.
Don't fold install remove inactive into the upgrade playbook. Run it as a separate, deliberate pass once you've confirmed the new image is stable.
Several older 3750s in the same fleet were stuck on auth/noauth only. Audit the image with show version | inc Cisco IOS Software before planning an SNMPv3 push.