VMware ‘Disable DelayedAck’ Does Not Work?

KB ID 0001525

Problem

I’ve got a client that’s been having some performance issues with their VMs. Their storage vendor, (EMC) said that as a result of finding this in the logs;

[box]

B       02/28/19 09:50:53.953 scsitarg          117000e [INFO] System: iSCSI Logout Initiator Data: IP=192.168.200.161 Name=...-ec-21 Target Data: Port=2 Flags=0x00002002 Info=0x01200801
B       02/28/19 09:50:53.969 scsitarg          117000e [INFO] System: iSCSI Logout Initiator Data: IP=192.168.201.161 Name=...-ec-21 Target Data: Port=3 Flags=0x00002002 Info=0x01200801
B       02/28/19 09:51:16.413 Health              608fe [WARN] User: Host ESXi-01.petenetlive.com does not have any initiators logged into the storage system.
A       02/28/19 10:04:25.968 scsitarg          117000d [INFO] System: iSCSI Login Initiator Data: IP=192.168.200.161 Name=...-ec-21 Target Data: Port=2 Flags=0x00002002 Info=0x00000000 [Target]
B       02/28/19 10:04:26.034 scsitarg          117000d [INFO] System: iSCSI Login Initiator Data: IP=192.168.200.161 Name=...-ec-21 Target Data: Port=2 Flags=0x00002002 Info=0x00000000
A       02/28/19 10:04:31.996 scsitarg          117000d [INFO] System: iSCSI Login Initiator Data: IP=192.168.201.161 Name=...-ec-21 Target Data: Port=3 Flags=0x00002002 Info=0x00000000 [Target]
B       02/28/19 10:04:32.055 scsitarg          117000d [INFO] System: iSCSI Login Initiator Data: IP=192.168.201.161 Name=...-ec-21 Target Data: Port=3 Flags=0x00002002 Info=0x00000000
B       02/28/19 10:04:57.438 Health              608fc [INFO] User: Host ESXi-01.petenetlive.com is operating normally.
Host Host ESXi-01.petenetlive.com is accessing lun Datastore_3 as HLU 3, After the initiators for this host start logging in/logging,  unit attention update events will be logged as the paths to the luns have changed this is expected
2019/02/28-09:50:41.607527 ~~~~     7F3C92369703      std:TCD:   Unit Attention update from 0000001A to 0001030D for LUN 0x3.
2019/02/28-10:02:55.860669 ~~~~     7FE476E61702      std:TCD:   Unit Attention update from 00010149 to 00010157 for LUN 0x3.

[/box]

We should disable DelayedAck and they kindly gave me the VMware KB that outlined the procedure.

Solution

The procedure outlined (for VMware 6.x) is to put the host in maintenance mode, then edit the properties of the iSCSI controller(s), untick the DelayedAck options, reboot the Host, and everything will be peachy. However, even though (post reboot) everything looks good in the the vSphere Web console. If you look on the host you may find something like this;

[box]

vmkiscsid --dump-db | grep Delayed

[/box]

DelayedAck = ‘1’ means ENABLED, DelayedAck = ‘0’ means DISABLED

So half my iSCSI entries in the iSCSI database still have DelayedAck ENABLED?

Some Internet searching told me this was quite common, and that the best way to ‘fix‘ it was to, disable the iSCSI initiator, remove the iSCSI database, reboot and then setup iSCSI again;

[box]

cd /etc/vmware/vmkiscsid
esxcfg-swiscsi -d
rm -f vmkiscsid.db
reboot

[/box]

Which is fine IF YOU ARE USING A SOFTWARE iSCSI INITIATOR, I however was not, I had 2x dedicated hardware iSCSI HBAs on each host!

After many hours of messing about and trial and error, it became clear, I had to do things in a certain order, or DelayedAck would simply just be enabled whether I liked it or not. 🙁

Disable DelayedAck With Hardware iSCSI NICs / HBAs

MAKE SURE THE HOST IS IN MAINTENANCE MODE FIRST

Then take a note of your iSCSI setup, Port Groups, VMKernel Ports, and Physical NICs, you are going to delete the iSCSI database in a minute, and you will need to ‘rebind’ the VMKernel Ports and add the iSCSI targets back in again.

Manually remove your iSCSI target(s) for ALL the iSCSI NIC/HBA’s

Below if you re-run the command, vmkiscsid –dump-db | grep Delayed you will see there’s still some entries in the database with DelayedAck enabled! So unlike above (see example for software iSCSI) we are going to remove the iSCSI database, only here we don’t need to disable the software iSCSI initiator (because we are not using one!) Finally reboot the host.

[box]

cd /etc/vmware/vmkiscsid
rm -f vmkiscsid.db
reboot

[/box]

When the host is back online ADD in the Network Port Binding for the appropriate VMkernel adaptor.

Like so;

DON’T RESCAN THE CONTROLLER AS PROMPTED TO DO SO!

On the Advanced Settings of EACH hardware iSCSI NIC/HBA > Edit > UNTICK ‘DelayedAck’.

Double check they are both still unticked (I’ve seen them re-tick themselves for no discernible reason!) Then rescan the controller(s).

Target > Add.

Re-add the iSCSI target back in, (that you took note of above).

Select the Target > Advanced > Untick the DelayedAck option (Note: This time it’s not inherited). Repeat for any additional iSCSI targets.

When they are all added, rescan the storage controllers again.

Finally recheck all the database entries are set to DISABLED.

[box]

vmkiscsid --dump-db | grep Delayed

[/box]

Related Articles, References, Credits, or External Links

Thanks to Russell and Iain for their patience while I worked all that out!

Cisco IOS – Configuring Switch to Switch MACSEC

KB ID 0001000 

Problem

My colleague had to set this up on the test bench today, and it looked infinitely more interesting that what I was doing, so I grabbed my console cable, and offered to ‘help’.

This was done on two Cisco Catalyst 3560-X switches, each with a 10G Service Module (C3KX-SM-10G), and 1Gb SFP modules (Note: Not 10Gb ones, this will become important later).

Solution

1. First hurdle was, when we tried to add the first command to the interface ‘cts man’ it would not accept the command, you need to make sure you are running either IP Base, or the IP Services feature set.

Note: We are running the universal IOS image this allows us to do the following;

[box]

Switch(config)#license boot level ipbase
PLEASE READ THE FOLLOWING TERMS CAREFULLY. INSTALLING THE LICENSE OR

LICENSE KEY PROVIDED FOR ANY CISCO PRODUCT FEATURE OR USING SUCH
PRODUCT FEATURE CONSTITUTES YOUR FULL ACCEPTANCE OF THE FOLLOWING
TERMS. YOU MUST NOT PROCEED FURTHER IF YOU ARE NOT WILLING TO BE BOUND
BY ALL THE TERMS SET FORTH HEREIN.

You hereby acknowledge and agree that the product feature license
is terminable and that the product feature enabled by such license
may be shut down or terminated by Cisco after expiration of the
applicable term of the license (e.g., 30-day trial period). Cisco
reserves the right to terminate or shut down any such product feature
electronically or by any other means available. While alerts or such
messages may be provided, it is your sole responsibility to monitor
your terminable usage of any product feature enabled by the license
and to ensure that your systems and networks are prepared for the shut
down of the product feature. You acknowledge and agree that Cisco will
not have any liability whatsoever for any damages, including, but not
limited to, direct, indirect, special, or consequential damages related
to any product feature being shutdown or terminated. By clicking the
"accept" button or typing "yes" you are indicating you have read and
agree to be bound by all the terms provided herein.

ACCEPT? (yes/[no]): yes
Switch(config)#
Mar 30 01:43:18.513: %IOS_LICENSE_IMAGE_APPLICATION-6-LICENSE_LEVEL: Module name
= c3560x Next reboot level = ipbase and License = ipbase

[/box]

Then reload the switch.

2. Then this jumped up and bit us;

[box]

Mar 30 01:32:07.400: %CTS-6-PORT_UNAUTHORIZED: Port unauthorized for int(Te1/1)
Mar 30 01:32:19.379: %PLATFORM_SM10G-3-SW_VERSION_MISMATCH: The FRULink 10G Service Module (C3KX-SM-10G) in switch 1 has a software version that is incompatible
with the IOS software version. Please update the software. Module is in pass-thru mode.

[/box]

3. If you issue the following command, you can see the difference (highlighted).

[box]

Switch#show switch service-modules
Switch/Stack supports service module CPU version: 03.00.76
                          Temperature                     CPU
Switch#  H/W Status       (CPU/FPGA)      CPU Link      Version
-----------------------------------------------------------------
 1       OK               41C/47C         ver-mismatch  03.00.41


Switch#

[/box]

4. So a quick download from Cisco later, with the file on a FAT32 formatted USB drive.

[box]

Switch#archive download-sw usbflash0:/c3kx-sm10g-tar.150-2.SE6.tar
examining image...
extracting info (100 bytes)
extracting c3kx-sm10g-mz.150-2.SE6/info (499 bytes)
extracting info (100 bytes)

System Type: 0x00010002
Ios Image File Size: 0x017BDA00
Total Image File Size: 0x017BDA00
Minimum Dram required: 0x08000000
Image Suffix: sm10g-150-2.SE6
Image Directory: c3kx-sm10g-mz.150-2.SE6
Image Name: c3kx-sm10g-mz.150-2.SE6.bin
Image Feature: IP|LAYER_3|MIN_DRAM_MEG=128
FRU Module Version: 03.00.76

Updating FRU Module on switch 1...
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!
!!!!!!!!!!!!!!!!!!!

Updating FRU FPGA image...

FPGA image update complete.
All software images installed.
Switch#

[/box]

Configuring One Switch Uplink for MACSEC

1. Notice I’m configuring GigabitEthernet 1/2 NOT TenGigabitEthernet 1/1, this is because I’m using 1Gb SFP’s, both interfaces are listed in the config! (This confused us for about twenty minutes). We are not using dot1x authentication, we are simply using a shared secret password (abc123). Note: This has to be a hexedecimal password i.e numbers 0-9 and letters a-f.

[box]

Switch(config)#interface GigabitEthernet 1/2
Switch(config-if)#cts man
% Enabling macsec on Gi1/2 (may take a few seconds)...

Switch(config-if-cts-manual)#no propagate sgt
Switch(config-if-cts-manual)#sap pmk abc123 mode-list gcm-encrypt
Switch(config-if-cts-manual)#no shut
Switch(config-if)#
Mar 30 01:59:03.800: %CTS-6-PORT_UNAUTHORIZED: Port unauthorized for int(Gi1/2)
Mar 30 01:59:04.799: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthe
rnet1/2, changed state to down
Mar 30 01:59:05.805: %LINK-3-UPDOWN: Interface GigabitEthernet1/2, changed state
to down
Mar 30 01:59:08.339: %LINK-3-UPDOWN: Interface GigabitEthernet1/2, changed state
to up
Mar 30 01:59:09.329: %CTS-6-PORT_UNAUTHORIZED: Port unauthorized for int(Gi1/2)
Mar 30 01:59:10.016: %CTS-6-PORT_AUTHORIZED_SUCCESS: Port authorized for int(Gi1/2)
Mar 30 01:59:11.023: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthe
rnet1/2, changed state to up

[/box]

Configuring A Port-Channel Switch Uplink for MACSEC

1. Configure MACSEC on both physical interfaces, before you ‘port-channel’ them. The second interface (when using 1GB SFP’s), is GigabitEthernet 1/4.

[box]

!
interface Port-channel1
switchport trunk encapsulation dot1q
switchport mode trunk
!
!
interface GigabitEthernet1/2
switchport trunk encapsulation dot1q
switchport mode trunk
cts manual
no propagate sgt
sap pmk abc123 mode-list gcm-encrypt
channel-group 1 mode on
!

interface GigabitEthernet1/4
switchport trunk encapsulation dot1q
switchport mode trunk
cts manual
no propagate sgt
sap pmk abc123 mode-list gcm-encrypt
channel-group 1 mode on
!

[/box]

 

Related Articles, References, Credits, or External Links

Thanks to Steve Housego (www.linevty.com) for doing 97% of the hard work, whilst being slowed down by my ‘help’.