VMware ‘Disable DelayedAck’ Does Not Work?

KB ID 0001525

Problem

I’ve got a client that’s been having some performance issues with their VMs. Their storage vendor, (EMC) said that as a result of finding this in the logs;

[box]

B       02/28/19 09:50:53.953 scsitarg          117000e [INFO] System: iSCSI Logout Initiator Data: IP=192.168.200.161 Name=...-ec-21 Target Data: Port=2 Flags=0x00002002 Info=0x01200801
B       02/28/19 09:50:53.969 scsitarg          117000e [INFO] System: iSCSI Logout Initiator Data: IP=192.168.201.161 Name=...-ec-21 Target Data: Port=3 Flags=0x00002002 Info=0x01200801
B       02/28/19 09:51:16.413 Health              608fe [WARN] User: Host ESXi-01.petenetlive.com does not have any initiators logged into the storage system.
A       02/28/19 10:04:25.968 scsitarg          117000d [INFO] System: iSCSI Login Initiator Data: IP=192.168.200.161 Name=...-ec-21 Target Data: Port=2 Flags=0x00002002 Info=0x00000000 [Target]
B       02/28/19 10:04:26.034 scsitarg          117000d [INFO] System: iSCSI Login Initiator Data: IP=192.168.200.161 Name=...-ec-21 Target Data: Port=2 Flags=0x00002002 Info=0x00000000
A       02/28/19 10:04:31.996 scsitarg          117000d [INFO] System: iSCSI Login Initiator Data: IP=192.168.201.161 Name=...-ec-21 Target Data: Port=3 Flags=0x00002002 Info=0x00000000 [Target]
B       02/28/19 10:04:32.055 scsitarg          117000d [INFO] System: iSCSI Login Initiator Data: IP=192.168.201.161 Name=...-ec-21 Target Data: Port=3 Flags=0x00002002 Info=0x00000000
B       02/28/19 10:04:57.438 Health              608fc [INFO] User: Host ESXi-01.petenetlive.com is operating normally.
Host Host ESXi-01.petenetlive.com is accessing lun Datastore_3 as HLU 3, After the initiators for this host start logging in/logging,  unit attention update events will be logged as the paths to the luns have changed this is expected
2019/02/28-09:50:41.607527 ~~~~     7F3C92369703      std:TCD:   Unit Attention update from 0000001A to 0001030D for LUN 0x3.
2019/02/28-10:02:55.860669 ~~~~     7FE476E61702      std:TCD:   Unit Attention update from 00010149 to 00010157 for LUN 0x3.

[/box]

We should disable DelayedAck and they kindly gave me the VMware KB that outlined the procedure.

Solution

The procedure outlined (for VMware 6.x) is to put the host in maintenance mode, then edit the properties of the iSCSI controller(s), untick the DelayedAck options, reboot the Host, and everything will be peachy. However, even though (post reboot) everything looks good in the the vSphere Web console. If you look on the host you may find something like this;

[box]

vmkiscsid --dump-db | grep Delayed

[/box]

DelayedAck = ‘1’ means ENABLED, DelayedAck = ‘0’ means DISABLED

So half my iSCSI entries in the iSCSI database still have DelayedAck ENABLED?

Some Internet searching told me this was quite common, and that the best way to ‘fix‘ it was to, disable the iSCSI initiator, remove the iSCSI database, reboot and then setup iSCSI again;

[box]

cd /etc/vmware/vmkiscsid
esxcfg-swiscsi -d
rm -f vmkiscsid.db
reboot

[/box]

Which is fine IF YOU ARE USING A SOFTWARE iSCSI INITIATOR, I however was not, I had 2x dedicated hardware iSCSI HBAs on each host!

After many hours of messing about and trial and error, it became clear, I had to do things in a certain order, or DelayedAck would simply just be enabled whether I liked it or not. 🙁

Disable DelayedAck With Hardware iSCSI NICs / HBAs

MAKE SURE THE HOST IS IN MAINTENANCE MODE FIRST

Then take a note of your iSCSI setup, Port Groups, VMKernel Ports, and Physical NICs, you are going to delete the iSCSI database in a minute, and you will need to ‘rebind’ the VMKernel Ports and add the iSCSI targets back in again.

Manually remove your iSCSI target(s) for ALL the iSCSI NIC/HBA’s

Below if you re-run the command, vmkiscsid –dump-db | grep Delayed you will see there’s still some entries in the database with DelayedAck enabled! So unlike above (see example for software iSCSI) we are going to remove the iSCSI database, only here we don’t need to disable the software iSCSI initiator (because we are not using one!) Finally reboot the host.

[box]

cd /etc/vmware/vmkiscsid
rm -f vmkiscsid.db
reboot

[/box]

When the host is back online ADD in the Network Port Binding for the appropriate VMkernel adaptor.

Like so;

DON’T RESCAN THE CONTROLLER AS PROMPTED TO DO SO!

On the Advanced Settings of EACH hardware iSCSI NIC/HBA > Edit > UNTICK ‘DelayedAck’.

Double check they are both still unticked (I’ve seen them re-tick themselves for no discernible reason!) Then rescan the controller(s).

Target > Add.

Re-add the iSCSI target back in, (that you took note of above).

Select the Target > Advanced > Untick the DelayedAck option (Note: This time it’s not inherited). Repeat for any additional iSCSI targets.

When they are all added, rescan the storage controllers again.

Finally recheck all the database entries are set to DISABLED.

[box]

vmkiscsid --dump-db | grep Delayed

[/box]

Related Articles, References, Credits, or External Links

Thanks to Russell and Iain for their patience while I worked all that out!