I finally got round to replacing the SAN on my test network, I setup the new one via direct cable connection (10GbpsiSCSIDAC). I created vDisks and volumes, presented those volumes. Setup iSCSI bindings in vSphere, all vanilla stuff.
ESX hosts could not see the storage LUNS, they could see the SAN, but ‘add datastore‘ showed me no available storage.
Solution: Cannot See LUN
Two days! That’s what this cost me, I’ve spent over 20 years deploying storage (mostly HPE) but an assortment of HPE, Dell, IBM, NetApp, and a score of cheap alternatives. I manually changed the iqn names in VMware, I proved connectivity from VMKernels to the storage array with vmping. I updated the controller and card firmware – nothing.
I got a trusted colleague on the gear remotely to check I’d not done anyhting stupid, he made some suggestions, still no progress. I opened an quesiton on Experts Exchange – lot’s of good advice but nothing worked.
Then after trawling through old HPE and VMware forum posts I found a link to a video, it was an Indian chap deploying some iSCSI volumes to a Windows server, even though I don’t speak Hindi, I thought “What the hell I’ll watch it, and make sure (once again) I had not done anything stupid.
Then while mapping the new volume, he did something so simple and so mind bogglingly easy to miss, everyone I’d spoken to had missed it also. When mapping a volume you create a LUN (in this example LUN 10) Set the rights to ‘read-write’ and apply.
See those green ticks over the iSCSI ports they DO NOT MEAN present the storage through those ports. They simply mean there’s a working cable in those ports.
You must manually go to each port, and make sure the PORT IS TICKED so it looks like this.
Whoever designed that GUI needs a massive punch in the face.
Related Articles, References, Credits, or External Links
I got asked to do this by a client this week, HP has requested that this be set for connections to their Storevirtual VSA that had been having some problems.
Solution
I followed the instructions and was at first confused because I could not see the settings that needed changing? That’s because this only applies if you have MULTIPATHING enabled and set to ‘Round Robin’. So if your storage does NOT look like below, (All paths Active I/O). then this procedure is not applicable.
So assuming you are using round robin multipathing, and, <ahem!> the storage vendor hasn’t just pulled a solution from a list of things that might work, rather than actually diagnosing the problem. Then you can see the current setting with the following command;
[box]
esxcli storage nmp device list
[/box]
Take note of the iSCSI storage names, below you can see they all start with naa.6000, you can also see the IOPS value is set to 1000.
To change the value use the following command (change the value in red to match yours);
[box]
for i in `esxcfg-scsidevs -c |awk '{print $1}' | grep naa.6000`; do esxcli storage nmp psp roundrobin deviceconfig set --type=iops --iops=1 --device=$i; done
[/box]
Then recheck, the new value should be ‘1’.
Related Articles, References, Credits, or External Links
After ESX 5.5 Update 2, VMware added ATS Heartbeat.Some vendors, (like HPE SureStore and VSA) recommend that this is disabled. I can’t find any info about whether it’s safe to do this in production, so to be on the safe side I placed the hosts in maintenance mode first.
Enter Maintenance Mode
Use the following command;
[box]
vim-cmd /hostsvc/maintenance_mode_enter
[/box]
Disable ATS Heartbeat
Use the following command to disable;
[box]
esxcli system settings advanced set -i 0 -o /VMFS3/UseATSForHBOnVMFS5
[/box]
Then confirm it worked with following command;
[box]
esxcli system settings advanced list -o /VMFS3/UseATSForHBOnVMFS5
[/box]
Confirm that INT Value is set to 0 (zero).
Exit Maintenance Mode
Use the following command;
[box]
vim-cmd /hostsvc/maintenance_mode_exit
[/box]
Related Articles, References, Credits, or External Links
I was recently involved in deploying an HPe Synergy 12000 Frame. And the network connections from it were ‘a little unusual’ so I thought I’d document that here, to save anyone else the problems I had.
I was connecting to an HP/Aruba 5412 switch so my cables were all HP/Aruba (to be on the safe side).
What you can see (above) is the MPIO Cable (K2Q46A P/N 800867-001) fixed onto the left (and above boxed,) there is a QSFP (P/N 817040-B21.) Note: this can be used either as 4 x 10Gbe or 4 x 8GbFC). On the right you can see the cable ends in 4x Standard LC fibre connectors, so you will also need 4x 10GB SR SFP+ Modules (Aruba P/N J1950D) – shown bottom right.
So what does it do? (Apart from cost a fortune!) Well the QSFP connects at 40Gb and splits the traffic down into 4 x 10Gb
Cabling and Configuring MPIO QSFP
Connecting up is pretty straight forward, REMEMBER when you connect the 40GB QSFP to the Synergy it will light purple if its connected, and flash purple when it sees activity.
Connecting to the switch is also easy enough, (WARNING: All the ports need to be Trunked (HP) or Ether Channelled (Cisco,)) with LACP enabled. You don’t need to worry about configuring LACP on the Synergy, that’s handled automatically by the ‘Uplink set’.
So the back of the ‘Frame’ has two interconnect links (If you are from a switch background think of these like stacking cables). And two MPIO uplink cables.
HPe/Aruba Switch Config For MPIO
As previously stated, the switch I’m using is an Aruba 5412, with two 8 Port 1Gb/10Gb modules (J9993-A). Here’s the relevant switch config;
[box]
Firstly give the interfaces a sensible name;
!
interface A2
name "Trunk Link to Synergy VC1 Port Q1"
exit
interface A3
name "Trunk Link to Synergy VC1 Port Q1"
exit
interface A4
name "Trunk Link to Synergy VC1 Port Q1"
exit
interface A5
name "Trunk Link to Synergy VC1 Port Q1"
exit
!
interface B2
name "Trunk Link to Synergy VC2 Port Q1"
exit
interface B3
name "Trunk Link to Synergy VC2 Port Q1"
exit
interface B4
name "Trunk Link to Synergy VC2 Port Q1"
exit
interface B5
name "Trunk Link to Synergy VC2 Port Q1"
exit
!
Show any 'already configured' Trunk links with a 'show trunk' commandIn my case two existed, (Trk1 and Trk2). So I used Trk3;
!
trunk A2-A5,B2-B5 Trk3 LACP
!
Now UNTAG vlan 1 (assuming that's your default VLAN) And TAG and VLANS that need to be used in the Synergy Deployment. (Note on an HP switch simply add
the Trk3 to the existing settings like so;
!
vlan1
untagged A6-A8,B6-B8,E1-E24,F1-F24,G3-G12,H3-H12,Trk1-Trk3
!
vlan 100
tagged Trk1-Trk3
exit
vlan 101
tagged Trk1-Trk3
exit
vlan 102
tagged Trk1-Trk3
exit
vlan 103
tagged Trk1-Trk3
exit
etc.
[/box]
Cisco Switch Config For MPIO
If you have a Cisco Switch then instead of ‘Trunking’ you will be ‘Ether Channelling’ for a more detailed explanation see the following post
The process is, you add Networks, then collect Networks together in Network Sets, Then you create Logical Interconnect Groups. Part of creating a Logical Interconnect Groups, involves creating an Uplink Set, which consists of both your Networks, and the The Uplink ports.
Note: A Network Set is used by a Server Profile, (or a Server Profile Template).
Create Networks
One View > Networking > Networks > Create Network
Create Network Sets
One View > Networking > Networks Sets > Create Network Set > Give it a name > Add Networks > Create.
Create Logical Interconnect Group
One View > Networking > Logical Interconnect Group > Create Logical Interconnect Group > Give it a name > Select the correct Interconnect Bay Set (see diagram above) > Select Interconnects > Add Uplink Set.
Give the set a name > Select the Type > Add in the Networks > Add in the Uplinks > Create.
Note: You only need to add in ALL the LOGICAL interfaces i.e. Q1:1, Q1:2,Q1:3,Q1:4 for EACH Interconnect module. .
After a few minutes if you look under One View > Networking > Logical Interconnects > You will see one listed that has the name of your Logical Interconnect group (with a divide symbol on the end!) Make sure ALL the logical uplinks are connected. (If not you will see LACP errors on the switch).
Related Articles, References, Credits, or External Links
You attempt to upgrade the firmware on this unit and, it applies to the first controller, that restarts and then it constantly tries to update the other controller.
Solution
If you’re reading this then you’re probably already in an upgrade loop? The fist step is to stop it looping then sort the firmware out.
1. Remove controller B (bottom one) from the SAN, (Slacken the 2 thumb screws on the end of the controller, then press the release catches DOWN to eject the controller).
2. Connect to the web management console of controller A > Select the SAN > Configuration > Advanced Settings > Firmware > Untick the “Partner Firmware Update” option > Apply.
3. Check the firmware version > If controller A needs updating > Tools > Update Firmware > Browse to the downloaded firmware update.
4. Restart the controller, this can take a while, what I tend to do is connect the serial cable to the controller, open a HyperTerminal Session (Settings 115200-8-N-1-Off) then issue a “show controllers” command and repeat till it is back up again.
5. Shut down the SAN, replace controller B, and remove controller A > Power on the SAN >The repeat the firmware update on controller B.
6. When done restart controller B.
Note You can manually downgrade the firmware on controller B if you DONT want to disable Partner Firmware upgrades.
CLI Note: You can restart the controllers form a console session with the following, commands
restart mc both {enter} Restarts controllers A and B.
or
restart mc a {enter} Only restarts controller A.
restart mc b {enter} Only restarts controller B.
Related Articles, References, Credits, or External Links
Normally I simply connect a new MSA to a clients network, and it gets it’s address from DHCP. Then I can get the address for the DHCP Scope, and point my web browser at it.
Yesterday I was starting with new virtual infrastructure and had no DHCP. With the G1 and G2 models, you got a console/serial cable and could just terminal in. With the G3 they have replaced the serial socket with a mini USB socket. Each time I put in a new P2000, I think “I wonder how that USBCLI socket works?” Yesterday I had to find out.
Solution
The Quickest Solution – is to connect the MSA to the network, and if it cannot get a DHCP address it automatically gives itself 10.0.0.2/24 on controller A and 10.0.0.3/24 on controller B.
1. If you do have DHCP running, connect your MSA and run the MSA Device Discovery Tool, (On the CD that came with the device).
2. Once you know the IP address, you can connect with your web browser.
Connect to and Manage your MSA via the USB/CLI Cable
1. For your machine to see the MSA as a device, you need to install a driver, there is a copy of the drivers on the CD that came with the device.
Note: Windows 7 users, use the Windows 2008 Drivers or use this one.
2. Install the driver.
3. Connect the USB lead from the MSA controller to your machine, TAKE NOTE of the COM port number it’s using.
4. Now you can use whatever terminal emulation program you prefer to connect to that COM port. (I prefer HyperTerminal, or you can use Putty if you want something a bit lighter).
5. Set the following, Bits per second = 115200, Data bits = 8, Parity = None, Stop bits = 1, and Flow control = None.
6. You will need to press {enter} to connect, then login.
Seen on a G3 P2000 SAN, the client had also had an MSA70 shelf, which contained a failed array. I was removing the MSA and after the job, this error was getting logged.
Unwritable write-back cache data exists for a volume (vdisk: unknown name, volume: unknown name, SN {Serial Number} it comprises {number}% of cache.
Solution
Essentially, there was data in the cache that needed writing to the array/vdisk, when it failed. If the volume was going to get repaired and brought back online the data would have got written back. However this volume was never coming back.
1. Connect to a controller via Telnet.
2. Issue the following command;
[box] clear cache [/box]
Related Articles, References, Credits, or External Links
The call came in this morning, a client had replaced a failed drive in his SAN, (an MSA P2000 2324sa). He was asking if there was anything he needed to do. I said “Just mark it as a global spare and that should be it”. He rang back some time later to say he was still having problems.
When I dialled on I could see his ‘new’ drive was marked as LEFTOVR and was flagged with the following warning;
The disk may contain stale metadata. Recommended action: Clear the metadata to rescue the disk.
Solution
You see this error because this is a ‘recycled’ disk, and it still has data on it that refers to the array and/or vdisk that it was in originally. This is the ‘metadata’ that it is referring to.
Note: You can see, (from the image above) the disk I’m dealing with is disk 8 (make a mental note of that).
1. With the MSA Selected > Tools > Clear Disk Metadata.
2. Remember our disk is in in slot 8, so tick it and click ‘clear metadata’.
3. You can now treat this disk as if it were a new disk and add it as a global spare, (your failed vdisk will then claim the disk and the RAID should rebuild it without further intervention).
Related Articles, References, Credits, or External Links