Life in University Information Technology: 03.08

Monday, March 31, 2008

The session setup from the computer %computername% failed to authenticate.

I've been seeing this error come through on my Domain Controller System Event logs for some time now and finally figured out what the cause is. The Error is System Even ID: 5805

Event Type: Error
Event Source: NETLOGON
Event Category: None
Event ID: 5805
Date: 3/30/2008
Time: 10:16:24 PM
User: N/A
Computer: DC1
Description:
The session setup from the computer %computername% failed to authenticate.
The following error occurred:
Access is denied.

My Computers that are connecting to the Domain are Lab Computers for Student Computer Labs. We use a product called Deepfreeze with restores the computers to the last frozen state on every reboot. The one problem with using DeepFreeze and having the computers on the domain is that when the computers try to to a domain machine account password change they will forget the new password after a reboot. Now the computer won't login to the domain or setup a session. I fixed this by adding a group policy to all the Frozen Lab Computers that disables machine account password changes:

It's under Computer Configuration -> Windows Settings -> Security Settings -> Local Policies/Security Options -> Domain Member: Domain Member: Disableachine account password changes

VMWare ESX: Host %esxhost% has no management network redundancy

After my upgrade to ESX 3.5 and Virtual Center 2.5 I started seeing a Cluster Configuration Issue of "Host ESXHost currently has no management network redundancy". Apparently VMWare HA reports a configuration warning when it detects that there is no redundancy for the Service Console. After a little research I discovered that adding another Service Console Port on a different nic would solve the problem. My ESX Boxes have 4 cards in them so this was a pretty easy fix. I have my Host networking setup so vmnic0 and vmnic2 go into switch A and vmnic1 and vmnic3 go into switch B on my network.

Here's what my Network config looked like before the update:

And here's what it looks like after the update:

As you can see... I simply added another Service Console Port on vSwitch2. After adding the additional Service Console Port I also had to reconfigure HA for each ESX Host I made the change on to get the error to go away. I'm still trying to decide if I should create a second VMKernal Port as well.

Friday, March 28, 2008

VMWare Update Manager - ESX Host upgrade

I recently finished using the VMWare Update Manager to update my ESX Host systems. The process seems to work well but I did run into a few glitches. The first thing I noticed is that if it takes to long to migrate your VM's from the ESX Host you are updating to one of your other hosts the remediate process will time out. (The task can not be canceled and it ends up just hanging at that point). It would be nice if you had more information on what's going on with the task and cancellation points. The first time I ran the update I hit the task hang mentioned above. At this point I really only had two options restart the update manager service or restart the Virtual Center server. I first tried restarting the update manager service. In my case this did not work so I ended up restarting the Virtual Center server. After the reboot the tasks were showing as failed and no longer in progress.

So I decided to manually migrate all my VM's off the host before starting the process. On my first system I also manually put the host in maintenance mode. It took about 25 minutes to complete the process on each of my ESX hosts. One thing that was odd is that my versions didn't seem to change after the update process even though it showed 12 updates installed that were released on 3/12... VC still shows VMWare ESX Server 3.5.0, 64607. (this is what I showed before the updates.)

Overall this is still much easier then having to manually download each of the updates and run them from the console.

Thursday, March 27, 2008

NetApp 3040c Direct Attach VMWare ESX

This isn't documented... but you can actually direct attach you VMWare ESX Servers to the NetApp 3050c / 3040c Filer Systems. Here's my setup. (2) Dell 2950 systems with 2 fiber channel ports each. I have the Dell 2950's directly attached into my NetApp FAS 3040c systems with crossover for failover of the NetApp Cluster. See image below. The green and blue lines represent fiber cables from the ESX Servers to the NetApp Filers. The red and dark blue lines are the fiber cables from the NetApp heads to the DS14Mk4 / DS14Mk2 disk shelves.

What you do is setup the Fiber Channel to essentially contain a virtual switch. Both adapters 0c and 0d show 3 adapters, 1 online, 1 standby and 1 partner. This will allow the failover in VMWare when you use the cluster failover for maintenance and such.

Slot: 0c
Description: Fibre Channel Target Adapter 0c
Adapter Name: 0c_0
Adapter Type: Local
Status: ONLINE

Adapter Name: 0c_1
Adapter Type: Standby
Status: OFFLINE

Adapter Name: 0c_2
Adapter Type: Partner
Status: ONLINE

Slot: 0d
Description: Fibre Channel Target Adapter 0d
Adapter Name: 0d_0
Adapter Type: Local
Status: ONLINE

Adapter Name: 0d_1
Adapter Type: Standby
Status: OFFLINE

Adapter Name: 0d_2
Adapter Type: Partner
Status: ONLINE

Although this setup will only work for 2 ESX Servers, it can be very useful when you have a limited budget (no need for fiber switches) but need the bandwidth from Fiber Channel. If you are on a serious budget you might want to consider VMWare over NFS with the NetApp.

Tuesday, March 25, 2008

Blackboard on VMWare ESX with Network Appliance

I've been working with our E-Learning Team on designing a new Platform for Blackboard (Blackboard is a learning / course management tool). Our design has concluded with running Blackboard Enterprise on (4) Dell R900 Quad Proc / Quad Core systems with 32gb RAM via VMWare ESX 3.5 connected to a Network Appliance 3040c SAN. We will be running (28) 300 GB fiber channel drives over 4GB fiber for the back end drives. This should provide plenty of disk IO. The High IO apps like our SQL Database VMs will run over Fiber Channel and the smaller systems like QuestionMark will run via VMWare over NFS.

This design, while very new, will provide a very reliable, highly available learning management system. All systems, both Hardware and Apps are configured in some kind of cluster to minimize all single points of failure.

Performance really should be no issue, as the horsepower we can give the Virtual Machines is higher than we were able to dedicate on the old standalone physical systems.

Wednesday, March 5, 2008

Filer Panic - NetApp 3050c - DataOnTap 7.2.3

I was surprised to find an email this morning from my Filers and NetApp support telling me one of my filers had a little panic over the night.

RPANIC:Saving 674M to /etc/crash/core.101185944.2008-03-05.06_16_18.nz ("Protection Fault accessing address 0x0000008c from EIP 0x1426f54 in process FTPPool03 on release NetApp Release 7.2.3") via sparecore

According to the logs and the auto support message it appears that an FTP Error occurred from a poorly written FTP Client that caused the filer to PANIC. This is disturbing. It's saying I can write crappy FTP Software and cause the NetApp Filers to panic. I run my filers in a cluster so during the panic the filer that didn't panic took over the one that did and life went on normally. Now I need to wait until after hours to perform the giveback because already this morning there are over 100 connections via CIFS to the filer and bringing it out of cluster takeover mode requires the cifs service be shutdown and restarted on the partner system.

The Command is easy though: > cf giveback

Panic Message:Protection Fault accessing address 0x0000008c from EIP 0x1426f54 in process FTPPool03 on release NetApp Release 7.2.3

Bug: 264711

Title: disconnecting FTP session during the "LIST" command may panic filer

Description: During a session from an external FTP client to the FTP service running on the storage appliance itself (a service of the Data ONTAP kernel), if the FTP control connection unexpectedly disconnects at the same time the appliance is processing an Passive FTP "LIST" command (or equivalent operation), the appliance may suffer an interruption of service.

Workaround: Correctly written FTP clients on a healthy network are less likely to provoke an abrupt disconnection.

Monday, March 3, 2008

VMWare over NFS vs FiberChannel (FC)

VMWare over NFS has been quite the buzz over the past few months, especially when it comes to NetApp filers like the 3070c. I personally run VMWare over FC connected into a NetApp 3050c on hi-speed fiber channel drives. What I like about it is it's fast and it works well. It's really not that hard to setup and a denial of service attack on my IP network won't hurt my VM's access to the SAN.

With Fiber channel you have to create LUNs for the VMWare box to access. There are 2 schools of though here... either create 1 big LUN and run all your VM's from it - or - create a new LUN for each VM.

With NFS, things change. Apparently you can take advantage of de-duplication, no single disk I/O, and you can use VMDK thin provisioning. What I'd like to see is a good white paper on NFS vs Fiber Channel including setup instructions and why you would choose one over the other.