Thursday, August 4, 2011

Troubleshooting SCVMM crashes (collect traces)

I am currently testing the SCVMM 2012 BETA, and I want to share how you collect SCVMM traces.
(This applies to both SCVMM 2008 R2 and SCVMM 2012)

I had an issue that when I added a WSUS server to the Fabric, the VMMservice crashed. It only crashed when it was properly configured. That means, if I configured the wrong TCP-port for the WSUS connection, I got a correct error message. But when I hit the high notes, it crashed with no mercy.

The recipe that Carmen Summers (the Program Manager of SCVMM) has made available:

From what computer should I collect the traces:

If it`s a console crash issue.

·         Please collect the traces from both the computer where you run admin console, and your VMM Server.

If it’s an “Add Hosts” issue,
·         Please collect the traces from the VMM server;
If it’s a host status (Needs Attention) or VM issue,
·         Please collect the traces from both the VMM server and the host in question.
If it's self-service portal issue,
·         Please collect the traces from the Web server and the VMM server
What are the steps to collect traces?

Install DebugView from http://www.microsoft.com/technet/sysinternals/utilities/debugview.mspx on your VMM server, your host in question and your Web server (if it's a self-service portal issue).
Save the following code into a text file and name it as "odsflags.cmd":

@echo off
echo ODS control flags - only trace with set flags will go to ODS
if (%1)==() goto :HELP
if (%1)==(-?) goto :HELP
if (%1)==(/?) goto :HELP
echo Setting flag to %1...
reg ADD "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Tracing\Microsoft\Carmine" /v ODSFLAGS /t REG_DWORD /d %1 /f
echo Done.
goto :EXIT

:HELP
echo Usage: odsflags [flag], where flag is
echo TRACE_ERROR = 0x2,
echo TRACE_DBG_NORMAL = 0x4,
echo TRACE_DBG_VERBOSE = 0x8,
echo TRACE_PERF = 0x10,
echo TRACE_TEST_INFO = 0x20,
echo TRACE_TEST_WARNING = 0x40,
echo TRACE_TEST_ERROR = 0x80,

:EXIT

·         Save the following code into a text file and name it as "odson.reg":

Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Tracing\Microsoft\Carmine]
"ODS"=dword:00000001

·         Save the following code into a text file and name it as "odsoff.reg":

Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Tracing\Microsoft\Carmine]
"ODS"=dword:00000000

·         Copy the above three files onto your VMM server, your host in question and your Web server (if it's a self-service portal issue).
·         In a command window on the machine that you want to capture VMM tracing, run “odson.reg” and “odsflags.cmd 255”. (If you need to collect traces for both VMM Server and the host or the Web server, make sure to run these commands on all computers.)
·         Open DebugView and run it as administrator, make sure that in its Capture menu, you have both "Capture Win32" and "Capture Global Win32" checked. You should be able to see tracing from the VMM components showing up in the DebugView. (If you need to collect traces for both VMM Server and the host, make sure to do these steps on all computers.)
·         Restart vmmservice on VMM server with “net stop vmmservice” and “net start vmmservice”.
·         Restart the agent service on the host with “net stop vmmagent” and “net start vmmagent”.
·         Restart the IIS service on the Web server with "iisreset".
·         Reproduce the issue that you found.
·         Save the output from the DebugView to a text file and email it to the people who can help you diagnose the issue.
·         Don't forget to turn off the tracing after you are done collecting by running "odsoff.reg" on the machine

EXAMPLE:

In my case, where the VMMservice crashed when I added the WSUS server, I was able to locate the following in the VMM.LOG afterwards:

00004729             77.38172150       [5092] 13E4.0868::07/27-21:26:31.413#04:UpdateServer.cs(265): Adding Update Server to Pangaea, ServerName - VMM.lab.local, Port - 8530, SSLEnabled - False             
00004730             77.38208008       [3668] 0E54.0AD4::07/27-21:26:31.411#21:Callback.cs(53): Client uuid:a3258eb4-18cf-4f58-811f-0692c049677e;id=1 - events processed       
00004731             77.48566437       [3668] 0E54.0AD0::07/27-21:26:31.523#24:ConsoleViewModel.cs(294): UI Load: ConsoleViewModel completed AddPage for Jobs - 00:00:00.1359864    
00004732             77.48571014       [3668] 0E54.0AD0::07/27-21:26:31.523#24:ConsoleViewModel.cs(303): ConsoleViewModel begin OnClientCacheInitialized
00004733             77.66319275       [3668] 0E54.0AD0::07/27-21:26:31.700#24:ConsoleViewModel.cs(329): UI Load: ConsoleViewModel completed OnClientCacheInitialized - 00:00:00.1769823      
00004734             77.94854736       [432]    
00004735             77.94854736       [432] *** HR originated: -2147024774  
00004736             77.94854736       [432] ***   Source File: d:\iso_whid\amd64fre\base\isolation\com\copyout.cpp, line 1302               
00004737             77.94854736       [432]    
00004738             77.94861603       [432]    
00004739             77.94861603       [432] *** HR propagated: -2147024774              
00004740             77.94861603       [432] ***   Source File: d:\iso_whid\amd64fre\base\isolation\com\enumidentityattribute.cpp, line 144           
00004741             77.94861603       [432]    
00004742             77.94880676       [432]    
00004743             77.94880676       [432] *** HR originated: -2147024774  
00004744             77.94880676       [432] ***   Source File: d:\iso_whid\amd64fre\base\isolation\com\copyout.cpp, line 1302               
00004745             77.94880676       [432]    
00004746             77.94882202       [432]    
00004747             77.94882202       [432] *** HR propagated: -2147024774              
00004748             77.94882202       [432] ***   Source File: d:\iso_whid\amd64fre\base\isolation\com\enumidentityattribute.cpp, line 144           
00004749             77.94882202       [432]    
00004750             77.94924927       [432]    
00004751             77.94924927       [432] *** HR originated: -2147024774  
00004752             77.94924927       [432] ***   Source File: d:\iso_whid\amd64fre\base\isolation\com\copyout.cpp, line 1302               
00004753             77.94924927       [432]    
00004754             77.94928741       [432]    
00004755             77.94928741       [432] *** HR propagated: -2147024774              
00004756             77.94928741       [432] ***   Source File: d:\iso_whid\amd64fre\base\isolation\com\enumidentityattribute.cpp, line 144           
00004757             77.94928741       [432]    
00004758             82.27567291       [5092] 13E4.0868::07/27-21:26:36.311#04:WatsonExceptionReport.cs(756): Unhandled exception caught.          
00004759             82.27619934       [5092] 13E4.0868::07/27-21:26:36.312#04:WatsonExceptionReport.cs(757): Unhandled exception.         
00004760             82.27857208       [5092] 13E4.0868::07/27-21:26:36.314#04:WatsonExceptionReport.cs(757): System.ArgumentOutOfRangeException: An attempt was made to access an invalid or unsupported language.                

The last line indicates that this is an issue caused by my regional settings on my servers. Since this is a Beta, there is no support for non US regional settings.


No comments: