Ok, today I’ll show you how to install, setup and use the Windows Debugger to get information about hanging MaxDB kernel tasks.
The first thing we need is obviously a hanging MaxDB session, so let’s produce one!
Setting up the test case
My MaxDB instance is a 7.6.03 Build 9 instance with only one 50 MB data volume. The goal of the test setup is to create a DB FULL situation, since this is a rather common MaxDB hanging situation when DATA AREA AUTOEXTEND is not active.
Therefore I create a simple table in SQL Studio:
create table test (col1 int primary key)
and fill it with LOTS of data (notice the Cartesian product of the three-way join with the files table):
insert into test (select rownum from files a, files b, files c)
After a short amount of time my DB will be filled up. This is something the DBMGUI really tells you:
For our analysis of this situation we need some more information, especially we need the Windows thread ID of our hanging session. This information is available via the x_cons tool (you may as well use the dbmcli-command ‘db_cons’ or the DBMGUI function CHECK -> Database Server -> ACTIVE:
|x_cons NDB show active:|
ID UKT Win TASK APPL Current Timeout Region Wait tid type pid state priority cnt try item T17 7 0x4EC User 3004 DB FULL (197) 0 0 791(s) T33 4 0xD1C Savepnt Vsleep 0 0 262(s)
As we see only two tasks are currently ACTIVE. One (T33) is the SAVEPOINT that is triggered automatically whenever a LOG FULL/DB FULL situation occurs.
The second one (T17) is our hanging session and we see it’s waiting on the DB FULL (197) suspend reason. Note the ‘Win tid’ column; it’s what we’re looking for right now. Our user task has the Windows thread ID 0x4EC in Hex-Format or 1260 in decimal format.
Installing the Windows Debugger
Now we’ve two options to get a call stack. The first one is to use the MS Windows Debugger. This tool is available on the Microsoft Website for free download.
Debugging Tools and Symbols: Getting Started (http://www.microsoft.com/whdc/devtools/debugging/debugstart.mspx)
Depending on your Platform either choose “Install Debugging Tools for Windows 32-bit Version” or “Install Debugging Tools for Windows 64-bit Versions” and install the tools on the machine where our database runs.
I installed mine into the standard path ‘C:\\Program Files\\Debugging Tools for Windows’.
The next things we need are the symbol files.
These file contain the information that make it possible that the debugger can tell us, what function is at what memory address of any windows program.
These symbols can either be downloaded completely via the ‘Download Windows Symbol Packages’ or we can tell the debugger to load them on-demand while we debug. As my machine is connected to the Internet.
I go for this option and show you how to configure in the next few paragraphs.
Debugging our user task
Now, we just start the WinDbg.exe (after the installation there is a new entry in the START-menu).
In order to analyze anything, we’ve to tell the Debugger where to find all the symbols. Via the Menu ‘File’-> ‘Symbol File Path’ we get a dialogue like this:
We’ve to enter a PATH Variable here, so we can put in several paths limited by the semicolon sign ‘;’. Since we want to have the Windows symbols automatically downloaded from the MS website the first path we enter is this one:
The bold printed part denotes the local folder where the debugger should store the downloaded symbol files. Next, we need to supply the paths to the MaxDB symbols. Since these are delivered by default in each installation I just have to put in these paths on my machine:
As you see there is a ‘symbols’-Folder in each DEPENDENT folder and one in the INDEPENDENT-folder.
Now the debugging can start.
To do so we need to attach the debugger to the process we want to debug. This can be done via ‘File’ -> ‘Attach to process [F6]‘.
We get a list of running processes and choose our ‘KERNEL.EXE’:
Make sure you check the box ‘Noninvasive’ since we want our process to run further after our debugging and all we want to do now is ‘just looking’!
If you’re unsure which one of the possibly several KERNEL.EXE-processes on your machine would be the right one, just look up the process ID in the knldiag-File in the startup part:
INFO Starting SERVERDB: 'NDB' INFO SERVERNODE: 'xxxxxxxxxnotvisibletopublicxxxxxx' INFO Process ID: 3028 INFO Date: 2007-11-14 INFO Owner: 'sdbUser' INFO Machine: 'Intel IA32 level 6 revision d06' INFO Processors: 1 INFO Fiber: 'YES' INFO Max virtual memory: 2047 MB INFO Total physical memory: 1022 MB INFO Available physical memory: 419 MB
Ok, once the debugger is attached the rest is really easy.
We click on ‘Display Threads’  first and choose a thread in the list. Now we need to remember our Windows thread ID from above: 0x4EC.
After we select our thread we click on ‘show Callstack’  et voilà:
That’s the call stack of our hanging session.
To read it more easily or to hand it over to support for analysis we want a text-version of it. Nothing easier than that: right-click on the Windows-Title bar and click ‘Copy stack to clipboard’ in the context menu. Now paste it into notepad and we’re done.
To stop our debugging session we just select ‘Debug’ -> ‘DETACH DEBUGEE’ from the menu.
Process Explorer Approach
What we’ve right now can also be easily archived with the Sysinternals tool Process Explorer. All I can say about the Sysinternals tool suite for Windows is: get it onto your servers!
Whenever there is anything weird to analyze these tools have been reliable sources of in-depth system information.
So double recommendation: install them! Today!
The tools are also available for free at the MS website:
Windows Sysinternals (http://www.microsoft.com/technet/sysinternals/default.mspx)
To be able to actually use symbols and produce readable call stacks we need to setup the process explorer a little bit.
It also needs to have the Windows Debugger tools installed, since it makes use of the special Dbghelp.dll that comes with them.
All we need to do now is to tell process explorer where this DLL is stored and enter our symbols path (see above):
Now we just browse through the process overview and double-click on our KERNEL.EXE.
The process ID from the knldiag-file can again help us here to find ‘our’ process:
Now we browse through the threads until we find thread ID 1260 (= Hex 0x4EC) and click on [Stack]:
Again we can use ‘copy’ to save this information to a text file or a support message.
Interpreting the call stack
The call stack we just created looks like this on my test machine:
00 ntdll!KiFastSystemCallRet 01 ntdll!ZwWaitForSingleObject+0xc 02 kernel32!WaitForSingleObjectEx+0xa8 03 kernel32!WaitForSingleObject+0x12 04 kernel!sql41c_wait_event_sem+0x11 05 kernel!__os74WaitForWakeup+0xa4 06 kernel!RTETask_TaskScheduler::NextScheduledTask+0x19f 07 kernel!os74GotoDispatcher+0x1f 08 kernel!vsuspend+0x153 09 kernel!Kernel_SynchronizedTaskWaitList::Wait+0x12 0a kernel!Converter_Converter::RequestNewPageNo+0x18a 0b kernel!Converter_Converter::RequestNewPermPageNo+0x22 0c kernel!b13new_node+0x92 0d kernel!Data_PageAccessManager::NewPage+0x1e1 0e kernel!Data_ChainPage::Append+0xbd 0f kernel!Data_ChainSplitSpaceBackwardRead<Log_UndoPage>::ReserveSpace+0x761 10 kernel!Log_UndoFile::WriteUndoEntry+0x74 11 kernel!Log_Transaction::WriteUndoEntry+0x1d2 12 kernel!Log_ReadWriteActionImage::WriteBeforeImage+0x9d 13 kernel!kb611ins_WriteBeforeImage+0x16 14 kernel!k61bd_ins+0xde 15 kernel!bd30AddToTreeWithBeforeImage+0x153 16 kernel!b02kb_ins_record+0x17e 17 kernel!kb61insert_rec+0xa7 18 kernel!k61ins_select+0x13a 19 kernel!kb721result_handle+0x271 1a kernel!kb721select_gets+0x14f 1b kernel!kb721seq_search+0x2b8 1c kernel!k721strat_distribution+0x177 1d kernel!k720_select+0x402 1e kernel!k720_single_select+0x24 1f kernel!k61table_ins_del_upd+0x34 20 kernel!k05functions+0x60c 21 kernel!a06lsend_mess_buf+0x349 22 kernel!a06dml_send_mess_buf+0x43 23 kernel!a505most_execute+0xbf4 24 kernel!a505loop_most_execute+0x332 25 kernel!a501exec_with_change_rec+0x3a3 26 kernel!a502complex_execution+0x407 27 kernel!a501exec_with_change_rec+0x299 28 kernel!a501execute+0x2af 29 kernel!ak92analyze_messagetype+0x464 2a kernel!a92_mode_analyzer+0x97 2b kernel!ak93one_command+0x585 2c kernel!a93_user_commands+0x40e 2d kernel!SQLTask+0x7b 2e kernel!Kernel_Main+0x223 2f kernel!RTETask_TaskMain+0x4c 30 kernel!sql88k_kernel_fiber+0xd4 31 kernel32!BaseThreadStart+0x37
To ‘read’ the call stack we start from the button and move upwards (that’s how a stack works). I will just comment on a few of these lines now.
We see at line 2d that this is a user task ‘kernel!SQLTask+0x7b’ that currently handles a statement that changes the DB content ’27 kernel!a501exec_with_change_rec+0x299′.
Since the statement is not a single-table statement but involves a sub-select we see ’26 kernel!a502complex_execution+0x407′ before that.
A bit further up we see that the table-content will be changed ‘1f kernel!k61table_ins_del_upd+0x34′ and also we see that the select of the subquery is handled ‘1e kernel!k720_single_select+0x24’ and later it’s inserted into our target table ’17 kernel!kb61insert_rec+0xa7′.
An insert need, of course, some work on the b*tree in which the table is stored so we see ’15 kernel!bd30AddToTreeWithBeforeImage+0x153‘ and know by now that this transaction will be rollback-able.
After all Undo- and Log-Information has been written (yes, we follow the WAL Write-Ahead-Log approach) a new page needs to be allocated, to hold our data ‘0d kernel!Data_PageAccessManager::NewPage+0x1e1′.
As this page has to be provided by the Converter we find ‘0a kernel!Converter_Converter::RequestNewPageNo+0x18a’.
The Converter than realizes that there is no space available anymore, so it suspends our task and lets it wait for the notification that the DB FULL situation is gone ’05 kernel!__os74WaitForWakeup+0xa4‘.
As we see it’s not too difficult to understand what tasks are waiting for once we have the necessary information (the call stack).
This was just a demo situation but in other hanging situation this can
help to find the cause.
So better install the Windows Debugger and the Sysinternals tools (plus the Symbols if no Internet is available) on your Windows server today!
But one last thing: how to solve our Data Full Situation?
In MaxDB you HAVE (sorry no more ways to stress this via formatting…) to add a data volume now. There’s no way to cancel or kill the user task (that is the user session) that causes the big data load even if it’s not wanted beforehand.
So what to do when we don’t really want the data or cannot provide much storage space right now?
Well, we kill the user task anyhow!
<Local>:NDB> db_cons kill T17 OK
Now, we’ll see the following with ‘show active’
<Local>:NDB> db_cons show active OK
ID UKT Win TASK APPL Current Timeout Region Wait tid type pid state priority cnt try item T17 7 0x4EC User 3004 DB FULL (197) ! 0 0 793(s) T33 4 0xD1C Savepnt Vsleep 0 0 2358(s)
The ‘!’ sign denotes that this user task has been marked to be killed.
Due to the cooperative multitasking model on user task level, the user task is not killed by some authority but has to commit ‘suicide’.
It first has to be able to read the KILL flag and react on this.
But to be able to do so, it must get out of the DB FULL situation.
So we really need to provide a new data volume here to get the task actually killed.
Therefore I add just a very small volume. Since the task will be killed the transaction that loads the data will be rolled back. In effect, the database will have much more free space again afterwards.
After the rollback is done I can simply drop the data volume again – this is the point where we are happy that this database is already on 7.6 😉
MaxDB Versions <7.5 are not able to drop volumes!