As I promised here’s the second part of my short excurse into the shoals of lock management with MaxDB databases. (The first part can be found MaxDB – On locking mechanisms and how we get to know our MaxDB a bit better each day… Part I)
2) When deadlocks go unnoticed…
Ok, big buzz word in the title – I’m sure that just the word ‘deadlock’ will get me the audience this time 😉
Before we start with the actual point, let’s get clear about what is meant by ‘deadlock’.
Among the huge variety of possible locking and hanging situations parallel working systems can get into, deadlocks are very specific.
The point of deadlocks is not that the system is hanging for a long time, but that it is impossible for the processes involved in the deadlock to resolve it by themselves.
Since locking situations can be thought as (mathematical) graphs. A deadlock can be defined as a closed circular graph with the minimum number of vertices.
The simplest example would look like this:
Process A [PA] Process B [PB] Resource A [RA] Resource B [RB] ([PA],[RA]) <--- PB --- lock request --- PA ---> ([PB],[RB]) |
In this case, PA and PB need to wait for each other for the release of the requested resource. But since they both wait, no process can actually release a lock – this is a deadlock.
Of course, deadlocks can be way more complex, including many resources, more processes and sometimes even multiple application layers (these are really nasty since usually there is no coherent view to these cross-layer locks).
One advantage of this rather abstract view to deadlocks is that this makes it easier to recognize them.
That is what’s behind the deadlock detection feature of current DBMS.
Whenever a process needs to wait for a resource for a long time (say 1 second or so), the DBMS looks out for such a deadlock graph and eventually ‘resolves’ the situation by telling one of the waiting processes that it won’t get the lock.
The general idea behind the feature is of course not to prevent deadlocks.
Deadlocks are usually design-errors, bugs of the application program. This cannot be fixed automatically.
However, it is important for heavy-duty databases to keep running as long as possible.
To make this possible, the deadlock detection and resolution helps a great deal.
Once a deadlock is removed, the whole system can continue it’s work, while only one transaction gets an error.
So far the story is rather nice, isn’t it?
The DBMS checks for deadlocks and makes sure that the system will stay responsive even if the application designers made a mistake.
Unfortunately, nothing is perfect – and so isn’t the deadlock detection in MaxDB.
As you may know (or learn now) MaxDB knows different kinds of SQL locks:
- Table locks
- Row locks
- Dictionary/Catalog locks
As long as the deadlock is just between table/row-locks everything works just as expected:
#### Session 1 (Isolation level 1, Autocommit off) select * from locka COLA COLB 1 X 2 X select * from lockb COLA COLB 1 Y 2 Y update lockb set colb='YX' where cola=1 |
#### Session 2 (Isolation level 1, Autocommit off) update locka set colb='XY' where cola=1 |
#### Monitoring session select session, tablename, lockmode, lockstate, rowidhex from locks SESSION TABLENAME LOCKMODE LOCKSTATE ROWIDHEX 8459 LOCKB row_exclusive write 00C1100000000... 8460 LOCKA row_exclusive write 00C1100000000... |
Nothing special up to here – let’s create a deadlock:
#### Session 1 update locka set cola='XY' where cola=1 |
#### Session 2 update lockb set colb='YX' where cola=1 Auto Commit: Off, SQL Mode: Internal, Isolation Level: Committed General error;600 POS(1) Work rolled back update lockb set colb='YX' where cola=1 |
*** corrected the update statements 20.10.09 22:02 ***
As we see the crosswise row lock request (for the update an exclusive lock is required) is recognized and one session is rolled back.
Now let’s do this again, but let’s use shared (catalogue) locks as well…
#### Session 1 update lockb set colb='YX' where cola=1 |
#### Session 2 update locka set colb='XY' where cola=1 |
#### Session 1 alter table locka add (colc varchar(10)) --> hangs ! |
#### Monitoring session select session, tablename, lockmode, lockstate, rowidhex from locks SESSION TABLENAME LOCKMODE LOCKSTATE ROWIDHEX 8459 LOCKA row_exclusive write 00C110000000000... 8460 SYS%CAT2 row_share ? FFFF00000000000... 8460 SYSDDLHISTORY row_exclusive write 00FFFE000000000... 8460 LOCKB row_exclusive write 00C110000000000... |
Wow!
Besides our two already known row_exclusive locks on tables LOCKA and LOCKB we also find one for SYSDDLHISTORY and a row_share lock for SYS%CAT2.
What are those about?
Well, the lock for SYSDDLHISTORY is for an insert statement that is automatically done with MaxDB >= 7.7 whenever a DDL statement is issued.
The SYSDDLHISTORY table will contain all committed DDL statements by that – neat feature but has nothing to do with what we want to do here.
The SYS%CAT2, in turn, is the mentioned catalogue lock.
Now let’s create the deadlock:
#### Session 2 alter table lockb add (colc varchar(10)) --> hangs ! |
#### Monitoring session select session, tablename, lockmode, lockstate, rowidhex from locks SESSION TABLENAME LOCKMODE LOCKSTATE ROWIDHEX 8459 SYS%CAT2 row_share ? FFFF00000000000... 8459 SYSDDLHISTORY row_exclusive write 00FFFE000000000... 8459 LOCKA row_exclusive write 00C110000000000... 8460 SYS%CAT2 row_share ? FFFF00000000000... 8460 SYSDDLHISTORY row_exclusive write 00FFFE000000000... 8460 LOCKB row_exclusive write 00C110000000000... select tablename, h_applprocess as holder, h_lockmode, r_applprocess as requestor, r_reqmode from lock_waits TABLENAME HOLDER H_LOCKMODE REQUESTOR R_REQMODE LOCKB 4132 row_exclusive 1904 sys_exclusive LOCKA 1904 row_exclusive 4132 sys_exclusive |
Now, this is, in fact, a deadlock but MaxDB does not do anything about it.
The reason for that is simple:
The deadlock detection does not include the share locks!
To be precise, for share locks the kernel does not maintain a list of session IDs, but only a single counter.
Based on this counter it’s not possible to find out which session is holding/waiting for a specific share lock and in consequence, the kernel cannot tell which tasks to roll back.
In this case, one user task needs to be manually cancelled or the lock timeout will deny the first request.
Although this is an ugly limitation of the deadlock detection it’s not really that bad in day to day DB usage.
The reason simply is that usually there are only few DDL commands running in parallel – especially when it’s not the upgrade weekend.
3) The dead walk – how deleted rows reappear
Ok, one last thing 🙂
It’s a simple effect that I found to be surprising while I was playing around with locks during the ‘research’ phase for this blog.
#### Session 1 select * from locktest THE_ROW THE_ROW2 1 ? 10 ? 2 ? 3 ? 4 ? 5 x 6 x 7 x 8 x 9 x delete from locktest where the_row >='5' More than one row updated or deleted. Affected Rows: 5 -> SEE: no commit here! |
#### Session 2 select * from locktest THE_ROW THE_ROW2 1 ? 10 ? 2 ? 3 ? 4 ? |
Where is the data?
#### Session 1 rollback |
#### Session 2 select * from locktest THE_ROW THE_ROW2 1 ? 10 ? 2 ? 3 ? 4 ? 5 x 6 x 7 x 8 x 9 x |
There it is!
This is a really nasty feature if you come from other DBMS like Oracle.
MaxDB currently (!) does not support a consistent view concurrency and it does not reconstruct deleted rows.
Since deletions are done in-place during the statement execution (and not at commit time) the deleted rows are really just gone when the second session looks into the table.
There’s nothing there to tell the second session to look for old data, the data is just gone.
If your application really relies on a consistent view of the data without data access phenomena like ‘dirty reads’, ‘non-repeatable reads’ etc. then you either need to use a higher transaction isolation mode (but loose scalability by that) or make your application aware of this.
Looking back
As we’ve seen locking is not really something that is ‘just there’.
It can become pretty important to be able to differentiate between what locking can do for you and what it wouldn’t do.
One important thing I did not mention yet explicitly: I’ve been just writing about SQL locks. But MaxDB (and the other DBMS as well) rely on multiple different shared resources that need to be protected/serialized as well.
For that task, MaxDB uses B*Tree-locks, critical regions, semaphores & mutexes, filesystem locks and the like.
So there’s plenty of topics to write about …
Resources
For more information on this area of MaxDB please check these resources:
MaxDB Internals Course – Locking
SAP Note #1243937 – FAQ: MaxDB SQL-Locks
Marketing
If you’re not already booked for October 27-29 this year and you happen to stay in Vienna and you keep asking yourself what to do … then get your ticket for SAP TechED 2009 and make sure to attend my MaxDB session ! In addition to the presentation there will be a expert session on the afternoon, where I’ll await your questions that I hopefully can answer. |