A common requirement for SQL DB modeling is to have something that could be called conditional unique constraint.
The idea is that certain combinations of column values should only occur once if at all, while other combinations are allowed to occur in any number.
That certainly sounds a bit theoretic, so let’s have an example.
Say we have a table STUFF
in which we track the status of, well, stuff.
Each individual item of stuff can be identified by its ID
value.
For every entry we store a STATUS
value and a transaction number TX_SEQ
.
Now, the rule is that any item can have any status value multiple times but status 100 should only ever occur once per item. Let’s pretend status 100 represents the initial creation of the item or something similarly unique.
The common unique constraint cannot be a solution for this, because it would require all combinations of ID
and STATUS
to be unique.
Several RDBMS offer check constraints for this kind of model, but SAP HANA currently (HANA 2 SPS03) only support simple search conditions for those and no sub-queries.
Trigger action
If you’ve been reading any of my comments and posts including triggers, you probably noticed that I try to avoid using them whenever possible.
Tom Kyte of Oracle made really good arguments against triggers https://blogs.oracle.com/oraclemagazine/the-trouble-with-triggers and these hold true on other DBMS.
For the conditional unique constraint on SAP HANA, however, using triggers is one possible solution.
Let’s have a table
create column table stuff
(id integer, status integer, tx_seq integer);
Insert some data
create sequence tx_seq;
insert into stuff values (1, 10, tx_seq.nextval);
insert into stuff values (1, 100, tx_seq.nextval); -- <- final state = 100
insert into stuff values (2, 10, tx_seq.nextval);
insert into stuff values (2, 10, tx_seq.nextval); -- <- duplicate state = 10
insert into stuff values (1, 80, tx_seq.nextval);
Create the trigger
The trigger should ensure that the data in the table cannot be changed so that the “conditional unique” rule is broken. That means it has to run before anything is written into the table. It also needs to run for each and every record for both UPDATE
and INSERT
commands.
When the trigger finds that the table already contains an entry for the current ID
and status 100
we want a proper error message for this.
The trigger for that could look like the following:
create trigger stuff_unique_status
before update or insert
on stuff
referencing new row as new
for each row
begin
declare cnt integer;
declare non_unique_final_status condition for SQL_ERROR_CODE 10001;
declare exit handler for non_unique_final_status resignal;
-- only check if status is 100
if :new.status = 100 then
select count(*) into cnt
from stuff
where status = 100
and id = :new.id;
if :cnt > 0 then
SIGNAL non_unique_final_status
set message_text ='Final status 100 must be unique per ID. Affected ID: '||:new.id;
end if;
end if;
end;
Try out if the trigger works
Let’s insert data that would violate the condition:
insert into stuff values (1, 100, tx_seq.nextval);
/*
Could not execute 'insert into stuff values (1, 100, tx_seq.nextval)' in 53 ms 755 µs .
SAP DBTech JDBC: [10001]: user-defined error: "DEVDUDE"."STUFF_UNIQUE_STATUS": line 5 col 50 (at pos 274):
[10001] (range 3) user-defined error exception: Final status 100 must be unique per ID. Affected ID:1
*/
The same happens if we try to update data so that a duplicate would occur:
update stuff
set status = 100
where id = 1
and tx_seq = 1;
/*
Could not execute 'update stuff set status = 100 where id =1 and tx_seq=1' in 37 ms 66 µs .
SAP DBTech JDBC: [10001]: user-defined error: "DEVDUDE"."STUFF_UNIQUE_STATUS": line 5 col 50 (at pos 274):
[10001] (range 3) user-defined error exception: Final status 100 must be unique per ID. Affected ID:1
*/
We see that it paid off to create a custom condition and to include the ID value for which the exception condition was raised. It’s easy not to go back and review what caused the faulty data insert.
What about performance ?
When we have the DBMS run code for every record that gets modified in the table, that of course leads to longer processing times.
In order to get a feeling for this difference, let’s just insert some more data into our table and see how long it takes. Then we repeat the insert with a table with no trigger on it.
-- some data
insert into stuff
(select top 100000
row_number() over()
, to_integer(floor (rand()*100) )
, tx_seq.nextval
from objects cross join objects );
/*
Statement 'insert into stuff (select top 100000 row_number() over() , to_integer(floor (rand()*100) ) , ...'
successfully executed in 29.261 seconds (server processing time: 29.259 seconds) - Rows Affected: 100000
*/
compared to
create column table stuff2
(id integer, status integer, tx_seq integer);
insert into stuff2
(select top 100000
row_number() over()
, to_integer(floor (rand()*100) )
, tx_seq.nextval
from objects cross join objects );
/*
Statement 'insert into stuff2 (select top 100000 row_number() over() , to_integer(floor (rand()*100) ) , ...'
successfully executed in 6.071 seconds (server processing time: 6.070 seconds) - Rows Affected: 100000
*/
That’s 6 seconds vs. 29 seconds with the trigger; quite a difference and if you’re doing mass data loading or high-speed transaction capturing it might not be viable to keep the trigger active while loading the data.
What about adding an index?
Having a trigger active on a table adds runtime in itself and there does not seem to be a way around this. Even if I only load records with a status different from 100, the trigger is executed (to check whether or not the status is 100).
We can, however, allow for a potential speedup of the COUNT query by adding an index.
Since the query always targets one specific combination of ID and STATUS
in an EQUAL
comparison, I use the inverted hash index type here to safe memory.
create inverted hash index unique_final_status
on stuff (id, status);
Running the aggregate query in PlanViz confirms that the index is used for this query:
Name: Basic Predicate
ID: cs_plan2_hxehost_39003_qoTableSearch1_qoTableSearchNode1
Summary: (ID,STATUS) = (2500,100)
Schema: DEVDUDE
Execution Time (Inclusive): 5.925 ms
Execution Time (Exclusive): 5.925 ms
Execution Start Time: 7,127.854 ms
Execution End Time: 7,133.791 ms
Estimated Result Size: 1
Estimation Time: 0.011 ms
Evaluation Time: 5.914 ms
Index Usage:: using INDEX ON (ID,STATUS)
Inverted Index: used
That’s it.
There you go, now you know!
I will guess this is MySQL because it is what I need it to be right now. So I’m going to try it in a moment and I will came back to confirm.
Nope, this is not MySQL
That’s right – this article is about SAP HANA DB. Best of luck, though, with your MySQL DB!