Paul picked a very ABAP-y problem to solve:
Finding records with fallback to a default value.
I won’t go into too much detail where and how that is done in SAP ABAP-based solutions, just this:
For a group of records (like an order made up of order items), there are matching records in another table.
The rule to get those matching records is to check whether there is a full match for an item in the table.
If there is, this match should be used.
If there isn’t a match, another, a default entry (that is guaranteed to be there) should be used.
Paul’s example is about orders with order-items that can have different payment terms. Each item can have its own payment terms but will use the default payment terms that are defined by the “header” order-item (which is item
'00000000' in the order items table).
It’s not hard to think of other examples for something like that.
Machine configurations could be such a thing.
Or variations of pizza toppings.
Olden days and olden ways
Paul explains that the typical ABAP approach was to simply try and read the specific payment terms for each order item and do a second read to get the default value if the first read did not get anything back.
That’s two ABAP <=> DB roundtrips for just one record.
Worse, the application does not add any logic here; the database (not just HANA) can very well determine whether a SELECT returns something or not on its own. No confirmation or double-check from the application required.
Newer ABAP language versions provide some syntactic sugar for this scenario:
they allow for specifying how a DEFAULT record should be computed.
That’s an improvement but only for making the ABAP code better express the intention of the developer.
Paul is solving the problem
Like probably many ABAPers would do, Paul created an AMDP (ABAP Managed Database Procedure) that mimics the ABAP logic 1:1.
As admitted in the post, this was his first SQLScript ever written.
Considering this, it’s a success as it delivers the correct results… sometimes.
The code reads the data for an order item into an SQLScript table variable, like so:
bdata = select ... from <table> where <conditions>;
Then, assuming there will be at most one record in the result set, a column (part of a table) is “extracted” into a variable (a scalar value).
ed_payment_terms = :bdata.paymentterms; IF :ed_payment_terms <> '' THEN RETURN; END IF;
The problem here is that if there is no record in
bdata, accessing the table variable on index 1 will fail.
The style faux-pas to use
= instead of
:= can certainly be attributed to the lack of good programming guides/manuals for SQLScript developers.
I agree with Paul’s sentiment that the SAP documentation is not really helpful here (and it’s not really supposed to: SAP’s documentation is a legal document outlining the features and expected behaviours of the systems they sell).
The more SQL-y way of doing all this would be to use
select … INTO ed_payment_terms from <table> where <conditions>;
This would also fail with a NO DATA FOUND error, if there was no matching record.
One way to deal with this is to implement some error handling into the procedure, which is verbose and tedious (but required in pre-HANA 2.3).
With recent HANA versions one can use the DEFAULT VALUE clause to safeguard against this error (see what SAP did here?)
select … INTO ed_payment_terms DEFAULT 'some default value' from <table> where <conditions>;
Don’t stop here though
The problem with this very common approach of duplicating ABAP/application logic in SQLscript is that it often only reduces data transfers between application and database.
It does not take advantage of what the SQL environment could do here to further increase performance.
One early commenter to the post provided a solution in ABAP CDS that performs multiple outer joins between the items table and the payment terms table (the commenter uses the actual ABAP tables, but that doesn’t matter here) to
first figure out if an item has a “special” payment term or uses the default and
then provide the correct join key to a second join that actually computes the result set.
This, also, solves the problem and provides an easy-to-use database view that shows all correct match-ups for all items. This can be easily consumed in ABAP, does not require creating a stored procedure (which always bears the overhead of invoking the SQLscript engine in HANA, which means there is a performance/resource penalty compared to “straight” SQL) and fits easily into the modern ABAP/CDS development flow.
So, that’s fine, isn’t it?
For the scenario Paul described, the CDS view approach is probably fine, can be thought up and implemented in an hour or so and is an improvement over the old-style ABAP approach.
There are of course alternative options and I looked into some of them, mostly a modelling exercise for myself.
I thought of
- sqlscript select into with error handling in case no data is found
- only suitable for single records lookup
- added SQLscript overhead
- sqlscript select into with DEFAULT clause
- only suitable for single record lookups
- added SQLscript overhead
- SELECT with LEFT OUTER JOIN and check for NULLs
- potentially big overhead
- UNION with best record select (i.e. MAX + JOIN or RANK/ROW_NUMBER() + FILTER)
- potentially big overhead
- also, UNION is not necessary as all DEFAULT/special entries are in a single table anyhow
- keep track of non-default entries in table b, i.e. have a flag there to indicate that there is a non-default setting
- would require additional updates, maybe via triggers (ref. constraints won’t do that. Alternatively the application code could do that)
- would address the technical issue of not knowing if there should be a match before doing the join/select on table b
- would keep storage requirements in table a and the number of records in table b low
- “unfold” the default cases to ensure all records in table a have a distinct matching record in table b
- leads to potentially many more records in table b (especially if records in table a most often use the default)
- but column store compression could reduce this again
Thinking good, testing better
Then I went and did a test implementation for the ones that I found promising.
This step – implementing a version and testing it out – is something I can only recommend.
It is so easy to fall in love with a clever solution in your head.
If it does not survive a test, it shouldn’t get into production.
Like Paul, I want some data to play with.
I went, however, for some more data than he did (MONSTER table naming inspired by Paul’s post):
create column table MONSTER_BUSINESS_ITEMS ( client NVARCHAR( 3) NOT NULL , order_number NVARCHAR(10) NOT NULL , item_number NVARCHAR( 8) NOT NULL , item_info NVARCHAR(80) NOT NULL DEFAULT '' , primary key (client, order_number, item_number)); create column table MONSTER_BUSINESS_ITEM_DATA ( client NVARCHAR( 3) NOT NULL , order_number NVARCHAR(10) NOT NULL , item_number NVARCHAR( 8) NOT NULL , payment_terms NVARCHAR(10) NOT NULL , primary key (client, order_number, item_number)); --- generate some test data --- 10000 orders with 20 items each insert into MONSTER_BUSINESS_ITEMS ( select '007' as client , ABAP_NUMC(orders.generated_period_start, 10) as order_number , ABAP_NUMC(items.generated_period_start, 8) as item_number , 'info for item ' || orders.generated_period_start ||'/'|| items.generated_period_start as item_info from series_generate_integer (1, 1, 10001) orders cross join series_generate_integer (1, 0, 20) items ); -- create DEFAULT entries for all orders (=> item_number = '00000000' -- as different orders may have different DEFAULTs, just add variety insert into MONSTER_BUSINESS_ITEM_DATA ( select client , order_number , item_number , map(mod(order_number, 3) , 0, 'DEFAULT0' , 1, 'DEFAULT1' , 2, 'DEFAULT2' , 'DEFAULTX') as payment_terms from monster_business_items where item_number = ABAP_NUMC(0, 8)); select count(*) from monster_business_items; -- 200.000 -- create NON-DEFAULT entries for 5% of all items with item_number != '00000000' -- remember to insert NON-DEFAULT values! insert into MONSTER_BUSINESS_ITEM_DATA ( select client , order_number , item_number , 'SPECIAL' from monster_business_items TABLESAMPLE BERNOULLI ( 5 ) where item_number != ABAP_NUMC(0, 8) order by client, order_number, item_number ); -- - Rows Affected: 9443
Note: I deliberately made use of the
TABLESAMPLE functions, as these belong to the functions that have been available in HANA for some time but that barely gets used. Even worse, developers build their own versions of such functions, because they are not aware of the existing ones.
Ready, steady, go
Now, we have 200.000 order_items and about 5% of them have non-default payment terms.
The data looks like this:
CLIENT ORDER_NUMBER ITEM_NUMBER ITEM_INFO 007 0000000001 00000019 info for item 1/19 007 0000000001 00000018 info for item 1/18 007 0000000001 00000017 info for item 1/17 007 0000000001 00000016 info for item 1/16 007 0000000001 00000015 info for item 1/15 007 0000000001 00000014 info for item 1/14 007 0000000001 00000012 info for item 1/12 007 0000000001 00000011 info for item 1/11 007 0000000001 00000010 info for item 1/10 007 0000000001 00000009 info for item 1/9 007 0000000001 00000007 info for item 1/7 007 0000000001 00000006 info for item 1/6 007 0000000001 00000005 info for item 1/5 007 0000000001 00000004 info for item 1/4 007 0000000001 00000003 info for item 1/3 007 0000000001 00000002 info for item 1/2 007 0000000001 00000001 info for item 1/1 007 0000000001 00000000 info for item 1/0 007 0000000001 00000008 info for item 1/8 007 0000000001 00000013 info for item 1/13 [...] CLIENT ORDER_NUMBER ITEM_NUMBER PAYMENT_TERMS 007 0000000001 00000000 DEFAULT1 007 0000000001 00000013 SPECIAL 007 0000000001 00000008 SPECIAL 007 0000000002 00000000 DEFAULT2 007 0000000002 00000017 SPECIAL 007 0000000003 00000000 DEFAULT0 007 0000000004 00000000 DEFAULT1 007 0000000005 00000000 DEFAULT2 007 0000000006 00000000 DEFAULT0 007 0000000007 00000000 DEFAULT1 [...]
ORDER_NUMBER 1 all but
ITEM_NUMBER 8 and 13 have the
DEFAULT1 payment terms.
OPTION: BEST MATCH
— best record select (i.e. MAX + JOIN or RANK/ROW_NUMBER() + FILTER)
— requires that there is an entry for the item_number to work
— this utilizes the ordering of item_numbers and ‘000000000’ indicating the default item
— basically, it sorts all item_numbers of a specific order number and picks the “first” one
— which means the “smallest” value as we order in ascending order and put NULLS last
select bi.client, bi.order_number, bi.item_number , bid.payment_terms as pay_terms_by_simple_join , MAP (bid.payment_terms , NULL , FIRST_VALUE(bid.payment_terms) OVER (PARTITION BY bi.client, bi.order_number ORDER BY bi.client, bi.order_number, bi.item_number ASC NULLS LAST) , bid.payment_terms ) as pay_terms_with_default_fallback from monster_business_items bi left outer join monster_business_item_data bid on (bi.client, bi.order_number, bi.item_number) = (bid.client, bid.order_number, bid.item_number) order by bi.client, bi.order_number, bi.item_number; CLIENT ORDER_NUMBER ITEM_NUMBER PAY_TERMS_BY_SIMPLE_JOIN PAY_TERMS_WITH_DEFAULT_FALLBACK 007 0000000001 00000000 DEFAULT1 DEFAULT1 007 0000000001 00000001 ? DEFAULT1 007 0000000001 00000002 ? DEFAULT1 007 0000000001 00000003 ? DEFAULT1 007 0000000001 00000004 ? DEFAULT1 007 0000000001 00000005 ? DEFAULT1 007 0000000001 00000006 ? DEFAULT1 007 0000000001 00000007 ? DEFAULT1 007 0000000001 00000008 SPECIAL SPECIAL 007 0000000001 00000009 ? DEFAULT1 007 0000000001 00000010 ? DEFAULT1 007 0000000001 00000011 ? DEFAULT1 007 0000000001 00000012 ? DEFAULT1 007 0000000001 00000013 SPECIAL SPECIAL 007 0000000001 00000014 ? DEFAULT1 007 0000000001 00000015 ? DEFAULT1 007 0000000001 00000016 ? DEFAULT1 007 0000000001 00000017 ? DEFAULT1 007 0000000001 00000018 ? DEFAULT1 007 0000000001 00000019 ? DEFAULT1 [...]
This approach performs only a single LEFT OUTER JOIN and then uses
MAP (same thing as
CASE) to handle the empty-matches.
For those records, where there was no specific entry in the ITEM_DATA table, we want to use whatever value is used for the “header-item” with ITEM_NUMBER 0.
This is here done with the WINDOW clause and the
FIRST_VALUE analytic function.
This function creates a “window” for the current row that includes all rows with the same CLIENT and ORDER_NUMBER (
Then this data is sorted by CLIENT, ORDER_NUMBER and ITEM_NUMBER in ascending order with all NULL values at the end of the sorted data (
ORDER BY … NULLS LAST).
FIRST_VALUE function picks, well, the FIRST_VALUE which will be the default payment term for this ORDER_NUMBER.
When packaged up into a view this can be used like a normal table:
create view monster_business_items_pay_terms_WINDOW as (select bi.client, bi.order_number, bi.item_number , bid.payment_terms as pay_terms_by_simple_join , MAP (bid.payment_terms , NULL , FIRST_VALUE(bid.payment_terms) OVER (PARTITION BY bi.client, bi.order_number ORDER BY bi.client, bi.order_number, bi.item_number ASC NULLS LAST) , bid.payment_terms ) as pay_terms_with_default_fallback from monster_business_items bi left outer join monster_business_item_data bid on (bi.client, bi.order_number, bi.item_number) = (bid.client, bid.order_number, bid.item_number) ); select CLIENT, ORDER_NUMBER, ITEM_NUMBER , PAY_TERMS_BY_SIMPLE_JOIN , PAY_TERMS_WITH_DEFAULT_FALLBACK from monster_business_items_pay_terms_WINDOW where client = '007' and order_number = ABAP_NUMC(1, 10) and item_number = ABAP_NUMC(3, 8); -- (ABAP_NUMC (13, 8)) CLIENT ORDER_NUMBER ITEM_NUMBER PAY_TERMS_BY_SIMPLE_JOIN PAY_TERMS_WITH_DEFAULT_FALLBACK 007 0000000100 00000003 ? DEFAULT1 CLIENT ORDER_NUMBER ITEM_NUMBER PAY_TERMS_BY_SIMPLE_JOIN PAY_TERMS_WITH_DEFAULT_FALLBACK 007 0000000001 00000013 SPECIAL SPECIAL
Attacking the core problem – Loss of knowledge
Taking a step back from fancy SQL implementation tricks, the core of this problem is in fact not the “double-dipping” of the application code.
That really is only the symptom of the actual issue.
What we have here is a situation of knowledge loss:
When the user of the application decided that ITEM_NUMBER 13 should have its own payment terms, the system and the user “knew” explicitly that this was the case.
The user would not have to look up again if there is a special payment term for item 13 to recall this fact.
Yet, the application (and data-) designers decided to throw away this information and rely on looking it up.
What if – for some reason – that entry for ITEM_NUMBER 13 got removed from the MONSTER_BUSINESS_ITEM_DATA table?
Nobody would ever know that there was supposed to be an entry.
And all our implementation efforts up to here basically tried to work out this little piece of information (whether the default or the special terms apply) and then fetch the data accordingly.
What if we never lost that information?
Could we do better?
Of course, we can!
All it takes is a sort of indicator or flag for every ITEM_NUMBER that represents this bit of information.
This indicator could easily be maintained by the application that works on this data anyhow; when inserting data into the ITEM_DATA table it could just update the flag.
Alternatively, one could retrofit database triggers (no, no, no!) that would change the flag whenever there are inserts/deletes on the ITEM_DATA table.
Let’s assume the maintenance has been taken care of and we can use the flag. What would that look like?
Add the flag – “default” for using the default conditions
alter table monster_business_items add (DEFAULT_CONDITIONS_APPLY NVARCHAR(1) NOT NULL DEFAULT 'X');
Delete flag for those with special conditions
update monster_business_items bi set bi.default_conditions_apply = '' where exists (select * from monster_business_item_data bid where (bi.client , bi.order_number , bi.item_number) = (bid.client, bid.order_number, bid.item_number));
Finally, joining data with confidence
-- now we know by the flag if we need to join on '00000000' or the item_number select bi.client, bi.order_number, bi.item_number , bi.default_conditions_apply , bid.payment_terms as pay_terms_by_simple_join from monster_business_items bi inner join monster_business_item_data bid on (bi.client, bi.order_number) = (bid.client, bid.order_number) and bid.item_number = MAP (bi.default_conditions_apply , 'X', ABAP_NUMC(0, 8) , '', bi.item_number) order by bi.client, bi.order_number, bi.item_number;
Again, I put this into a view:
create view monster_business_items_pay_terms_FLAG as (select bi.client, bi.order_number, bi.item_number , bi.default_conditions_apply , bid.payment_terms as pay_terms_by_simple_join from monster_business_items bi inner join monster_business_item_data bid on (bi.client, bi.order_number) = (bid.client, bid.order_number) and bid.item_number = MAP (bi.default_conditions_apply , 'X', ABAP_NUMC(0, 8) , '', bi.item_number) );
and check the result:
CLIENT ORDER_NUMBER ITEM_NUMBER PAY_TERMS_BY_SIMPLE_JOIN DEFAULT_CONDITIONS_APPLY 007 0000000001 00000003 DEFAULT1 X CLIENT ORDER_NUMBER ITEM_NUMBER PAY_TERMS_BY_SIMPLE_JOIN DEFAULT_CONDITIONS_APPLY 007 0000000001 00000013 SPECIAL
Comparing the approaches
With PlanViz this “FLAG” approach used ~1.8 ms/1.3 MB.
The former approach with
FIRST_VALUE clocked in at ~3.0 ms/1.5 MB.
(all values are reduced to a simple average – do not mistake that for proper performance testing!)
That is quite a difference for a statement that just returns a single record.
I have not measured the approaches presented in the blog post but I dare say they would probably be less efficient.
Now, I hear you asking, what does DEFAULT_CONDITIONS_APPLY column cost us?
HANA memory is expensive and we cannot waste it…
Since the column only has two distinct values (‘X’ and ”) it is highly compressible.
For the 200.000 entries in my MONSTER_BUSINESS_ITEMS table (and 5% special conditions used), this column takes a whopping 55KB.
That, of course, is outrageous, if you plan to manage the data on your C64 but you would already be OK if your parents get you that awesome AMIGA 500+ 🙂
There you go, now you know.