Find the ONE … or the other

In his recent blog post “Code PushdownPaul Hardy, a long-time SAP community contributor described his first foray into HANA SQLScript programming.

Paul picked a very ABAP-y problem to solve:
Finding records with fallback to a default value.

I won’t go into too much detail where and how that is done in SAP ABAP-based solutions, just this:
For a group of records (like an order made up of order items), there are matching records in another table.
The rule to get those matching records is to check whether there is a full match for an item in the table.
If there is, this match should be used.
If there isn’t a match, another, a default entry (that is guaranteed to be there) should be used.

Paul’s example is about orders with order-items that can have different payment terms. Each item can have its own payment terms but will use the default payment terms that are defined by the “header” order-item (which is item '00000000' in the order items table).

It’s not hard to think of other examples for something like that.
Machine configurations could be such a thing.
Or variations of pizza toppings.

Olden days and olden ways

Paul explains that the typical ABAP approach was to simply try and read the specific payment terms for each order item and do a second read to get the default value if the first read did not get anything back.

That’s two ABAP <=> DB roundtrips for just one record.
Worse, the application does not add any logic here; the database (not just HANA) can very well determine whether a SELECT returns something or not on its own. No confirmation or double-check from the application required.

Newer ABAP language versions provide some syntactic sugar for this scenario:
they allow for specifying how a DEFAULT record should be computed.
That’s an improvement but only for making the ABAP code better express the intention of the developer.

Paul is solving the problem

Like probably many ABAPers would do, Paul created an AMDP (ABAP Managed Database Procedure) that mimics the ABAP logic 1:1.
As admitted in the post, this was his first SQLScript ever written.
Considering this, it’s a success as it delivers the correct results… sometimes.

The code reads the data for an order item into an SQLScript table variable, like so:

bdata = select ... from <table> where <conditions>;

Then, assuming there will be at most one record in the result set, a column (part of a table) is “extracted” into a variable (a scalar value).

ed_payment_terms = :bdata.paymentterms[1];

IF :ed_payment_terms <> '' THEN 
    RETURN;
END IF;

The problem here is that if there is no record in bdata, accessing the table variable on index 1 will fail.

The style faux-pas to use = instead of := can certainly be attributed to the lack of good programming guides/manuals for SQLScript developers.

I agree with Paul’s sentiment that the SAP documentation is not really helpful here (and it’s not really supposed to: SAP’s documentation is a legal document outlining the features and expected behaviors of the systems they sell).

The more SQL-y way of doing all this would be to use

select
     …
INTO ed_payment_terms
from <table>
where <conditions>;

This would also fail with a NO DATA FOUND error, if there was no matching record.
One way to deal with this is to implement some error handling into the procedure, which is verbose and tedious (but required in pre-HANA 2.3).
With recent HANA versions one can use the DEFAULT VALUE clause to safeguard against this error (see what SAP did here?)

select
     …
INTO ed_payment_terms DEFAULT 'some default value'
from <table>
where <conditions>;

Don’t stop here though

The problem with this very common approach of duplicating ABAP/application logic in SQLscript is that it often only reduces data transfers between application and database.
It does not take advantage of what the SQL environment could do here to further increase performance.

One early commenter to the post provided a solution in ABAP CDS that performs multiple outer joins between the items table and the payment terms table (the commenter uses the actual ABAP tables, but that doesn’t matter here) to
first figure out if an item has a “special” payment term or uses the default and
then provide the correct join key to a second join that actually computes the result set.

This, also, solves the problem and provides an easy to use database view that shows all correct match-ups for all items. This can be easily consumed in ABAP, does not require creating a stored procedure (which always bears the overhead of invoking the SQLscript engine in HANA, which means there is a performance/resource penalty compared to “straight” SQL) and that fits easily into the modern ABAP/CDS development flow.

So, that’s fine, isn’t it?

For the scenario Paul described, the CDS view approach is probably fine, can be thought up and implemented in an hour or so and is an improvement over the old-style ABAP approach.

There are of course alternative options and I looked into some of them, mostly a modeling exercise for myself.
I thought of

  • sqlscript select into with error handling in case no data is found
    • only suitable for single records lookup
    • added SQLscript overhead
  • sqlscript select into with DEFAULT clause
    • only suitable for single record lookups
    • added SQLscript overhead
  • SELECT with LEFT OUTER JOIN and check for NULLs
    • potentially big overhead
  • UNION with best record select (i.e. MAX + JOIN or RANK/ROW_NUMBER() + FILTER)
    • potentially big overhead
    • also, UNION not necessary as all DEFAULT/special entries are in a single table anyhow
  • keep track of non-default entries in table b, i.e. have a flag there to indicate that there is a non-default setting
    • would require additional updates, maybe via triggers (ref. constraints won’t do that. Alternatively the application code could do that)
    • would address the technical issue of not knowing if there should be a match before doing the join/select on table b
    • would keep storage requirements in table a and the number of records in table b low
  • “unfold” the default cases to ensure all records in table a have a distinct matching record in table b
    • leads to potentially many more records in table b (especially if records in table a most often use the default)
    • but column store compression could reduce this again

Thinking good, testing better

Then I went and did a test implementation for the ones that I found promising.
This step – implementing a version and test it out – is something I can only recommend.
It’s so easy to fall in love with a clever solution in your head.
If it does not survive a test, it shouldn’t get into production.

Test setup

Like Paul, I want some data to play with.
I went, however, for some more data than he did (MONSTER table naming inspired by Paul’s post):

create column table MONSTER_BUSINESS_ITEMS
	( client 	    NVARCHAR( 3) NOT NULL
	, order_number 	NVARCHAR(10) NOT NULL
	, item_number	NVARCHAR( 8) NOT NULL
	, item_info 	NVARCHAR(80) NOT NULL DEFAULT ''
	, primary key  (client, order_number, item_number));

create column table MONSTER_BUSINESS_ITEM_DATA
	( client 	    NVARCHAR( 3) NOT NULL
	, order_number 	NVARCHAR(10) NOT NULL
	, item_number	NVARCHAR( 8) NOT NULL
	, payment_terms NVARCHAR(10) NOT NULL
	, primary key  (client, order_number, item_number));

--- generate some test data
--- 10000 orders with 20 items each
insert into MONSTER_BUSINESS_ITEMS (
	select
		  '007' as client
		, ABAP_NUMC(orders.generated_period_start, 10) as order_number 
		, ABAP_NUMC(items.generated_period_start, 8) as item_number
		, 'info for item ' 
			|| orders.generated_period_start 
			||'/'|| items.generated_period_start as item_info
	from
		series_generate_integer (1, 1, 10001) orders
		cross join
		series_generate_integer (1, 0, 20) items
		);
	
-- create DEFAULT entries for all orders (=> item_number = '00000000'
-- as different orders may have different DEFAULTs, just add variety

insert into MONSTER_BUSINESS_ITEM_DATA (
	select 
		   client
		 , order_number
		 , item_number
		 , map(mod(order_number, 3)
		 	   , 0, 'DEFAULT0'
		 	   , 1, 'DEFAULT1'
		 	   , 2, 'DEFAULT2'
		 	      , 'DEFAULTX') as payment_terms
	from
		monster_business_items	
	where
		item_number = ABAP_NUMC(0, 8));

select count(*) from  monster_business_items; -- 200.000

-- create NON-DEFAULT entries for 5% of all items with item_number != '00000000'
-- remember to insert NON-DEFAULT values!

insert into MONSTER_BUSINESS_ITEM_DATA (
	select 
		   client
		 , order_number
		 , item_number
		 , 'SPECIAL'
	from
		monster_business_items TABLESAMPLE BERNOULLI ( 5 ) 
	where
		item_number != ABAP_NUMC(0, 8)
	order by
		client, order_number, item_number
);		
--  - Rows Affected: 9443 

Note: I deliberately made use of the SERIES_GENERATE_INTEGER, ABAP_NUMC and TABLESAMPLE functions, as these belong to the functions that have been available in HANA for some time but that barely get used. Even worse, developers build their own versions of such functions, because they are not aware of the existing ones.

Ready, steady, go

Now, we have 200.000 order_items and about 5% of them have non-default payment terms.

The data looks like this:

CLIENT	ORDER_NUMBER	ITEM_NUMBER	ITEM_INFO         
007   	0000000001  	00000019   	info for item 1/19
007   	0000000001  	00000018   	info for item 1/18
007   	0000000001  	00000017   	info for item 1/17
007   	0000000001  	00000016   	info for item 1/16
007   	0000000001  	00000015   	info for item 1/15
007   	0000000001  	00000014   	info for item 1/14
007   	0000000001  	00000012   	info for item 1/12
007   	0000000001  	00000011   	info for item 1/11
007   	0000000001  	00000010   	info for item 1/10
007   	0000000001  	00000009   	info for item 1/9 
007   	0000000001  	00000007   	info for item 1/7 
007   	0000000001  	00000006   	info for item 1/6 
007   	0000000001  	00000005   	info for item 1/5 
007   	0000000001  	00000004   	info for item 1/4 
007   	0000000001  	00000003   	info for item 1/3 
007   	0000000001  	00000002   	info for item 1/2 
007   	0000000001  	00000001   	info for item 1/1 
007   	0000000001  	00000000   	info for item 1/0 
007   	0000000001  	00000008   	info for item 1/8 
007   	0000000001  	00000013   	info for item 1/13
[...]

CLIENT	ORDER_NUMBER	ITEM_NUMBER	PAYMENT_TERMS
007   	0000000001  	00000000   	DEFAULT1     
007   	0000000001  	00000013   	SPECIAL      
007   	0000000001  	00000008   	SPECIAL      

007   	0000000002  	00000000   	DEFAULT2     
007   	0000000002  	00000017   	SPECIAL      
007   	0000000003  	00000000   	DEFAULT0     
007   	0000000004  	00000000   	DEFAULT1     
007   	0000000005  	00000000   	DEFAULT2     
007   	0000000006  	00000000   	DEFAULT0     
007   	0000000007  	00000000   	DEFAULT1     
[...]

E.g. for ORDER_NUMBER 1 all but ITEM_NUMBER 8 and 13 have the DEFAULT1 payment terms.

OPTION: BEST MATCH

— best record select (i.e. MAX + JOIN or RANK/ROW_NUMBER() + FILTER)
— requires that there is an entry for the item_number to work
— this utilizes the ordering of item_numbers and ‘000000000’ indicating the default item
— basically it sorts all item_numbers of a specific order number and picks the “first” one
— which means the “smallest” value as we order in ascending order and put NULLS last

select
	    bi.client, bi.order_number, bi.item_number
	  , bid.payment_terms 			as pay_terms_by_simple_join

	, MAP (bid.payment_terms
			, NULL
			, FIRST_VALUE(bid.payment_terms)
			  OVER (PARTITION BY
						bi.client, bi.order_number
			        ORDER BY
		 		  		bi.client, bi.order_number, bi.item_number ASC
				  		NULLS LAST)
			, bid.payment_terms ) as pay_terms_with_default_fallback

from
					monster_business_items 	   bi
	left outer join monster_business_item_data bid
	on (bi.client, bi.order_number, bi.item_number) =
		(bid.client, bid.order_number, bid.item_number)
order by
	bi.client, bi.order_number, bi.item_number;

CLIENT	ORDER_NUMBER	ITEM_NUMBER	PAY_TERMS_BY_SIMPLE_JOIN	PAY_TERMS_WITH_DEFAULT_FALLBACK
007   	0000000001  	00000000   	DEFAULT1                	DEFAULT1
007   	0000000001  	00000001   	?                       	DEFAULT1
007   	0000000001  	00000002   	?                       	DEFAULT1
007   	0000000001  	00000003   	?                       	DEFAULT1
007   	0000000001  	00000004   	?                       	DEFAULT1
007   	0000000001  	00000005   	?                       	DEFAULT1
007   	0000000001  	00000006   	?                       	DEFAULT1
007   	0000000001  	00000007   	?                       	DEFAULT1
007   	0000000001  	00000008   	SPECIAL                 	SPECIAL
007   	0000000001  	00000009   	?                       	DEFAULT1
007   	0000000001  	00000010   	?                       	DEFAULT1
007   	0000000001  	00000011   	?                       	DEFAULT1
007   	0000000001  	00000012   	?                       	DEFAULT1
007   	0000000001  	00000013   	SPECIAL                 	SPECIAL
007   	0000000001  	00000014   	?                       	DEFAULT1
007   	0000000001  	00000015   	?                       	DEFAULT1
007   	0000000001  	00000016   	?                       	DEFAULT1
007   	0000000001  	00000017   	?                       	DEFAULT1
007   	0000000001  	00000018   	?                       	DEFAULT1
007   	0000000001  	00000019   	?                       	DEFAULT1
[...]

This approach performs only a single LEFT OUTER JOIN and then uses MAP (same thing as CASE) to handle the empty-matches.
For those records, where there was no specific entry in the ITEM_DATA table, we want to use whatever value is used for the “header-item” with ITEM_NUMBER 0.
This is here done with the WINDOW clause and the FIRST_VALUE analytic function.

This function creates a “window” for the current row that includes all rows with the same CLIENT and ORDER_NUMBER (PARTITION BY).

Then this data ist sorted by CLIENT, ORDER_NUMBER and ITEM_NUMBER in ascending order with all NULL values at the end of the sorted data (ORDER BYNULLS LAST).

Finally, the FIRST_VALUE function picks, well, the FIRST_VALUE which will be the default payment term for this ORDER_NUMBER.

When packaged up into a view this can be used like a normal table:

create view monster_business_items_pay_terms_WINDOW as
(select
	  bi.client, bi.order_number, bi.item_number
	  , bid.payment_terms 			as pay_terms_by_simple_join

	, MAP (bid.payment_terms
			, NULL
			, FIRST_VALUE(bid.payment_terms)
			  OVER (PARTITION BY
						bi.client, bi.order_number
			        ORDER BY
		 		  		bi.client, bi.order_number, bi.item_number ASC
				  		NULLS LAST)
			, bid.payment_terms ) as pay_terms_with_default_fallback

from
					monster_business_items 	   bi
	left outer join monster_business_item_data bid
	on (bi.client, bi.order_number, bi.item_number) =
		(bid.client, bid.order_number, bid.item_number)
);


select
	   CLIENT, ORDER_NUMBER, ITEM_NUMBER
     , PAY_TERMS_BY_SIMPLE_JOIN
     , PAY_TERMS_WITH_DEFAULT_FALLBACK
from
	monster_business_items_pay_terms_WINDOW
where
		client = '007'
	and order_number = ABAP_NUMC(1, 10)
	and item_number = ABAP_NUMC(3, 8); -- (ABAP_NUMC (13, 8))

CLIENT	ORDER_NUMBER	ITEM_NUMBER	PAY_TERMS_BY_SIMPLE_JOIN	PAY_TERMS_WITH_DEFAULT_FALLBACK
007   	0000000100  	00000003   	?                       	DEFAULT1

CLIENT	ORDER_NUMBER	ITEM_NUMBER	PAY_TERMS_BY_SIMPLE_JOIN	PAY_TERMS_WITH_DEFAULT_FALLBACK
007   	0000000001  	00000013   	SPECIAL                 	SPECIAL

Attacking the core problem – Loss of knowlegde

Taking a step back from fancy SQL implementation tricks, the core of this problem is in fact not the “double-dipping” of the application code.
That really is only the symptom of the actual issue.

What we have here is a situation of knowledge-loss:
When the user of the application decided that ITEM_NUMBER 13 should have its own payment terms, the system and the user “knew” explicitly that this was the case.
The user would not have to look up again if there is a special payment term for item 13 to recall this fact.
Yet, the application (and data-) designers decided to throw away this information and rely on looking it up.
What if – for some reason – that entry for ITEM_NUMBER 13 got removed from the MONSTER_BUSINESS_ITEM_DATA table?
Nobody would ever know that there was supposed to be an entry.

And all our implementation efforts up to here basically tried to work out this little piece of information (whether the default or the special terms apply) and then fetch the data accordingly.

What if we never lost that information?
Could we do better?

Of course we can!

All it takes is a sort of indicator or flag for every ITEM_NUMBER that represents this bit of information.
This indicator could easiyly be maintained by the application that works on this data anyhow; when inserting data into the ITEM_DATA table it could just update the flag.
Alternatively, one could retro-fit database triggers (no, no, no!) that would change the flag whenever there are inserts/deletes on the ITEM_DATA table.

Let’s assume the maintainance has been taken care of and we can use the flag. How would that look like?

Add the flag – “default” is to use the default conditions

alter table monster_business_items
	add (DEFAULT_CONDITIONS_APPLY NVARCHAR(1) NOT NULL DEFAULT 'X');

Delete set flag for those with special conditions

update monster_business_items bi
	set
		bi.default_conditions_apply = ''
	where
		exists (select *
				from
					monster_business_item_data bid
				where
					(bi.client , bi.order_number , bi.item_number)
				  = (bid.client, bid.order_number, bid.item_number));

Finally, joining data with confidence

-- now we know by the flag if we need to join on '00000000' or the item_number
select
	    bi.client, bi.order_number, bi.item_number
	   , bi.default_conditions_apply
	   , bid.payment_terms 			as pay_terms_by_simple_join
from
			   monster_business_items 	   bi
	inner join monster_business_item_data bid
		on (bi.client, bi.order_number)
		 = (bid.client, bid.order_number)
		and bid.item_number = MAP (bi.default_conditions_apply
								  , 'X', ABAP_NUMC(0, 8)
								  , '', bi.item_number)
order by
	bi.client, bi.order_number, bi.item_number;

Again, I put this into a view:

create view monster_business_items_pay_terms_FLAG as
(select
	    bi.client, bi.order_number, bi.item_number
	    , bi.default_conditions_apply
	    , bid.payment_terms 			as pay_terms_by_simple_join
from
			   monster_business_items 	   bi
	inner join monster_business_item_data bid
		on (bi.client, bi.order_number)
		 = (bid.client, bid.order_number)
		and bid.item_number = MAP (bi.default_conditions_apply
								  , 'X', ABAP_NUMC(0, 8)
								  , '', bi.item_number)
);

and check the result:

CLIENT	ORDER_NUMBER	ITEM_NUMBER	PAY_TERMS_BY_SIMPLE_JOIN	DEFAULT_CONDITIONS_APPLY
007   	0000000001  	00000003   	DEFAULT1                	X

CLIENT	ORDER_NUMBER	ITEM_NUMBER	PAY_TERMS_BY_SIMPLE_JOIN	DEFAULT_CONDITIONS_APPLY
007   	0000000001  	00000013   	SPECIAL

Comparing the approaches

With PlanViz this “FLAG” approach used ~1.8 ms/1.3 MB.

The former approach with FIRST_VALUE clocked in at ~3.0 ms/1.5 MB.
(all values reduced to a simple average – don’t mistake that for proper performance testing!)

That’s quite a difference for a statements that just returns a single record.

I haven’t measured the approaches presented in the blog post but I dare say they would probably be less efficient.

Closing

Now, I hear you asking, what does DEFAULT_CONDITIONS_APPLY column cost us?
HANA memory is expensive and we cannot waste it…
Since the column only has two distinct values (‘X’ and ”) it is highly compressible.
For the 200.000 entries in my MONSTER_BUSINESS_ITEMS table (and 5% special conditions used), this column takes a whopping 55KB.
That, of course,
is outrageous, if you plan to manage the data on your C64 but you would already be OK if your parents get you that awesome AMIGA 500+ 🙂

There you go, now you know.

10 Comments

  • Michelle Crapo says:

    Very nice. I have bookmarked this one for my next project. The description is detailed and explained. I need both right about now.

  • Sebastian Gesiarz says:

    Not only a great comparison that a developer is usually to lazy to do but also a great showcase of relatively new functions! Looking forward to using them. Thank you so much!

  • Ged Hurst says:

    Thanks Lars. I looked at your blog for some time, after coming here from Paul Hardy’s Code Pushdown blog. As a simple ABAP developer, I understand about 10%, and I’m utterly intrigued by the remaining 90%. I hope some of that 90% will slowly sink in…

    • Well, it’s all here 🙂
      Very happy to see that others find bits and pieces of my posts interesting.
      In case some of it is difficult to understand, it’s probably due to my way of presenting topics.
      Shoot me a question in the comments, when you come across a post that does not compute for you.

      Cheers,
      Lars

  • Paul Hardy says:

    This is of course suggest that the standard SAP table VBAP should really have a flag like the one described above added to say if the default VBKD entry should be used or not, even in an S/4HANA system. i will do an experiment in my sandbox system.

  • Jens Nissen says:

    Hi Lars. Thanks for the blog.

    You are describing how to go from one table to another table with two lookups (item then header) in order to get correct value.

    In the example it is how to go from sales order item table (VBAP) to business data table (VBKD) to get terms of payment.

    Solution does not look nice whatever way is chosen due to how data are organized.

    However, I think it starts getting ugly when business comes back and say they want to see all sales order items for a certain terms of payment like 0001 ‘Payable Immediately due net’.

    Here selection goes the other way from ‘another table’ to ‘one table’.

    When all business data records (VBKD) has been found for terms of payment 0001 there will be a mix of item and header records with that value.

    Items records match 100% to sales order item table (VBAP). This is fine.

    Header records (‘000000’) is more problematic. They do not mean that all sales order items (VBAP) for these records should be counted as belonging to terms of payment 0001. There could be sales order items with terms of payment <> 0001. So, for all sales order items with header record having terms of payment 0001: There need to be a lookup to see if each sales order item has its own terms of payment <> 0001 in order to eliminate these sales order items.

    It can happen that header record has terms of payment 0001 and that sales order item as well has terms of payment 0001. This can lead to double counting and should be avoided.

    The created View/SQL contributes to the bad side of the performance. For documents that will not be changed, like billing documents, the exactly same calculation will be done again and again with this type of issue.

    I think this kind of example creates the need to stage data within HANA like a shadow table to sales order item table VBAP containing business data like payment terms. Yes, I know staging is not popular to mention. Pros and cons needs to be investigated.

    In SAP BW the issue does not exists as data are normalized: All sales order items exists with payment terms either from item or header level. There is one table (DSO) representing table VBAP and VBKD.

    I do not know how SAP S/4HANA looks like. It might solve the issue as I have heard that data are normalized.

    Until then: Solution is ugly or data are staged.

    • Hi Jens,
      thanks a lot for the extensive comment.
      I think you make a valid point here: the “space-saving”-design of the tables leads to a complex query logic when trying to use the data differently from what the application does.
      Considering designs I’ve seen that are way worse I would not go as far as calling this ugly, but it is not pretty either.

      Now about the staging: you may be right and there are cases where you want to trade query execution time for the complexity of handling a staged data copy.
      I would, however, test this assumption thoroughly with the actual use-case it should support before implementing this.
      For the example that you’ve mentioned “Payment immediately due net” it’s rather likely that a report about those sales items won’t need to cover all of transaction history.
      This reduces the scale of the data processing problem usually quite a bit.
      To be clear, I don’t dismiss what you wrote. There definitively are cases where pre-computation (like staging normalized data) is an efficient option to solve the problem at hand.
      And I’ve seen cases where developers tried to make everything “dynamic” and computed-on-the-fly to a degree that made the solution mostly complicated and thus hard to “tune” performance-wise.

      What’s missing, in my view, is the actual experiment that shows whether the assumptions are correct.
      And running a “data-preview” in HANA studio with no actual grouping/filter conditions does not cut it for this – and sadly, I’ve seen quite a few complaints about data model/HANA performance based on just such a “test”.

      So, back to your point: I agree that some of the data-encodings are complex and the benefits they had when they were built might not weigh up for the complexity anymore.
      This is were a re-design needs to take place (but that is usually a very long shot, especially with heavy infrastructure systems like SAP).
      No idea (yet), if the

      VBAP

      /

      VBKD

      situation has changed in S/4 either.

      Cheers,
      Lars

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.