Separate business logic from tables and avoid dynamic SQL

Real programming

One of the aspects of doing “actual” programming is to automate computations. Making the computer do things not just once, say by running a calculation on the October sales data, but to do it over and over again on any months’ sales data.

Being able to do it once is OK – but building a machine that can do it again and again, well that is the real deal of programming, isn’t it? This is what it means to give the computer a new ability, to make it smarter.

I believe, this perception is partly behind the urge to separate “the business logic” from “the data”. I used quotation marks here because after some experience with the real-world of business computer users one realises that there is neither “the” business logic nor “the” data per se. Instead, both concepts tend to rely heavily on context and situation. The famous Domain-Driven Design by Eric Evans is a great read about that. So, I leave this train of thought here and come back to database programming.

The SQL data type mental block

For SQL programming the urge to separate data from computation commonly hits a roadblock that is caused by the strict typing of SQL. Tables/views are the data types that developers work with, while records (rows) are the instances of the data types. Ok, they are a form of collection of those instances, but who wants to nit-pick now?

This leads to a very common frustration when trying to build a “parameterised” FROM clause, that is, the option to change one or more table references in the FROM clause of a statement. The idea is that one could run the “same” statement against different tables. (see here, here, or here if you want to).

Seeing that the specific tables are the data types it is understandable that a query against TABLE_X in schema A cannot simply be performed against TABLE_Y in schema B – even though the tables “look” similar.

Dynamic SQL: the failed solution

One of the ways out of this problem is to use “Dynamic SQL” which is part of the SQL standard, which is why one can find it in all relevant SQL DBMS.

This idea is as simple as it is stupid dumb unwise: just give the SQL processor a text string and make it process it as a SQL command. Such a feature is stupidly easy to implement when writing a SQL parser for your DB, but it removes any option for compile-time checking and optimisation.

In short: using dynamic SQL drops all the benefits from a statically typed language for the ability to mishandle your data model.

An alternative

So what are the options here?

One, very straightforward option, is to generate the SQL commands in your application code. This where you have complete control over any parts of it, so SQL injection can easily be avoided. SQL is notoriously bad at string processing, so chances are whatever language you are using for your application code, it is better suited to do that anyhow.

Even Kevlin Henney agrees…

But what if you really, really want to put the database code into the database? That is not too hard either. Let’s look at an example:
In “How to use a field from an input parameter of a HANA stored procedure to generate table name dynamically for execute_immediate statement?” the developer wants to build a query/procedure/function that allows to run a specific query against different – but very similar – tables.

The similarity between those tables is that they have columns that contain data with the same meaning: a CUSTOMER NAME, a LOCATION NAME, and a DATE information.
The query should filter on CUSTOMER and LOCATION and compute the minimum value for DATE.

In the linked example, the developer wants to point this query directly at SAP BW on HANA ADSO tables, instead of the generated calculation views, which is problematic in its own right. So we leave this aspect aside for now.

some data to play with

First, let’s create a test schema with some tables and data:

CREATE SCHEMA multi_tab;
SET SCHEMA multi_tab;

CREATE COLUMN TABLE CUSTOMERS_A 
    ( CUSTOMER_NAME NVARCHAR (200) NOT NULL
    , LOCATION NVARCHAR (200) NOT NULL
    , TX_DATE DATE NOT NULL
    , A_MEMO NVARCHAR(4000));

CREATE COLUMN TABLE CUSTOMERS_B 
    ( CUSTOMER_NAME NVARCHAR (200) NOT NULL
    , LOCATION NVARCHAR (200) NOT NULL
    , TX_DATE DATE NOT NULL
    , CUSTOMER_RATING INTEGER);
    
CREATE COLUMN TABLE CUSTOMERS_C 
    ( CUSTOMER_NAME NVARCHAR (200) NOT NULL
    , LOCATION NVARCHAR (200) NOT NULL
    , TX_DATE DATE NOT NULL
    , TX_ID NVARCHAR(32) );

-- data A
INSERT INTO customers_A VALUES 
    ('WILLY WONKA INC.', 'SYDNEY', '2020/08/20', 'none.');
INSERT INTO customers_A VALUES 
    ('WILLY WONKA INC.', 'SYDNEY', '2020/08/21', '-');    
INSERT INTO customers_A VALUES 
    ('ACME', 'NEW YORK', '2020/02/21', '');    
INSERT INTO customers_A VALUES 
    ('ACME', 'NEW YORK', '2020/04/01', 'N/A');
INSERT INTO customers_A VALUES 
    ('TUFFI MILK', 'WUPPERTAL', '1983/07/21', 'wuppdika!');    
INSERT INTO customers_A VALUES 
    ('TUFFI MILK', 'WUPPERTAL', '1983/02/21', 'wuppdika!');

SELECT * FROM customers_A;    

--CUSTOMER_NAME   |LOCATION |TX_DATE   |A_MEMO   |
------------------|---------|----------|---------|
--WILLY WONKA INC.|SYDNEY   |2020-08-20|none.    |
--WILLY WONKA INC.|SYDNEY   |2020-08-21|-        |
--ACME            |NEW YORK |2020-02-21|         |
--ACME            |NEW YORK |2020-04-01|N/A      |
--TUFFI MILK      |WUPPERTAL|1983-07-21|wuppdika!|
--TUFFI MILK      |WUPPERTAL|1983-02-21|wuppdika!|

-- data B
INSERT INTO customers_B VALUES 
    ('WILLY WONKA INC.', 'SYDNEY', '2020/08/20', 0);
INSERT INTO customers_B VALUES 
    ('WILLY WONKA INC.', 'SYDNEY', '2020/08/21', 3);    
INSERT INTO customers_B VALUES 
    ('ACME EAST', 'SAN FRANCISCO', '2020/02/21', 0);    
INSERT INTO customers_B VALUES 
    ('ACME EAST', 'SAN FRANCISCO', '2020/04/01', 2);
INSERT INTO customers_B VALUES 
    ('OCEANIC AIR', 'HAWAII', '1983/07/21', 8);    
INSERT INTO customers_B VALUES 
    ('OCEANIC AIR', 'HAWAII', '1983/02/21', 7);

SELECT * FROM customers_B;
--
--CUSTOMER_NAME   |LOCATION     |TX_DATE   |CUTOMER_RATING|
------------------|-------------|----------|--------------|
--WILLY WONKA INC.|SYDNEY       |2020-08-20|             0|
--WILLY WONKA INC.|SYDNEY       |2020-08-21|             3|
--ACME EAST       |SAN FRANCISCO|2020-02-21|             0|
--ACME EAST       |SAN FRANCISCO|2020-04-01|             2|
--OCEANIC AIR     |HAWAII       |1983-07-21|             8|
--OCEANIC AIR     |HAWAII       |1983-02-21|             7|

-- data C
INSERT INTO customers_C VALUES 
    ('WILLY WONKA INC.', 'SYDNEY', '2020/08/20', SYSUUID);
INSERT INTO customers_C VALUES 
    ('WILLY WONKA INC.', 'SYDNEY', '2020/08/21',  SYSUUID);    
INSERT INTO customers_C VALUES 
    ('ACME EAST', 'SAN FRANCISCO', '2020/02/21', SYSUUID);    
INSERT INTO customers_C VALUES 
    ('ACME EAST', 'SAN FRANCISCO', '2020/04/01', SYSUUID);
INSERT INTO customers_C VALUES 
    ('OCEANIC AIR', 'HAWAII', '1983/07/21',  SYSUUID);    
INSERT INTO customers_C VALUES 
    ('OCEANIC AIR', 'HAWAII', '1983/02/21', SYSUUID);

SELECT * FROM customers_C;
--
--CUSTOMER_NAME   |LOCATION     |TX_DATE   |TX_ID                           |
------------------|-------------|----------|--------------------------------|
--WILLY WONKA INC.|SYDNEY       |2020-08-20|23000050CBE89A5C1700EA0878204890|
--WILLY WONKA INC.|SYDNEY       |2020-08-21|24000050CBE89A5C1700EA0878204890|
--ACME EAST       |SAN FRANCISCO|2020-02-21|25000050CBE89A5C1700EA0878204890|
--ACME EAST       |SAN FRANCISCO|2020-04-01|26000050CBE89A5C1700EA0878204890|
--OCEANIC AIR     |HAWAII       |1983-07-21|27000050CBE89A5C1700EA0878204890|
--OCEANIC AIR     |HAWAII       |1983-02-21|28000050CBE89A5C1700EA0878204890|

An example query against one of the tables would look like this:

-- the query
 select
    customer_name,
    location,
    min(tx_date)
  from customers_a
  where customer_name = 'ACME'
    and location = 'NEW YORK'
  group by
    customer_name,
    location;
--
--CUSTOMER_NAME|LOCATION|MIN(TX_DATE)|
---------------|--------|------------|
--ACME         |NEW YORK|  2020-02-21|

a general function that works on CUSTOMER data

What we want now is an option to take what we consider to be relevant CUSTOMER data and execute the query against that.

You may have noticed that the three tables are similar but not exactly the same. They all have three columns CUSTOMER_NAME, LOCATION, and TX_DATE. But they also have additional columns, that have no relevance for our query.

To make this work, we can create a user-defined table function (table udf) that takes in tables with just the three relevant columns.

Looking at the result set of our test query, we also know what the output/result table should look like: it will be a table that also has CUSTOMER_NAME, LOCATION, and TX_DATE columns.

simple start – define an identity-function

An easy way to start building such a function is to start off with a function that just returns the input data.

CREATE OR REPLACE FUNCTION earliest_customer_transaction 
       (IN customers TABLE (CUSTOMER_NAME NVARCHAR (200)
                          , LOCATION NVARCHAR (200)
                          , TX_DATE DATE))
       
      RETURNS TABLE (CUSTOMER_NAME NVARCHAR (200)
                   , LOCATION NVARCHAR (200)
                   , TX_DATE DATE)
                   
AS 
BEGIN

    RETURN :customers;
END;

With this we have the table “signature” defined and can start to play around with it:

DO BEGIN

    my_customers = SELECT customer_name, location, tx_date
                   from customers_a
                   where customer_name = 'ACME'
                     and location = 'NEW YORK';
                        
    SELECT  
         customer_name, location, tx_date
    FROM
        earliest_customer_transaction (:my_customers);
END;
--
--CUSTOMER_NAME|LOCATION|TX_DATE   |
---------------|--------|----------|
--ACME         |NEW YORK|2020-02-21|
--ACME         |NEW YORK|2020-04-01|

Neat!
We can feed a result set from our query into the table function and get the same data back.

add the “business logic”

Now that we have the formalities of setting up the function and calling it under control, let’s add the aggregation logic.

-- add the aggregation to the function
CREATE OR REPLACE FUNCTION earliest_customer_transaction 
       (IN customers TABLE (CUSTOMER_NAME NVARCHAR (200)
                          , LOCATION NVARCHAR (200)
                          , TX_DATE DATE))
       
      RETURNS TABLE (CUSTOMER_NAME NVARCHAR (200)
                   , LOCATION NVARCHAR (200)
                   , TX_DATE DATE)
                   
AS 
BEGIN

    _result = SELECT customer_name, location, min(TX_DATE) AS TX_DATE
             FROM :customers
             GROUP BY customer_name, location;

     RETURN :_result;
END;

Checking with our test code again:

DO BEGIN
    my_customers = SELECT customer_name, location, tx_date
                   from customers_a
                   where customer_name = 'ACME'
                     and location = 'NEW YORK';
                        
    SELECT  
         customer_name, location, tx_date
    FROM
        earliest_customer_transaction (:my_customers);
END;
--
--CUSTOMER_NAME|LOCATION|TX_DATE   |
---------------|--------|----------|
--ACME         |NEW YORK|2020-02-21|

This works fine, too. Now we only get one record with the earliest TX_DATE back.

It should be clear that we could do any form of computation within our function, not just aggregation. In fact, the function now contains the business logic that we want to apply to the data and is de-coupled from any actual table in the system. As long as we feed it data in the correct “CUSTOMERS” form, the function will correctly apply the computation logic.

And all that, without having to use dynamic SQL or string concatenation etc.

But does it work with other tables?

As the main goal was to use the computation against several tables, let’s try this:

-- let's query some data from other tables 
DO BEGIN
-- customers_a
    my_customers_a = SELECT customer_name, location, tx_date
                   from customers_a
                   where customer_name = 'ACME'
                     and location = 'NEW YORK'
                      OR a_memo = 'wuppdika!';
                        
    res_a = SELECT  
         customer_name, location, tx_date
    FROM
        earliest_customer_transaction (:my_customers_a);
    
-- customers_b
    my_customers_b = SELECT customer_name, location, tx_date
                   from customers_B
                   where customer_name = 'OCEANIC AIR'
                     and location = 'HAWAII'
                     and customer_rating >= 4;
                 
    res_b = SELECT  
         customer_name, location, tx_date
    FROM
        earliest_customer_transaction (:my_customers_b);
    
-- customers_c
    my_customers_c = SELECT customer_name, location, tx_date
                   from customers_c
                   where 
                        location = 'SYDNEY';
                        
    res_c = SELECT  
         customer_name, location, tx_date
    FROM
        earliest_customer_transaction (:my_customers_C);
    
-- get the results
    SELECT 'A' AS QUERY, * FROM :res_a 
    UNION ALL
    SELECT 'B' AS QUERY, * FROM :res_b 
    UNION ALL
    SELECT 'C' AS QUERY, * FROM :res_C;
END;

--
--QUERY|CUSTOMER_NAME   |LOCATION |TX_DATE   |
-------|----------------|---------|----------|
--A    |TUFFI MILK      |WUPPERTAL|1983-02-21|
--A    |ACME            |NEW YORK |2020-02-21|

--B    |OCEANIC AIR     |HAWAII   |1983-02-21|

--C    |WILLY WONKA INC.|SYDNEY   |2020-08-20|

In the example above, the data for our “CUSTOMERS” table is different for each table. For CUSTOMERS_A the base query has an additional OR-branch in the WHERE clause that allows for the “TUFFI MILK” entry to be part of the result set.

For CUSTOMERS_B and additional filter on CUSTOMER_RATING is part of the query. And for CUSTOMERS_C we filter on LOCATION only.

Yet, in all three cases, the function correctly applied the query logic and returned the records that we are interested in.

How do use this is real code?

The approach shown here works because we can use table variables (probably better referred to as “result set labels”) in SQL Script to assign parameter values. Upon execution time, HANA replaces the table variables with the queries “behind” them and optimizes the resulting overall query. For our example function, this looks like the following example (simplified).

A call like this:

-- customers_c
    my_customers_c = SELECT customer_name, location, tx_date
                   from customers_c
                   where 
                        location = 'SYDNEY';
                        
    res_c = SELECT  
         customer_name, location, tx_date
    FROM
        earliest_customer_transaction (:my_customers_C);

would lead to an internal query like this:

    _result = SELECT customer_name, location, min(TX_DATE) AS TX_DATE
             FROM 
                 (SELECT customer_name, location, tx_date
                   from customers_c
                   where 
                  location = 'SYDNEY') as customers
             GROUP BY customer_name, location;

     RETURN :_result;

Here, we have the “replacing of the table in the FROM” clause, that we wanted.

If you want to use this in, say ABAP, one option would be to create small stored procedures for each physical table that you want to use the table function with.

This might seem like we are in the same position as before since now, we again have DB procedure code that has to have the table names hardcoded in it.

The "before" situation - data selection and business rules are intertwined and depend directly on the accessed physical tables.
The “before” situation – data selection and business rules are intertwined and depend directly on the accessed physical tables.

But, we are, in fact, better off: the remaining “hardcoded” procedures only contain the boilerplate code to call the table function, but not the actual “business logic”.

The "after" situation - data selection and business logic are separated and only the boilerplate code to call the business logic depends on the physical tables.
The “after” situation – data selection and business logic are separated and only the boilerplate code to call the business logic depends on the physical tables.

When any of the source tables changes, then only the calling procedure will need to be adjusted/recompiled, while all other procedures as well as the “business logic” function can stay as they are.

In other languages, e.g. JAVA or JavaScript we could just use an anonymous SQLScript block (DO BEGIN … END) as I have done in the examples above.

Summary

By thinking about tables and result sets as data types (table types) we can isolate computation/business logic into table functions (or procedures) from any physical tables. That way, we can have a “dynamic” application of those functions without all the negative effects of dynamic SQL.

There you go, now you know.


Update 22.11.2020

Jörg Brandeis wrote a piece based on the technique I described here: Reusing Business Logic in AMDP Transformation Routines. Might be worthwhile to check this out for SAP BW/ABAP developers.

2 Comments

  • Abhishek Hazra says:

    Hi Lars,

    Very informative blog post that you have responded with regards to my requirement. It really helped me to think of an alternative approch to solve my requirement. But still, the question remains open if it is possible to generate a table name dynamically if we know the object name & use in the select in the calling AMDP. May be I was not super clear with my requirement in stack overflow, I wish I could attach a snippet to explain what it really stands for. Basically we wanted a generic stored procedure which could be reused in any solution across the system regardless of the fact whether the multiple source tables have same fields or not. So, the idea was to have the business logic centered in the AMDP in transformation between 2 infosources itself, but to be able to have the selectable table name generated at run time with help of a generic stored procedure. Let’s say, the AMDP has a business logic : select ……. from :dyn_tab.
    And the dyn_tab is the table name (derived by concatenating “‘/BIC/A'” as prefix & ‘2’ as suffix to the provider name) returned by a stored procedure with the parameter passing the provider name as input to it. So, regardless of the business logic, we have one stored procedure which can return the table name to be used in the AMDP based on the source of the loads. But I can understand the risks & performance penalty from this from the stack over flow response. So, I am thinking of an alternative with the inputs from the responses.
    Thanks a lot & have a good day!

    Best Regards,
    Abhi

    • Hi Abhi,

      good to see that you found your way to this blog post.

      Look, I do believe that I understand what you are aiming for. Dynamically resolving BW datastructures on DB level – and the table name(s) for a given is just part of this – is definitively possible. After all, BW does it itself and all the information about all objects are stored in the DB, too.
      This would re-build the same standard function that is already available in BW (in ABAP).
      So why re-build it?
      To me it comes down to where the knowledge about the internal structure of the objects is kept; and that’s in the ABAP context of BW.
      I’d rather generate object specific procedures and views on the fly with an ABAP program than trying to re-build already available features in SQLScript. Really, for what?
      The main motivation to implement logic in SQLScript is to do data processing near the data – not to generate SQL-code. But that is exactly what you try to do here.
      So, your experience that this is complicated and comes with many drawbacks seems to be a straight consequence of that.

      On another level: there are no concepts on DB level that are in fact independent of the data structure. If you want to, say, get the “most recent transaction” than you have to have a data structure that contains the relevant information to compute “the most recent transaction”. This can never be “generic”. What can be generic is a program that creates a specific – non-generic – DB program. And that is exactly what BW does itself: it creates DB programs based on the object definitions and computation logic provided by the BW designer.

      It’s clear now that this question covers a lot of conceptual grounds and probably more than we can discuss here.
      I hope this helps anyhow.

      Cheers
      Lars

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.