Types of NULL

This is about an error I ran into while fiddling with an SQL query. Since it was not immediately obvious to me what was the cause this makes a good topic for a new post.

Setting up

I used a table like this:

create column table base_table 
         (product     NVARCHAR(20) NOT NULL
        , location    NVARCHAR(4) NOT NULL
        , p_version   NVARCHAR(3) 
        , qty         INTEGER 
        , total_qty   INTEGER
        , p_date date NOT NULL);

And I put some data into it:

set 'DATE_FORMAT' = 'DD/MM/YYYY';

truncate table base_table;

insert into base_table  
    values ('70012345', '1020', NULL, NULL, NULL,  '01/01/2019');
insert into base_table  
    values ('70012345', '1020', '001', 10, 30 , '01/02/2019');
insert into base_table  
    values ('70012345', '1020', '002', 20, 30, '01/02/2019');
insert into base_table  
    values ('70012345', '1020', NULL, NULL, NULL,   '01/03/2019');
insert into base_table  
    values ('70012345', '1020', NULL, NULL, NULL,   '01/04/2019');
insert into base_table  
    values ('70012345', '1020', NULL, NULL, NULL,   '01/05/2019');
insert into base_table  
    values ('70012345', '1020', NULL, NULL, NULL, '01/06/2019');
insert into base_table  
    values ('70012345', '1020', '003', 30, 30, '01/07/2019');
insert into base_table  
    values ('70012345', '1020', NULL, NULL, NULL, '01/08/2019');
insert into base_table  
    values ('70012345', '1020', NULL, NULL, NULL, '01/09/2019');

insert into base_table  
    values ('70012345', '1020', '002', 15, 55, '01/10/2019');

insert into base_table  
    values ('70012345', '1020', '004', 40, 55, '01/10/2019');

insert into base_table  
    values ('70012345', '1020', NULL, NULL, NULL, '01/11/2019');

insert into base_table  
    values ('70012345', '1020', '001', 25, 25, '01/12/2019');

Which of course yields this result:

select * from base_table order by p_date asc;

PRODUCT     LOCATION    P_VERSION   QTY TOTAL_QTY   P_DATE    
70012345    1020        ?           ?   ?           01/01/2019
70012345    1020        001         10  30          01/02/2019
70012345    1020        002         20  30          01/02/2019
70012345    1020        ?           ?   ?           01/03/2019
70012345    1020        ?           ?   ?           01/04/2019
70012345    1020        ?           ?   ?           01/05/2019
70012345    1020        ?           ?   ?           01/06/2019
70012345    1020        003         30  30          01/07/2019
70012345    1020        ?           ?   ?           01/08/2019
70012345    1020        ?           ?   ?           01/09/2019
70012345    1020        002         15  55          01/10/2019
70012345    1020        004         40  55          01/10/2019
70012345    1020        ?           ?   ?           01/11/2019
70012345    1020        001         25  25          01/12/2019

The query

One of the things I want to do with this data is to find the records where the columns P_VERSION, QTY and TOTAL_QTY are all NULL.
I figured that writing three IS NULL conditions would look inelegant, so I came up with this construct:

select
     product, location, p_date
from base_table
where 
   coalesce(p_version, qty, total_qty) IS NULL;

This worked without any issue, but I was interested in how the execution plan for this would look like.

OPERATOR_NAME   OPERATOR_DETAILS 
COLUMN SEARCH   BASE_TABLE.PRODUCT, BASE_TABLE.LOCATION, BASE_TABLE.P_DATE 
  COLUMN TABLE  FILTER CONDITION: 
                COALESCE(TO_INT(BASE_TABLE.P_VERSION)
                        , BASE_TABLE.QTY
                        , BASE_TABLE.TOTAL_QTY
                        ) IS NULL

In the edited EXPLAIN PLAN output above, line number 4 stands out because it does not only contain the COALESCE function but also the TO_INT() data type conversion function for the P_VERSION column.
Checking back the SQL query text ensures: I did not put it there.

So… what is going on here?

SQL DBMS provide a statically typed environment.
That means, that every expression, table, view, function, and procedure has a predefined fixed data type.
In our case, the condition COALESCE(p_version, qty, total_qty) IS NULL is made up of two expressions: the COALESCE function and the IS NULL comparison.

The COALESCE function returns the first argument that is NOT NULL starting from the left.
So, if P_VERSION has a value, then P_VERSION gets returned.
Otherwise, if QTY has a value, then QTY gets returned.
Otherwise, if TOTAL_QTY has a value, then TOTAL_QTY gets returned.
Finally, if all arguments are NULL, then NULL gets returned.

Looking at this, what is is the data type of the coalesce expression?

Looking back at the table definition we find

create column table base_table 
         (product     NVARCHAR(20) NOT NULL
        , location    NVARCHAR(4) NOT NULL
        , p_version   NVARCHAR(3) 
        , qty         INTEGER 
        , total_qty   INTEGER
        , p_date date NOT NULL);

P_VERSION is an NVARCHAR type, while QTY and TOTAL_QTY are INTEGER types.
As the EXPLAIN PLAN has shown SAP HANA reconciles this by casting P_VERSION to INTEGER during execution.

Ok, just a type-cast, all good. Or is it?

While this works with the data currently in the table, this easily can lead to runtime errors, while the SQL compiled without any issues. After inserting a record like this:

insert into base_table  
    values ('70012345', '1020'
           , 'V1'
           , NULL, NULL,  '01/01/2019');

the SQL from above fails with

Could not execute 
'select product, location, p_date 
from base_table 
where coalesce(p_version, qty, total_qty) is NULL' 
in 114 ms 494 ┬Ás . 

SAP DBTech JDBC: 
[339]: invalid number:  
[6930] attribute value is not a number;

isnull(if(isnull([here]int("P_VERSION"))
        , if(isnull("QTY")
        , "TOTAL_QTY"
        , "QTY")
        , int("P_VERSION"))) = 1,
 
 (P_VERSION = 'V1'[string]
, QTY = -1664583280[int] null
, TOTAL_QTY = 32605[int] null)

,docid 1,HXE::SYSTEM:BASE_TABLE$delta_1$ (t 38048) 

Note, how the error message of this HANA version (2.00.040) not only returns the error number and text but also what the failing record looks like and where it is stored in the column table structure (here, the record was still in the delta store and has been the only record in there).
A bit surprising is that the NULL values for QTY and TOTAL_QTY seem to be encoded differently, but overall this error message provides enough information to understand what caused the problem.

Moral of the story?

Finally, I went back to revert my “clever” statement to the more verbose:

select
     product, location, p_date
from base_table
where 
   (    p_version IS NULL
    and qty IS NULL
    and total_qty IS NULL);

This example is a good reminder that every expression in SQL has precisely one datatype and that this datatype is determined when the statement ist prepared.
In addition, unless the statement uses explicit type casting the database will “take care” of things on its own behalf and pick which types get converted into which other types based on a type precedence ranking.

The important take from this example is that there can be data type-related errors, that are not discovered during SQL preparation and that may only flare-up, when the “right” kind of data is present.

Wait a minute… these are all NULLs, why the error message?

The important bit here is to see that COALESCE() needs to be evaluated in order to perform the filtering. So, COALESCE() needs to touch every single record and only where it yields NULL, the record gets included in the result set.

And there you go, now you know.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.