This is about an error I ran into while fiddling with an SQL query. Since it was not immediately obvious to me what was the cause this makes a good topic for a new post.
Setting up
I used a table like this:
create column table base_table
(product NVARCHAR(20) NOT NULL
, location NVARCHAR(4) NOT NULL
, p_version NVARCHAR(3)
, qty INTEGER
, total_qty INTEGER
, p_date date NOT NULL);
And I put some data into it:
set 'DATE_FORMAT' = 'DD/MM/YYYY';
truncate table base_table;
insert into base_table
values ('70012345', '1020', NULL, NULL, NULL, '01/01/2019');
insert into base_table
values ('70012345', '1020', '001', 10, 30 , '01/02/2019');
insert into base_table
values ('70012345', '1020', '002', 20, 30, '01/02/2019');
insert into base_table
values ('70012345', '1020', NULL, NULL, NULL, '01/03/2019');
insert into base_table
values ('70012345', '1020', NULL, NULL, NULL, '01/04/2019');
insert into base_table
values ('70012345', '1020', NULL, NULL, NULL, '01/05/2019');
insert into base_table
values ('70012345', '1020', NULL, NULL, NULL, '01/06/2019');
insert into base_table
values ('70012345', '1020', '003', 30, 30, '01/07/2019');
insert into base_table
values ('70012345', '1020', NULL, NULL, NULL, '01/08/2019');
insert into base_table
values ('70012345', '1020', NULL, NULL, NULL, '01/09/2019');
insert into base_table
values ('70012345', '1020', '002', 15, 55, '01/10/2019');
insert into base_table
values ('70012345', '1020', '004', 40, 55, '01/10/2019');
insert into base_table
values ('70012345', '1020', NULL, NULL, NULL, '01/11/2019');
insert into base_table
values ('70012345', '1020', '001', 25, 25, '01/12/2019');
Which of course yields this result:
select * from base_table order by p_date asc;
PRODUCT LOCATION P_VERSION QTY TOTAL_QTY P_DATE
70012345 1020 ? ? ? 01/01/2019
70012345 1020 001 10 30 01/02/2019
70012345 1020 002 20 30 01/02/2019
70012345 1020 ? ? ? 01/03/2019
70012345 1020 ? ? ? 01/04/2019
70012345 1020 ? ? ? 01/05/2019
70012345 1020 ? ? ? 01/06/2019
70012345 1020 003 30 30 01/07/2019
70012345 1020 ? ? ? 01/08/2019
70012345 1020 ? ? ? 01/09/2019
70012345 1020 002 15 55 01/10/2019
70012345 1020 004 40 55 01/10/2019
70012345 1020 ? ? ? 01/11/2019
70012345 1020 001 25 25 01/12/2019
The query
One of the things I want to do with this data is to find the records where the columns P_VERSION
, QTY
and TOTAL_QTY
are all NULL
.
I figured that writing three IS NULL
conditions would look inelegant, so I came up with this construct:
select
product, location, p_date
from base_table
where
coalesce(p_version, qty, total_qty) IS NULL;
This worked without any issue, but I was interested in how the execution plan for this would look like.
OPERATOR_NAME OPERATOR_DETAILS
COLUMN SEARCH BASE_TABLE.PRODUCT, BASE_TABLE.LOCATION, BASE_TABLE.P_DATE
COLUMN TABLE FILTER CONDITION:
COALESCE(TO_INT(BASE_TABLE.P_VERSION)
, BASE_TABLE.QTY
, BASE_TABLE.TOTAL_QTY
) IS NULL
In the edited EXPLAIN PLAN output above, line number 4 stands out because it does not only contain the COALESCE function but also the TO_INT()
data type conversion function for the P_VERSION
column.
Checking back the SQL query text ensures: I did not put it there.
So… what is going on here?
SQL DBMS provide a statically typed environment.
That means, that every expression, table, view, function, and procedure has a predefined fixed data type.
In our case, the condition coalesce(p_version, qty, total_qty) IS NULL
is made up of two expressions: the coalesce
function and the IS NULL
comparison.
The COALESCE function returns the first argument that is NOT NULL
starting from the left.
So, if P_VERSION
has a value, then P_VERSION
gets returned.
Otherwise, if QTY
has a value, then QTY
gets returned.
Otherwise, if TOTAL_QTY
has a value, then TOTAL_QTY
gets returned.
Finally, if all arguments are NULL
, then NULL
gets returned.
Looking at this, what is is the data type of the coalesce expression?
Looking back at the table definition we find
create column table base_table
(product NVARCHAR(20) NOT NULL
, location NVARCHAR(4) NOT NULL
, p_version NVARCHAR(3)
, qty INTEGER
, total_qty INTEGER
, p_date date NOT NULL);
P_VERSION
is an NVARCHAR
type, while QTY
and TOTAL_QTY
are INTEGER
types.
As the EXPLAIN PLAN has shown SAP HANA reconciles this by casting P_VERSION
to INTEGER
during execution.
Ok, just a type-cast, all good. Or is it?
While this works with the data currently in the table, this easily can lead to runtime errors, while the SQL compiled without any issues. After inserting a record like this:
insert into base_table
values ('70012345', '1020'
, 'V1'
, NULL, NULL, '01/01/2019');
the SQL from above fails with
Could not execute
'select product, location, p_date
from base_table
where coalesce(p_version, qty, total_qty) is NULL'
in 114 ms 494 µs .
SAP DBTech JDBC:
[339]: invalid number:
[6930] attribute value is not a number;
isnull(if(isnull([here]int("P_VERSION"))
, if(isnull("QTY")
, "TOTAL_QTY"
, "QTY")
, int("P_VERSION"))) = 1,
(P_VERSION = 'V1'[string]
, QTY = -1664583280[int] null
, TOTAL_QTY = 32605[int] null)
,docid 1,HXE::SYSTEM:BASE_TABLE$delta_1$ (t 38048)
Note, how the error message of this HANA version (2.00.040) not only returns the error number and text but also what the failing record looks like and where it is stored in the column table structure (here, the record was still in the delta store and has been the only record in there).
A bit surprising is that the NULL
values for QTY
and TOTAL_QTY
seem to be encoded differently, but overall this error message provides enough information to understand what caused the problem.
Moral of the story?
Finally, I went back to revert my “clever” statement to the more verbose:
select
product, location, p_date
from base_table
where
( p_version IS NULL
and qty IS NULL
and total_qty IS NULL);
This example is a good reminder that every expression in SQL has precisely one datatype and that this datatype is determined when the statement ist prepared.
In addition, unless the statement uses explicit type casting the database will “take care” of things on its own behalf and pick which types get converted into which other types based on a type precedence ranking.
The important take from this example is that there can be data type-related errors, that are not discovered during SQL preparation and that may only flare-up, when the “right” kind of data is present.
Wait a minute… these are all NULLs, why the error message?
The important bit here is to see that COALESCE()
needs to be evaluated in order to perform the filtering. So, COALESCE()
needs to touch every single record and only where it yields NULL, the record gets included in the result set.
And there you go, now you know.