Be careful with dates and timestamps

A timeline of dates, and overlapping range bars showing tht the selection with DATEs includes more data than the selection with TIMESTAMPs.

A recent question on StackOverflow about how to “format” date values in WHERE-conditions in HANA brought up an aspect of date-handling that is easily overlooked and that can lead to unexpected results.

Dates and formats

Of course, a date value does not have a specific format – to HANA it is a number that represents a certain calendar day. Whatever way you like to have it printed to the screen is another matter.

However, in order to provide a date value in SQL, all we can use is text – so there is at least the need to convert this text into HANA’s date value.
This conversion can be done implicitly/automatically, e.g. when you type things like

SELECT 
   ... 
FROM 
   ... 
WHERE 
   "some_date" = '2020|08|20';

In this case, HANA tries to convert the provided date (here in YYYY|MM|DD format) into a date value.
The operative word here is “try” because HANA does not know what format was used to describe the date. All HANA does is trying to apply its default formats to convert the text. If this conversion fails, it stops with an error:

[303]: invalid DATE, TIME or TIMESTAMP value: 
Error while parsing 2020|08|20 as DATE (at pos 104) 

To avoid this error, there are two options available:

  1. using the TO_DATE(' value', 'date-format') function
    or
  2. as of HANA 2, setting the DateTime session variables.

Both of these options tell HANA what the format for date strings should be.
Which one you use is a matter of preference and scenario.

For example, I used option 2 successfully in order to run SQL script files with many INSERT statements and date values, where the date values were formatted in DD/MM/YYYY.
A single line of

SET 'DATE_FORMAT' = 'DD/MM/YYYY';

allowed me to run the script without tedious re-formatting of date values.

Date format solved – but there is another issue

With this knowledge, it is straightforward to answer the SO question and Sandra Rossi did just that and provided a very comprehensive answer.

This answer got accepted, but the OP provided another bit of information: the column that the date value should be compared to is a TIMESTAMP data type.

The OP’s query would avoid the conversion error by using the TO_DATE function with the appropriate format:

SELECT 
     * 
FROM 
    OPDN A 
WHERE
    A."DocDate" BETWEEN to_date('01/01/2020', 'MM/DD/YYYY')
                    and to_date('01/31/2020', 'MM/DD/YYYY')

However, this would not yield the result the OP was (likely) looking for.

Make sure to write what you mean

With column "DocDate" being a TIMESTAMP another data type conversion is necessary to compare it to the dates in the BETWEEN-clause.

HANA could either convert all values of "DocDate" to dates by dropping the time component or it could simply convert the two constant values from the BETWEEN-clause to TIMESTAMP data types by adding time components.
While there is no formal definition of what any DBMS has to do in such cases, the heuristic here is that converting two values is usually quicker than converting potentially thousands of different values in the table.

So, there is a very good chance that HANA will convert the DATE values from the BETWEEN-clause.
And there lies the cause for problems.

When the query was written, the author meant to include all data with the timestamp between the two given dates, including the last day.
Records with timestamps on the last day, say at 14:05 should be included, just as records with timestamps late at night, say 23:48.
But with the range condition converted to TIMESTAMP, it excludes any records past midnight, due to the added time-component:

SELECT 
     * 
FROM 
    OPDN A 
WHERE
    A."DocDate" BETWEEN timestamp'2020/01/01 00:00:00'
                    and timestamp'2020/01/31 00:00:00'
A timeline of dates, and overlapping range bars showing tht the selection with DATEs includes more data than the selection with TIMESTAMPs.
Diagram of what data gets included when selecting with DATEs vs. TIMESTAMPs

To fix this, the query needs to ensure that the comparison is done in DATE-semantics by explicitly converting DocDate to DATE as well.

SELECT 
     * 
FROM 
    OPDN A 
WHERE
    to_date(A."DocDate") BETWEEN to_date('01/01/2020', 'MM/DD/YYYY')
                             and to_date('01/31/2020', 'MM/DD/YYYY')

While this adds another conversion to the query processing, dropping the time-component of the timestamp data is not nearly as resource hungry as parsing text into a date format.
And given that this change ensures that the data returned is what was expected, it is well worth to include it.

There you go, now you know.

Update 24.03.2020

The behavior of BETWEEN described in this post is also reflected in the DAYS_BETWEEN and SECONDS_BETWEEN functions.
As of SAP HANA 2.0 SPS03 HANA considers the full timestamp components when computing the result. Using the TO_DATE function to normalise the calculation to full days is the workaround here, too.

SAP documents this in SAP notes 2573900 – Changed Behavior of the SQL function DAYS_BETWEEN(<d1>,<d2>) in HANA 2.0 SPS03 and 2905198 – What’s HANA Behavior when Calculation Days Between?.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.