One of the funny things one can learn when living with South Africans is their concept of time.
“Now now“, “Just now” or “now” do mean very different things and can confuse the rest of the world terribly.
For an in-depth explanation don’t hesitate to contact your local Springbok or check this nice online article: South African Time explained: Just Now versus Now Now.
While I do live with lovely South Africans, this blog is mostly about databases and HANA and so is this post.
Cheat mode
Jump directly to the TL;DR.
NOW
One of the more popular functions with HANA early on was the now function. It returns the current local timestamp of the HANA server.
A big downside of this function is that its name is very vague about what it does and what do expect as a result.
Granted, this function can also be found in the very popular MySQL and PostgreSQL DBMS’s, so it is not entirely surprising that many developers felt right at home with this.
CURRENT_TIMESTAMP
With HANA 1 SPS 03 (late 2011) functions with more explicit names were introduced:
current_date, current_time, and current_timestamp make it rather clear, what to expect.
Timezones?
All could have been good with this, but the globalized life is more complicated than that, so one has to consider timezones when dealing with datetime data.
The fact that I seem to keep on writing posts relating to this topic either indicates that I have trouble getting to the point or, maybe, the topic is rather complex.
Personally, I go with the later and point you to Trouble with time? , You got the time?, and Finding answers on workdays for some spare-time reading.
Anyhow, HANA’s datetime datatypes do not carry any timezone information with them, which means, the person writing a query has to know what timezone the data is in. Good advice is to keep data stored in UTC on DB level. But if this is done, this data can only be compared to UTC times and dates.
For that, HANA comes with the UTC-variety of the mentioned functions: current_utcdate, current_utctime, current_utctimestamp.
Periods, deadlines, terms
The main motivator of having all these datetime functions in a DBMS is of course that the majority of databases are used in business contexts. And businesses revolve around periods, deadlines and terms. That means, it is important to get “time” right when developing applications or designing queries.
Example-time
A query I have been working on, a classic “customer aging” report, includes this expression:
DAYS_BETWEEN("DueDate", NOW())
I did a very informal survey in my home-office and we came to the conclusion that this means:
the number of days between now and whatever is in the column “DueDate”.
To make this expression more … expressive (there’s a dollar in the bad-pun jar), I changed it to
DAYS_BETWEEN("DueDate", current_timestamp)
since current_timestamp
and now()
are the exact same functions in HANA. Checking the result of the expression confirmed this understanding.
Now, looking at it again, I figured that mixing a date with a timestamp is not quite “clean”.
Thus, I changed the expression to
DAYS_BETWEEN("DueDate", current_date)
To my surprise, now the results of my query changed. My report now showed customers having a lot less outstanding balances. Did I just lose money?
DAYS_BETWEEN
In my mind, the expression represents a payment term in calendar day resolution. If the payment is due on Monday and I get it on Thursday that’s three days overdue. It does not matter whether the payment gets in on Thursday early morning or late afternoon.
Unfortunately, that is not what DAYS_BETWEEN
does.
Looking very sharply at the documentation one finds that the parameters are timestamps.
To work out how many days are between two inputs DAYS_BETWEEN
first converts dates to timestamps and then computes how many times a whole day worth of seconds (60*60*24 = 86400) goes into the difference.
Let’s say the value for “DueDate” is Monday, 27.01.2020 and current_date returns Wednesday, 29.01.2020.
Then
DAYS_BETWEEN("DueDate", current_date)
=> DAYS_BETWEEN('2020-01-27', '2020-01-29')
==> 2
becomes
DAYS_BETWEEN('2020-01-27 00:00:00', '2020-01-29 00:00:00')
When using current_timestamp
, this happens:
DAYS_BETWEEN("DueDate", current_date)
=> DAYS_BETWEEN('2020-01-27 00:00:00', '2020-01-29 00:00:00')
==> 2
“The result is the same! What is your problem?” I hear you think, dear reader. And, indeed, in this case, there is no difference between those approaches.
What happens if the “DueDate” is in the future though?
DAYS_BETWEEN("DueDate", current_date)
=> DAYS_BETWEEN('2020-01-30', '2020-01-29')
==> -1
That looks right to me: there’s is one day to go or -1 day overdue.
Checking with current_timestamp
results in
DAYS_BETWEEN("DueDate", current_timestamp)
=> DAYS_BETWEEN('2020-01-30', '2020-01-29 11:43:15')
==> 0
See what I mean?
With the timestamp not at midnight ( 00:00:00
) there are not enough seconds between those two points in time to make up a full day.
While this may be a subtle difference, it can become rather important when summing up how much money a customer owes at what point in time.
Which one is right then?
To me, the most important aspect with this sort of thing is that the user of the query correctly understands what the expression means at the level of abstraction s/he is working on.
In the case of the “customer aging” report, the expectation would probably be that the “number of days due” is in calendar days and that the time of the day does not matter.
One-up this problem
Maybe now you are wondering why I brought up the timezone topic before. Well, while we have settled on a mode to calculate the “days due” we have no idea whether the data in the table is in the same timezone as current_date
/ current_timestamp
.
This can only be figured out by checking the HANA system setup and the application/data source that created the data in the table. No shortcut there for that.
One thing you can do, however, to make your life and that of everyone that uses your query, view or data model is to make this important piece of information visible:
DAYS_BETWEEN("DueDate_UTC", current_utcdate)
or
DAYS_BETWEEN("DueDate_AEST", current_date)
(assuming the HANA server is in AEST-timezone)
TL;DR
DAYS_BETWEEN
works on timestamps (not dates) and counts how often a day’s worth of seconds goes into the difference between the inputs.
This can lead to counterintuitive results when calendar days are expected to be used.
There you go, now you know.