Apart from writing blog posts I have written answers to many questions on several Q&A sites like StackOverflow and SAP Community. Some of those answers received good feedback and/or the answer surprised me once I worked it out.
As I keep on coming back to these answers every now and then, I figure it may be worthwhile to collate them here too.
SAP HANA or SAP-specific questions
Questions listed here are about HANA specifics or SAP “special” cases and may not apply to other SQL databases.
This answer is at the top of this list because it is the most upvoted answer I have written on Stackoverflow. The question was simply “How to show / list all tables in SAP HANA via SQL?” and the fact that this has been such a popular topic seems to give a good indication about the types of hurdles beginner users face with HANA.
With HANA SAP went into full “innovation” mode: everything about a database/data processing platform/business platform was open for discussion. Mixed with SAP’s chronic struggle to take users “with them” and to explain in plain language what the new stuff is for, those efforts led to understandable confusion. In this answer, I provide a bit of context to “Calculation Views“, “CDS Views” and “Table Functions“. The comments by other users also add some clarifications and are worth reading.
I really liked this question as it highlights the user-hostile choice of names and terms in the SAP world. Who could guess what a database view called
This is another question that demonstrates the gaping holes in the SAP HANA documentation. As the situation was not much better for people working for SAP, I found this particularly frustrating. Concepts, such as the CPBTree, might not be overly important in the day-to-day usage of HANA, but the complete absence of a reasonable explanation of a fundamental building block makes a black box of it. This kind of “magic” may be useful for showmanship or sales, but it does not instill trust in developers and engineers.
In my answer, I tried to explain the data structure as well as I was able to and provided links to more detailed resources.
“History Tables” were one of the very first ideas that came out of the insert-only data structure behind the column store tables in HANA. The idea here is that changed records do not overwrite existing ones. Instead, the data structure keeps track of when data of a record was inserted, updated, or deleted. Whenever a SQL command needs to use the data in the table, HANA would automatically use only the current “version” of the record, which is the same result as if the original data would have been overwritten. But since all record versions are still present in the data structure, “time travel” queries are possible, e.g. showing the table as if it was yesterday.
At first sight, this sounds like any data auditor’s dream. When it came to the actual implementation of the feature and what one could reasonably do, it turned out that it wasn’t all that much. Resource consumption for history tables was huge, partitioning was initially not available and when it became available it did not allow to separate current and outdated data. While SAP BW tried to use the feature for the implementation of their DataStore objects, they turned away from it again rather quickly.
I have answered questions about this elusive HANA feature – again, the documentation was scant – many times over in various forums. My Stackoverflow answer explains what the corresponding SQL standard feature “System versioned tables” does and how it is implemented in SAP HANA. Do not use SAP HANA history tables!
There is a specific type of question about programming HANA that seem to occur mostly to developers that come from MS SQL Server. Using common table expressions (aka CTE’s, WITH clause) and creating temporary tables as placeholders for intermediate result sets must be among the most often asked ones.
This specific question ventures in a slightly different direction though: the need for an auxiliary table that contains monthly dates starting from a specific anchor date.
The answer not only explains how to use two options to efficiently solve the problem with SAP HANA features: the
This was a rather surprising question and I learned quite a bit while working out the answer. When using a scalar user-defined function (that is an SQLScript function that returns only scalar variables but no table variables), it takes longer to run the function the more return values are fetched. This is counterintuitive as the code in the function computes all the values when it is executed.
It turned out that the problem was not in the function itself, but in the way, it got called for multiple return variables in a SELECT statement: once for every variable. This answer even sparked a follow-up question.
In Hana query of a CTE, why does type conversion fail in WHERE clause (though it succeeds in SELECT clause)
Like any other piece of software, SAP HANA has bugs and sometimes the best one can do, is try to understand the issue well enough to report the issue and to find options to work around it.
In this answer, I explained how I analysed the issue, found the actual cause of it (not the use of a CTE but a HANA internal query execution engine bug), and explained how HANA does text-to-decimal type conversion (as a side-quest).
SAP applications are business applications and that implies that the databases tend to store information about things like sales, revenue, dates, quarters, locations, etc. Now, a very common request is to take information from a particular level of granularity, say yearly revenue data, and distribute it evenly over the next smaller grain of information, say quarterly. This operation is also known as disaggregation and HANA developers can use built-in functions for that. See my answer for the details on that.
SAP HANA was the first commercial DBMS that used column store tables for large-scale transactional use cases. The technology components existed before, no doubt, but SAP’s choice to put it together in a single system and run its applications on top of it put it into a different league of usage than anything previously done with the technology. And while HANA is made up of so many technical improvements over “classic” DBMS (compression, in-memory, SIMD, multi-core/NUMA design, heavy query parallelization, adaptive variable length encoding, late materialisation algorithms, cache conscious data structures, …) the column store was the piece that developers and DBAs “latched” on to quickly. As memory usage was/is the main constraint for a HANA instance, understanding what amount of memory is allocated by data is an essential part of the HANA DBA role.
But focussing on single/simple numbers in a very complex system typically does not help all that much. This answer addresses a concern about how HANA deals with the “delta merge” and if there may be a memory leak.
This is a collection of multiple questions about how HANA deals with INSERTs into partitioned tables. The common mental mix-up that happened for the author of the question was that there are multiple levels of abstraction between the “SQL table”, i.e., what the SQL user interacts with, and the data structures in HANA that make up the “SQL table”.
The same is true for transactions (SQL transactions != changes of the data structures), which regularly led to confusion since HANA was advertised as using “lock-free” data structures for higher parallel performance. In my answer, I tried to address the different confusions one by one.
This question is what I think of as a “transition question“: when starting to work with new technology people take what they know as true in one domain, say ABAP programming, and attach it to similar terms and concepts in the new environment. The difficulty with this is knowing which aspects can be transferred and which are different.
Here it is about the common ABAP tool “internal table” and how that translates into writing SQLscript procedures in HANA.
I see this question as a direct consequence of the confusion about how to use HANA created by SAP themselves. Calculation views, stored procedures, table functions, and SQL are just some of the many options to define data processing in HANA. Naturally, developers wonder: what is the best/right/most performant/less-likely-to-make-me-look-stupid way to do the job in HANA? This is my answer to this question.
The following entries are not HANA specific, but rather how certain data modelling problems can be approached in SQL databases, including HANA.
Finding an answer to this question took me a few hours of thinking and trial and error. I distinctly remember being immensely proud of myself when I finally could present a working solution. The problem itself reads simple enough: given a number of columns, of which some may or may not contain a NULL, return only those columns that do not contain NULL. Of course, the NULL columns change with every record, so a simple projection list does not help here. If you are looking for a SQL brain teaser, read the question without the answer first and try to come up with your own solution.
Imagine a table that contains information about, say, VEHICLES. Now, for different kinds of VEHICLES (e.g. cars, boats, helicopters, etc.) you have additional information that is specific to the vehicle kind. For example, you have information on km/litre for cars, water displacement for boats, and maximum lift-off weight for helicopters. These specific detail information are stored in separate tables and the VEHICLES table contains the information that is common to all vehicles (including the fact, that the vehicle entry exists in the first place and what kind of vehicle the entry represents).
Now, after this long introduction into an example of data modelling fit for a primer course in SQL, how do you go about writing a query that fetches all vehicles and their respective details? Read this for a reasonably obvious approach.
This question’s title is unfortunately a bit too generic. I guess that the author wanted to generalise from the specific problem (dealing with gaps in transaction date entries) and came up with this title. The solution I presented is based on the specific property of the problem that it is about dates of sales transactions. For this sort of question, using a calendar-auxiliary table that holds entries for all calendar dates (in the timeframe of interest, that is) is a common tool that every SQL developer should know.
If you wonder what the answer to this question is and why – please, do make sure to read this right now. If you believe the answer has to do with storage efficiency or query performance, then you are so, so wrong.
Ever wanted to just show the most recent entries for a specific “logical” primary key? Then these answers should give you the pointers to solve this problem. The bottom line here (at least for me) is that the solution to this should absolutely not depend on DB-specific special features like ROWID or TXID, but it should be based on the information available in your data model.
A common experience of SQL developers is that real-life requirements for queries are vastly more complicated than whatever was presented in any of the SQL/databases courses available. The more complicated those requirements are, the more important it is to structure the SQL query into pieces that deal with only one or two aspects at a time. “Decomposition” of the problem is the name of the game here. The SQL language provides common table expressions (CTE’s) that can be very useful for splitting problems into smaller, easier problems.
I have even given a presentation showcasing this approach – see SAP Inside Track #sitMEL – Full Day Event in Melbourne for the slide des
This question is an example of converting data with units of measure at query time. The problem is neither HANA nor SAP-specific but is something that comes up with modelling data about real-world objects.
And these are the questions and answers I found worthwhile to revisit so far. I might add to this list when I stumble across one that I missed later. If you know of one that should be included here, feel free to leave a comment.