For many BW performance relevant DB-features, it’s really all about the details of usage and implementation when it’s about the effect of those features.
The following are two examples of rather small things that went wrong which had a big impact on the system performance.
Two examples for BW on MaxDB
The first two examples are both good examples for the inherent assumptions that developers make during development.
Number 1 goes like this:
A customer has chosen MaxDB as the database platform for its SAP BW instance.
In addition to that, the customer decided to go for a BIA, which is quite a clever choice, if you ask me.
Instead of having a super-expensive and maintenance intensive BW main database that maybe still would require the setup of a BIA, this customer now runs a low-cost low-maintenance main database and the performance intensive reporting out of the expensive but also low-maintenance BIA.
Unfortunately, it looks like nobody anticipated this combination to become popular.
Otherwise, I assume report RSDDTREX_MEMORY_ESTIMATE would have been tested with MaxDB as well.
This report is used to get an estimation of the required memory for the BIA usage.
It’s not too complicated and merely consists of taking the number of rows in an InfoCube and multiply this with the InfoObjects data lengths and some “magic” constants.
So far nothing special.
What’s “special” is that this report still makes use of the nowadays abandoned fact-views from BW3.x-times.
Fact-views make it possible to access the data in both E- and F-fact table at once, by concatenating the sets with a UNION ALL.
That means, fact-views basically look like this:
CREATE VIEW "/BIC/V..." AS
( SELECT col1, col2, ... FROM "/BIC/F...." UNION ALL SELECT col1, col2, ... FROM "/BIC/F...." )
From the ABAP side, this eases the access since you now just have to run one query to get access to all data in an InfoCube.
Our report does the same and runs this statement:
SELECT count(*) FROM "/BIC/VMYCUBE"
The readers with some MaxDB experience might think now:
“That’s great! MaxDB has its file counter statistics and a special COUNT(*) optimization that avoids table/index access for counting!”
And those readers are correct!
Unfortunately the COUNT(*) optimization has a severe limitation: it only works for simple statements.
- no WHERE condition (!),
- no JOINs (!),
- no UNIONS/SET OPERATIONS (!),
- no GROUP BY/ORDER BY (!)
In reality, it means NO NOTHING, just the COUNT(*).
The fact-view used here, therefore, couldn’t take advantage of this optimization and had to do the counting via the brute-force-traditional approach: read the whole first table, read the second whole table, combine the results and count the number of rows.
The execution plan for such an IO/CPU burning process looks like this:
OWNER TABLENAME STRATEGY PAGECOUNT SAPXXX /BIC/FMYCUBE TABLE SCAN 1 SAPXXX /BIC/EMYCUBE TABLE SCAN 1194819 INTERNAL TEMPORARY RESULT TABLE SCAN 1 SHOW RESULT IS COPIED, COSTVALUE IS 10653812 SHOW QUERYREWRITE - APPLIED RULES: SHOW DistinctPullUp 1
The runtime of this little monster was 3 days and running … until the database couldn’t keep the huge temporary result set of approx. 10 Mio. pages (ca. 76 GB) anymore. The report finally dumped with the infamous
“POS(1) Space for result tables exhausted”
Fortunately, the report already was prepared to handle the request without a fact view, but it wasn’t enabled for MaxDB yet.
This was quickly done after a short discussion with the responsible IMS colleague and correction note
#1533676 – Long runtime of program RSDDTREX_MEMORY_ESTIMATE
The execution plans afterwards looked like this:
OWNER TABLENAME STRATEGY PAGECOUNT SAPXXX /BIC/FMYCUBE TABLE SCAN 1 COUNT OPTIMIZATION SHOW RESULT IS COPIED, COSTVALUE IS 2 SHOW QUERYREWRITE - APPLIED RULES: SHOW DistinctPullUp 1
OWNER TABLENAME STRATEGY PAGECOUNT SAPXXX /BIC/EMYCUBE TABLE SCAN 1194819 COUNT OPTIMIZATION SHOW RESULT IS COPIED, COSTVALUE IS 2 SHOW QUERYREWRITE - APPLIED RULES: SHOW DistinctPullUp 1
And the total runtime of the report went down to a few hours (there is other stuff in there that just takes some time).
(Remark: important to understand for MaxDB execution plans is that only the COSTVALUE represents an optimizer estimation. All other PAGECOUNT values refer to the TOTAL number of pages the table or index of this specific line allocates in the database!)
If you look at the sap note with the correction, you’ll find that it was a very small change that made the difference:
IF sy-dbsys <> 'DB400'. APPEND g_v_tablnm_v TO g_t_tablnm. ELSE. APPEND g_v_tablnm_e TO g_t_tablnm. APPEND g_v_tablnm_f TO g_t_tablnm.
IF sy-dbsys = 'DB400' OR sy-dbsys = 'ADABAS D'. APPEND g_v_tablnm_e TO g_t_tablnm. APPEND g_v_tablnm_f TO g_t_tablnm. ELSE. APPEND g_v_tablnm_v TO g_t_tablnm.
Knock, knock, any data in there?
The second example is not only ‘special’ on the MaxDB port but on all databases.
However, for MaxDB, the effect was the worst, due to certain limitations of SQL optimization.
SAP BW is a data warehouse and therefore a lot of the functionality is there to handle data, to store and move data and to get rid of data.
These tasks bring with them the necessity to sometimes drop a table and rebuild it, e.g. when you change an InfoObject-definition.
But before merely dropping tables, BW is cautious and asks“Hey, any data in this table?”.
And indeed, there is a function module called RSDU_DATA_EXISTS_TABLE that answers this question.
Now, before proceeding, ask yourself: how would YOU try to answer this question in SQL?
A common first approach would be: count the number of rows in the table and if it’s larger then 0 then there is some data in the table.
But given the fact that counting the actual number of rows in a table really can take ages (see the example above), this is the second worst idea to approach the issue (and I admit that it was also the first I thought up).
The worst idea I’ve seen so far is what was actually implemented in the function module:
SELECT bname FROM usr01 CLIENT SPECIFIED UP TO 1 ROWS INTO :test WHERE EXISTS ( SELECT * FROM (i_tablnm) CLIENT SPECIFIED ). ENDSELECT.
Let’s see if we can figure out, what this statement should do.
In English it means:
- Give me the column BNAME
- from the table USR01 for at most one row
- for which the set of all rows in table I_TABLNM (this is the one we want to know whether it’s empty or not) contains something.
This is just amazing!
As you can imagine, MaxDB will first create a temporary result set for the exists clause (that is full table copy) and then returns just one row.
If the I_TABLNM table is not empty, this can easily become a similar problem as the example above.
Now, of course, there is a much better way to do this.
If you think about it, all we want is a YES (there’s data in there) or a NO (nope, all empty) and this can be done as well as SAP note #1542839 – “Performance Optimization in RSDU_DATA_EXISTS_TABLE_ADA” nicely demonstrates:
SELECT 'X' FROM (i_tablnm) WHERE ROWNUM <= 1
This means: “Database, go and get me an X for the first row that you hit on the table and stop afterwards!”
Regardless how you process this statement, in the worst case it will end after a few (1-4) page visits.
The database may even use an index-only strategy since NO data from the table needs to be fetched (just a constant).
There are of course similar examples for other DBMS as well, but for the sake of a digestible blog post size, I’ll save them for later posts.