Ok, MaxDB is most often used with a NetWeaver on top of it, so this blog is about a niche topic of a niche product.
Wow – that should be enough understatement and un-buzzing for now.
The question of how and when to collect new optimizer statistics pops up every now and then.
Most people accept that a cost-model-based query optimizer depends on statistics that fit the data and storage usage of the tables and indexes involved in a query to come up with the best possible query execution plan.
But how can we know when the statistics are not fitting good enough anymore?
The reactive strategy would be to monitor the execution runtimes of every query in the system, waiting for runtime increases to show up and then check whether the execution plan had changed compared to the times the query ran quick enough.
Obviously, this strategy is pretty labour and time intensive.
Another way would be to say: “ok, maybe I’m doing some update statistics too often, but at least this does not make execution plans worse”.
This approach (and yes, sometimes the execution plan can become worse, but that’s a different story) is the one employed by recommendations like “Update statistics at least once a week”.
One improvement to this approach is to carefully choose the tables for which new statistics should be collected.
A possibly reasonable criteria for that is the change of data volume since the last update of statistics.
That means we need to compare the current size of a table (in pages) against the size it had when the statistics were last collected.
Fortunately, MaxDB provides two system tables containing this information:
Shows the current size of the table.
This is even true for uncommitted transactions.
So if you load data into your table in session A you’ll be able to monitor the table growth via session B even before session A commits.
Contains the stored optimizer statistics.
Doing this comparison for all tables in your database manually would be a bunch of monkey work, so MaxDB development decided to deliver a built-in monkey in form of a stored procedure:
SUPERDBA.SYSCHECKSTATISTICS (IN CHANGETHRESHOLD INTEGER)
This procedure does the comparison for us.
Via the CHANGETRESHOLD parameter, we can specify the percentage of data volume change that should lead to new statistics.
The procedure then loops over all tables of the current user and the ‘SYSDBA’ schema and performs the check.
Once a table qualifies for new statistics (another reason may be that a table does not have any optimizer statistics at all) the table name is denoted into a system table:
If you’re familiar with the automatic statistics update feature of MaxDB than this table is already known to you.
It’s the same table where the MaxDB Kernel puts table names in when it realizes during a join, that the optimizer statistics were wrong and more data than expected.
Anyhow, apart from the automatic statistics update, there is a command for manual processing of the denoted tables present:
UPDATE STATISTICS AS PER SYSTEM TABLE
This command will read the entries from SYSUPDSTATSWANTED and run a parallelized non-blocking update statistics without sampling.
You may, of course, choose to use the sampling size stored for each table in the database catalogue via
UPDATE STATISTICS AS PER SYSTEM TABLE ESTIMATE
but this will lead to table locks, so it’s not exactly what we want to see in production systems.
Once the statistics collection is finished, you can check the result in the table
SELECT * FROM SYSDBA.SYSUPDSTATLOG \ \ SCHEMANAME|TABLENAME|TABLEID |COLUMNNAME|INDEXNAME|EXECUTED_AT |IMPLICIT|SAMPLE_PCT|SAMPLE_ROW|EXECUTION_START |EXECUTION_END |SESSION|TERMID |SEQNO| \ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- \ LARS |T1 |000000000001A560 | | |2010-05-27 13:19:23|NO |100 |0 |2010-05-27 13:19:35|2010-05-27 13:19:35|14692 |7200@VIENXXXXXXXXA|0 | \ LARS |T1 |000000000001A560 |N1 | |2010-05-27 13:19:23|NO |100 |0 |2010-05-27 13:19:35|2010-05-27 13:19:35|14692 |7200@VIENXXXXXXXXA|1 | \ LARS |T1 |000000000001A560 |N2 | |2010-05-27 13:19:23|NO |100 |0 |2010-05-27 13:19:35|2010-05-27 13:19:35|14692 |7200@VIENXXXXXXXXA|2 | \ LARS |T2 |000000000001A561 | | |2010-05-10 15:05:36|NO |100 |0 |2010-05-10 15:05:36|2010-05-10 15:05:36|14299 |3532@VIENXXXXXXXXA|0 | \ LARS |T2 |000000000001A561 |N1 | |2010-05-10 15:05:36|NO |100 |0 |2010-05-10 15:05:36|2010-05-10 15:05:36|14299 |3532@VIENXXXXXXXXA|1 | \ LARS |ZTEST6 |000000000001A52E | | |2010-05-27 13:19:23|NO |100 |0 |2010-05-27 13:19:23|2010-05-27 13:19:32|14686 |7200@VIENXXXXXXXXA|0 | \ LARS |ZTEST6 |000000000001A52E |MANDT | |2010-05-27 13:19:23|NO |100 |0 |2010-05-27 13:19:23|2010-05-27 13:19:32|14686 |7200@VIENXXXXXXXXA|1 | \ LARS |ZTEST6 |000000000001A52E |OTHID | |2010-05-27 13:19:23|NO |100 |0 |2010-05-27 13:19:23|2010-05-27 13:19:32|14686 |7200@VIENXXXXXXXXA|3 | \ LARS |ZTEST6 |000000000001A52E |UNIID | |2010-05-27 13:19:23|NO |100 |0 |2010-05-27 13:19:23|2010-05-27 13:19:32|14686 |7200@VIENXXXXXXXXA|2 | \ ... \ \
As we saw above, this procedure depends on having the SYSINFO.FILES information at hand.
Unfortunately, for databases that had been upgraded from an SAP DB/MaxDB version <=7.5 this information might not yet be available.
For which tables this information is missing, you can figure out by checking table
As long as the file counters are not present, the SYSCHECKSTATISTICS procedure consults the SYSDBM.ESTIMATED_PAGES table to get an estimation of the current table size.
This might take much longer and would not deliver precise results, but rather an estimation on the total table size.
Summing this up:
Given a MaxDB =>7.6 at a recent patch level you can easily implement a statistics maintenance strategy by running these two commands, say once a week:
–> as your SQL Schema owner:
CALL SYSCHECKSTATISTICS (40)
–> as the SYSDBA (SUPERDBA) of the database:
UPDATE STATISTICS AS PER SYSTEM TABLE
So there would be two commands to be scheduled:
dbmcli -U DB770W -USQL DB770LARS sql_execute “call SUPERDBA.SYSCHECKSTATISTICS (40)”
dbmcli -U DB770W -USQL DB770W sql_execute “UPDATE STATISTICS AS PER SYSTEM TABLE”
Note: DB770W is my XUSER entry for SUPERDBA and DB770LARS is for my SQL-User.
Make sure to remember that XUSER entries are case sensitive!
db770w would not work here!
I hope you like this blog and maybe this technique can be an alternative to the one shown in SDN thread doubt about UPDATE STAT COLUMN .http://forums.sdn.sap.com/thread.jspa?threadID=1668683&tstart=0