After the tremendous (and very unlikely) success of the first MaxDB: Space oddities blog, I immediately decided: this must be the voice of the market asking for more.
So here you get it.
While Oracle DBAs are used to be exposed to features like segment fragmentation (that is, although data is removed from tables/indexes these segments still allocate the same amount of space which won’t be automatically released to the freespace) MaxDB users don’t have to think about this.
In fact, MaxDB users cannot do a single thing to change the way the data is stored. Or could they?
They could. Just by sorting the data before inserting them into the table, quite some space can be saved.
Don’t believe me? Behold and believe!
I’ve created two sets of identical data. One of them will get loaded with data in a sorted order (SORTLOADED) the other one will get loaded in a random order (UNSORTLOADED). Anyhow, both tables contain the very same data afterwards.
The table structure is simple and the same for both tables:
create table "LARS"."SORTLOADED"( "KCOL1" INTEGER not null, "COL2" VARCHAR (10) ASCII, "COL3" VARCHAR (30) ASCII, primary key ("KCOL1"))
create table "LARS"."UNSORTLOADED"( "KCOL1" INTEGER not null, "COL2" VARCHAR (10) ASCII, "COL3" VARCHAR (30) ASCII, primary key ("KCOL1"))
Next I generate and load the test data.
Now this is actually a part that is not that easy with MaxDB. In Oracle you can do things like “CREATE TABLE AS SELECT … ORDER BY …” and thereby create arbitrarily sorted data sets.
MaxDB does not support sorts for subselects (as they are usually meaningless in relational contexts). Therefore I needed to create the data sets somewhere else and I went for MS EXCEL. Since creating the test data is not that interesting in itself, I’ll skip this for now – if you’re interested in this, let me know.
The unequal twin tables
So, now each table contains the exact very same rows, right?
Of course we can doublecheck this:
select * from sortloaded sl where not exists (select 'X' from unsortloaded usl where sl.kcol1= usl.kcol1) Statement 'select * from sortloaded...' successfully executed in 1.242 seconds. - No result
Now let’s check how equal the two tables really are in terms of storage. Let’s use what we’ve learned in “MaxDB space oddities” and run:
select tablename, entrycount, trunc(treeindexsize/8) TREE_PAGES, trunc(treeleavessize/8) LEAF_PAGES from tables , files where schemaname=user and fileid=tableid and tablename in ('SORTLOADED', 'UNSORTLOADED')
TABLENAME ENTRYCOUNT TREE_PAGES LEAF_PAGES ---------------------------------------------------------- SORTLOADED 10000 1 40 UNSORTLOADED 10000 1 48
Hmm… the unsorted load takes eight pages more in this example, that’s 16% more space usage.
What is going on here?
Let’s check the storage of these tables a bit more detailed.
I use a systemtable called ‘tablestatistics’ here, but I’ve to give a warning on it: whenever you select data from this table, the data is gathered from the tables you look up. For big tables this can run very very long and may block other accesses meanwhile.
Therefore it’s better to use the ‘tablestoragedetails’ resp. ‘indexstoragedetails’ instead as these both tables deliver their data based on samples.
For the easier comparison I reformatted the result a bit:
|Min separator length|
|Max separator length|
|Avg separator length|
|Min key length|
|Max key length|
|Avg key length|
|Min row length|
|Max row length|
|Avg row length|
|Min rows per page|
|Max rows per page|
|Avg rows per page|
|Space used in all pages (%)|
|Space used in index pages (%)|
|Space used in index pages (%) max|
|Space used in index pages (%) min|
|Space used in leaf pages (%)|
|Space used in leaf pages (%) max|
|Space used in leaf pages (%) min|
|Space used in root page (%)|
Ok, so what do we see here?
Obviously the difference in the total number of pages is due to the difference in the number of ‘Leaf pages’ – those pages that contain the records completely.
Looking to the ‘Space used in leaf pages (%)’ row we immediately spot that for UNSORTLOADED there is 16% less space usage per block compared to the SORTLOADED table.
Of course if you don’t fill up the pages completely you will need more of them to save the same amount of data. Therefore we find 42 rows less on average in the pages of UNSORTLOADED than we do in the pages of SORTLOADED.
But why is it like this?
To answer this question, we’ve to remember how B*-Trees are build up.
I’m not going to reinterate on B*-Trees in general, but just like to point out the important facts about them, that are necessary here to explain the effect:
- B*Trees are always balanced, that is: all leaf pages have the same distance to the root page.
- The filling level of all pages is approximately equal
- On leaf level, the pages are connected in ascending key order, e.g. the lowest value is the far leftmost, while the highest value is the far rightmost.
To fullfill these requirements, an insert to a B*-Tree usually is done like this:
- Find the page to insert the record to
- Check whether the filling of the current page will fit the filling of its neigbours after the INSERT has been done.
If not, rebalance this part of the B*-Tree so that the equal filling condition is met.
This rebalancing works by shifting rows from one page to the other and adapting index page above accordingly.
Obviously this can only work, if there are any rows that could be moved, e.g. because the new value is ‘in-between’ the two pages.
Due to the mentioned features of B*-Trees there is one case where no rebalancing is done, because it’s never necessary: inserts in ascending key order.
When the key value increases with each insert, than a rebalancing with the right neighbour page is not possible – there simply is no further right neighbour. And since the data is correctly sorted anyhow, all what’s left to do is to update the index pages.
That way a lot of time can be saved and the pages get filled up much denser than it would be the case with random inserts.
Therefore MaxDB developers decided to implement a kind of ‘shortcut’ – whenever you insert a row with a ‘higher’ or ‘more right’ key than the currently ‘highest’ or ‘rightmost’ key in the table, simple add the row and don’t do balancing.
This feature is also present in other DBMS, e.g. Oracle. For Oracle’s B*-Tree-Indexes there are two types of page/block-splits: the ‘usual’ 50:50 block split, which is just the standard index balancing and the 90:10 block split.
By the way: inserting the sorted data in descending order also leads to the ‘bigger’ table, since there is no special handling of sequential descending inserts.
What’s this knowledge useful for?
If you ever did a system copy with R3Load and checked the table sizes in the target system against that in the source system, then you might have asked yourself: did I loose any data? The tables are smaller than in the source system! If you then are aware of the fact that R3load reads and writes it’s data sorted by the primary key, then you now know the answer: all data is there, just saved in a denser way.
It’s also worth noting that such dense tables need more balancing operations when you start inserting data into the middle of the table. Since there is not much freespace in the pages that could be used for the rebalancing, new pages need be added to the B-Tree and that causes the rebalancing of whole subtrees. For that reason I would not say, that having such dense B-Trees is actually something to wish for in a OLTP system.
Hope you liked this blog as the first “Space oddities” blog from myself.
p.s. While researching for this I came across another blog entry called “A space oddity”: http://richardfoote.wordpress.com/2008/01/18/introduction-to-reverse-key-indexes-part-iii-a-space-oddity/ by Richard Foote, who seems to be really deep into Oracle index internals.
So don’t leave his blog unvisited.
Seems that this title is somehow popular for B*-Tree space management blogs.