Machine learnings of hana, odbc, and macos

One of the benefits of answering questions from users is that there usually is a pretty good chance that I will learn something along the way.
Today’s case was no different. In fact, I learned quite a bit.

The question was this…

“Your article https://blogs.sap.com/2013/01/23/hana-quick-note-checking-my-connections-and-using-them-securely/ … is amazing.
Can you share the syntax to use the hdbuserstore key from RStudio? At this time I am looking to use the key to open a ODBC connection using ConnectionContext from hana.ml.r.”

unnamed LinkedIn user

TL;DR
If you don’t want to read through the whole sleuth story but jump right to the solution, see the Quick solution recipe-box below

Let me unpack this:

  • the referenced blog post is about an SAP HANA client feature called User Store (hdbuserstore). This feature allows us to store login credentials for HANA databases in the local users’ profile (it’s a set of files in the file system of the user SSFS_HDB.KEY and SSFS_HDB_DAT).
    With the stored credentials users can logon to HANA databases without ever having to type in the host, port, username or password.
    It’s a bit like 1Password for HANA.
  • this HANA specific credential store can also be used with ODBC and JDBC connections.
    To use it with an ODBC connection, I showed in the old blog post, that all one has to do is to create the hdbuserstore entry and then put it into the ODBC DSN (Data Source Name) definition where the ServerName goes.
    To indicate that the HANA ODBC driver should use the hdbuserstore key, a @ has to be put in front of the KEYname.
  • RStudio is a graphical UI and development environment for the R programming language and probably one of the most important tools for data scientists today.
    It allows accessing many different databases and supports both ODBC and JDBC.
    That means the ODBC topic from above applies here.
  • the mentioned R library hana.ml.r is a library that is provided with the SAP client software and works as an interface to the machine learning features of HANA for R.
    There is a blog post that explains the use of the library. Unfortunately, this blog post only shows how to generally set up an ODBC connection in RStudio to use with HANA, but explicitly not how the hdbuserstore gets used.

This last bit is what the question was about.

Now, it has been a while for me that I last played with R/RStudio and HANA and that was based on my old SAP laptop running MS Windows.
Currently, I use my trusty old 2015 MacBook Pro 13″, which means that some things look and feel different from what I showed in the old blog post.

I don’t know what set up the person who send me the question uses, but before I write “it works just the same as in the blog post” I rather try it out myself.

unixODBC on macOS

In order to use ODBC on my Mac I use the unixODBC package that is available via brew:

> brew install unixodbc 

installs the current release (2.3.7).

To make things work, there are two main configuration files that need to be edited.
The brew-formula installs unixODBC without any GUI, so we’re left with text-config files:

  • /usr/local/etc/.odbcinst.ini -> this file contains information about installed ODBC driver software.
  • ~/.odbc.ini -> this file contains the DSNs configured for the user

As the file locations indicate, .odbcinst.ini is used by all users of the computer, while .odbc.ini is user-specific.

unixODBC documentation

Detail documentation about these files is not that easy to find.
A man odbc.ini only reports what these files are but not how entries have to look like.
The page “unixODBC without the GUI” contains a lot more information.
For a general overview, “Linux/UNIX ODBC” is pretty comprehensive.

My /usr/local/etc/odbcinst.ini looks like this:

[ODBC]
TraceFile   = /tmp/sql.log
Trace       = No
 
[HDBODBC]
Driver      = /Applications/sap/hdbclient/libodbcHDB.dylib
Description = SAP HANA ODBC Driver
FileUsage   = 1

The [HDBODBC] section contains all information about the SAP HANA ODBC driver, namely the dynamic library file name on my computer.

The ~/.odbc.ini contains this:

[DSN_HXE]
ServerNode  = hxehost:39015
Driver      = HDBODBC
Description = HXE Tenant DB

[DSN_SYSTEMDB]
ServerNode  = hxehost:39013
Driver      = HDBODBC
Description = SYSTEM Database

Hidden files in macOS file dialogs

Both of these configuration files are “hidden” files as their names start with an ‘.’ (UNIX convention).
While it’s easy with the command line to show those files in a directory (ls -la) and to open them (e.g. vim .odbcinst.ini) as a Mac user I like working with a GUI-editor.
Unfortunately, when using the standard macOS file open dialog, hidden files stay hidden.
To temporarily show hidden files in this dialog one can press [CTRL]+[SHIFT]+[.].
This switches between showing and hiding “hidden” files and folder.

Setting up the hdbuserstore entry

As we want to use hdbuserstore entry for the ODBC connection, we first have to create the entry:

> cd /Applications/sap/hdbclient

> hdbuserstore SET HXE_ML hxehost:[email protected] ML Welcome123!

> hdbuserstore list
DATA FILE       : /Users/larsbreddemann/.hdb/Silverfold.local/SSFS_HDB.DAT
KEY FILE        : /Users/larsbreddemann/.hdb/Silverfold.local/SSFS_HDB.KEYKEY HXE_ML
   ENV : hxehost:39015
   USER: ML
   DATABASE: HXE

Test the connection “on HANA level”

> hdbsql -U HXE_ML -C "select current_user, current_time from dummy"

CURRENT_USER,CURRENT_TIME
ML,14:40:07
1 row selected (overall time 8467.022 msec; server time 569 usec)

This shows, that the connection to the HANA DB works when using the HANA client software.
The next step is to configure the ODBC connection for this hdbuserstore entry.

Adding a new DSN

Since the ODBC driver settings are already in my .odbcinst.ini all that’s missing is the matching entry in .odbc.ini.

I add

[HXE_SECURE]
ServerNode  = @HANA_HXE
Driver      = HDBODBC
Description = HXE Tenant DB via hdbuserstore
CHAR_AS_UTF8 = TRUE

as the documentation tells us, the old @+ trick works here, too.
The mentioned blog post about the ML package also referenced an old StackOverflow answer I provided, to make sure that the ODBC-type conversion does not truncate or garble UTF data that is stored in HANA CHAR/VARCHAR columns.
That’s what the CHAR_AS_UTF8 is for.

Property or parameter?

Interestingly enough, the documentation states that HANA specific connection parameter names are case sensitive while connection property names are case insensitive.
I tried out different spellings and could not find a difference, just as I am not quite sure what makes an entry a “connection parameter” and what a “connection property“.

Also, note how the Driver entry ties back to the .odbcinst.ini entry. This is how both files are connected.

Test the connection “on ODBC level”

Testing the new DSN entry is easy but different from the MS Windows version.
On macOS (and Linux) the command is odbcreg <DSN>.

> odbcreg HXE_SECURE

ODBC Driver test.

SQLConnect with DSN: 'HXE_SECURE'.
 Driver version 02.04.0162 (2019-09-17).
 Select now(): 2020-01-15 14:53:11.068000000 (29)

Ok, so both the connection via HANA client software as well as via ODBC connect work.

Using the connection in RStudio

Following the ML blog post, I installed the hana.ml.r-package in RStudio:

> install.packages("/Applications/sap/hdbclient/hana.ml.r_1.0.4.tar.gz", repos = NULL, type = "source")
   ERROR: dependencies ‘futile.logger’, ‘sets’, ‘uuid’ are not available for package ‘hana.ml.r’
removing ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library/hana.ml.r’
   Warning in install.packages :
   installation of package ‘/Applications/sap/hdbclient/hana.ml.r_1.0.4.tar.gz’ had non-zero exit status

The first try failed because some dependent libraries were not yet installed. So I quickly added those:

> install.packages(c("futile.logger", "sets", "uuid"))      

and after that, the installation worked fine:

> install.packages ("/Applications/sap/hdbclient/hana.ml.r_1.0.4.tar.gz", repos = NULL, type = "source")
      installing source package ‘hana.ml.r’ …
   ** using staged installation
   ** R
   ** byte-compile and prepare package for lazy loading
   ** help
   *** installing help indices
   ** building package indices
   ** testing if installed package can be loaded from temporary location
   ** testing if installed package can be loaded from final location
   ** testing if installed package keeps a record of temporary installation path
   DONE (hana.ml.r)    

Another dependency is the RODBC-package, which was already installed.

To set up a new DB connection the current RStudio has a handy UI and with a few clicks, the connection works:

RStudio DB connection dialog showing ODBC DSNs including the new DSN
RStudio automatically picked up the new DSN
RStudio DB connection test dialog showing a succesful test.
This connection works!

RStudio is even that nice to show the code for creating the connection:

> library(DBI)
> con <- dbConnect(odbc::odbc(), "HXE_SECURE", timeout = 10)

Too easy!

Now, let’s quickly try out the connection via the hana.ml.r-package as shown in the tutorial post and we’re done here:

> library (hana.ml.r)
   Loading required package: R6
   Loading required package: futile.logger
   Loading required package: sets
   Loading required package: RODBC
   Loading required package: uuid

> conn <- hanaml.ConnectionContext(dsn="HXE_SECURE")
   Error in ConnectionContext$new(dsn, username, password) : 
     argument "username" is missing, with no default

> conn <- hanaml.ConnectionContext(dsn="HXE_SECURE",username=, password = )
   Error in ConnectionContext$new(dsn, username, password) : 
     argument "username" is missing, with no default

> conn <- hanaml.ConnectionContext(dsn="HXE_SECURE",username='', password='' )
   WARN [2020-01-15 15:04:29] Failed: simpleWarning in RODBC::odbcDriverConnect("DSN=HXE_SECURE"): [RODBC] ERROR: state IM002, code 55767936, message [iODBC][Driver Manager]Data source name not found and no default driver specified. Driver could not be loaded

ERROR [2020-01-15 15:04:29] Error: Error in value(cond): 

Oops!

This is rather surprising, given that we were able to connect with this setting just seconds before.

It took me a while digging and find that the hana.ml.r-package uses RODBC which in turn uses the iODBC driver manager.

Remember when I mentioned that I use unixODBC at the beginning?
That’s another, different ODBC driver manager on macOS.
As it is, unixODBC is what SAP HANA uses on non-Windows platforms. That’s what is in the documentation, that’s what most examples show.

iODBC has a hard requirement that the ODBC driver comes as a shared library (.so-files), while the SAP HANA client for macOS only provides dynamically loaded modules (.dylib-files).
Put simply, iODBC cannot load the SAP HANA driver library.

There are also other aspects like that iODBC uses slightly different named configuration files (~/odbc.ini instead of ~/.odbc.ini) that also follow slightly different rules about the content of the files.
These configuration files are similar enough to be easily confused and different enough to be incompatible.

So, it does look like the hana.ml.r-package really does not work on macOS.

Because reasons…

From what I can tell, the dependency on RODBC is only present in the dataframe.R file of the package. Maybe the problem is easy to fix.
I don’t know why SAP chose to use RODBC here, but there is another ODBC package for R available: odbc.
And that package makes use of the DBI-package and, as shown above, connection via DBI work like a charm:

> con <- odbc::dbConnect(odbc::odbc(), "HXE_SECURE") 
> str(con) Formal class 'HDB' [package ".GlobalEnv"] with 4 slots   [email protected] ptr     : 
     [email protected] quote   : chr "\""
     [email protected] info    :List of 13
     .. ..$ dbname               : chr ""
     .. ..$ dbms.name            : chr "HDB"
     .. ..$ db.version           : chr ""
     .. ..$ username             : chr ""
     .. ..$ host                 : chr ""
     .. ..$ port                 : chr ""
     .. ..$ sourcename           : chr "HXE_SECURE"
     .. ..$ servername           : chr "@HANA_HXE"
     .. ..$ drivername           : chr "libodbcHDB.so"
     .. ..$ odbc.version         : chr "03.52"
     .. ..$ driver.version       : chr "02.04.0162 (2019-09-17)"
     .. ..$ odbcdriver.version   : chr "03.51"
     .. ..$ supports.transactions: logi TRUE
     .. ..- attr(*, "class")= chr [1:3] "HDB" "driver_info" "list"
     [email protected] encoding: chr ""

Checking the rsession process

Don’t let yourself be fooled by drivername: libodbcHDB.so – there is no such file.
Checking with the macOS Activity monitor we can check what libraries are actually loaded by the rsession process:

macOS activity monitor showing currently running processes in a hierarchy. The rsession process is highlighted.
macOS activity monitor detail page showing the Open Files and Ports tab. The search bar is open and "libodbc" is the search term. The text area shows the two found hits.
Very neat that the search function + works nicely in the “Open Files and Ports” textbox!

Some learnings and yet no good solution

So, I learned a lot about macOS, R, the different ODBC managers and where to look and check.
But if SAP wants to make the hana.ml.r-package work on macOS then there needs to be some work done on their end.
For MS Windows users, this should not be a problem as there the ODBC connect works just fine as shown in the ML blog post.

The RStudio way of handling DB connections

This post was all about how R can connect to HANA and how to use the HANA specific way to securely store login credentials. Since HANA is just one of many DB platforms used by data scientists, RStudio provides its own methods to deal with login data.
See the very well written “Securing Credentials” and “Setting up ODBC Drivers” for details.
The latter document also recommends using unixODBC on macOS…

A new hope (Update 16.01.2020)

Glenn Neuber, a former colleague of mine, saw this post and took the time to point to a potential solution.

Followed his pointer to this StackOverflow question and learned that the RODBC package actually provides configure option flags that allow selecting the odbc manager it should use.

And, of course, all this has been properly documented, in the “vignette” for the RODBC-package. Chapter A – Installation explains that with the configure option --with-odbc-manager=odbc one can select unixODBC as the odbc manager to be used.

And here we go again

In the search for closure, which is probably a driving force for many problem solvers, I tried this out right away and … it did work!

First, I removed the old package installation:

> remove.packages("RODBC") 
Removing package from ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library’
 (as ‘lib’ is unspecified)

Next, I reinstalled the package, this time with the config parameter:

> install.packages("RODBC", 
                   type = "source", 
INSTALL_opts="--configure-args='--with-odbc-manager=odbc'")

trying URL 'https://cran.rstudio.com/src/contrib/RODBC_1.3-16.tar.gz'
 Content type 'application/x-gzip' length 1172291 bytes (1.1 MB)
 downloaded 1.1 MB
 installing source package ‘RODBC’ …
 ** package ‘RODBC’ successfully unpacked and MD5 sums checked
 ** using staged installation
 checking for odbc_config… /usr/local/bin/odbc_config
 checking for gcc… clang
 checking whether the C compiler works… yes
 checking for C compiler default output file name… a.out
 checking for suffix of executables… 
 checking whether we are cross compiling… no
 checking for suffix of object files… o
 checking whether we are using the GNU C compiler… yes
 checking whether clang accepts -g… yes
 checking for clang option to accept ISO C89… none needed
 checking how to run the C preprocessor… clang -E
 checking for grep that handles long lines and -e… /usr/bin/grep
 checking for egrep… /usr/bin/grep -E
 checking for ANSI C header files… rm: conftest.dSYM: is a directory
 rm: conftest.dSYM: is a directory
 yes
 checking for sys/types.h… yes
 checking for sys/stat.h… yes
 checking for stdlib.h… yes
 checking for string.h… yes
 checking for memory.h… yes
 checking for strings.h… yes
 checking for inttypes.h… yes
 checking for stdint.h… yes
 checking for unistd.h… yes
 checking sql.h usability… yes
 checking sql.h presence… yes
 checking for sql.h… yes
 checking sqlext.h usability… yes
 checking sqlext.h presence… yes
 checking for sqlext.h… yes
 checking for library containing SQLTables… -lodbc
 checking for SQLLEN… yes
 checking for SQLULEN… yes
 checking size of long… 8
 configure: creating ./config.status
 config.status: creating src/Makevars
 config.status: creating src/config.h
 ** libs
 clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I. -DHAVE_UNISTD_H -DHAVE_PWD_H -DHAVE_SYS_TYPES_H -DHAVE_LONG_LONG -DSIZEOF_LONG_INT=8 -I/usr/local/Cellar/unixodbc/2.3.7/include   -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/usr/local/include  -fPIC  -Wall -g -O2  -c RODBC.c -o RODBC.o
 clang -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/Library/Frameworks/R.framework/Resources/lib -L/usr/local/lib -o RODBC.so RODBC.o -lodbc -L/usr/local/Cellar/unixodbc/2.3.7/lib -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation
 installing to /Library/Frameworks/R.framework/Versions/3.6/Resources/library/00LOCK-RODBC/00new/RODBC/libs
 ** R
 ** inst
 ** byte-compile and prepare package for lazy loading
 ** help
 *** installing help indices
 ** building package indices
 ** installing vignettes
 ** testing if installed package can be loaded from temporary location
 ** checking absolute paths in shared objects and dynamic libraries
 ** testing if installed package can be loaded from final location
 ** testing if installed package keeps a record of temporary installation path
 DONE (RODBC)  

Besides the additional parameter (marked in bold) for the clang call there is nothing that indicates that this time, unixODBC will be used. But let’s try and see if it is:

> library("RODBC")

> odbcDataSources()
  DSN_HXE   HXE_SECURE DSN_SYSTEMDB 
  "HDBODBC"    "HDBODBC"    "HDBODBC"  

Yes!

RODBC now uses unixODBC and picks up the correct DSN definitions.
Would a database connection work now? Let’s see:

> conn <- odbcConnect("HXE_SECURE")

> res <- sqlQuery(conn, "SELECT current_user, current_date FROM dummy") 

> res
   CURRENT_USER CURRENT_DATE
 1           ML   2020-01-16 

DSNs get picked up, connections using the hdbuserstore work and SQL statements get executed.
This thing works.

The final test

To round it all off, the final step is to see, if now the hana.ml.r package works.

>library ("hana.ml.r")
Loading required package: R6
Loading required package: futile.logger
Loading required package: sets
Loading required package: uuid 

> conn <- hanaml.ConnectionContext("HXE_SECURE")
Error in ConnectionContext$new(dsn, username, password) : 
     argument "username" is missing, with no default 

It does not like to have parameters with no values. That’s ok, let’s provide those:

> conn <- hanaml.ConnectionContext("HXE_SECURE",username='', password='' )

> check_df <- conn$sql("SELECT current_user, current_date FROM dummy")

> check_df
  CURRENT_USER CURRENT_DATE
1           ML   2020-01-16 

Finally, it works. Case closed!

What started as a straight-forward question turned into long-winded research into the many different technologies that many of our tools are built upon.

Quick solution recipe

If you’ve ever watched an episode of “Gourmet Makes” on the Bon Appétit youtube channel, then you know that after re-engineering famous food, chef Claire Saffitz summarizes the steps to take.
This section is just that.

In order to use the hdbuserstore secure credentials with R and the hana.ml.r package you need to:

  1. configure the hdbuserstore entry for the computer user that should use it.
    The command is hdbuserstore set .... Check above or the documentation for details
  2. With the user store key set up correctly, a DSN using this key needs to be created.
    On macOS, you need to install unixODBC and the SAP HANA client software first.
    Once both pieces of software are installed, two text files need to be created and correctly configured: /usr/local/etc/odbcinst.ini and .odbc.ini.
  3. At this point, an ODBC connection without providing login credentials should work.
    Now, the odbc access in R needs to be setup.
    For this, the RODBC-package needs to be correctly installed and configured. See details above.
  4. Finally, the hana.ml.r package needs to be installed and loaded.
    Now, a DB connection can be created using the configured DSN by providing empty strings for the username and password parameters.

Without writing this blog post and the helpful comment from Glenn, I would have probably stopped looking for a solution, since I don’t really need to use the hana.ml.r package.
Instead, my audience spoke back to me and provided the exact right push to find an actual solution.

To me, this is immensely satisfying.
Not only, do I now know that there are people that actually read my posts (come on, every author likes to have this confirmed every now and then). And not only did the problem come to full closure since the original question was answered.
But in the course of writing this, I learned many useful bits and pieces.

There you go, now you (we) know!

4 Comments

  • Hi Lars,

    Happy New 2020! Saw your blog article on LinkedIn. Using mac and linux, I often found myself installing libraries from source to make them work. If I understood correctly, making RODBC work with unixODBC could potentially solve your iODBC issue: I found this SO article for you: https://stackoverflow.com/questions/47077493/how-do-i-install-rodbc-so-that-is-uses-unixodbc.

    The last answer details it “Turns out the default RODBC package install won’t work with unixODBC. Here is a process for recompiling and installing RODBC.” and how to “Set the environment variable DYLD_LIBRARY_PATH to the location where your unixODBC library is located. I used homebrew to install unixODBC so my my library is located in /usr/local/lib.”

    Hope this works 🙂

    • Hi Glenn,
      thanks so much for this pointer.

      I read through it and found that there even is an installation flag for RODBC to indicate which odbc manager to reference.

      I will update the blog post with this additional learning 🙂

      Cheers,
      Lars

  • ryanhunt says:

    Just wanted to thank you for this post – I’ve been trying to the SAP HANA ODBC client to work on MacOS with Excel I was bashing my head against the wall until I came to your line about iODBC not loading dylib files, which is the only version that SAP supplies. I found that CData provide a .so version, but it’s $499/yr (!)

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.