Getting data into HANA – the easy way

In a recent blog post SAP presented several ways to load data into a HANA instance. The main approach shown was to use a combination of Python, SQLAlchemy, and Pandas (a Python data analysis library) as well as the Python-specific version of the SAP HANA client driver.

Nine reading minutes and some reasonably straight-forward Python code later, the demo data file chosen (from Kaggle, a website that hosts free data sets for data science and machine learning) could be loaded into HANA.
The author listed the time for the data load as around 25 minutes, which seemed like a long time for me.
Mind you, the file in question contains 22.489.348 records, which translates into a data loading speed of around 15.000 records/second.

Isn’t that a bit complicated?

Now, I thought this can be improved upon, but at the same time, I prefer easy and convenient over setting up a whole Python environment with dependencies, etc.

Like other bloggers in the SAP HANA space (e.g. here and here) I have grown fond of DBeaver, a great free database tool, which happens to have a very convenient data import tool.

Another big plus of this tool is that it is Eclipse/JAVA based, which means it works the same on macOS (which is what I use), Windows, and Linux.

The Click-through

Let’s walk through this, shall we?

Find the “Import Data” in the context menu

DBeaver Import Data menu
DBeaver Import Data menu

Choose what to import

There’s not too much to do for us here, just continue by clicking on “Next >”.

DBeaver data import source dialogue
DBeaver data import source dialogue

Pick the source data file

We select the source file via a standard file dialogue box and get the following Tables Mapping window:

DBeaver tables mapping window
DBeaver tables mapping window

This window is rather central to the whole data import process. Here we can specify which fields from the source file are mapped to what columns of the target table.

Speaking of which, this window also allows us to define the target table (or select an existing one). DBeaver conveniently generates “safe” default target column types (really, those are just huge character columns that we should adjust if we know more about the data).

Target schema

Unlike the tools that SAP provides for dealing with HANA databases (HANA Studio & WEB IDE) DBeaver’s functionality was designed by somebody that actually wants to work with it. So, it’s not surprising that sensible features like checking the generated DDL for the target table are just a click away (button “Schema …” in the tables mapping window). Here we could make further adjustments if we wanted to.

DBeaver target DDL dialogue
DBeaver target DDL dialogue

Summary of our setup

Once we finished the mapping definitions, we get an overview window. Here we can use “Save task” to save this definition for later reuse.
Ever wanted to reuse the same mapping rules on multiple files and got surprised that HANA Studio just “forgot” how you did it the last time? Then this feature is for you.

DBeaver data transfer summary window
DBeaver data transfer summary window

“Make it so!” – start the data import

Now, about 5 dialog windows into it, we can finally kick off the actual data import. The final settings allow to truncate the target table before loading the data (or not) and to set after how many records a COMMIT will be sent to the database.

I chose to truncate before loading and committing after 500.000 rows.

DBeaver data transfer final settings
DBeaver data transfer final settings

And that’s it

After around 12 mins. the import was finished and the data loaded into my HANA Express VM instance. With no coding, minimal typing, and no setting up a new environment.

I’ve got no idea whether the improved speed (2x as fast as in the SAP blog post) was due to my setup of using a local VM vs. the instance in the SAP blog post was on HANA service on SAP Cloud Platform or due to something else. But unless I need to absolutely automate the data loading I prefer the easy UI option.

There you go, now you know!

2 Comments

  • Hello Lars,

    thank you for the well described article. Next time I need to import a few million records I will compare universal DBeaver tools with HANA Studio and HANA Cockpit.
    I noticed in the past the “do commit” value matters in SAP HANA database most.
    Did you import column store or row store tables?

    Best Regards from Switzerland

    Stefan

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.