Till on SAP Hana

Nine years ago I started my private validation project of the then highly hyped SAP Netweaver Stack. It became a rough trip with several highlights - NWDI is much better than its reputation - and some disappointments - WebDynpro/Java never left beta. The results were a book and greying hair.

This time - in may 2013 - I decided to take a softer approach to the next SAP hype: SAP Hana. I will spend 1 hour per week examining Hana with no predefined result. To force myself to follow this rule I will write a blog about my insights.

Day 1

SAP offers a free 30 day trial version of SAP Hana. I started with this link and got one. It is hosted by CloudShare. It takes a couple of minutes to initialize, then you get access to a Linux Hana server and a preinstalled client Windows PC with Hana Studio. The server has 24 GB memory, 250 GB disk space and 6 CPUs. The web interface to both machines is flimsy due to network latency and also since it emulates a shell / OS screen in a browser. Anyhow this is not the core product but just the access enabling software. I got stuck twice and received valuable hints from SAP's support mail hotline devcenter@sap.com. These people are really supportive. Thanks, Rocky!

Day 2

I found this document on the web. It explains briefly what Hana is all about. SAP Hana is an in-memory database. It is column based and allows for parallel execution of sub-queries. Fortunately it supports standard SQL as well as its own scripting language called SQL Script. Both are accessible via JDBC. These features combined promise crazy query performance. SAP will not only deliver Hana as a raw software building block but will make it accessible for BI systems and core SAP ERP systems as well.

Day 3

Following the help texts in Hana Studio I managed to establish my first connection from (virtual) client PC to Hana server. In the "Navigator" view I could browse the database catalogue and create my own schema and table. This feels very much like a standard SQL client. In expectation of its sheer execution speed I am already impressed with the whole concept. SAP seems to have found its way into the cloud era and offers an alternative to Oracle databases. This makes much sense in a world with increasing demand for big data analysis.

I picture SAP selling cloud access to millions of Hana instances at cloud prices and think wow, what a highly competitive product! No one is going to stick to his fat old self-owned and self-administered database server any more.

Day 4

Now I wanted to see the query monster in action. So I took some posts from web discussion boards that I had crawled earlier and tried to load them into my Hana instance. I chose to use the Hana Studio SQL command line. I reached it from the "SAP Hana Development" view's "Navigator" tab, right-clicked on the system name and chose "SQL Editor" from the context menu to get there. Then I pasted SQL insert statements into the editor and executed them.

It turned out to be not so fast on inserts. But the selects were fast. Hana studio shows the execution duration in microseconds, not milliseconds. Unfortunately the license conditions of the Hana developer package do not allow me to publish benchmarks.

Day 5

Since the welcome message on the remote client PC encouraged me to change the initial passwords I did it on the server's UNIX command line. Afterwards CloudShare's web frontend showed a message that the server was misconfigured. It turned out that there are several different levels of credentials.

A the Linux Hana server credentials
B the Hana database credentials
C the credentials CloudShare assumes are valid for the Linux Hana server
D the credentials CloudShare assumes are valid for the Hana database
E the CloudShare question/passphrase based mechanism that wraps all of the above.

I used E to reconnect Hana Studio to the server and did not have to supply A. This means the CloudShare framework has a special relationship to the Hana server that allows to bypass password entry. Probably based on certificate exchange.

Day 6

The 30 days of trial are over. Now I must pay for my Hana instance. There are three different providers that offer SAP Hana hosting: CloudShare, Amazon and some asian company. CloudShare has the cheapest monthly rate. Amazon charges 51 cents per hour but it does not become clear what an hour means. Is is the time I actually use the Hana server or is it the time during which the server is available for me independent of me accessing it?

Day 7

Finally I rented the Cloudshare version of Hana. It costs 137 dollars per month plus tax. This amount goes completely to Cloudshare. SAP charges nothing since this is a developer server as part of a promotion program.

I did not rent the remote desktop offered by Cloudshare since I am not sure if it is covered by the above rate. Therefore I'll have to connect directly from my laptop.

Day 8

I bought a book called SAP HANA: An Introduction. Flipping through the pages I found an interesting paragraph on pricing on page 114. It says

The average SAP HANA license sale is $400.000... As of this writing SAP is not discounting the SAP HANA licenses.

This means the business case for Hana shrinks to bundling with core ERP sales. No broad market penetration with a cheap cloud based approach as I guessed earlier.

Another funny paragraph on page 115 is titled

"Is SAP HANA Just Another SAP Fad That Will Not Have a Long Lifespan - Like mySAP?"

Keep in mind this is just a quote from a book published by SAP Press! The conclusion in the following lines is: No, SAP is betting its future on Hana.

Day 9

I downloaded the Hana Studio from this link and installed it directly on my PC. The connection to the Hana server worked soon using the connection data provided by Cloudshare.

Day 10

I was lucky to spend some hours on a train in the last week. So I could countinue reading SAP HANA - an introduction. One chapter deals with data modeling in Hana Studio.

If you use Hana for reporting purposes then data is modeled in Hana in three different view types

A attribute views
B analytic views
C calculation views

To understand what an attribute view is I must explain how the word 'attribute' is used in this context. Business Intelligence terminology distinguishes between facts and attributes. Facts are measured values such as revenue or costs. Attributes partition the realm of your measurement e.g. in respect of location, time or organisational structures. So an attribute view (A) is a partition of your observation realm defined by one or few database fields.

Analytic views (B) join one fact table with one or several attribute views.

Calculation views (C) usually combine several attribute and analytic views and perform calculations on top of them. I think of revenue purged from seasonal and inflational effects.

Anyways this type of modelling seems to be relevant only if you plan to use Hana for reporting purposes. But I can picture more exciting uses.

Day 11

After all the configuration issues of the last sessions I could finally get back to my initial goal - data insertion. Whenever I explore a new database layout or database product I try to get a million records to play around with. As a rule of thumb a million records is enough to get an impression of the runtime behavior but little enough to be handy.

I tried again to insert records from the SQL Editor as explained earlier. This does not work for larger amounts of data since the editor content is very limited.

Day 12

Since some Hana hosters offer hourly rates instead of monthly rates I was wondering what that means. Do they only charge for the hours during which you actually consume CPU time - meaning few? Or do they charge for the hours during which you have access rights to the server - meaning many? I do not know the answer yet but I found out that Hana Studio cannot connect to the server before I log into my ClouseShare account. Thus the Hana server must be located on a virtual machine that is activated only on demand.

Anyways this distracts me from my initial aim to find out how to bulk load data into Hana. There are several web posts that describe how to do that. The main approaches seem to be these

A Load data from a client side CSV file using Hana Studio navigating File / Import / Sap Hana Content / Data from local file
B Load data from a CSV file plus control file directly from the Hana server's file system
C BO DataServices (I did not dig into this one)

A is quite straight forward but I expect it to be slower than B as it will probably involve parsing on the client side and record by record transmission to the server.

I have high expecations in aproach B. It reminds me of Oracle's SQL Loader mechanism that is very effective.

Approach C sounds as If I'd need additional software to try it.

Day 13

I learned that it is important to first start the Hana Server from the Cloushare page before starting the Hana Studio. If you do it vice versa the Hana Studio makes endless turns trying to connect to the server. Even if you restart the studio you have to close the old SQL Editor which has earlier become stale and must open a new one.

My next insight was that table and Schema names are halfway case sensitive. I create a table called "POST" from the studio and another one called "Post". The SQL Editor is not case sensitive and let me access only one of both. So I deleted them both and created the upper case table again. This is the output

Statement 'create column table "MYSCHEMA"."POST"( "POSTID" INTEGER not null, "AUTHORID" INTEGER null, "DATE" ...' successfully executed Affected Rows: 0

Then I started importing records using method A from day 14. The import stopped. In the "Job Log" tab on the right side I saw a message that the date format that I used is not correct.

java.lang.IllegalArgumentException at java.sql.Date.valueOf(Unknown Source)

It tried a few differnt date formats including YYYY-MM-DD or DD/MM/YYYY but no success.

My attempts got stuck because of connection problems. It took some time till I figure out that the Hana server had passivated itself. On the Cloudshare page you get the possibility to prolong the server uptime. From now on I'll extend the uptime to 180 minutes right at the start of the session.

Day 14

Still not content with the import functionality I tried to figure out how to do mass import from the raw Hana command line on the server side. There seem to be two building blocks to go this way.

The first is the access to the Hana database command line HDBSQL. This video shows how to start.

The second one is the import statement described in the SAP Hana SQL Reference Guide starting on page 120.

No results yet.

Day 15

Today I managed to configure my Putty SSH client to connect to the HANA server via SSH. Strangely this is done using the root user. Next I changed user to the HANA admin OS user hdbadm in my case.

Then I was able to log into the database command line tool

hdbsql -S <SID> -n localhost:3<system no>15 -u system -p <password>

This way I could execute these commands

\h - show command help

select table_name, schema_name from m_tables where schema_name like 'MY%'

So this is a SQL command line tool that supports other commands, too.

I tried a million ways to spell an import statement like this one but I did no succeed yet. Still I am not giving up.

import "myschema"."post" as CSV from '/usr/sap/HDB/HDB00/import' with REPLACE

Day 16

Today I tried the IMPORT FROM statement that is similar to the IMPORT statement but uses a control file. The control file test.ctl looks like this

IMPORT DATA
INTO TABLE "MYSCHEMA"."POST"
FROM '/usr/sap/HDB/HDB00/import/test.csv'
RECORD DELIMITED BY '\n'
FIELD DELIMITED BY ','
OPTIONALLY ENCLOSED BY '"'
ERROR LOG '/usr/sap/HDB/HDB00/error_file_name.err'

The data file test.csv looks like this

1,7295,1970-01-01,"blub"
2,7296,1970-01-01,"blubsdlfkj"

Both are located on the Hana server in the folder /usr/sap/HDB/HDB00/import/. I logged into the Hana server command line and executed this statement

hdbsql HDB=> IMPORT FROM '/usr/sap/HDB/HDB00/import/test.ctl'
0 rows affected (overall time xxx msec; server time xxx msec)

And BAM! the two records were inserted into the existing table myschema.post. Do not let the statment "0 rows affected..." discourage you. Make sure to spell the table and schema names in the control file in upper case and have a close look at the error file specified in the control file and success will almost be unevitable.

Day 17

I spent the session copying a large data file onto my Hana server. I first copied it onto a web server I own and then loaded it via wget.

Day 18

While trying to convert a million text based records to a suitable import format I found out that this format works, too.

1,7295,1970-01-01,blub

Thus all I need to to in my case is remove ticks from my data file.

cat in01.csv | sed s/\'//g > in02.csv

As a result 3800 records (out of 64.000) were inserted rather fast. In the error file I found messages such as this one

[ROLLBACK] - inserted value too large for column:

Day 19

I examined one of the lines that were too long. It had 1009 characters in the CSV field instead of 1000. I am quite sure that the additional length appeared during export. The easiest way to fix this would be to adjust the field length in the database, not the content. I tried to do this in the Navigator tab of the Hana Studio. You can double-click on the table name or execute "Open Definition" from the context menue. Then right-click on a row and chose "Edit". There is no option to modify a column but you can "remove column" and "add column". So I added a new column with length 1020.

Statement 'alter table "MYSCHEMA"."POST" drop ("TEXT")' successfully executed Affected Rows: 0
Statement 'alter table "MYSCHEMA"."POST" add ("TEXT" VARCHAR (1020) null)' successfully executed Affected Rows: 0

The result is that the field length is changed but the data in that field is lost. Probably there is a command to modify the field length without data loss but it is currently not supported by the UI.
I chose "Open Content" from the context menue of the table name in the Navigator tab. The following SQL statement is started implicitely (good to know the syntax) and the first 1000 lines of my table are displayed fast.

SELECT TOP 1000 * FROM "MYSCHEMA"."POST"

Next I deleted - not dropped - my table's content from the HDB command line. This happened very fast. Then I ran my insert statement again. The 64.000 records of my first data bulk were inserted in very little time.

I have dealt with databases for many years but I have never seen such an insertion speed. I am not sure if a hard disk can technically cope with this speed. The cruicible question is: what happens if the server is switched off immediately after the insert statement has returned? Will the data persist? And if not, when can I be sure the data is persisted?

I wrote a mail to devcenter to find out about the durability issue.

Day 20

I actually received a response from SAP devcenter. They confirm that inserted data is already stored durably when the command returns. If you pull the server's plug at the same moment it will probably reside on SD RAM and will later on be moved to hard disk.

Then I prepared 1 million records and loaded them into the database. The insertion was still fast for this amount of data but did non scale linearely. One reason might be that the amout of data (roughly 1 GB) exceeded the available SD space. The other possible reason is that I found about 120.000 entries in the import's error log. Usually error handling is much slower than regular functionality.

I ran a first query on the new data. It is a full text search on the complete table.

select count(*) from myschema.post where text like '%brot%'

The duration absolutely excited me! I received 1800 results plus this message. I assumed this is no error but rather an end of stream message.

System call error at fputs: "Broken pipe" * 0: SQLSTATE: 0000

Then I ran a Unix grep command on the import file to compare the execution speed. time grep -c brot test.csv It turned out to be 14 times slower even on the second attempt.

Day 21

Back to theory. I purchased the book ABAP Development for SAP HANA. The introduction gives a clear picture of Hana's technological concept. The main innovations that allow high speed are

A Data is mainly stored in columns instead of rows.
B Stored data is compressed wherever possible.
C Hana hardware carries much RAM.
D A special query language allows for parallel query execution.

Each of these items alone does not seem really stunning. But combined they make a lot of sense. Think of (A). If you store data in columns you'll need to access much less data for the everage query that is based on conditions on a few fields. You'll only have to access the data regions where the respective attributes are stored and don't have to care about the irrelevant other fields. Let's assume that this way one can reduce the amount of data to be accessed on an average query by one order of magnitude.

Compression of stored data (B) might save another order of magnitude. Among others Hana uses a dictionary based copression algorithm that replaces long recurring string fields with numeric equivalents.

Now add some more RAM (C) to your hardware and eventually your whole database or at least the hot spots will fit into RAM. A hundredfold decrease in storage demand results in a 100.000 fold increase in speed. (1:100.000 is the factor between RAM latency and hard disk latency). The book claims that Hana's bottleneck is not the hard disk but the RAM - as compared to the even faster L1-3 caches.

Finally the items C and D both reflect a shift in hardware improvements. Today processors do not get faster as quickly as in the past. Instead RAM becomes cheaper fast and processors allow for higher parallelism. So Hana provides a specialy query language called SQL Script that enhances SQL by functional parallelization elements.

Day 22

I spent another day on a train and went on reading ABAP Development for SAP HANA. By the way this is one of the best books from the SAP Press series. Aside from writing quite readably the authors know their subject well and have a deep unterstanding of the design considerations that led to the final Hana product.

I reached a description how to program Hana from the ABAP stack. You'll need an ABAP 7.4 system. To run the combination of ABAP and Hana you should code from the Eclipse environment, not the SAP GUI. For those unfamiliar with Eclipse: it is a plugin framework for development tools which was originally developed by IBM but soon after made open source. SAP considers Eclipse a strategic platform that will allow them to bundle and release development environments for different SAP technologies. In our case we'll need a combination of "ABAP Development Tools for SAP Netweaver" and "SAP Hana Studio".

Unfortunately I do not have access to a ABAP 7.4 server yet but this will be one of the hottest topics on my wish list for future sessions.

The most relevant change for ABAP database programmers is an approach called code pushdown. In the past you could treat the database as a black box that was hidden from you by the Open SQL database abstraction layer. In most cases you didn't have to worry whether your data was located in the database or if it was being processed in the ABAP engine. It was usual to first load several rows into an ABAP working area and then manually execute filters or modifications on these data.

If you plan to fully exploit Hana's abilities you must become location aware for both data and code execution. And you will lose the portability of Open SQL ABAP code. Fast Hana code is as importable as native SQL code. Much of the functionality that used to be performed in ABAP on the application server will have to be moved to SQL Script to minimize the number of remote database calls and take full advantage of in memory data access. Since schematic diagrams tend to show app server on top of their database this modification resembles of pushing the code downward into the database.

Day 23

Hana is rumored to have great full text retrieval abilities. I found them surprisingly easy to use. All you need is this statement.

hdbsql HDB=> create fulltext index i_text on myschema.post(text)

It creates a full text index based on the content of the text column. According to the documentation i_text is the name of a hidden column that represents the index. Make sure to use this function-style syntax in your queries

hdbsql HDB=> select count(*) from myschema.post where contains( text , 'brot' )

This statement returned crazy fast. It searched 1 million lines of 1K of text considerably faster than Google returns your search results.

The result differed from the result of the corresponding query from day 20. This one returned 500 lines. Probably the above syntax means that a search for a full word is performed and not for any appearance of the string. The function 'contains' allows to use a third parameter that describes the type of search. I'll examine that in another ession.

Inserts into the indexed table will probably be slower than before creating the index. Since the duration for the creation of the whole index was much smaller than the insertion duration the speed loss on inserts won't be too big.

Day 24

SAP proposes to use the Hana database in a software stack that contains also a lightweight application server that is branded Hana XS. It consists of a JavaScript and SQLScript container that runs within the Hana database. These are the main implications and features.

o application code runs within the database
o SQLScript and JavaScript are supported
o SAPUI5 frontends are supported
o REST access to database contents works easily
o short access paths from application logic to database contents
o no need to specify connection credentials or adjust firewall rules
o homogenous deployment and versioning of application and database

This architecture sounds very clean and fast. Still I do not understand what the intended use case looks like. It is not very likely that any large company might migrate its existing app server landscape to such a special purpose niche product. Furthermore companies do have serious security concerns if externally reachable web applications run on the same server instance as the ERP system. This was already the case with WebDynpro/ABAP or fifteen years earlier with Oracle's 8i database app server.

So I guess that Hana XS applications will best be used for super fast in-house reporting. Or as separately served tiles within a portal that runs on one of the big name app servers.

Day 25

The good thing about a private evaluation project is that I can define my own schedule. No business need keeps me away from the interesting subjects. And I am free to switch subjects whenever I like. So I had a quick look at SQL Script.

As discussed earlier SQL Script is Hana's counterpart of Oracle's PL/SQL plus a parallel execution booster. The syntax of a simple procedure must look like this. But caution, this is not proven yet!

CREATE PROCEDURE count_posts ( out count integer )
   LANGUAGE SQLSCRIPT
   SQL SECURITY INVOKER
   READS SQL DATA AS
BEGIN
   count = select count(*) from myschema.post;
END;

This procedure executes the count(*) statement that you can see between BEGIN and END. The output parameter "count" is declared in the first line. I chose a scalar type for the first try. This way I did not have to define a table type. You'll find the full SQL Script language definition here.

To create a SQL Script procedure you can use the Project Explorer in SAP Hana Studio. I tried this click path:

File / New / Project / Sap Hana Development / XS Project
New / Folder
New / File "*.procedure"

If you create a file with the extension "procedure" the IDE fills it with a predefined procedure template. You have to add the parameters and the procedure body.

Then I tried to active the procedure. This is a compilation and publication step that is similar to activation in the ABAP workbench. It is supposed to work by clicking "Team" / "Activate" in the procedure's context menu. Unforunately I received an error

Error getting repository client version. Repository client not properly configured.

The activation is obviously tied to repository which I missed to configure. I'll try to clarify this till next week. No skipping of subjects this time I promise.

Day 26

If you want to use the Hana repository you need to install a package called the "Hana client", too. I just installed the "Hana Studio". You can download the Hana client from this same location as the studio. Then you unzip it and double-click on "hdbinstall". If you choose the default path you will obtain a client installation in the folder

C:\Program Files\sap\hdbclient

right next to your hdbstudio installation. It contains a file called "regi.exe".

After installation of the Hana client I opened the Hana studio, right-clicked on the project I created on day 25. There I could execute "Team" / "Share Project". I first received an error message but it contained a link to a form where I could specify the path to "regi.exe" and define a workspace location.

Next I right-clicked on the folder from day 25 that contains my SQL Script file. The option "Team" / "Activate" was accessible now together with several other option such as "Commit". I pressed "Activate". A message told me that I have insufficient privileges.

SQL error: insufficient privilege: Not authorized ... Repository Activation Problem

Anyways I think I got a step further and I'm quite confident that I'll manage to run my SQL Script in the next session.

Day 27

Today I managed to activate my SQL Script procedure.

To fix the authorization problem from day 26 I had to run this statement in the SQL editor of Hana Studio.

GRANT SELECT ON SCHEMA "MYSCHEMA" TO _SYS_REPO WITH GRANT OPTION;

The schema name "MYSCHEMA" results from the schema from which I started the SQL editor.

Then I got into the good old development loop of code modification and error message. I'm starting to feel at home in Hana Studio! I'd run Team / Check and draw my conclusions from the message in the "Problems" tab. My take home messages are these two:

You cannot return scalar values from a SQL Script procedure therefore you must define a table type.

SQL Script is case sensitive in the sense that it expects the field to be spelled the same way as in the table definition. Plain Hana SQL is only halfways case sensitive, see above.

Below is the checked version of the procedure. It serves a different purpose than before just to get the my first SQL Script running.

The procedure definition is in the tab "Script".

CREATE PROCEDURE first_five ( out first_ids tt_postid )
   LANGUAGE SQLSCRIPT
   SQL SECURITY INVOKER
   READS SQL DATA AS
BEGIN
   first_ids = select top 5 POSTID, AUTHORID, DATE from MYSCHEMA.POST;
END;

The table type definition is in the tab "Local Table Types"

CREATE TYPE tt_postid AS TABLE (
   POSTID INTEGER,
   AUTHORID INTEGER,
   DATE DATE
)

With this code I coud also execute Team / Activiate. I also tried to call it from the SQL editor but I did not find the right syntax. It must follow this pattern

call <procedure name> ();

but I always got

SAP DBTech JDBC: [328]: invalid name of function or procedure:...

Day 28

My procedure ran sucessfully during the fourth SQL Script session. In the Hana SQL Script Reference you find a definition that the procedure name consists of the schema name plus the procedure identifier. So I tried to first determine the schema my procedure is located in. I did not specify the schema when I defined the procedure or the surrounding project and package. Therefore there must be a default rule that probably derives the schema name from the object type.

I used the navigator tab in Hana Studio to click through all schemas. In schema _SYS_BIC I found a procedure called "MyXSProject.procedures/first_five" in the "Procedures" folder. So I tried this call which came quite close to the correct one.

call "_SYS_BIC"."MyXSProject.procedures/first_five" ( );

I received a message that made clear I had to deal somehow with the return parameter.

Could not execute 'call "_SYS_BIC"."MyXSProject.procedures/first_five"()'
SAP DBTech JDBC: [1281]: wrong number or types of parameters in call: FIRST_IDS is not bound

You can probably write another SQL Script program that defines a table variable to hold the call's return value. But if you just want the out table to be displayed in Hana studio you can call the procedure with a question mark.

call "_SYS_BIC"."MyXSProject.procedures/first_five" ( ? );

The result table is displayed as if it was the result of a query.

I'll go on examining SQL Script in future sessions in depth. Naturally performance will be my main focus then.

Day 29

The majority of people who use Hana will do that from an ABAP stack. There is no way I can bypass this constellation in my blog. I'll start with some theory while I figure out offline how to get a Netweaver 7.4 server running (this is a requirement to access Hana from ABAP).

Basically there are two by two different ways how to access Hana from ABAP: as primary or as secondary database and via Open SQL or natively.

A primary database is the one that is accessed by default whenever you run an Open SQL statement such as

SELECT SINGLE * FROM tab INTO ls_rec .

If you are using a Netweaver 7.4 server that uses Hana as sole database the above statement already accesses Hana. If you are using a 7.4 server that uses a database from any other vendor then you may choose to use an additional Hana database for performance critical operations, typically from reporting purposes. The secondary database can be accessed with such a statement

SELECT SINGLE * FROM tab CONNECTION ( 'CONN2' ) INTO ls_rec .

The difference is the CONNECTION clause that specifies the database to be accessed. It can be defined in transaction ST04 / "Database Connections" and may but does not have to be of type Hana.

Independently of your decision to use Hana as primary or secondary database you can either access it from Open SQL as shown before or natively. In a nutshell native database access trades portability of your database code for the ability to take advantage of database vendor specific features. You'll definitely want to do the latter if you are using Hana for an extra portion of speed! Native database access is a prerequisite for the code pushdown technique mentioned in earlier posts.

Native database access can be done with the EXECUTE SQL command. Alternatively it can be performed with a newer API called ADBC. ADBC (ABAP data base connecivity) is an ABAP version of Java's first database API which was called JDBC. ADBC code looks like this

stmt = cl_conn=>get_connection( 'CONN2' )->create_statement( ).
res = stmt->execute_query( 'SELECT ...' ).

Throughout the next sessions I will try to use Hana as a secondary database from a SAP server that will probably be installed on my laptop.

Day 30

I must admit I am almost stuck on my way to a Netweaver 7.4 ABAP / Hana scenario. My internet research led to two different approaches.

A Use a hosted pre-installed NW 7.4 + Hana scenario following this link.
B Install a Linux based NW 7.4 + Hana on own hardware. This link indirectly hints that it may be available.

Neither one is exactly what I was looking for. Both approaches are based on a 90 day license. The first one is said to be slow on startup (30 to 40 minutes). For the second one I need to get a Linux PC.

Day 31

I decided to try the hosted NW 7.4 + Hana and began with the registration. It will take a couple of seesions till everything is up and running again.

Day 32

I am still struggling with the setup and alignment of my Amazon AWS account and my SAP Hana Demo account. It is not that easy since AWS is a multiple purpose environment that offers several degrees of freedom. I will not describe the details. They are related to the hosting model and not to Hana itself. So they are not very likely to help anyone in a Hana business scenario.

Days 33, 34, 35

Success is lurking a few clicks from here. Since I have spent so much time with the hosted Hana access I made up my mind to write a bit about it. Although it is not too clever to store relevant business data on american servers the hosted Hana setup is still one of SAP's official distribution models.

Here are some links that have proven to be helpful.

Step by step How-to secure version.

Step by step How-to less secure version.

How to create a billing alarm. Hint: if you try this before the AWS instance was created for the first time no billing metrics are available.

This link describes the permissions you need to set on AWS side to be able to access it from the SAP Cloud Library. It could save you a lot of time.

How to use the ABAP + Hana stack.

Day 36

Today I managed to access the ABAP Workbench of my new Hana.

The description ABAP + Hana stack I mentioned in the last session lacks a bit of clarity at the beginning when it referrs to the login URL

https://caltdc.netweaver.ondemand.com/console/tenant_<tenant_name>

At first I could not figure out what my tenant_name was. Then I saw that there is a HTTP parameter called tenant_name in the URL that is shown in your browser when you start with the link for approach A from day 30, press "AS ABAP incl. BW 7.4 SP5 on SAP HANA SP7 [Developer Edition]" / "Demo Now" and log in/register. That very same tenant_name worked fine for me as part of the login URL above.

Day 37

In this session I learned the difference between terminating and passivating a hosted Hana instance.

Terminating means that all entries made to the Hana system are lost. Start up of a new Hana (which is called "creation") takes 40 minutes again. Then you must enter the license key, etc. from scratch. But I believe a terminated Hana creates no costs at all.

On the other hand a passivated Hana starts up (which is called "activiation") in about 5 minutes. All changes you made in an earlier session are still there. But I guess storage will cost a bit. Anyways my current overall costs are about 11 dollars. So this is affordable.

Day 38

On some days it is very hard to make a small progress. On other days progress happens almost by itself. This was the case during my first ADBC session.

I explained the basic idea of ADBC in session 29. Today I adapted an ADBC code snippet from the book "ABAP Development for SAP HANA". It worked immediatley on my Hana ABAP stack.

This is the complete code.

REPORT ZADBC01.
data: lv_statement type string,
   lo_conn type ref to cl_sql_connection,
   lo_statement type ref to cl_sql_statement,
   lo_result_set type ref to cl_sql_result_set.
types: begin of ty_addr_result,
   addrnumber type AD_ADDRNUM,
   name1 type AD_NAME1,
   end of ty_addr_result.
data: lt_addr_result type table of ty_addr_result,
   ls_addr_result type ty_addr_result,
   lr_addr_result type ref to data.
get reference of lt_addr_result into lr_addr_result.
lv_statement = | select addrnumber, name1 from ADRC |.
try.
   lo_conn = cl_sql_connection=>GET_CONNECTION( ).
   lo_statement = lo_conn->CREATE_STATEMENT( ).
   lo_result_set = lo_statement->execute_query( lv_statement ).
   lo_result_set->set_param_table( lr_addr_result ).
   lo_result_set->NEXT_PACKAGE( ).
   lo_result_set->close( ).
   catch cx_sql_exception.
endtry.
loop at lt_addr_result into ls_addr_result.
   write: / ls_addr_result-ADDRNUMBER, ls_addr_result-name1.
endloop.

This program is supposed to read all entries from the address table ADRC. I was surprised that the result was not limited to the single entry I could see from transaction SE16. In fact the output comprised three entries!

When I changed the report so that the field "client" was shown, too, everything became clear. ADBC does not care about the client restriction that applies to Open SQL database access and the SE16. The two additional results came from a different client than the one I used to log in.

I do not yet fully understand every single line of the report. I'll deepen my understanding in the next sessions.

Day 39

During this short session I tried to find out how to pass a parameter to an ADBC query.

The principle is simple. You replace the parameter's value in the SQL statement with a question mark (?). Then you can set the parameter's value from ABAP with the SET_PARAM method.

Here are the code fragments that you need to blend with the report from day 38 in order to set the client parameter programmatically.

data ...
   lv_client type mandt,
   lr_client type ref to data.
   ...
get reference of lv_client into lr_client.
lv_client = '001'.
...
lv_statement = | select client, addrnumber, name1 from ADRC where client = ? |.
...
   lo_statement->SET_PARAM( pos = 0 DATA_REF = lr_client ).
   lo_result_set = lo_statement->execute_query( lv_statement ).
...

Day 40

To gain some more realistic experience with ADBC I want to implement a small database scenario that can also serve as starting point to investigate Hana's data warehouse capabilities.

I'll start with a data insertion programm that can populate the tables for a fictitious training application. There are four master data tables for tutors, courses, locations and attendees. Then there is one table that contains transactional data for attendances.

These tables cover most data constellations of interest for a data warehouse. There are time and location fields, prices and plain text fields.

Today I defined only the tables:

table ZTUTOR
MANDTMANDT
TUTORIDNUM10
FIRSTNAMETEXT20
LASTNAMETEXT20
RATE_HOURNUM10

table ZCOURSE
MANDTMANDT
COURSEID NUM10
TITLE TEXT50
DATESTART DATS
DATEEND DATS
LOCATIONID NUM10
TUTORID NUM10
PRICENUM10

table ZLOCATION
MANDT MANDT
LOCATIONID NUM10
NAME TEXT20
TOWN TEXT20
COUNTRY INTCA

table ZATTENDANCE
MANDT MANDT
ATTENDANCEID NUM10
COURSEID NUM10
ATTENDEEID NUM10

table ZATTENDEE
MANDT MANDT
ATTENDEEID NUM10
FIRSTNAME TEXT20
LASTNAME TEXT20

I did define them in the ABAP Workbench. Later on the challenge will be to access them from the Hana Studio. Both development environments access the same Hana but I do not have an idea yet how the latter can address the data from the first. Anyways it is quite usual that business data originates from the ERP system and is later transformed in the data warehouse.

Day 41

I started to write a program that populates the tables I defined in session 40.

I also investigated how to access them from the Hana Studio. With the help of some blog posts I found out that you can see them in the Project Explorer. For a known combination of SAP System ID (SID), SAP client ID, SAP user name and language you will find a Hana Studio project following the naming convention

<SID>_<clientID>_<user_name>_<language>.

In my case there was a project called A4H_001_developer_en . I opened it but did not find my tables such as ZTUTOR. I already assumed it to be hard to find since it is located in $TMP.

Then I tried the "Search" option from the top tool bar. With "Search"/"Search"/"ABAP Object Search" I could enter my table name ZTUTOR. It took a while till the result was shown in the lower right corner of Hana Studio in a window called "Search". My table was found and when I double-clicked on it an ABAP-Workbench-like table description came up in the main tab.

This feels real strange since it is a mixture of well known Eclipse behavior and well known ABAP Workbench behavior. The context menues I clicked through earlier also showed entries like "New... ABAP Class". It will take some time till I understand this whole thing but it is quite intrigueing.

Day 42

The data creation report is almost finished. I'm looking forward to building Hana views on top of the generated table contents.

Day 43

Data creation report finished.

Day 44

Timidly I began to create an attribute view based on the tables from session 40. If you recall session 10 you will remember that attribute views are the first of three layers that are used to build data warehouse functionality in Hana. The others are analytic views and calculation views.

I expected to get stuck somewhere soon. The "SAP Hana Modeler" perspective looks quite cumbersome and the propagation delay of the hosted desktop makes the setup feel flimsy. But I was wrong with my pessimism.

I could easily create a new package in the content tab. This is used to separate the development of different teams or vendors. Then I addad a new attribute view called COURSE_PLAIN. On view creation one must choose between "session client" and "cross client" type. This is similar to client restricted and cross client tables in the ABAP workbench.

The attribute view consists of a "data foundation" part and a "semantics" part. You specify the data foundation first. This is just the list of tables and fields you want to base the view on. I was happy to find the table ZCOURSE by name with no necessity to type in a package or project name. For my first attribute view a single table is enough. Then I double clicked on the fields I wanted to be visible in the attribute view. If you mark no fields at all you'll get an error later on when activating.

You can switch to the "semantics" part by clicking on the respective box in the window on the left. Here you define the primary keys.

Until now I just created a respository object. In order to use it you have to transform it into a runtime object by activating it from the object's context menu (right click + "activate"). If the object cannot be activated due to errors you see them in the "Job Log" window on the right side which happened when I missed to mark any visible fields in the data foundation.

Then there's a button labeled "Data Preview" on top of the main window. It allowed me to see the course data I had inserted from ABAP on day 43.

Day 45

Improved the data insertion program. Implemented bulk size for commits, added time measurement, fixed random date bug, implemented deletion.

Strangely the deletion of even 100.000 records takes no measurable time. I implemented this with "delete from...", no table dropping.

Day 46

Attribute views can consist of several different tables that are joined. To explore this feature I created an attribute view called FULL_BLOWN_ATTENDANCE. It is meant to cover basically all human readable fields in my data model that deal with a course attendance. The data foundation is formed by the tables ZCOURSE, ZATTENDANCE, ZATTENDEE. But only these fields are marked as visible: title (from zcourse), datestart, firstname, lastname, attendanceid. The attendanceid is necessary since the semantics part of the attribute view requires a primary key.

Then I defined a similar multi table view called FULL_BLOWN_COURSE. It covers the tables ZCOURSE, ZLOCATION and ZTUTOR. The fields are: firstname, lastname, rate_hour, title, datestart, price, name, town and courseid.

After activation both views were visible in the "data preview editor" and also in the "SQL editor" which I did not mention before. The SQL editor is intuitively comprehensible. It shows just the first 200 rows of the view. The output of the data preview editor is a bit harder to grasp. It shows the charted distribution of a single attribute. For instance the attendees' first names are distributed equally over the four possible values that my random data generator returns.

When I defined the joins between the table triples I had the choice to use the join types referential join, inner join, left/right outer join, text join. While I knew inner and outer join from relational databases I was not sure about referential and text joins.

With the help of some online research I found out that referential joins are roughly the same as inner joins with the prerequisite that the key field in the master table be unique. Under these circumstances referential joins are faster than inner joins. Text joins are there to allow internationalization of text fields. They link an attribute to the language specific text representation.

I just wanted to finish the session when I discovered that I could see all ABAP objects I had created from the ABAP workbench earlier also from the Hana studio - also the programs and classes. As mentioned much earlier you can use Eclipse to write ABAP code but I did not have the opportunity to try this before. I added a field called "price" to the ZCOURSE table.

Day 47

I wanted to query data from one of the new attribute views using the SQL console of Hana Studio. To do so I needed to know the exact name of the table. I found a folder called "Public Synonyms". It is located on the "Systems" tab of the SAP Hana Modeller perspective. This tab contains fully qualified names for many tables and views. This way I found out that I can query from FULL_BLOWN_ATTENDANCE with this syntax:

select count(*) from "evaluation01::FULL_BLOWN_ATTENDANCE"

evaluation01 is the name of the package I created when I defined the attribute views.

Next I began hastily to build my first analytic view. Analytic views are the next layer of Hana's data warehouse view stack, see day 10. But soon I discovered that I had misconceived the concept.

Analytic views are not fully built on top of attribute views. Instead the data foundation of an analytic view consists of a facts table(!) plus one or more attribute views. The attribute views are there to provide the dimension data. Thus I'll have to design my view from scratch.

Day 48

Money is the core of all things. Even in my Hana demo scenario. I want to visualize the revenue and profit that can be calculated from the fields ZCOURSE-PRICE and ZTUTOR-RATE_HOUR.

First I defined two attribute views called PLAIN_TUTOR and PLAIN_LOCATION. They contain the fields tutorid+rate_hour and locationid+town+country, respectively. Primary keys are tutorid and locationid.

Then I started defining an analytic view called REVENUE. Analytic views consist of three parts:

A data foundation
B logical joins
C semantics

The "data foundation" is assembled of the facts tables ZCOURSE and ZATTENDANCE joined on courseid (there can be more than one). Then I marked the fields datestart, attendanceid and price as visible. Datestart will be used for the time dimensions.

In the "logical joins" part I added the attribute views PLAIN_TUTOR and PLAIN_LOCATION with joins on locationid and tutorid. The latter will provide the geographical dimensions.

In the "semantics" part I could mark attributes as "measures". The only measure for the revenue is the price field.

Then I hit the "Data Preview" button and drews quick graphs for price/country und price/datestart as y/x axis. The graphs looked nice but I discovered that it used "price (count)" in the vertical direction. What I intended was a sum of prices not the number of recoreds. When I switched to "price (sum)" I received the error

inconsistent datatype: only numeric type is available for aggregation function

On first sight this is surprising since the price field is explicitely labeled "numeric". The first hits on google did not seem to be helpful, either. We'll see.

Day 49

I found the solution for the type mismatch problem. In ABAP the field price has the data type NUM10. This type is a text field that is restricted to numeric characters. I adjusted it to a self-defined type ZDECIMAL with a domain that is defined with raw type DEC and two decimal places. There seems to be a tiny glitch in Hana Studio here since it showed a numeric type at first and offered numeric aggregation functions which couldn't be executed later on.

Anyways the diagrams in the data preview for avg(price)/country look good.

But this is only the data preview! When do we get to the real data display functionality?

Well, we don't. It took me some time to figure out that Hana is only the data source and aggregation engine. If you want beautiful charts you need to apply SAP Business Objects Business Intelligence Platform or SAP BW.

Another interesting detail is that you cannot query analytic views without using aggregates. If you run this query

SELECT "PRICE" FROM "_SYS_BIC"."evaluation01/REVENUE"'

then you get

SAP DBTech JDBC: [7]: feature not supported: not allowed over OLAP VIEW : search without aggregation or grouping by expression other than view column

If you use an aggregation function like SUM, MIN, MAX or AVG then it works fine.

SELECT AVG("PRICE") FROM "_SYS_BIC"."evaluation01/REVENUE"

Day 50

It's may 2014. One year and 50 sessions are over and it is time for a summary.

First of all Hana is a crazy fast database. Check.
Second it is tightly coupled with the SAP Business Suite. Speed is never wrong. Check.
Then there is that assumption that everyone will be doing big data analysis in the close future and therefore will need Hana. I do not fully agree. I think the number of users doing data analysis will be relatively small compared to the number of plain ERP users. And only few of these big data analysts will choose Hana. It is too expansive and not attractive as stand alone tool outside the SAP ecosystem.
So yee SAP specialists out there - you're gonna need this stuff. But the rest of you - not really.

After this year I'll go on working with Hana. But I'll post only sporadically whenever there is anything remarkable to write about.

Day 51

My 90 day developer license ran out. I could still log into the client desktop but whenever I tried to connect to the Hana server I received this message.

SAP DBTech JDBC: [437] only commands for license handling are allowed in current state.

Fortunately the developer license can be prolonged. This link describes how.

Day 52

The description how to re-activated your Hana instance with a new license key does not appear really clear to me. In step 3 "Install License Key" I had to use the "SAP Hana Development" view and the "Systems" tab. There I used the "Properties" entry of context menu of my system. It was only available after I logged in (doubleclick plus password). I also had to delete the old license before I could install the new one.

Day 53

I was looking for a meaningful and concise example to demonstrate the power of code pushdown from ABAP to SQLScript. I wanted to write a single line SQLScript program that does something you cannot do in a SQL select statement. An optimal outcome would have looked like this

r_val = SUM( func( field_x ) ) from table;

where func stands for a SQLScript function that performs and returns some kind of non trivial calculation on the input field. A straightfoward ABAP counterpart would have involved a large number of database accesses. And even a more sophisticated implementation would require at least to transfer the whole table into the ABAP runtime where the function's logic would be applied.

My first idea was to look for built-in SQLScript functions. I thought there might be some type of date conversion but I found nothing helpful. Then I stumbled upon a blog post from Rich Heilman on Table User Defined Functions. This looked reasonably useful. So there is a way to define functions in SQLScript. The big difference of functions as compared to procedures is the call style. Procedures require an own CALL statement while functions can be used as operands or parameters in another statement. The downside of the functions Rich Heilman describes in that blog is that they can only return table type results.

Then I found another blog, also from Rich Heilman: Scalar User Defined Functions. This is what I was really looking for and what makes SQLScript by far more useable.

The next problem was to persuade my Hana Studio to define a UDF. Starting with the "Systems" tab of the SAP Hana Development perspective I tried File / new / SAP Hana / Database Development / Scalar Functions. This click path led me to a text editor prefilled with a UDF template (FUNCTION ...) but with no visible means to activate the code. So I just added "CREATE" to the template and pasted everything into Hana Studio's SQL command line tool.

This is my function that converts dollars to euros.

CREATE FUNCTION _SYS_BIC."CONVERT_EUR" ( dollars decimal(15,2) )
   RETURNS result decimal(15,2)
   LANGUAGE SQLSCRIPT
   SQL SECURITY INVOKER AS
BEGIN
   result := :dollars * 1.15;
END;

It can be used this way for queries

SELECT SUM( _SYS_BIC."CONVERT_EUR"( salary_usd )) FROM saphanaabap.zemployee;

During development I had to throw in one of these statements from time to time.

DROP FUNCTION _SYS_BIC."CONVERT_EUR";

Today I've learned that there is a way to bypass that bulky sixties COBOL inspired coding style that SQLScript procedures promote. But since we are living in the age of functional programming there is still much room for improvement here. I hope that Rich will one day be able to blog about SQLScript closures.