SAP HANA Fast-Restart

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Tech Community.

Objective

Over the recent years, we witnessed the fast growth of SAP HANA database size which dramatically stretches the time it takes to load massive amount of data onto server memory every time either the SAP HANA services or the server itself need to be restarted. Persistent memory (PMEM) is a solution to reduce data load time for servers capable of supporting the new memory technology. For those who haven’t yet taken on PMEM, SAP HANA Fast Restart (FR) is a compromised ‘software’ alternative to avoid the data load time in cases when HANA services is the only component needs restarting.

 

Since HANA 2.0 SPS 04, SAP introduced the FR feature to preserve memory content in temporary file systems (tempfs) which are directly mounted on Linux memory segments. Linux tempfs is like a RAM drive, tempfs content disappears if the server is rebooted. If the host server (or VM) stays online and SAP HANA services restart and recover, they can attach onto these memory segments with prior content before service recycled; thereby completely skip the reloading of HANA data.

Test environment
We conducted a test to observe the shutdown and startup behaviors of a 3+ TB SAP HANA database on an Azure Mv2 VM. The Mv2 server is the M208ms-v2 SKU that has 208 vCPUs and 5700 GiB RAM. It has SLES 12 SP4 for the OS, installed with SAP BW/4HANA on HANA 2.0 SPS 4 rev 40. The data content is the data collection used for the BW/4 benchmarks. The data size on disk and its memory footprint can be seen below,

 

1.database_footprint.png

HANA Startup Process overview

It’s of value to describe the HANA startup phases to support the parameters used to earmark HANA state of readiness in the boot process. Below is a synopsis of the HANA startup process while this blog provides more details.

  1. Opening the Volumes – services with persistence validate filesystem mounts and open the volumes
  2. Loading and Initializing Persistence Structures – initialize persistence structure and load statistic
  3. Loading or Reattaching the Row Store – this time is saved with the hdbrsutil OS processes which preserve process memory across HANA restarts (not VM reboot)
  4. Garbage-Collecting Versions - the garbage collector cleans up all versions except for the most recent one for any column store table
  5. Replaying the Logs – replay redo logs for both row and column stores. This action requires row stores to be in memory while the required column tables can be loaded on the fly
  6. Transaction Management - all services of a database synchronize with each other to ensure transactional consistency
  7. Savepoint - All changes that have been performed in steps 3 – 5 are now persisted to the DATA volumes by a savepoint
  8. Checking the Row Store Consistency - a row store consistency check is performed as last step during startup. This step can be configured to skip to save 10 min or so if one is certain of row store consistency during run time
  9. Open SQL Port - As soon as the SQL port is open, the application can access the HANA DB

The opening of SQL ports is a key timestamp earmarking HANA service availability while the rest of the column store data continues loading. If an incoming query requests data from a column table that is not yet in memory, it will be loaded on-demand. Hence the query performance may be suboptimal compared to normal HANA operation. The column tables load duration varies depends on the data size, and the storage system performance.

SAP HANA DB trace

To have SAP HANA DB startup and shutdown related activities captured, the database traces need to be configured at the ‘debug’ level, so all actions are available in the various trace files for investigation. To learn about the different trace levels available, please see this SAP help article.

Test read-outs

The SAP HANA FR option uses memory mapped storage structure in the file system to preserve and reuse MAIN data fragments to speed up SAP HANA restarts. This is effective in cases where the operating system is not restarted. For FR setup details see SAP help.

Before FR configuration, the SAP HANA DB startup timing was recorded to capture the baselines. The same metrics were again observed on another restart process after FR enablement for comparison. This report includes timestamps of significant events as the basis to draw the test conclusion.

Non-FR Startup load time

Extracted meaningful records from SYSTEMDB – nameserver_<hostname>.30001.000.trc. First record of the trace file registering the start of the name service:

[57074]{-1}[-1/-1] 2019-08-27 22:25:48.228382 i Basis TraceStream.cpp(00708) : ==== Starting hdbnameserver, version 2.00.041.00.1560320256 (fa/hana2sp04), build linuxx86_64 b178e03892acbdd031bc1a7824a3cd17c7db3ae3 2019-06-12 08:26:49 ld4550 gcc (SAP release 20181205, based on SUSE gcc7-7.3.1+r258812-2.15) 7.3.1 20180323 [gcc-7-branch revision 258812]

[57213]{-1}[-1/-1] 2019-08-27 22:25:59.489436 i Service_Startup tcp_listener_callback.cc(00074) :

start the SQL listening port: 30013 with backlog size 128

Observation: It took 11 seconds from the start of the name server to the SQL port opens.

 

Right after this time, the xsengine starts, indexserver begins to unload, then load tables. The load duration lasts from

  • The beginning of the indexserver load trace file: 0;3;2019-08-27T23:33:16.521000+00:00;SAPBHB;0;ESENDCONTROLT;8916;0;0;en;;3;300;0;transaction_id=30;
  • The last statement of the trace file: 0;3;2019-08-28T01:41:10.482070+00:00;SAPBHB;0;ENHSPOTCOMPSPOT;6973;0;0;en;$trexexternalkey$;3;3;0;statement_id=1290621387667374, statement_hash=6a188027dffefeee5a0fafa8b24552db, transaction_id=240, statement_execution_id=1125912791771611, connection_id=300496, db_user=MUELLERCARS, application_name=ABAP:BHB, app_user=SAPBHB;

Observation: It took 2 hours and 8 minutes to complete loading the column store.

FR enabled startup load time

Name server trace

[131307]{-1}[-1/-1] 2019-08-29 04:15:10.304153 i Basis   TraceStream.cpp(00708) : ==== Starting hdbnameserver,

[131340]{-1}[-1/-1] 2019-08-29 04:15:16.590652 i Service_Startup tcp_listener_callback.cc(00074) : start the SQL listening port: 30013 with backlog size 128   

Observation: It took 6 seconds from the start of the name server to the SQL port opens

Index server trace

First entry of the load trace file

0;3;2019-08-30T18:19:32.760000+00:00;SAPBHB;0;0BW:BIA:BI0_0C00014222;450069;0;0;en;;3;79;0;statement_id=1291262830569130, statement_hash=5a9012b2349c8e356c328bd696bbe9e9, transaction_id=109, statement_execution_id=1125912791746560, connection_id=300645, db_user=_SYS_STATISTICS, application_name=Embedded Statistics Server;

Last entry of the load trace file

0;3;2019-08-30T18:19:36.753000+00:00;SAPBHB;0;/IWBEP/L_ST;26660;0;0;en;;3;100;0;statement_id=1291262830569130, statement_hash=5a9012b2349c8e356c328bd696bbe9e9, transaction_id=95, statement_execution_id=1125912791746560, connection_id=300645, db_user=_SYS_STATISTICS, application_name=Embedded Statistics Server;

Observation: load related activities were recorded for 4 min.

 

simply based on the duration of the indexserver trace file, the load duration from multiple trials of HDB restart with FR enabled showed significantly faster startups as compared to HANA restarts without FR. With this test configuration, it reduced the start time from 2 hours and 8 minutes without FR; to merely a few minutes with FR.

 

A view from hdbsql

The above conclusion was drawn from the database activity traces but how does it look reading a big table with hdbsql during the early stage of SAP HANA data load? We ran a small experiment to satisfy that curiosity. Admittedly, this is not the most precise measurement because we can’t manually trigger the hdbsql connection request at the exact point in time of the startup process, but good enough to establish a ballpark reference.

Test run with FR configuration active

With each of these test run, the B32 database is restarted and the below sql command is executed as soon as the hdbsql connection to HANA is possible.  In this scenario, the target table is supposedly already in memory.  For the next case without FR, this is done to query the table before it gets a chance to load.

 

For the test, we added up a key figure column of the largest table in this system containing more than 10 billion rows.

hdbsql B32=> select sum("/BA7/S_CURKYF01") from "SAPBHB"."/BA7/AB4CORPM1"

1 row selected (overall time 22.164005 sec; server time 14.166588 sec)

Test run with FR undone

hdbsql B32=> select count(*) from "SAPBHB"."/BA7/AB4CORPM1"

1 row selected (overall time 92.801180 sec; server time 84.679599 sec)

 

Observation: the Sum() function took roughly 4.5 times longer to run without FR setup.

 

To sum up, SAP HANA Fast Restart configuration eliminates the data load time associated with database services restart thereby improves the data availability for SAP business applications.  To preserve data in memory across VM or host reboots, persistent memory is required.

 

 

REMEMBER: these articles are REPUBLISHED. Your best bet to get a reply is to follow the link at the top of the post to the ORIGINAL post! BUT you're more than welcome to start discussions here:

This site uses Akismet to reduce spam. Learn how your comment data is processed.