This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Tech Community.
In several SAP deployment projects on Azure, multiple SAP on Hana customers have reported poor Savepoint performance for Row-Store LoB objects when running on Hana 2.0 SPS04. We never saw any problems in the last 3 years when customers deployed SAP HANA on Azure before. Hence SAP HANA development and Microsoft worked together to analyze the issues the customers lately reported.
In short, the root cause of this problem relates to a sub-optimal I/O pattern for Row-Store Lob writes introduced in HANA 2.0 SPS04 where I/O chunking has been regressed for the Azure platform. A “LoB” is a Large Binary Object often represented by e.g. the datatype “RAW” in ABAP.
One of the reasons why the ‘chunking’ code changes in Hana 2.0 SPS 04 was detectable on the Azure platform was the fact that Azure storage and VMs have IOPS quotas. Azure VM and Azure Storage quotas are configurable and can be provisioned to allow very large Hana workloads to run. The code change in SPS 04 did lead to a significant increase in IOPS and decrease in throughput, something that became critical at several customer projects as the IOPS requirement exceeded the quotas in place. As mentioned previously, this problem does not exist in HANA 2.0 SP03, where in these previous versions, the 4KB blocks representing Row-Store LoBs are “chunked” into 64KB IO operations. That type of chunking improves IO throughput and speeds up HANA Savepoints.
1. What Causes the Performance Regression?
SAP Hana and many other RDBMS use something approximating this process whenever an INSERT/UPDATE/DELETE operation occurs:
- When the INSERT/UPDATE/DELETE occurs the RDBMS persistency datafiles are not updated immediately as this is a time-consuming operation
- The INSERT/UPDATE/DELETE is hardened to the transaction log thereby satisfying the A.C.I.D requirements for an Enterprise RDBMS
- The contents of the table(s) is simultaneously changed in the DBMS data cache
- Any subsequent SELECT statements will read the new changed values from data cache and not the old outdated values from DBMS persistency datafiles
- Periodically the changed values are written back to persistent datafiles. In SAP Hana this process is called a Savepoint. The Savepoint process is required to avoid long DB recovery times and excessive DB log growth
The Savepoint process writes data to the Hana Datafiles in certain IO block sizes. Various mechanisms are used to aggregate very small operations (such as a single INSERT to a LoB table) into a larger “batch” of updates. In Hana 2.0 SPS 04 the block size may become much smaller leading to many more IO operations per second and, especially as experienced on Azure infrastructure, a decrease in Savepoint throughput. This can extend Savepoint runtime and in extreme cases lead to blocking conditions.
This problem is most noticeable during R3load imports and mass data upload. The problem described in this blog impacts Row Store LoB operations and not Column Store operations. DMO/SUM runtimes are drastically increased if a customer has a large number of RowStore LoB objects
2. Which Hana Releases Are Impacted?
The releases impacted are Hana 2.0 Support Pack 4.0 and higher up to and including 044.00. Hana 2.0 SPS 03 is not impacted. Hana 1.0 is not impacted by this issue.
3. Which Hana Revision Contains the Fix?
Hana 2.0 SPS 4 Revision 045.00 will include a correction to resolve this problem.
Customers planning to upgrade from earlier Hana Support Packs to Hana Support Pack 4 are recommended to deploy Revision 045.00 or higher. Revision 45 is scheduled to be released in November 2019.
4. Which Azure VM Types & Storage Types Are Impacted?
All Azure VM types and storage types can be impacted by the initial change in SAP HANA 2.0 SPS04 that caused higher IOPS rates. Customers running on Azure Hana Large Instances or on-premise SAP HANA certified appliances with SAP HANA certified storage infrastructure may not be impacted by the initial change because the storage technology used has a large high performance Write Caches. These reduce impact of the increase in IO calls.
5. Which Other Updates Are Recommended?
It is highly recommended to address the issues discussed in Note 2814271 - SAP HANA Backup fails on Azure with Checksum Error and this blog Note 2814271 - SAP HANA Backup fails on Azure with Checksum Error
- SLES 12 SP4 - kernel version 4.12.14-95.37.1 (or higher)
- SLES 15 - kernel version 4.12.14-150.38.1 (or higher)
- SLES 15 SP1 - kernel version 4.12.14-197.21.1 (or higher)
Please note that SLES 15 SP1 is not currently supported for SAP HANA in Azure
(for actual release status see SAP HANA Hardware Directory)