Hana Smart Data Access (SDA) allows to blend remote systems into the Hana database as virtual tables. This technology is also known as Data Federation, Virtual Data Models and the old Enterprise Information Integration (EII) concept. SAP seems to favor the term “Hana Data Fabric” now.
The goal is to make data from remote systems visible in S/4 Hana as if they were part of Hana. Whenever a user selects such a virtual table, the select statement is forwarded to the source system, executed there and the Hana queries just show the results. The user does not feel any difference, apart from the performance probably.
This approach seems to be the foundation of how data is integrated between the SAP systems in future, at least Hasso Plattner called it the future of SAP data integration at the last Sapphire conference.
For simple cases as commonly found in ERP queries, this is indeed a useful technology with multiple advantages: The remote system data does not have to be copied into Hana, hence the data is always up to date and creating a new virtual table only takes a few seconds.
A perfect use case might look like the following: The S/4 Hana UI shows the customer master data. Additional data from a remote cloud CRM system should be shown. Therefore, the screen executes an additional select statement against the virtual table for the customer. Because this returns a single record only, the statement will return the requested data almost instantly. Copying the entire CRM data into Hana, just to avoid a single row lookup, does not make sense for such a scenario. It would simply be a waste of resources.
Result set caching
The virtual access is also suggested for analytics. In such a use case, the query might look like “What are my top customers in terms of revenue and group them per a CRM attribute?” In many cases, the database optimizer’s best approach will be to read all rows of the CRM data and join them in Hana in order to create the desired end result.
For every single user, for every query execution, the CRM data is copied again. In such a case, the opposite approach would be better: to copy the CRM data once and make the data available locally in Hana. Some sort of data caching…
This is supported by Smart Data Access by activating a cache flag. Then the first query using the data will need to query the CRM system anyhow and can, under the covers, store the data set as cached data in an invisible table. Subsequent queries will use the cached data, until the data is considered too old or out of date. And that is the key problem: How to set this timeframe? If it is too short, the cache will never be used. If it is too long, the CRM data is outdated and inaccurate. No matter what timeout value will be used, it is wrong.
The perfect solution would be if the source feeds all data changes into that cache to keep it current. Then it is up to date and complete and response times would be at Hana speed. This is possible in fact. It is one of the great features provided by Hana Smart Data Integration.
SDI is based on SDA and works with the same virtual table concept but also supports realtime replication. Hence it is easy to create a Hana table that has the same contents as the source table and is kept up to date with a latency of seconds. Nothing else is cached. I hope SAP will simplify that process and allow to flag a virtual table as “cached (realtime)” and that Hana will be able to execute the necessary commands itself.
Using above technologies, Hana can support the best-matching approach depending on the use case. If data is accessed from a virtual table only occasionally and it’s only for a handful of records, caching is deactivated and pure play virtual data access is used. If the data does not change in the source system, the temporary result set caching is a good fit. If the remote table changes constantly, the realtime caching is the way to go.
Data integration is more than accessing remote data, however. The source data also needs to be integrated into the SAP data model. In fact, this is considered the most expensive task in any project. Interestingly, SAP does not talk much about that. There is a project to harmonize the data models of the various SAP cloud solutions with S/4. If all applications use the exact same data model, yes, then integration will be simple. But only if it is perfect, there are no user customizations and the system to integrate is a SAP provided system.
The ETL (Extract-Transform-Load) tools in the market provide excellent support for all possible transformation tasks. SAP Data Services is one of the market-leading solutions. The approach of these ETL tools is to run a job once a day, reading the changed records and performing the complex task of the data transformation into the target data model. Hence the expensive and time consuming transformation happens on the changed data only.
In a virtual data model, these transformations must be implemented in calculation views. Whenever data is selected from the view, the data undergoes the same transformations again and again. For simple cases this is not a problem, but for typical cases, this can take seconds or even minutes – which is unacceptable.
The vision I had for Hana Smart Data Integration contained realtime transformations as well. Instead of copying the source table 1:1, why not apply some transformations along the line? Then these could be removed from the calculation view and are part of the cached data already.
There are some traces of that visible in the FlowGraph editor. In reality, only a tiny small subset of the transformations required in real life projects support that functionality.
IT operation concerns
The approach of SAP to use virtual data models for everything has another issue to be considered upfront. Will the IT department even allow access to the source systems? Their responsibility is to reduce down-times of the main ERP systems and they therefore are very protective.
Now, we ask for access to the systems to execute lots of queries at random and with uncontrollable resource consumption. It is very likely they will not even provide the required access! A point worth checking upfront…
Overall, the Hana Data Fabric is nothing new. It has been supported via Smart Data Access and Smart Data Integration for many years. Having said that, simplifying the handling is always a good idea. And for certain use cases – single row lookups – it is the technical optimal solution.
For all other use cases, I have extended that architecture by Apache Kafka to store all ERP changes. Everybody can consume that data at their convenience, either in batch or realtime. The combination of these two technologies unlocks the full potential including the additional requirements of Cloud, Intelligent Enterprise, Data Lakes, Big Data, etc. without downsides, in my opinion.