The first three are the same product but differ from the enabled transformations. Data Integrator has all the base transforms plus typical data integration transforms like History Preserving. The Data Quality bundle consists of the base transforms plus Data Quality transforms like address cleansing. The SAP Data Services bundle contains all transforms.
Over the years Data Services grew to a very powerful tool and allows to implement every requirement efficiently and quickly. When I was part of the team, the development guideline had been to enable the customer performing even the most complex transformations with the combination of a few transforms. The tool also supports all SAP APIs available to pick the best suited one. Connecting to the database, generating Abap code for the extraction, call BAPIs/RFCs, send and receive IDOCs, use the modern ODP/ODQ API, web services, restful,… you name it.
Unfortunately, the tool did not receive much tender loving care over the last years and this shows, most prominently in the old-fashioned UI. As the tool is more for individual specialists, a modern UI is not that important, though.
At one point in time, our team got tasked with a cloud version of an ETL tool – Cloud Platform Integration – Data Services or short CPI-DS. Its backend is still a normal Data Services but easier to install. Building a proper Web UI is expensive and, in some areas, not even possible.
Also, the goal was to simplify things. The easiest way to simplify a UI and keep development cost down is to remove functionality. CPI-DS can therefore be used for some specialized cases only.
The concepts and architecture of Data Services are still state of the art as can be seen in the Gartner Magic Quadrant for Data Integration, where SAP is named one of the few leader since ages and for good reasons.
In a few areas the development of Data Services did a wrong turn, for example the Business Objects Enterprise Portal integration for unified administration and monitoring was a bad idea and never undone. As a result, the installation is unnecessarily error prone and time consuming.
The unique selling point of Data Services is how well it supports the human approach of data integration. We love to think logically and step by step: First I need to read the data from the following objects. Next I want to rename the columns so all gets readable and also let me do some simple data conversions to prepare for the hard work. Then join the data with another table, pivot the result and load that into the target. This is exactly how the dataflow is created. This allows to build the flow quickly initially and in a couple of months, when it needs to be tweaked to produce more data, it is easy to understand.
The downside of such a dataflow design would be the execution performance. But Data Services has an optimizer and converts the dataflow into a process that creates the same result but will achieve the best throughput. Renaming columns? Not needed for the execution, why waste CPU power for that. Convert the data first then join? Better join first to reduce the amount of data and do the conversion later.
Data Services can even do extreme optimizations and pushdown the entire processing into the source or target system, if possible. Even partitioning the data and processing the source data in parallel, fully independent streams is supported by the optimizer.
The result is perfect. Within minutes dataflows are designed and their throughput is so fast, the engine is mostly busy with waiting for the sources to provide the data fast enough.
SAP Data Services delivers on promises
All the marketing statements I heard recently have been supported by Data Services since the beginning. “Move the transformation to the data, not the data to the transformation” (SAP Data Hub) is called a pushdown in Data Services. “ELT instead of ETL” (SAP Hana) means to extract (E) the data first, then load (L) it into the target system and do the transformation (T) inside the database as follow up step.
Depending on the case, this is a good idea and therefore Data Services supports both approaches and the optimizer picks what makes the most sense.
Realtime Data Integration is supported also, but not implemented very well. It starts with the question “What data has been changed?” – a requirement for realtime streaming of changes. The philosophy of Data Services is to provide all techniques and APIs for any given source system and the user can choose which one to use. This makes sense as each has pros and cons.
There are other tools, SAP SLT for example, which can do a single method only, which reconfigure the source system to produce the changes and therefore getting changes in realtime is simpler. Both approaches have their merits.
The focus of Data Services on batch performance has downsides on transactional consistency, stability of dataflows running for multiple days as it would be required in realtime.
My main area of concern, because it is unexpected at first, is the connectivity to cloud systems and the big data world. Yes, Data Services can read and write Hadoop file systems, but the optimizer supports almost none of its usual tricks. Even the connectivity itself could perform better. Here the lack of interest in SAP in Data Services shows. My hope is for Data Services to be getting more attention, but all indications say no.
Overall, Data Services is still a very good data integration tool, especially because of its low price point. Currently, SAP tries to push Data Hub for each and every use case but frankly, for a pure Data Integration requirement, Data Services is easier to use, faster and costs 1% (TCO) compared to Data Hub. In case this situation comes up, ask SAP sales to compare the two in a PoC.