Adopting new technologies follows the same pattern in all companies. A group of evangelists see the new technology as the ultimate answer to every problem, conservatives see some potential with many downsides. The developers have to live with the unintended consequences.
For a few years now, new SAP solutions have mostly been built on container technology. The goal of this article is to look at use cases and what could go wrong.
A container feels like a small-scale Linux operating system with an application installed. Such a container is delivered as a file image and runs on a host computer as a virtual server. In contrast to VMware-like virtual machines – where even the operating system itself is part of the image – the container’s operating system is just redirecting the system calls to the host computer.
As a result, the application performance inside a container is identical compared to installing the application directly on the host computer, which is not true for virtual machines.
Rule #1: Do not install software, deploy containers
One view of containers could be that it is a very convenient way of installing software. It is a matter of seconds. The command “docker pull imagename:latest” installs the software, the command “docker run imagename” starts it. Way simpler than running a custom installer, where you have to wait half an hour for it to complete the task, and then wonder why the installation worked on one server but not the other. Everybody who installed a larger software package can relate, I assume.
Example: To install Hana, a command line tool exists, but also a docker image. With the command line tool, the prerequisites of the server need to be met, the installation allows for many different options and takes 30 minutes. In contrast, the docker image just works.
This example shows what not to do. The image requires certain system limits on the host computer to be updated and the docker run command requires a whole set of parameters. Unfortunately, there is not much that can be done about that, as Hana is a large service as well as a database with data persistence.
Rule #2: No configuration information in the container
One consequence of the aforementioned instances is that all configurations need to be done at container start. The container image itself is user and customer agnostic. The concept of containers goes even further than that: it always starts with the same image files. In other words, all changes made within the running image are lost as soon as the image is shut down.
Example: The container image is a webserver, the webserver writes logs and when the image is shut down, all created log files cease to exist. That is the reason why a Hana image asks to create a directory on the host server and mounts the directory into the image at the start. When Hana creates the database and the tables, it writes into the same directory within the image as before, but because it is mounted, the changes are actually made in the host file system.
Rule #3: Isolation
An important aspect of containers is the intrinsic security this model provides. Everything a container should do is either inside the container itself or done by calling other network services.
All containers work 100 percent independently from each other; they do not even know of each other. An SQL application container can execute a select statement against a Hana database and this database might be another container. It should never use low-level calls to bypass the container isolation.
A (negative) example: SAP Data Hub breaks the container isolation. One application container is the vFlow editor to design the flow of data. The resulting vFlow is assembled as a set of new container images. In other words, one container creates (and has control of) another container. That requires disabling cross container security. Not a good architecture in my opinion, as it defeats the purpose.
Rule #4: Container size
One important break point to watch out for is the container size. An increasing size of a container has multiple negative side effects.
There is security: The more services a single container exposes, the larger its attack surface becomes. A container should expose a single service only, if possible.
Examples: A webserver container exposes one network port. The Hana container should only expose the SQL port.
Another reason to keep the size small is the fact that containers start and stop frequently. Not so much because of failures but for load balancing. If a service is used heavily, another instance can be started to take some of the load. If the container start takes 10 minutes, however, speed and efficiency suffer.
Example: The Hana 2.0 container start takes quite a while. But that does not matter much as sharing the workload by starting more containers is not possible, anyway.
Most important is the software lifecycle, however. When developing software in containers, the initial version provided by development has only one service – plus ten dependent software libraries with their respective dependencies. Over time, the container grows in capabilities, and therefore the number of dependent libraries increases as well. After a year of development, it is as agile as the worst monolithic application.
Rule #5: Network hops
The other extreme in regards to container size is making the services too small and then one user action requires a chain of services to work in harmony. The probability of failure is not the problem here, as containers are hopefully used with fault tolerance and load balancers. The problems are the latency and the throughput.
A simple calculation: The services were designed in an atomic fashion. The sales overview service asks the authentication service if the request is valid. Once that gate is passed, it calls the oData service for sales data. For the returned data the finance status is checked, and the master data system is queried for the customer record. Even if every service hop takes just a short time, the wait time on dependent services alone will cause a significant delay.
Things go from bad to worse when the amount of data increases. The bandwidth of an idle (!) network is 0.1 GByte/s. Copying data to memory is done with a bandwidth of 20 GByte/sec, within the CPU the bandwidth is 4,000 GB/sec. These are quite significant differences in throughput! If many small services are using the network, it no longer is idle but fully saturated and the wait time goes up exponentially.
That’s not the end of the story, however. Data in thread #1 is available as a memory structure. This structure needs to be turned into network packets and on the receiving end is unwrapped into the original memory structure for thread #2 to process it. In other words, for service-to-service communication a lot of memory copies need to be performed.
In contrast, when processing the data locally within one container, the two CPU threads do not only exchange information via the fast CPU interconnects, but the data size goes down as well. Why? Because they exchange the memory pointer to the structure only. Instead of copying the entire row with 10kb of data, they only exchange the 8-byte pointer where the memory structure is found.
In effect, the throughput numbers between the two designs are worlds apart.
Example: The SAP Data Services jobserver partitions the data horizontally. One task performs all operations on one partition of the data, each partition can be processed on a different server. Thus the performance combines the best of both worlds: Multiple servers are utilized for linear scaling and all transformations within one partition happen on a single server to achieve maximum throughput.
Data Hub does the opposite. One container performs one sequence of operations, streams the intermediate result to the next container and the next and the next.
Rule #6: Autoscaling
Finally there are unique opportunities when using containers, primarily targeting cloud operations. Getting cloud qualities for free can be an enormous advantage.
Autoscaling is one example, which is the ability to start and stop container instances automatically to match the current demand in processing power.
Example: In a horizontally partitioned system, the first set of container instances all get assigned one partition of data to process. As soon as one instance has completed its task, it gets the next partition of data to work on. If all servers are utilized at 100 percent, the cluster can use other idle physical servers by starting the same container image there as well, joining the group of available data processors. Once all processing is done and the container has been idle for some time, one container instance after the other is shut down again. All of that happens fully automatically.
A properly built container should support that. However, not every SAP container-based product supports autoscaling.
Concerning containers, I seem to be more on the evangelist side. Yes, containers should be used for every service because at least the installation is much easier – if nothing else. They have no performance downside, are easier to manage and upgrade. However, as shown, there are some things that can go wrong during the implementation.