It is vital to perform ‘discovery’ work for any GDPR project. Without a clear understanding of where personal data is located in each of the systems in an enterprise, it will not be a straightforward task to carry out any of the steps to reach GDPR compliance.
The research reveals that the task facing organizations in the coming few months is significant. In SAP alone there are over 900,000 fields that may (or may not) contain personal information that require data discovery and risk assessment. The size and complexity of the databases mean that businesses that are not well-advanced in data discovery or are undertaking manual discovery processes may not be ready on time for GDPR.
Silwood Technology’s founder Nick Porter takes us through the company’s research:
Mr Porter, how was your reasearch conducted?
We want customers and partners to be able to undertake fast, accurate and cost effective, software-based data discovery or source data analysis of large complex ERP and CRM systems. This helps to accelerate delivery of information and data management projects, including Data Governance and the specific requirements to locate personal data for GDPR compliance.
To do this, we used our product Safyr. Our solution gives data analysts and architects control over when and how they access and use the metadata in their SAP, Oracle, Salesforce and Microsoft packages as implemented. This eliminates or drastically reduces the need to go through lengthy manual search and analysis processes or use template-based approaches. There is no need to have specialist technical or application knowledge of the package under scrutiny. Safyr works by extracting the rich metadata from the source system as implemented and storing it in a repository. Users then have access to a broad range of functions and features, which they use to search, navigate and analyse the metadata to find the groupings of tables they need for their project. Results are shared with a variety of other software platforms and tools.
Our goal is to enable data professionals to be able to understand large ERP and CRM applications without having to be specialists in those environments. To facilitate this, Safyr pulls metadata from the application layer of each of the packages that we address and makes the data descriptions easy to search and subset. For our research, the terms “date of birth” and “social security number” were selected for test purposes and searches performed to see how often they appeared. The team wanted to research the frequency with which certain personal data categories occurred in the chosen applications. Several instances of each package were examined and the statistics presented give an indication of how many occurrences of each field will be found in a typical system.
Your research is covering a lot of ground on multiple platforms from different vendors like SAP, Microsoft or Oracle. Can you provide us an overview of your results?
As a brief summary of common issues, all the packages we investigated had thousands or tens of thousands of tables and fields which might or might not contain personal data. Due to the size and characteristics of the way these packages have been developed, there is little difference in the way the tables or fields are named in the database itself that makes it easy to identify which might be relevant for GDPR using standard database tools.
This means that companies will struggle to incorporate this personal data metadata into the solutions they are building to support the information audit phase of their GDPR compliance programme. The bottom line is the sheer volume of data fields that need to be considered for GDPR – thousands of tables and tens of thousands, sometimes even hundreds of thousands of fields.
Let us explore what that means for SAP offerings a little more. Did your findings differ from what you would see e.g. in Oracle’s solutions?
SAP is the extreme case of the general problem. It has nearly 5 times as many tables as the nearest competitor, with about 90,000 before customisations. Of course, this is understandable because it delivers a very wide range of functionality, however it means that personal data can reside in a very wide range of physical locations in the database. In order to locate these accurately, it is necessary to search over 900,000 fields, which is not a task that could be achieved quickly or accurately using manual methods or tools that are not designed for global searches.
If you were an SAP customer looking to get ready for GDRP, what would you recommend based on your research?
Medium-sized businesses may attempt manual data discovery. On average, an SAP implementation will take more than 20 times longer to locate personal data using traditional approaches, compared with an automated solution. Simply put: you may not be ready for GDRP in May by going that way. Less than 1% of a typical SAP system contains the personal data that could cause GDPR breaches but that may cost your organization up to 4% of its annual turnover come May.
I would advise sticking to the following three steps:
- Decide what categories of Personal Data you need to consider.
- Find out where these are using discovery automation software like Safyr, not forgetting any of the custom fields added to the SAP system(s).
- Transfer this catalog of personal data fields to an environment where you can analyse the data content, assign governance responsibility, etc. (this might be a repository or data catalog from vendors such as ASG, Adaptive, Collibra or Alation).
Are there any requirements or limitations for SAP customers regarding infrastructure, existing SAP solutions or interfaces when looking to get Safyr?
Safyr‘s interface to SAP is certified by SAP themselves. We deliver an Abap Function that is installed on each SAP instance that the customer wants to work with. Because we are attaching to SAP at the metadata level, we can work with a wide range of versions. This is because the structure of the metadata tables of SAP rarely changes. Safyr can work with SAP ECC, BW, CRM, SRM and S/4Hana.
Lastly, how long would you say it takes for an SAP customer to see results when starting to use Safyr?
For SAP customers we provide a GDPR Starter Pack that picks out the personal data fields in a SAP system. This means that a great deal of that personal data scoping work has already been done. It’s not an exaggeration to say they will start to see real results within days of implementing Safyr.