Today, I’d like to talk to you about the table comparison in BODS. As the name already indicates, it’s about comparing tables. Usually one of them is the target one and the other one is containing the transformed data so far. The purpose of this article is not to examine the methods of comparing the data (row-by-row select, cached comparison table and sorted input), but to give insights of the procedural method of BODS while comparing them. This means, that this article is about what happens in the background of table comparison and how do we get the results we get. So, let’s get started.
Let’s assume that we have the following data flow, consisting of flat file as a source, a query transformation, a table comparison and a target table.
The flat file consists of two columns: CustomerID (which is the BusinessKey) and Product. In order to keep it simple, the flat file has only five rows.
The query transformation simply transfers the data to table comparison.
As far as the table comparison is concerned, let us differ between two cases.
1) The option Input contains duplicate keys is activated
2) The option Input contains duplicate keys is deactivated
Let’s start with the first one. By activating this field, we ensure that our input will be filtered before loading it to the target table. We can set the rest of the filed like this.
Since there are only five data records, we can use the row-by-row select method, which is suitable for short amount of data. As an input primary key column, we add the CustomerID, since this is the BusinessKey and Product is the compare column. This means, that every single record will be examined based on its CustomerID. If there is no such an ID in the target table, the record will be added. But if there’s already this particular ID in the target table, then the Product will determine the next action. If the target table consists of the same CustomerID and Product, then this data record will be ignored. But if there is a different Product, the original record will be updated. Let’s just execute the job.
This is the output we get. Only two records. For better understanding, let’s just have a closer look to the procedural method of table comparison, by examining what actually happens after trying to insert each row.
Record 1: The first row will be automatically added, since the target table is still empty. The System compares the CustomerID to nothing and inserts the record. So, the content of the target table at this moment looks like this.
Record 2: The second record has a different CustomerID (101) than the first one (100), so this record will also be inserted. So, at this moment, the target table has the following content.
Record 3: This record has the same CustomerID as the second one (101), but a different product. Hence, the second record will be updated. Its product is from now on A. The target table content looks like this now.
Record 4: This record has the same CustomerID as record 1, but a different product. It’s the same as when inserting record 3. Hence, the first record will be updated. Its product is from now on B. The target table so far.
Record 5: This record has the same content as record three. This means no further updates for CustomerID 101.
And that’s how this output is generated.
In Part 2, I will examine the second case (the option Input contains duplicate keys is deactivated) and how table comparison works under this assumption.
For more information about SAP Data Services, please visit https://bimanu.de/leistungen-business-intelligence/sap-data-services-beratung/.
If you wish to find out more about SAP Analytics products we invite you to take advantage of our free Workshop.
We look forward to exchange with you.
16.01.2019, Anastasios Koukothakis
We accompany you from the beginning to the end – see for yourselves – bimanu references.