By Paul Hausser, Envisn, Inc.
This blog focuses on Part I of the subject of Data Lineage, covering what it is and why it’s important. A Part II blog will follow this and focus on creative possibilities for using data lineage in multiple ways once you’ve broken the code.
What is it?
Simply stated, data lineage describes the data’s origins, where it moves and what happens to it over time. In some environments there are specific requirements for the data lineage and its provenance. This is often the case with financial institutions that have to satisfy regulatory requirements relating to data used for investment purposes.
At one level data lineage may seem rather mundane, but the reality is that without a means to fully document it there is no way to be able to validate it. What appears in a field of a report can be nothing more than an item that’s in a database table such as Order Date or Unit Price. But it could also be something slightly more complex such as a revenue calculation.
Cognos provides a capability within Cognos Connection for looking at data lineage but it’s limited. While you can see it for a report, it shows it for the model items not the report items so it’s not a complete picture. And even if that were done it’s likely you would miss data items used in calculations or calculations embedded in other calculations. There has to be a better way.
Three things are needed:
- 1. The ability to capture the data lineage completely for every item that appears in a report or query. If only for documentation purposes, but there are other reasons for doing this as well.
- 2. The ability to see the complete data chain back to and including the database itself.
- 3. A means of persisting this data lineage so it can be referenced if needed.
In Figure 1 we see the data lineage from a report labeled Order Summary. There are a number of items in this simple report but we’re only showing two of them here; Total Price and Total Quantity. Total Quantity is simply the sum of all unit quantities. Total Price is Unit Price X Total Quantity X (1 – Discount).
In this example we see the levels of the data lineage:
- Level 0 represents what is in the report. This could be a model item, another report item or a combination of these.
- Level 1 is the next reference and here we see the same three items as in level 0 except we learn that Tot Price is a calculation. This is at the Package level.
- Level 2 reference represents the FM Model level.
- Level 3 is the database level. This tells us that this calculation is actually at the database level.
The goal is have all references resolved in the data lineage at the appropriate level of detail.
In the real world there are often more levels shown between the report/query level and the database itself and they can far more complex. Consider something like foreign currency trading for a large New York City bank. Figure 2 shows one data item from real world example report. It’s called SwapPoint and is just one of many items on this report which has the business rules used in trading foreign currencies. Many of the other report data items are even more complex than the algorithms used here.
The key point is that for an activity that can likely make or lose tens of millions of dollars for the bank, you really need to have good documentation of your data and the rules for using the data. And if something needs to be changed quickly you better know where those rules are being used. Other equally important examples no doubt exist in many other industries.
The Holy Grail
As mentioned above, persisting or storing this data is also important for multiple reasons many of which will be covered in Part II of this blog. The important thing to know is that the data lineage really is the Holy Grail. Why? Because once you’ve captured it correctly, it’s the key to unlocking multiple opportunities that can make your life much easier as a Cognos administrator.
Data lineage within Cognos should be looked at as a two-step process. The first step is to capture the lineage in a complete and dissembled way. The second part is to store or persist this in a way that it can be referenced for other uses. This is the tricky part but it’s where the big payoff lies. It’s tricky because how it’s dissembled and stored has a lot to do with maximizing its value.
Read on to see Part II of the data lineage blog where we’ll cover some of the things that data lineage enables: Cognos Data Lineage – Infinite Possibilities – Part II
© Envisn, Inc. – 2017 – All rights reserved.