By Paul Hausser, Envisn, Inc.
In Part I of this blog we covered the basics of data lineage; its definition and why it’s important. Here in Part II our will be focus on some of the things that are possible when Cognos data lineage is captured and stored for use in multiple creative ways.
Part I covered the capture of data lineage back to and including the database itself. Having this data is essential to understanding data used in your Cognos environment. And if you have it for every object, by definition you have it for all the objects in your environment; and thus, you have your data universe. While this sounds like a truism, the point is that it’s the key for unlocking many additional capabilities beyond simply documenting the data being used. Done correctly, this becomes the Rosetta Stone for your Cognos environment.
The data lineage is not in the Cognos Content Store. At least not in a way that makes it readily usable. So how do you get it? This requires a lot of work and expertise to be able to get this from the reports, queries and published packages and models that are in the Content Store. It requires being able to break these down and get all of the details needed about the data, where it’s used and its dependencies. Difficult but not impossible. We’ll cover this in more detail in a future blog.
What the data lineage makes possible are a number of useful things including the following:
- Database dependencies – This is the ability to identify all objects that are dependent on a given database, its tables and columns. There are a number reasons why this is useful. If you know that changes are being made to a database you can simply let report owners know what changes will be made and when so they can make the changes to the objects at the right point in time. Ideally this change would result in a newly published model and package into the environment for use by developers and authors.
If for example, you have multiple databases with the data item “Customer” in them it’s often important to know where these are and if they differ from what the official standard is. (See figure 1) The results may surprise you. You can try this with virtually any data item that is common across databases used in your environment. Getting to one version of the truth is a key goal of virtually every reporting environment. - Impact analysis – If we make these changes to the Finance Package, which objects will be impacted and how? This is useful information since you can plan these changes so there is no impact on your user base if it’s done in a planful way. The database dependency above is one example, but others might be a change to the package used, data source, etc.
- Find and replace – Specification changes within reports or queries is a challenge unless you know where all examples of a given string appear. And it also helps having the right tool to make the change. This is especially useful if you have a number of objects that need to be changed enabling you to make multiple updates at once. (See Figure 2) In this example we are making a minor change to a string that appears in some reports.
- Data source dependency – Since an FM model can contain multiple databases, having a full profile of data sources enables you to see data dependencies across multiple databases. This information isn’t visible within Cognos since it only shows data lineage for one item at a time. But if you have it for
- Model item dependency – Breaking the data code allows you to see which data items relate to which FM model. Understanding this makes it possible to do a dependency analysis if there are planned changes being made to a model in active use. And with a tool that has a model item change capability, it’s a relatively simple task to do mass updates and make changes to multiple report objects at once. (See Figure 3) Having to do this to a large number of objects manually would be tedious and prone to error. This same capability will work equally well with packages within Cognos.
- Filters and calculations – Some environments need to document all filters or calculations used in reports and where they are used. Doing this manually on a consistent basis, especially in an environment of change, is virtually impossible. If you have all of the Cognos data used available in a database it’s a simple task to have this type of documentation always available and up to date.
- Broken lineage (See Figure 4) – This typically occurs when data items in a report no longer exists in the model, or the data item has changed. It’s an automated check made across all objects in the Content Store to validate that what’s in the object for data items is still a valid reference in the model itself. The administrator can simply run a report every day that identifies broken lineage and let the report owners know which reports need to be fixed. In this example the item in red no longer exists in the model, and thus, this report won’t run. It’s not uncommon to see a production Content Store with 15 percent or more of the objects unusable at any given time because of broken data lineage.
The Secret Sauce
What really makes this work as shown in these examples, is that while we have all the pieces of data you would ever need, we’ve captured and stored it in a way that we retained all the “interconnectedness” of the data and its dependencies. So we know what pieces of data came from what object and how that object relates to other objects.
This makes it possible to answer virtually any question you may ever have about your Content Store and to be able to perform all of these tasks and more that Cognos administrators would encounter on a daily basis. The secret sauce is really getting both the data capture and design right.
© Envisn, Inc. – 2017 – All rights reserved.