To start data governance from scratch, you need to create a basic framework in the form of a minimum viable product (MVP). This will contain your key components which can be extended as the scope grows.
By delivering an MVP, alongside the business benefits and credibility gains from rapidly adding value to the business, we also begin to build out data governance artefacts with high quality content.
Do you want to be updated as soon as new articles are published? Enter your email address at the bottom of page and you'll be in the loop.
MVP does not include some activities needed for sustainable data governance, such as setting up a Data Committee; creating policies and supporting documentation; or putting in place a controls and assessment environment. We do need them, but not just yet.
We start the MVP with four components based around four key questions.
Let’s take a look at the four deliverables in turn
Data Dictionary/Business Glossary
First, we should note that the terms “Data Dictionary” and “Business Glossary” are sometimes used interchangeably, but do have differences.
- A Data Dictionary is usually used for technical definitions of all data items in a system (e.g. is it a text field, or a numeric field, is there a maximum length?).
- A Business Glossary is a set of terms used by the business, defined in clear business-English.
Whatever you call it, we require discrete definitions which are clearly-written and accepted by the business.
One way to populate it is by reusing existing content from the existing documentation and design materials. For example, when building a Data Warehouse, the developers will have needed to know the required fields, their names and any constraints. Even if this information differs from how the business uses those fields, it is still important because it reflects how the data is stored in reality. If the business does things differently, this activity will reveal the differences, which can be tracked through the Issue Log. We’ll come to that later.
This approach will be attractive to the data professional as it takes existing content and makes it more valuable. To the business leader however, this approach will be unlikely to give a sufficient return on the resources invested, so a project to do this will be unlikely to receive funding.
Taking the different approach of the MVP (minimal viable product), we would create a business glossary for a limited set of reports. Start with widely used terms which are non-controversial, so you can develop some momentum.
Under no circumstances start with the term “customer”!
On the other hand, “customer address” is a much better bet as it can trigger a clarifying conversation about “address”, “billing address”, “delivery address”, previous addresses, multiple addresses and more..
Understanding the flows of data around the business enables us to:
- Identify the sources of data
- Understand how to bring data into the platform
- Reduce the prevalence of Shadow IT and Shadow MI
Mapping out and labelling the data sources lets you cross-reference to your data dictionary and data ownership. Data controls can be overlayed onto the lineage to show where they take place - and where they might be added. Particular points of interest are where data moves between different teams.
Business Analysts can be used to interview stakeholders to understand the data flows. A benefit of this approach is that it identifies manual workarounds which users have set up. Other Elicitation Techniques will be suitable for different approaches, such as those in this article.
Tools and technical solutions exist which scan for data lineage. They can identify data flows which may not be documented or known about, which is valuable when carrying out root cause analysis. Their limitation is that not all data flows are automated, such as the spreadsheets which are manually emailed between colleagues, or even left on network drives for anyone to use.
Data ownership is a powerful concept with a lousy naming convention, so I’m going to use the term “Data Community” to cover Data Owners, Data Stewards and any other terms which might be in use in your organisation.
- Data Leaders are senior colleagues who sponsor the data governance activity but have less involvement day to day,
- Data Owners are responsible for data governance activity taking place.
- Data Stewards are typically subject matter experts and are operationally responsible for the data.
All the members of your data community must know
- What their role means, why they have it and what responsibilities they have.
- What activities they need to carry out under business as usual.
- When and how to escalate data issues.
These colleagues will all be carrying out their roles alongside their day jobs, so they will need support from a central data team to co-ordinate their data activities. This will be a small team relative to the size of the organisation - one or two data governance and data quality professionals can manage a data community of several dozen.
Some data governance tools include functionality to manage the data community. I would view this as a value-adding benefit beyond the standard data dictionary and data lineage features.
A data committee will comprise select members of your data community. I will deliberately leave that for another article because it must not be one of the first things you set up. The committee needs to have engaged members and an agenda with real decisions to make. You won’t have that until you’ve delivered the initial elements of the framework.
Data Issue Management
This is possibly my favourite area of data governance, because it’s where problems are collected, managed and resolved.
To get started in this area, you can use a simple Excel-based tracker. I'll even send you one if you ask nicely. It acts as a base to record the issue, its characteristics, the impacts of the issue, and the actions.
Commercial tools are available with advanced functionality, but you don't need that yet.
Without managing data issues in this way, you have:
- Data issues being identified and fixed on an ad hoc basis.
- An absence of ongoing data quality testing – work is reactive and not proactive.
- No follow up activity to prevent reoccurrence.
- No idea of the impacts of the issues; or the impacts of the solutions!
But with a Data Issue Log
- Issues are tracked, and repeat occurrences can be identified
- Long-standing issues are tracked, and can be escalated if necessary
- We can turn the “unknown unknowns” into “known knowns”.
Significant issues and those requiring decisions from senior stakeholders are taken from the log and shared with the Data Committee.
They can authorise spend for issues to be resolved, enable or approve a mitigation approach, or accept the risk of not addressing the issue. The last of these responses may feel painful, but is a completely valid response where there is a better use of resources elsewhere.
There are several elements to implementing data governance, which make up a data governance framework.
- The order in which you go about the implementation is critical to its success.
- Start small. Gain advocates. Land and expand.
Datazed can help you implement data governance in your organisation.
- We can deliver an MVP bespoke to your needs and pain points.
- We can upskill your colleagues so they can continue delivering your data governance and data quality activities.
- We can provide training to your team, including training specific to data owners and data stewards.
- We can provide you with data governance and data quality resource on an interim or part-time basis.
- We can guide you through the tool selection process.
All these services can be provided remotely.