There was a lot of amusement this week when Liam Thorp, political editor of the Liverpool Echo newspaper, found himself on the priority list for the covid-19 vaccine due to his very high BMI. (BMI is a figure calculated based on height and weight which gives a rough guide to being under or over-weight). His height that been recorded on the system as being 6.2 cm rather than the correct 6 feet 2 inches (approx 190 cm).
Aside from the humour, there was a general assumption in the responses that this type of error was ridiculous; and ought to have been spotted much earlier.
From a data quality perspective, is that a reasonable assumption?
The first place to point the blame, would be at whomever entered the data into the system. I don't think that's fair, given that medical administrators have large amounts of work to do at great speed and it’s reasonable that they make occasional data input errors.
Do you want to be updated as soon as new articles are published? Enter your email address at the bottom of page and you'll be in the loop.
I would work on the basis that there will always be some incorrect data, and that someone like me should be designing validation tests to ensure that data errors are caught early and dealt with - before they trigger incorrect decisions, such as who is prioritised for the vaccine.
In designing a validation rule, we could get clever and consider typical heights for a given age and gender to create a narrow band of valid values, but this is quite hard to do and invariably there will be some exceptions.
Let’s think about what errors are most likely.
- The error in our story, where the height in feet was entered but the expected value was in cm.
- A missing digit, e.g. the user entered 19 cm instead of 190 cm.
- A transposed digit, e.g. the user entered 910 cm instead of 190 cm.
Now let’s consider that:
- A new born baby is around 18 inches or 45 cm.
- The world’s tallest man measured 8 ft 11 inches or 272 cm.
So, we can create a really simple validation test that the minimum height of a person should be no less than 30 cm. This would accommodate new born babies, and rule out the vast majority of typos.
But what if I’m wrong, and some premature babies are less than 30 cm in length?
That’s where a subject matter expert comes in. I propose these rules, and a medical professional (in this scenario) can very quickly assess if it is sensible. If the answer is that I need to refine it to 20 cm, or 25 cm, or 40 cm, then so be it.
Similarly, a maximum height of 300 cm is going to capture every human, and we could take it down a little as well. The current tallest living male is 251 cm.
When the data gets transferred to another system, where the need is more specific, we can put further validation in place. For example, if we are only considering adults, then we can make the valid height range much narrower.
This is a good example of applying extra resources to data quality only when needed and only on the subset of data which you require.
- We can improve data quality through better UX design
- We can improve data quality with simple validation rules
- We do not need to set up complex rules as doing so can be an unnecessary use of resources
Datazed can help you implement data governance in your organisation.
- We can deliver an MVP bespoke to your needs and pain points.
- We can upskill your colleagues so they can continue delivering your data governance and data quality activities.
- We can provide training to your team, including training specific to data owners and data stewards.
- We can provide you with data governance and data quality resource on an interim or part-time basis.
- We can guide you through the tool selection process.
All these services can be provided remotely.
Take the next step - get in touch!
Questions, comments or feedback? Use the comments board below to let me know, or contact us privately if you would prefer.
Subscribe to be informed when new articles are published.