Friday, 20 April 2018

Data Quality


Most people have a story to tell about what can go wrong with data quality and data management.

The following are just a few examples:

·       Being assigned the incorrect gender on your ID document;
·       Being denied credit due to a mix-up with someone else’s blacklisted information;
·       Money paid to the wrong creditor due to incorrect capturing of an account number;
·       Misquoted youth unemployment statistics due to incorrect capturing of birth dates;
·       Impossible transformation statistics due to incorrect capturing of “population groups”;
·       Colleges not being able to verify that students qualified there because of incorrectly stated National ID numbers (either at the college or by the enquirer); and
·       Lecturers sorting only one column of a spreadsheet and students receiving each others’ results.

These are all cases where there was no fraud or identity theft intended, yet the consequences are equally problematic.

Most people also believe that they are not capable of making such mistakes, yet mistakes are being made all the time.  All it takes is a momentary lapse in concentration.  However, simple methods can and should be put in place to prevent mistakes.  The type of method depends on the type of data[1] being captured.

The most common methods are as follows:

1.   For data elements that must always be the same (over and over again), such as gender, population group (sometimes referred to as “equity”), province, qualification type, NQF Level, achievement status: lookup tables must be used.  In databases and spreadsheets, these appear as drop-down menus, and only the data elements already present in each menu are accepted for each data field.[2]

2.    For data elements that are not always the same but only have certain allowed values, totals or combinations: validations must be used.  For example, it must not be possible to assign times that add up to more than 24 hours in a day; dates must all be formatted the same as each other; it must not be possible to capture a Bachelor’s degree with NQF Level 4.

3.   For data elements that are truly free-form, such as names, addresses, qualification titles: it is not possible to put in complete validations, although even here there can be some validations, for instance e-mail addresses must contain “@” and must have no spaces; certain characters are allowed or disallowed; it may be necessary to convert special characters to plain characters, e.g. é to e.  In general, for free-form data elements, an additional check must be used, the most effective being “eyeballing” by someone other than the person who has done the capturing. 

Items 1 and 2 should be provided by whoever develops and maintains the database or spreadsheet.  It is helpful if the lookup tables and validations incorporate whatever rules are laid down by the load specifications of any other system that the data must feed into.  (If not, then a more complex mapping process has to be utilised when transmitting data from one system to another.)

Item 3 is the responsibility of each individual who captures the data.  A data standards document should be made available to all data capturers.  It should go into more detail about the “how” of data quality.



[1] Data are raw facts, such as one person’s test score.  Information is the product after data have been organised (aggregated or analysed).  Information assists with understanding or deciding something, such as the class average of the test scores assisting with decisions concerning the moderation of results.

[2] A data field is a place where one stores data, such as a column in a database / spreadsheet, or a specific place (field) on a data entry form or web form.

No comments:

Post a Comment

Shortcut keys on the computer (Windows)

Shortcut Keys on the Computer ©  Yvonne Shapiro 2019 Shortcut keys in Windows The Windows (“Vlaggie”) key is shown as “W-” i...