Most people have a story to tell about what can go
wrong with data quality and data management.
The following are just a few examples:
·
Being assigned the
incorrect gender on your ID document;
·
Being denied credit due
to a mix-up with someone else’s blacklisted information;
·
Money paid to the wrong
creditor due to incorrect capturing of an account number;
·
Misquoted youth
unemployment statistics due to incorrect capturing of birth dates;
·
Impossible
transformation statistics due to incorrect capturing of “population groups”;
·
Colleges not being able
to verify that students qualified there because of incorrectly stated National
ID numbers (either at the college or by the enquirer); and
·
Lecturers sorting only
one column of a spreadsheet and students receiving each others’ results.
These are all cases where there was no fraud or
identity theft intended, yet the consequences are equally problematic.
Most people also believe that they are not capable of
making such mistakes, yet mistakes are being made all the time. All it takes is a momentary lapse in
concentration. However, simple methods
can and should be put in place to prevent mistakes. The type of method depends on the type of
data[1]
being captured.
The most common methods are as follows:
1. For data elements that must always be the same (over
and over again), such as gender, population group (sometimes referred to as
“equity”), province, qualification type, NQF Level, achievement status: lookup
tables must be used. In databases and
spreadsheets, these appear as drop-down menus, and only the data elements
already present in each menu are accepted for each data field. [2]
2. For data elements that are not always the same but
only have certain allowed values, totals or combinations: validations must be
used. For example, it must not be
possible to assign times that add up to more than 24 hours in a day; dates must
all be formatted the same as each other; it must not be possible to capture a
Bachelor’s degree with NQF Level 4.
3. For data elements that are truly free-form, such as
names, addresses, qualification titles: it is not possible to put in complete
validations, although even here there can be some validations, for instance
e-mail addresses must contain “@” and must have no spaces; certain characters
are allowed or disallowed; it may be necessary to convert special characters to
plain characters, e.g. é to e. In
general, for free-form data elements, an additional check must be used, the most
effective being “eyeballing” by someone other than the person who has done the
capturing.
Items 1 and 2 should be provided by whoever develops
and maintains the database or spreadsheet. It is helpful if the lookup tables and
validations incorporate whatever rules are laid down by the load specifications
of any other system that the data must feed into. (If not, then a more complex mapping process
has to be utilised when transmitting data from one system to another.)
Item 3 is the responsibility of each individual who
captures the data. A data standards
document should be made available to all data capturers. It should go into more detail about the “how”
of data quality.
[1] Data are raw facts, such as one person’s test score. Information is the product after data have
been organised (aggregated or analysed).
Information assists with understanding or deciding something, such as
the class average of the test scores assisting with decisions concerning the
moderation of results.
[2]
A data field is a place where one stores data, such as a column in a database /
spreadsheet, or a specific place (field) on a data entry form or web form.
No comments:
Post a Comment