Alexandra Collins, Product Owner at Datactics, recently spoke on an A-Team webinar on the subject of how to build a data quality control framework. In case you missed it, here is everything you need to know about building a data quality control framework using the latest tools and technologies.
What are the drivers of change for organisations needing to create a data quality framework?
The volume of data being captured is a key driver of change. The volume of existing data that organisations need to process is growing at an exponential rate, resulting in most of the manual effort that is involved in creating a data quality framework being unable to keep up (especially when dealing with things like alternative data and unstructured data).
As a result, organisations are suffering the consequences further downstream, since the quality of the data being analysed is still poor. In order to improve the quality of data this size, organisations need to research and invest in more automated data quality checks, which is where the introduction of AI and machine learning has played a key role.
In my opinion, it is almost infeasible for big data organisations to achieve an acceptable level of data quality by relying solely on manual procedures.
What types of approaches have been taken to improve data quality in the past, and why have they fallen short?
Again, this is down to a combination of factors, including but not limited to, large volumes of data, reliance on manual data quality processes, and the difficulties of code-heavy operations.
The manual effort involved in generating good quality data solely via SQL-based checks, or using the more everyday spreadsheet tooling, isn’t feasible when the volume of data is as large as it is today. The time involved and incidents of human error can often result in too large a percentage of data not meeting an appropriate standard of data quality, which ultimately impacts the effectiveness of downstream business operations.
The same applies in instances where intensive code solutions, like Python, R, and Java, are being relied upon. Being able to find the required skill sets to implement and maintain automated processes can be a challenge, with some organisations struggling to find employees with the technical knowledge and coding capabilities to build the necessary, automated data quality checks. Finding employees with the skills to develop and maintain code-heavy data quality processes is an ongoing challenge for a lot of organisations. People who have these skills are always in demand by companies that are struggling to retain their current employees.
Both of these approaches create bottlenecks within the workstream. Good quality data isn’t being fed into the business areas which actually analyse and extract value out of the data. The number of requests for ready-to-use, good-quality data from business functions further down the pipeline is increasing. Whether that’s for insights and analytics, regulatory compliance (standardisation), or complex matching activities, good quality data will always be indispensable. These activities include things like data migration, mergers, and acquisitions.
How can a framework be built, integrated with existing systems, and sustained?
This process can be broken down into two steps.
1) Step 1- Recognising how beneficial a data quality framework is to your organisation. Once the business area implementing the framework understands its importance and associated benefits, the building, integrating, and maintenance of the framework will naturally follow. Accepting that there are problems with the data is the first step. Then, being willing to investigate where these problems lie plays a crucial part in constructing the DQ checks initially, and the automated DQ solution that will run checks against the data.
2) Step 2- Analysing, resolving, and reviewing the failing checks. The process needs to be something that the business users are happy to adopt or one which can be easily integrated into their current tools (you can devise this yourself or buy it off the shelf). The framework should generate data quality metrics, which can then be consumed by other existing systems, e.g. dashboards, ticketing tools, and AI/ML solutions. Having a process like this will promote wider adoption across the organisation and ultimately results in continuous data quality improvement.
In terms of creating a sustainable data quality framework, a tool that meets the needs of the business users is more likely to be sustained. Considering the fact that most business units won’t have their own coding team, it’s worth using a low-code or no-code tool. This gives users the ability to create software and applications without traditional coding, making it a great option for businesses that need an application quickly, or that don’t have the resources to hire experienced coders.
Low-code/no-code software has some major advantages over manual coding, as it is less time-consuming, more cost-effective, and highly customizable.
What technologies, tools, solutions, and services are helpful?
Depending on where the organisation sits within its data quality journey, the answer could be different in every case. A data quality control framework could be a relatively new investment for an organisation that is aware that they have data quality issues but doesn’t know where to begin with applying the DQ checks. In this case, automated profiling solutions can help business users explore the data that needs assessing and highlight outliers within the data. Moreover, an AI/ML solution that can suggest which data quality checks or rules to run against your data can be a helpful tool for organisations beginning their data quality framework journey.
Tools which can be adapted to both coders and non-coders alike are helpful for allowing data quality checks to be defined and built effectively. Similarly, a tool that allows business-specific checks to be integrated easily within the overall workflow will help in rolling out an end-to-end data quality control framework more quickly.
It’s also worth noting that an easy-to-use issue resolution application is also beneficial as part of your data quality framework. This allows non-technical users within the business, who are usually the SMEs and people who work with the data on a day-to-day basis, to locate and fix the break points within their data to avoid bottlenecks within their workstream or further down the business.
When choosing a tool for your data quality framework, it’s worth considering the following questions to determine your organisation’s needs…
- Does the tool meet all the data quality requirements of your organisation?
- Can it find the poor-quality areas of your data?
- Can it execute the required checks against the data?
- Is it easy to fix this bad data/resolve the failing checks?
- Does the tool dive deeper into the cause of these failing DQ checks?
Could you give us a step-by-step guide to setting up a data quality control framework, from early decision-making to understanding if it is successful?
In the early stages of creating a data quality framework, the first decision you’ll need to make is deciding which tool is fit for purpose. Can it be applied to your specific business needs? Can it be integrated easily within your current system? Does it match the data quality maturity of the business? After considering these questions, the setup process for building a data quality framework is as follows:
- Conduct some initial data discovery. Drill down into your data to find the initial failure points, such as invalid entries, incomplete data, etc. The aim is to get an idea of what controls need put in place.
- Define and build the data quality checks that need to be performed on your data. Ideally, these checks would then be scheduled to run in an automated fashion.
- Resolve the failing checks. Once resolved, ensure that the fixed data is what is being pushed further downstream, or back to the source.
- Record data quality metrics over time. This allows for an analysis of the breaks to be performed and as a result, business users can pinpoint exactly what is causing the poor-quality data.
- Maintenance. Once your data quality framework is complete, maintenance and ongoing improvements are necessary to ensure its success.
In terms of monitoring its success, recording data quality statistics over time allows the business to see if its data quality is improving or regressing at a high level. If it’s improving then great, you know the framework is operating effectively and that the failure points that are highlighted by the DQ process are being resolved within the organisation. Another good measurement of success is to dig deeper into the parts of the business that use this data to determine if they are functioning more fluidly and no longer suffering the consequences of poor data.
Even if the numbers are on a downward trajectory, the organisation is being presented with the evidence that they have a problem somewhere within their workstream and can therefore allocate the required resources to investigate the specific failing DQ checks.
Finally, what three pieces of advice would you give to practitioners working on, or planning to work on, a data quality control framework?
- First, you need buy-in. Ensure that people within the business understand the need for the framework and how it benefits them, as well as the organisation more widely. As a result, they’ll be more likely to be interested in learning how to use the tool effectively and maintaining its adoption (possibly even finding improvements to be made to the overall workflow). If there is a suitable budget, working with data quality software vendors can help for achieving quick wins.
- Next, consider which tool to invest in. It’s important that the tool is appropriate for the team that will be using it, so there are a few questions worth thinking about. What technological capabilities does the team have in order to build data quality checks suitable for the business needs? How mature is data quality within the team? It might transpire that you need to invest in a tool that provides initial data discovery, or perhaps one that schedules your pre-built DQ checks and pushes results for resolution. Or, perhaps, the priority will go to using AI/ML to analyse the failing DQ checks. Dig a bit deeper here into what your requirements are in order to get the most out of your investment in a tool.
- Finally, understand the root cause of the poor data quality. Record data quality metrics over time and analyse where and when the failure points occur, perhaps visualising this within your data lineage tooling. This could then be tracked back to an automated process that is causing invalid or incomplete data to be generated.
A data quality control framework is essential for ensuring that the data your organisation relies on is accurate, reliable, and consistent. By following the best practices outlined in this blog post, you can be sure that your data quality control framework will be effective. To help minimise the manual effort involved with building a framework, there are a number of helpful technologies, tools, solutions, and services available to assist with creating and maintaining a data quality control framework.
If you would like more advice on creating or improving a data quality control framework for your organisation, our experienced data consultants are here to help.