When considering IT investments during budget season, the initiatives that most often make it on the docket are software development, business partners, employee skill development and new hires. But as you work through these issues, how deliberate are you in forecasting around the true value driver of these initiatives: data quality? Data quality is no longer just a “nice to have”; it is a business imperative.
Without high-quality data, there can be no confidence that any efforts you undertake around analytics will yield reliable results. Lack of confidence can produce erosion of trust with your clients and partners, and ultimately undermine your service or product. If you build a great software application, but it uses bad data, it can have a significant derailing effect on all your upstream efforts and cause serious business problems and bottlenecks. To avoid this scenario, look beyond software development and focus on data quality. In this article, we outline several methods that can help you ensure the data you’re feeding into your software will serve as an asset, and not as a hindrance.
Do Not Assume
It is more natural for people – even engineers! – to hope for the best rather than prepare for the worst. Assumptions about the data quality can linger throughout the software development process, expecting it to be higher than it actually is. However, your engineering team will undoubtedly inherit the data from multiple geographies, various business units, and several vendors or third-party sources, so assuming that data quality will be high without doing diligence is akin to wishful thinking. Be smart. Address data quality up front. Answers to strategic business questions about employees, customers and revenues must come from trusted data sources, with high-quality data standards. Start with data quality assessments and testing from the beginning, and have protocols in place to dictate what constitutes an acceptable level of data quality before moving on to subsequent phases.
All Roads Lead Back to Engineering
When it comes to taking responsibility for data quality, the buck stops with engineering. Data quality and data analytics require a platform managed by the engineering team. However, the datasets used to create such a platform may or may not be within engineering’s control. You may work with marketing’s third party vendor, or your finance department or any number of disparate dataset sources. However, when the application doesn’t work, or the data analytics results aren’t accurate, the engineering team is on the hook for the investment justification and return.
Here are a few sample questions that can help shape the conversation with decision makers, and justify the investment in improved data quality and data cleansing:
- How many of us here believe that our employee database is 100 percent accurate?
- How accurate is our current customer database?
- Do we believe these databases are 75 percent accurate?
- Is that enough to make confident business decisions?
In most cases, you will receive limited answers about data quality. Very few decision-makers believe that their datasets are pristine, because very few are. Luckily, data quality assessments can be performed fairly easily and with minimal investment. One typical scenario includes three individuals, with expertise in their own areas, working together to build and refine data quality into a process early in the development cycle.
This trifecta of professionals working together can:
- Create and perform testing on sample data sets to confirm accuracy of results
- Create valid use cases for testing and measurement and
- Create automated models to be used on large and complex data sets
After testing, during the control phase, ensure that cleansing was performed and appropriate reviews were completed. Data engineers and QA automation engineers are in high demand and short supply; using their time wisely is not only a cost-saving measure, it is a business best-practice. Supporting them with a subject matter expert for a particular data set can improve the process and quality outcomes exponentially.
Control and Security
As you plan for the eventualities of data quality cleansing, don’t forget the need for control and security measures for data management. Establishing and maintaining policies and procedures for data access and security can prevent contamination of the data in future uses.
As applications evolve, they are exposed to new algorithms and new data sources which over time will impact data quality, potentially causing it to become an issue. However, new data sources also provide the opportunity to get better information and improve the quality of answers provided for the business rules.
Always ensure that all datasets come from trusted sources, adhering to appropriate security procedures. Many of the basic requirements for security apply to data storage as they would to any other system.
An added benefit of high-quality data is increased productivity. With high-quality data in place and processes to maintain it, you will have more control over your engineering efforts, and will be able to direct them to other value-added activities. In addition, productivity gains that are possible through QA automation should also be considered. While there is an upfront investment to QA Automation, it does pay off in the long run and can serve as a major business accelerator by saving time and money.
As a strategic technology solutions partner, GAP often sees what happens when the trust killer of bad data shows up. We offer our clients an exceptional experience in analytics (including Data Engineering), and extensive expertise in other verticals, including cloud, mobile and QA / QA automation services. We can provide your organization with resources in the following areas:
- Software development for cloud and mobile applications
- Data analytics and data science
- Information systems
- Machine learning and artificial intelligence
- Predictive modeling
- QA Automation
If you have any further questions regarding our services, please reach out to us.
About Sergio Morales Esquivel
Sergio Morales Esquivel is the Global Analytics Technology Strategist at Growth Acceleration Partners, and a professor at the analytics post-graduate program at Cenfotec University. Sergio leads the Data Analytics Center of Excellence at GAP, where he directs efforts to design and implement solutions to complex data-related problems. Sergio holds a B.S. in Computer Engineering and an M.S. in Computer Science from Tecnológico de Costa Rica. Outside of work, he enjoys traveling, making games and spreading the love for open software and hardware. You can connect with Sergio on his website or LinkedIn, or send him an email.