Data Quality The Core of Strong Analytics

Data Quality The Core of Strong Analytics

On the GAP blog, we cover a lot of topics related to big data, machine learning, and analytics.

But there’s an underlying assumption we make when talking about these topics: that your data is important.  It is the lifeblood of your company, so data quality matters.

In this blog post, we explore the importance of data quality. We look at why you should pay attention to the quality of your data, the attributes of having good data and some steps that will help ensure you attain high levels of data quality.

The Data Economy

Some of the most valuable companies in the world today such as Amazon, Apple and Microsoft, sell vastly different types of products and services, but they do all have one very interesting thing in common – they convince users to part with their data in exchange for free services.

image1 3

More than ever, the quantity of data being generated each second in the enterprise is a driving force for businesses that want to unearth market insights in order to gain a competitive edge. Additionally, the exponential growth of user-generated content, such as social media, encourages this push even more forcefully.

Want extra insights? Download our free “The ugly truth about poor quality data and what to look for in good data” fact sheet!

In 2017, Forbes in conjunction with Dun & Bradstreet undertook a survey to explore the state of adoption of data analytics initiatives.  More than 300 executives in North America were part of this report, some of the key findings included:

  • Data analytics skills gaps persist across the enterprise, as 27% of analytics professionals surveyed cite this skills gap as a major impediment in their data initiatives.
  • Data analytics has moved from IT and finance to core business functions.
  • Today’s data-driven enterprise has a never-ending appetite for more and more data.
  • Analytical methods and tools often lag behind the needs and ambitions of most business leaders: 23% of analytics professionals are still using spreadsheets as their primary tool for data analysis.

Source: Forbes

The quality of your data is going to fuel analytics activities within your business, and as the old saying goes garbage-in-garbage-out! This can be mitigated by asking the right questions. 

The quality of your data is going to fuel analytics activities within your business, and as the old saying goes garbage-in-garbage-out! This can be mitigated by asking the right questions. Click To Tweet

Ask the right questions

For you to ensure you get quality data, it often starts with asking the right questions and depends on the domain your operating in.  

For example, if you wish to perform sentiment analysis of social data, you might start off by asking yourself – what would be an acceptable “success” rate of determining the sentiment of Twitter data?  In a case like this, 70-80% accuracy can be acceptable. Alternatively, in other contexts such as healthcare or finance, working from datasets that are only 70-80% accuracy is more than likely not suitable.

Other points to consider are:

  • What is the business driver for mining your existing datasets?  Are you trying to find insights that help boost profits or looking for ways to increase customer satisfaction or drive down employee attrition?
  • Can you define a KPI or unit of measure that will help to define “success”?
  • Are you looking for patterns? If you find them, what will that tell you?

Taking time to consider these questions will lay the foundations for you to then start thinking about the qualities your data needs to have.

Key Attributes of Quality Data

Not all data is equal! Some datasets provide more quality than others and there are specific attributes that tend to exist in such datasets.  You can look for the existence of some of these attributes to help guide you during your analysis and data selection activities.

Accuracy

Has the data been verified for accuracy before integrating it with your decision making? For example, inaccurate data can have serious and sometimes life-threatening consequences.  Quality data is accurate and leaves little room for ambiguity.

Relevance

Is the data relevant to your problem domain to justify the time spent analyzing it or feeding it into machine learning models?  Irrelevant data can result in inaccurate forecasts, whereas quality data will be relevant to the task at hand.

Validity

Are boundaries in place for specific fields in your datasets? For example, fields that represent items such as gender, nationality and so on are often limited to a prescribed number of options.  Having programmatic field validation to enforce these types of constraints will ensure quality data as an outcome.

Availability

How easy is it to get access to the data you need?  In some instances, refreshing your datasets from the source can mean navigating legal and regulatory guidelines which can hinder the data collection or refresh process.  Quality datastores have high availability, and in some instances, APIs or web services that you can use to extract data of interested over protocols such as REST. Typically the more current the data, the higher level of quality.  

Completeness

Does the dataset contain all elements that you need for you to complete your task or run through a specific business process?  If not, you might have to introduce another dataset to get a more holistic view of the landscape. Quality data, on the other hand, will contain the information that you need saving you from having to cobble together datasets.

These are just some of the attributes that you should look for when selecting datasets to ensure that your dealing with quality data which brings us onto the next point – how can you ensure you get quality data?

Want extra insights? Download our free “The ugly truth about poor quality data and what to look for in good data” fact sheet!

Ensuring you get quality data

In addition to looking for specific attributes in potential datasets, there are some other things to can consider maximizing the chance of ensuring that you get quality data to feed into your analytics practice:

Who has access to the source information?

The more stakeholders that are involved with a dataset or schema, the more likely it will be subject to change which can mean a constant moving target.  You can mitigate this somewhat by determining this upfront and planning around such events.

What happens when the information has been pulled from the source?

Does it get refreshed? Is it taken offline? Consider this to ensure that you know if additional processing needs to take place after you’ve extracted the data.  For example, selecting a dataset from a source may trigger a server-side refresh meaning you must undertake a daily or even more frequent refresh.

If you’re operating in a dynamic environment, look for formal change management and released procedure that reduce the likelihood of you receiving data structures that don’t sync up with interfaces at “your end”. Click To Tweet

Is the data in transit?

Email? FTP? Even flash drive? where is the data saved? This could make it trickier to handle the dataset, try to work with datasets that are relatively static in their point of origin to help ensure you get consistent, quality data.

These are just some points to consider and there are more, in a nutshell, try work with static endpoints or databases that allow you to extract information in a predictable manner.  If you’re operating in a dynamic environment, look for formal change management and released procedure that reduce the likelihood of you receiving data structures that don’t sync up with interfaces at “your end”.

Summary

In this blog post, we’ve looked at data quality and how it forms the core of having a strong analytics practice, we’ve also explored some of the key attributes you can expect to find in quality data and some points to consider ensuring you get quality data.

Want extra insights? Download our free “The ugly truth about poor quality data and what to look for in good data” fact sheet!

Here at Growth Acceleration Partners, we have extensive expertise in many verticals.  Our nearshore business model can keep costs down whilst maintaining the same level of quality and professionalism you’d experience from a domestic team.

Our Centers of Engineering Excellence in Latin America focus on combining business acumen with development expertise to help your business.  We can provide your organization with resources in the following areas:

  • Software development for cloud and mobile applications
  • Data analytics and data science
  • Information systems
  • Machine learning and artificial intelligence
  • Predictive modeling
  • QA and QA Automation

If you’d like to find out more, then visit our website here.  Or if you’d prefer, why not arrange a call with us?