The Full Data Journey from Ingestion to Insights

Remember when cloud was acclaimed as the simple solution to disentangling legacy IT infrastructure? Nostalgia isn’t what it used to be.

Amazon Web Services (AWS), at last count, now has more than 225 products from which to choose, from the classic compute, database and storage; to containers and application integration; to IoT, machine learning and blockchain. Want any help? You can thumb through the Well-Architected Framework, but it might take a while. In spite of various training and certification options available from providers, the cloud skills gap has remained for good reason. What can you do?

The answer, increasingly, lies in infrastructure automation. Gartner noted in its 2021 I&O leaders survey that 80% of respondents consider automation as a top tactic to achieve cost optimization – yet many organizations don’t know where to start.

Public clouds are essentially ecosystems of available products and services that can be utilized. Like a patchwork quilt, these services are ‘stitched’ together for your application or service. But who is doing the stitching? This is the overarching concept behind Red Hat OpenShift, which manages the benefits of containerization – increased portability and more consistent performance – and takes away the complexity.

When you just want to get an application up and running, it’s complicated; there is no way around it. Take an ‘industry-standard’ data platform as an example in three stages; ingest, transform, and analyze. You have the data sources, from customer data to CRM data, to mobile apps. You have the ingestion tools; a SaaS tool, a streaming app, and, using the AWS example, something like Amazon Database Migration Service (DMS). That goes into a landing zone that then needs to be cleaned and curated. These files are then in a format – such as Apache Parquet – which can work with machine learning and data visualization tools, for data analysts and line of business users.

For the majority of companies, they don’t need to reinvent the wheel, as Dave Moore, former chief innovation officer at U.S.-based strategic technology solutions provider Growth Acceleration Partners (GAP) explains.

“The response usually is ‘I’ve been wondering why AWS hasn’t done this,’” adds Moore. “Instead, they just provide a big pool of all these components and say knock yourself out, go ahead and stitch it together how you think it should.

“Our stance is ‘I think we know what you’re trying to build, and if I were building it, here’s the way I would do it’ – and if you run this command, it’ll deploy and be ready to run in an hour,” adds Moore. “It kind of avoids those religious wars about ‘hey, you should use this technology or that technology.’”

Containers are just one example of infrastructure automation, defined most simply as the use of technology that performs tasks with reduced human assistance. Another is infrastructure as code (IaC), with tools such as Terraform, which codifies and manages underlying IT infrastructure as software. Terraform enjoys almost 37% of market share compared to 31% for Ansible, according to Slintel data, but the markers are clear: KBV Research assesses a 21.9% CAGR for IaC between now and 2028.

GAP considers a particular application a ‘scenario’ and has implemented IaC for two specific scenarios. The first is what Moore refers to as a modern data and analytics platform – taking in the full journey from data ingestion to insights as mentioned above. The second is a highly scalable web app based on a microservices model and a serverless implementation. Both are implemented in the cloud-native technologies of each cloud provider and follow their Well-Architected Frameworks. From GAP’s perspective, it plays it by the book, but much faster and, arguably, much better.

Moore notes that the first part of the puzzle is knowing ‘infrastructure is complicated to get right.’ The second part is knowing ‘humans are the most expensive thing in the spreadsheet.’

IaC helps the former to a point, but not so much the latter. DevOps engineer was the most recruited job on LinkedIn in 2018. Understanding the hardcore infrastructure around containers and orchestration, Moore says “is being perceived – and rightfully so, in my opinion – as a more valuable skill than having someone that can just come in and write some Java for you that saves data to the database.” Nearshoring talent from places such as Latin America, a model which GAP successfully utilizes, is one way of solving this problem.

But is there an opportunity to look beyond infrastructure as code? In the influential book Terraform: Up & Running, by Yevgeniy Brinkman, there is a chapter dedicated to resilient, scalable, production-grade infrastructure and, crucially, production-grade infrastructure modules.

The checklist for production-grade infrastructure amounted to a total of 15 tasks, well beyond the usual install, configure and provision to cost optimization, documentation and tests. The advice from Brinkman is to explicitly identify the items to be checked and shipped in the checklist when working on a new module, to come up with a more realistic time estimate for your manager.

Moore, referencing Brinkman, puts it this way. “If you ask a team how long it takes them to get something running, if it’s a very small thing, they’ll say it’ll take us two weeks – but it’ll take them six,” he says. “But to do something that is an enterprise-level application, or a publicly-facing application, the infrastructure on that is at least eight to 12 months – and that’s for those who know what they’re doing.”

The truth is that this stuff is very complicated even if you are cloud-savvy. GAP offers a variety of transformation services, from public cloud advisory to architectural consulting, so Moore is used to the pain points.

“If [the customers] can live with points of view that we’ve implemented – and so far, we’re batting 1000 on that – then we’re good,” says Moore. “There may be a little bit of customization required, but most of them have already been burned on the customization train. They [want to] focus on the application itself, not all the infrastructure components that are required to make it work.

“There’s a lot to be said for a working system and working application out of the gate.”

Oct. 25, 2022

Written by James Bourne

Reprinted with permission from TechForge Media and Developer Tech magazine

Infrastructure automation: The full data journey from ingestion to insights

Start typing and press enter to search