Fast and big is the recipe to make machine learning a valuable resource

Stefaan Vervaet and Vivek Tyagi

The digital transformation era is upon us as data has become a modern-day currency whose value is unmasked by analysis. Analysis uncovers trends, patterns and associations that deliver valuable insights, new connections and precise predictions that can help businesses achieve better outcomes. Data is no longer just generated from traditional computer applications, but now it comes from mobile and IoT devices, connected and autonomous vehicles, machine sensors, healthcare monitors and wearables, video surveillance, and the list is endless. It's not just about storing data any longer, but capturing, preserving, accessing and transforming it to take advantage of the possibilities it possesses and the value and intelligence it can deliver.

Modern businesses are realising the value of big data especially when paired with intelligent automation models fueled by machine learning (ML). These models are moving from the core data center right to the edge device itself where intelligent decision-making and time-to-action are driving better customer experiences and operational efficiencies. The goal is simple: make faster and more predictive decisions.

Representational image. Reuters.

Representational image. Reuters.

Typical ML models exist that include 'trained' neural networks capable of executing a specific task, such as identifying and tagging all faces in an image or video sequence, or providing insights derived from 'what happened' to 'what will likely happen' (predictive analysis). What's common in all is the challenge of big data. These models handle complex situations that are never completed, but rather, improved upon over time as more and more data points are captured, providing the fuel to train them.

Businesses that use machine learning are gaining valuable insights and this trend is only accelerating. The global ML market is expected to grow in revenue to $8.81 billion by 2022, at a 44.1 percent CAGR. The Indian government has also given more precedence to new generation technologies like the Internet of Things, Artificial intelligence (AI), machine learning (ML), etc. All these developments have one basic requisite€"data storage. Better business decision is a critical task done by analysing humongous volumes of data. Businesses are rethinking their data strategies, not only to increase their competitiveness, but also to create infrastructures that enable data to live forever. There are steps that businesses can take to increase the value of their data.

Restore & Activate Your Tape Archives

To analyze big data, it must be captured and stored on reliable and accessible media, not only for immediate access but to validate it is of the highest integrity and accuracy possible. Old data, when aggregated with new data, may actually be the most valuable data for machine learning since an abundance of stored data is needed to successfully run and train analytic models, and validate reliability for production purposes. Archiving data is not about recovering datasets, but rather preserving them and being able to access them easily using search and index techniques.

Traditional tape storage methods are ineffective as tape deteriorates over time, can be difficult to find, and data extraction may require legacy equipment without operating instructions. This doesn't mean that tape should be eliminated, because it is viable for backup, but it does require that multiple copies be created in case data restoration or tape degradation issues occur. At least one copy of data 'must' be on online media if an analysis is required.

Consolidate Your Storage

To garner new insights, value extraction gets better over time as more data points are collected and the true value occurs when different data assets from a variety of sources are correlated together. In a connected car, for example, combining video footage from outside cameras with engine diagnostics or radar information could deliver a better driving experience and even save lives.

The act of correlating these new data formats streaming into the data center is quite a challenge to storage architects. It's not just the sheer capacity of data that's the challenge, but the disparate data formats and the set of applications that need to access them. As we see the move from a monolithic hardware infrastructure to a disaggregated world, a similar transition is occurring for applications where legacy apps are being rewritten to new API-based applications. Having native cloud APIs (like AWS S3^TM) on-premises become very important in supporting application developers looking for portability and speed-to-innovate.

As such, there is no surprise that businesses are focusing on consolidating their assets into a single scale-out storage architecture that supports petabyte-scale and the ability to connect multiple applications to the same data at the same time. On-premises object storage or cloud storage systems serve a great purpose for these environments as they are designed to scale and support custom data formats.

Focus on Metadata

Obtaining value from big data is really about the metadata. Metadata extraction and the discovered correlations between those metadata insights are the foundation of ML models. Once a model is sufficiently trained, it can be put into production and help deliver faster decisions in the field, whether at the edge (such as a car) or in the cloud (such as a web-scale application).

Across industries and business processes, chief data officers (CDOs), data scientists and analysts are playing a more prominent role in helping to map the statistical significance of key problems and translate the analysis quickly to implement into the business.

Final thoughts

Businesses require a lot of old and new data to successfully run and train the analytic models that provide valuable insights and golden nuggets of intelligence, and be reliable enough to be deployed within a production environment. When we say big, we are talking about petabytes of capacity (not terabytes), with an ability to scale it up at a moment's notice and be flexible enough to ingest file and object-based data. When we discuss fast, we are not only talking about accelerating data capture at the edge but also accelerating usage in the core by quickly feeding the GPUs that analyze and train the models. Therefore, for machine learning to be a valuable resource, to go fast, you have to go big first!

Stefaan Vervaet is a senior director, Partner Alliance Engineering Data Center Systems Western Digital Corporation and Vivek Tyagi is director of Business Development, Embedded & Enterprise, Western Digital €" India

Also See: Harward-Google to develop an AI model to predict aftershock locations post a quake

Google for India: Google Feed becomes bilingual, Assistant now available in Marathi

NVIDIA becomes NITI Aayog's deep learning tech partner to support a think tank

Read more on News & Analysis by Firstpost.