What comes to mind when you think of Internet of Things (IoT)? Ok, fair enough. “When will all the hype stop” is a valid first thought. But for IT professionals, the first thing that increasingly comes to mind is: that’s a lot of data!
Consider these examples to understand why:
- Pratt & Whitney’s Geared Turbo Fan engine is fitted with 5,000 sensors that generate up to 10 GB of data per second.
- New York utility Consolidated Edison is promising that its $1.3 billion smart meter rollout will give customers more fine-grained and timely data on their energy use, with residential electric meter reads every 15 minutes.
- GE, which has been a strong proponent of the industrial IoT, cites predictions that 50 billion devices will connect to the Internet by 2020.
That’s a lot of data. So where will it all go? Where will it be stored and protected? Let’s discuss each and provide some advice on how software-defined storage can tame the IoT data deluge.
IoT: The Data Stream that Keeps on Giving
One thing we talk a lot about at Hedvig with our customers and prospects is just how difficult it is to predict and plan out storage capacity. Traditionally in IT, companies attempt to purchase what they estimate their infrastructure (hardware and software) requirements will be three to five years out. That’s difficult. Now imagine how difficult that’s going to be when faced with a completely unpredictable IoT data stream.
In only relatively few instances, it’s possible that organizations can somewhat accurately ballpark what they expect their storage requirements will be, provided there’s a well-defined and specific use case. To that end, it’s helpful to think of IoT technical challenges as twofold: first order and second order. The first order is the network connectivity, processing power, battery technology and storage needed to ensure IoT devices and the resulting data can be effectively gathered, processed, and stored. The second order challenges include data protection, security, privacy, compliance and others.
As an example, LKAB, a large mining customer of ours in Sweden, uses geophone sensors to help with their mining operations. They didn’t struggle with sizing the first-order infrastructure needed to support this initiative, but found it difficult to size the second-order backup storage. As many of us know, not properly sizing your backup is a well-known pitfall in IT, but with IoT and its own storage implications, it can be doubly difficult. How much sensor data needs to be backed up? How critical is it to the business and what level of protection does it require? How often will it be accessed?
While LKAB was able to accurately gauge its first-order infrastructure requirements, in other industries that would be nigh impossible. Consider the tennis racket manufacturer Babolat, which makes rackets with sensors embedded in them to give players feedback on their technique, speed, stroke and other metrics. Imagine if the company offered an option to store all of its customer data in the cloud. How many rackets might the company sell? In what countries? How many swings does an average player take in an average match? What’s an average match, for that matter? IT’s job of predicting business needs just keeps getting harder.
IoT Data Storage Considerations and Recommendations
When our CEO Avinash Lakshman founded Hedvig in 2012, it was just starting to dawn on us that IoT was indeed real and that the corresponding explosion of data was equally real. He also knew from his experience at Amazon and Facebook that the only way to tame exponential data growth was with a new, distributed systems approach. Hence the Hedvig Distributed Storage Platform.
With more enterprises – large and small – embracing data as an entirely new revenue stream and as a way to enhance existing business operations, here are three things to consider if you’re evaluating to what extent you should embrace IoT:
- IoT data is valuable to the business. IoT, like big data, is often a business-driven undertaking. The business has already decided that this IoT data is important – whether it’s sensor data, smartphone data, or other endpoint data – and that’s why it decided to deploy all of the connected devices, sensors and the like. The decision has been made there’s value to be harvested from the IoT data.
- Store all or none of your IoT data. It makes little sense for a business to say it values this IoT data but then for IT to agree to store only some of that data. Thus, it goes without saying that with an enormous influx of data comes a commensurate increase in data storage requirements. Regardless of the storage architecture ultimately chosen, it will have to be sized significantly larger. As discussed, this means accurately sizing both your primary and secondary storage tiers.
- Analyze all of your IoT data. Once you have gathered, stored, and protected all of this data, you won’t know its true value until you analyze it. If you misjudge how much of this data you will store, then you’re implicitly limiting your ability to analyze that data. You also need to consider whether storage will become a bottleneck to any kind of analytics application. Hadoop, for example, has specific scaling and performance requirements depending on whether the IoT data is a large number of large files, or a large number of small files.
And since, in many ways, it’s still very early days in IoT data storage, it’s hard to know exactly what sorts of best practices will be developed. That said, here are three general recommendations.
Recommendation #1: Design IoT storage for incremental scalability
When it comes to the vast majority of IoT implementations, our view would be don’t try to size up your storage requirements at the outset. Instead, choose a product that can scale indefinitely: start small, then as the project changes and grows, just add more storage. For those of us already ensconced in the growing world of software-defined infrastructure, this may seem obvious. For others, IoT data may be the first killer use case for a software-defined storage pilot.
My colleague Chris provides a great illustration of the value of incremental scalability in this blog. He walks through how Hedvig scales in lockstep with needs, as shown here:
Recommendation #2: Decide if IoT data storage (and backup) is a separate tier
Will IoT data reside in an existing tier of storage or will it be isolated in a dedicated tier? Historically, Tier 1 is for mission-critical, high-performance data; Tier 2 is for backoffice applications and user data; and Tier 3 is usually backup and archive. But with software-defined storage, you gain far more flexibility. If you want to dedicate an entire cluster just to your IoT data, that’s fine. On the other hand, if you’d rather carve out a portion of it and apply specific policies and SLAs to it, that works, too. In software-defined storage, you simply roll in the right racks of commodity servers and set policies – significantly flattening the traditional tiering concept.
For example, you may choose to deploy higher-density disk drives on nodes with more moderate CPU and memory if you’re using software-defined storage to backup your IoT data. If you want the software-defined storage to be the primary storage, then simply deploy nodes with more cores and more RAM to drive higher IOPs. Looking to run Hadoop or other analytics? No problem, just scale out the cluster accordingly.
Recommendation #3: Pick a distributed storage platform to underpin IoT storage
Because of the distributed nature of IoT, it follows that the architecture and infrastructure you employ to support it should be distributed as well. That’s one of the defining characteristics of the software-defined infrastructure world: all of it is completely decoupled. You can change your storage and backup requirements easily as your needs evolve, spin up clusters and containers on demand, provision and allocate resources on the fly, and enforce policies and SLAs as required – it’s all elastic.
Where this really comes into play is in the nature of the IoT data. If it’s real-time sensor data, then you’ll need compute and storage to be closer to the sensor for immediate processing. This is an emerging trend that Cisco has coined Fog Computing. If your software-defined storage is a true distributed system, then you can deploy nodes in a geographically dispersed environment – ensuring data locality for IoT sensors while still providing a single, virtualized storage pool.
How Hedvig helps with IoT data storage
At the end of the day, the most important consideration is to ensure your IoT data storage strategy and implementation align with the business. If you’re just experimenting with IoT, then your needs are modest. But if the business goes all-in you need to scale in lockstep with the business. Hedvig is designed for incremental scalability, can help you expand or collapse your storage tiers as needed, and is built as a true distributed system. These three technical attributes will help you better align your IoT data storage with business IoT requirements.
If you’re interested in learning more, watch this short whiteboard video from Chris Kranz that illustrates how the platform works technically. Or simply click Learn More and we can pair you with the right Hedvig expert to answer any of your questions