What do pilots, electrical grid operators, and data center managers all have in common? They all control a piece of equipment that cannot fail. A pilot error, large-scale blackout, or financial data center outage can result in unquantifiable costs. It is their job to ensure their equipment is operating sufficiently at all times. Given their focus on reliability, it is understandable why, at least in the case of data centers, not much attention has been paid to optimizing the efficiency of the data that is collected for these systems. This post examines how each operator manages critical equipment to avoid failure today and introduces a new concept of how more intelligent control using big data can improve both safety and efficiency profiles.
Extensive monitoring and real-time data feedback provides operators with data to monitor equipment performance and make control decisions. In each case, there may be a thousand points of failure – all of which need to be accounted for and responded to should something go awry. This monitoring does help operators make good decisions. But failures do happen. One of the reasons is that this approach necessitates a reactive reaponse. Operators monitor until something goes wrong and then respond to fix the situation. What if control decisions weren’t based on current data, but on analysis of historical data applied to current data to identify potential problems before they occur? This simple example will help explain my point:
If you go to almost any casino and walk over to the roulette wheel, you’ll see a board showing the results of the last 8 rolls. Studies show that if the last 8 rolls have all been black, most people will bet red. Since a red has to come up at some point your odds of landing a red are better, right? Wrong. Statistically, the odds of landing a red is 47%, the odds of landing a black is 47%, and the odds of landing a green are 6%…for every roll! We all fall into poor decision trap because we are tricked by “current” information. Our perception of the situation leads us to make incorrect decisions about future events. If we performed an analysis on the last 100,000 outcomes of roulette, it would clearly show that the odds of rolling a black and a red are identical. With a larger set of data, we can more accurately predict future outcomes. The same mistakes are made by pilots, electrical plant operators, and data center managers except the result is losing more than just a couple casino chips.
The important take away is that lots of historical data can help predict the outcome of control decisions. If we could see results for the past 10 million roulette spins, we would be even more confident on how we should bet. Many operators are faced with making a decision based on a set of conditions with which they may or may not find familiar. Having historical data to help identify previous responses – even for rare events – under the same conditions would help operators understand the potential outcomes of their decisions before they are made. This understanding supports more intelligent control decisions and ultimately can avoid catastrophic outages.
Data, lots of data, provides useful context for being proactive in maintaining uptime and avoiding outages. Operators of critical equipment are taught how to respond and react when a failure occurs, however, with time-series data, analyzed in real-time and spanning a wide variety of locations or deployments, operators and control systems now have the intelligence to prevent incidents before they occur. It is time we start using the data and intelligent analytics that are available.