The single issue that every facility manager hates most is hot spots. They are like a disease. You cannot see them and you only become aware of them by observing their symptoms. And be aware when you do. Hot spots that continue for extended periods of time can cause poor server performance and, in the most extreme cases, server failures. Since the number one job of a facility manager is to prevent server failure, hot spots needs to be diagnosed and treated early. This post will examine why hot spots exist and why the industry’s current solution for mitigating them may actually make them worse.
Consider the following example describing data center airflow using a raised floor layout. Figure 1 shows a simplified side view layout for a data center with a Computer Room Air Conditioner (CRAC) unit on the left and two rows of IT equipment on the right. The CRAC unit and IT equipment sit on a raised floor. The area between the IT equipment is the cold aisle. Perforated tiles are indicated by the dotted line in the cold aisle.
As IT equipment rejects heat into the space, hot air rises and creates a warm thermal plenum in the ceiling above the IT equipment. The CRAC unit pulls the warm air in from the top, cools it down, and blows it into the raised floor. The warm thermal plenum and under floor cold plenum is shown in Figure 2.
Figure 3 shows how you expect the airflow to behave. As the cold air is discharged through the perforated tiles in the cold aisle, it is sucked through the IT equipment and rejected in the hot aisle taking the server heat with it.
Is this how you think the airflow in your facility behaves? If so, you are wrong. The reason is that most data centers install excess cooling capacity. This is a logical procedure to ensure that you have enough cooling redundancy. However, most data centers run ALL of their installed cooling units, all of the time. This is the critical mistake.
If your facility has an excess amount of installed cooling capacity, and all of those units are all running, then you create excess airflow in your data center. Excess airflow causes hot spots. Figure 4 shows why and how this occurs.
When you have an excessive amount of airflow, the air discharged through the perforated tiles bypasses the servers and is forced into the warm thermal plenum above the ceiling. This mixing causes the warm air to be displaced from the ceiling and back down to the IT equipment. Warm air finds its way over the top of racks, through the racks themselves, and even under the racks into the cold aisle. The warm air that finds its way back into the server inlet creates what we know as a hot spot.
Now, let me ask you a question. When you discover a hot spot, what do you do? Infrastructure improvements such as blanking panels, covering holes in the floor, or some kind of containment will help. But even after making these improvements you are probably still finding hot spots. And, you might come to the conclusion that you need more cooling. But then, after buying more cooling equipment or turning on more units, you find you have created more hot spots in other locations, and find yourself thinking, “What the f%#!”
By understanding why hot spots occur, you’ll realize why adding more cooling will make the problem worse. To sum up, adding more cooling adds more airflow. More airflow will cause more mixing with the warm thermal plenum. And more mixing causes more warm air to find its way into the cold aisle which results in more hot spots.
The next time you uncover the dreaded hot spot, don’t starting adding more cooling. Start by understanding the root cause of the “hot spot disease” and treat the symptoms to cure the disease.
Click here for more information about appropriate server inlet temperatures.