Article

Fail-safe IT systems - 7 easy tips for your company

Cyberattacks, power outages or user errors - IT hardware malfunctions or even breakdowns can lead to long downtimes of your IT systems. The consequences for businesses are extensive. Reduction of productivity and higher costs are common side effects when optimised fail-safety has been omitted. Here, we show you 7 easy tips to better protect your company.

But what does fail-safety mean?

Fail-safety refers to the increased protection or security to ensure the continuous operation of the IT infrastructure without failures or malfunctions. Systems, installations or computer components should thus be able to continue certain tasks reliably. If a system failure nevertheless cannot be prevented, optimised fail-safety ensures rapid resumption. Improved fail-safety is particularly important for companies, because data, executed services and the permanent availability of the network must always be guaranteed. The higher the fail-safety of systems and equipment, the less likely it is that the IT infrastructure will be disrupted. There are a variety of techniques and tips for increasing this. But why do IT infrastructures fail or become disrupted?

Why do failures occur? 

There are various reasons for a failure or disruption of a company's IT infrastructure. Attacks by the malicious software ransomware in particular are becoming more frequent and lead to disabilities. But other hacking attacks, hardware damage, power failures, user errors or fire are also common risk factors, due to an excessively high ambient temperature in data centres. They paralyse the company within seconds and force IT to have an emergency plan.

But what consequences does the failure of the IT infrastructure have for your company? 

Low fail-safety leads to long IT downtimes, which have consequences for other company departments. Current operations are interrupted and, above all, productivity in your business processes is reduced. Your employees are forced to suspend work. The possible data loss can also cause problems for companies. In the event of longer downtimes, severe financial losses and high costs are also to be expected in order to subsequently drive the company's processes forward again and to repair the possible damage.

But what simple tips can you use to increase your own company's fail-safety?  

Even a few adjustments minimise the risk of interruptions and malfunctions in your own company. Monitoring the ambient temperature, avoiding dust particles in and on the hardware and regular maintenance will protect your IT infrastructure.

1. Reducing the ambient temperature

The ambient temperature has a high influence on the failure rate of electronic devices. Already at a temperature of 30 degrees Celsius, the failure rate increases twofold. Therefore, it is recommended to set up data centres in rooms with low solar radiation. Furthermore, temperature increases can be prevented by using an air conditioning system. In addition to fail-safety, the service life of electronic devices increases at an appropriate temperature of 20 degrees Celsius.

2. Avoiding dust and dirt 

Another cause of failures is dirt and dust on the surface and inside the hardware. Air-conditioning systems as well as the built-in fans of the active devices promote air circulation and the distribution of dirt. Dust or similar settles on and in electronic devices through coolers and fans and reduces the heat dissipation of such devices. Consequently, short circuits, overheating and malfunctions occur. The risk of fire also increases due to the dust particles on the hardware. To avoid these consequences, there should be no visible dust in the room. With a regular finger test by brushing over the surfaces of the hardware, the risk of hardware damage decreases.   

3. Regular maintenance 

While regular finger testing prevents damages from dust, regular maintenance increases the lifespan of IT equipment. Damaged hardware or problems like hardware fans that are too loud or increased power consumption of the server room, occur partly due to neglected maintenance. Regular maintenance of servers, storage, switches and other electronic devices has a preventive effect. In addition, thorough computer checks should always be carried out at intervals specified by the manufacturer, regardless of error messages. This is the only way to increase fail-safety and minimise the risk of hardware failures.

How can systems continue to perform their tasks in the event of disruptions?

More broadly, the use of UPS systems, RAID systems, PDU power distributors and a firewall have a positive effect on fail-safety. In the event of a failure, they primarily support the IT infrastructure in reliably carrying out the tasks assigned to you. 

1. UPS systems

In the event of a power outage, a good UPS system is indispensable. Uninterruptible power supply systems are actively used to bridge and maintain power during outages. The batteries used in UPS systems provide the transition from mains current to generator power. This protects servers, desktop PCs and other hardware from crashing. Furthermore, UPS systems filter incoming mains voltages to eliminate common disturbances such as voltage dips or switching spikes. 

2. RAID systems

In the event of disruptions, data loss in particular is the greatest danger for companies. To protect them, it is possible to create backup copies using RAID systems. RAID means the redundant arrangement of independent hard disks.

There are 3 different techniques for storing data that are used in RAID systems

Mirroring of hard disks

With mirroring, an entire data set is stored on 2 different hard disks. If one hard disk fails, there is no risk of losing the data stored on it. All the information is available and stored on another hard disk. However, this occupies a lot of memory.

Grafik zur Spiegelung des Datensatzes

Striping of data blocks

In striping, the data set is divided into "strips". This means that only a part of the entire data block is stored on one hard disk. The successive hard disks each contain further "strips" of the data set. This means that all hard disks are necessary to read the complete data block. But if one hard disk fails, the information is unreadable and lost because there is no backup copy of the data. On the other hand, several hard disks can be read at the same time, so striping is particularly useful for large amounts of data.

 

Grafik zur Aufteilung des Datensatzes

 

Parity and striping

The 3rd technique is a combination of striping and parity. As with striping, thedata set is divided between different hard disks. In addition, the hard disks are equipped with so-called checksums (parity value - "P"). With these values, lost information and data can be calculated in the event of a failure and then restored. This technique is considered a highly fail-safe system, but is very slow compared to other systems.

 

Grafik zur Aufteilung des Datensatzes

Based on these 3 methods, there are various RAID systems with special RAID levels. These are suitable for different system sizes and have different security levels. So a suitable RAID system can be found for every project and for the desired system security. 

3. PDU Power Distributors

Power Distributed Units (PDU) are power strips used to distribute power from individual devices and protect circuits. These are often used in data centres. They increase the fail-safety of the IT infrastructure by determining measured values for electricity, voltage and current power. If the measured values are increased and exceed defined limit values, users receive a message by e-mail or SMS. 

4. Firewalls

Firewalls protect the company network and ensure secure and undisturbed use of the Internet. Harmful and unauthorised access to one's own network can be warded off with an appropriate firewall. Depending on the design, the security system protects an entire computer network or only a single computer by controlling the data traffic between the Internet and the network. They decide if data packets are allowed to pass through the network and the transmission of data and information takes place or not. To do this, firewalls open or block inputs and outputs (ports). If a computer is connected to the Internet, several ports open up. This increases the chance of intruders gaining access to the network. Firewalls prevent this and increase fail-safety.

Conclusion

The fail-safety of your IT infrastructure should have a permanent place in your IT contingency plan. Even with simple tips, you can increase fail-safety and ensure scalable availability of your IT processes. Regular monitoring, constant care of the hardware and adherence to maintenance intervals can prevent some malfunctions and failures. With a parallel use of security-enhancing network technology, your systems will continue to be well protected in the event of a malfunction. Cybertrading is using UPS, several hardware servers to load distribution as well as redundant components to be protected against disturbances.

Would you like to strengthen the security of your IT infrastructure?  

Then take a look at our online shop. There you will find a large selection of refurbished, used and new network technology from a wide range of manufacturers. Do you need support? Our sales team will be happy to advise you and assist you with your questions, problems or requests. Alternatively, you can also contact us via the contact form.