Unique data protection schemes

Storage system manufacturers are pursuing unique ways of processing large amounts of data while still being able to provide redundancy in case of disaster. Some large SAN units incorporate intricate device block-level organization, essentially creating a low-level file system from the RAID perspective. Other SAN units have an internal block-level transaction log in place so that the control processor of the SAN is tracking all of the block-level writes to the individual disks. Using this transaction log, the SAN unit can recover from unexpected power failures or shutdowns.

Some computer scientists specializing in the storage system field are proposing adding more intelligence to the RAID array controller card so that it is ‘file system aware.’ This technology would provide more recoverability in case disaster struck, the goal being the storage array would become more self-healing.

Other ideas along these lines are to have a heterogeneous storage pool where multiple computers can access information without being dependant on a specific system’s file system. In organizations where there are multiple hardware and system platforms, a transparent file system will provide access to data regardless of what system wrote the data.

Other computer scientists are approaching the redundancy of the storage array quite differently. The RAID concept is in use on a vast number of systems, yet computer scientists and engineers are looking for new ways to provide better data protection in case of failure. The goals that drive this type of RAID development are data protection and redundancy without sacrificing performance.

Read More

Dealing with the Complexity of Storage Systems

In fact, even with all the advancements in storage technology, only about 20%* of back-up jobs are successful (*according to Enterprise Strategy Group).

Each year hundreds of new data storage products and technologies meant to make the job faster and easier are introduced, but with so many categories and options to consider, the complexity of storage instead causes confusion – which ultimately leads to lost time and the loss of the very data such new enhancements are meant to avoid.

Hence the question for most IT professionals who have invested hundreds of thousands of dollars in state-of-the-art storage technology remains, “How can data loss still happen and what am I supposed to do about it?”

Why Backups Still Fail
In a perfect world, a company would build their storage infrastructure from scratch using any of the new storage solutions and standardize on certain vendors or options. If everything remained unchanged, some incredibly powerful, rock-solid results could be achieved.

However, in the real world storage is messy. Nothing remains constant – newly created data is added at an unyielding pace while new regulations, such as Sarbanes-Oxley, mandate changes in data retention procedure. Since companies can rarely justify starting over from scratch, most tend to add storage in incremental stages – introducing new elements from different vendors at different times – hence the complexity of storage.

All this complexity can lead to a variety of backup failures that can catch companies unprepared to deal with the ramifications of data loss. One reason why backups fail is due to bad media. If a company has their backup tapes sitting on a shelf for years, the tapes could become damaged and unreadable. This is a common occurrence if backup tapes are not stored properly. Another reason why backups fail has to do with companies losing track of the software with which those backups were created. For a restore to be successful, most software packages require that the exact environment still be available. Finally, backups fail due to corruption in the backup process. Many times companies will change their data footprint but not change their backup procedure to keep up – so they are not backing up what they think they are. Without regular testing, all of these reasons are likely sources of failure.

What to Do When Your Backup Fails
No matter how much a company tries to speed operations and guard against problems with new products and technology, the threat of data loss remains and backup and storage techniques do not always provide the necessary recovery. When an hour of down time can result in millions of dollars lost, including data recovery in your overall disaster plan is critical, and may be the only way to restore business continuity quickly and efficiently. When a data loss situation occurs, time is the most critical component. Decisions about the most prudent course of action must be made quickly, which is why administrators must understand when to repair, when to restore and when to recover data.

When to Repair
This is as simple as running file repair tools (such as fsck or CHKDSK – file repair tools attempt to repair broken links in the file system through very specific knowledge of how that file system is supposed to look) in read-only mode first, since running the actual repair on a system with many errors could overwrite data and make the problem worse. Depending on the results of the read-only diagnosis, the administrator can make an informed decision to repair or recover. If they find a limited amount of errors, it is probably fine to go ahead and fix them as the repair tool will yield good results.

Note: if your hard drive makes strange noises at any point, immediately skip to the recovery option.

When to Restore
The first question an admin should ask is how fresh their last backup is and will a restore get them to the point where they can effectively continue with normal operations. There is a significant difference between data from the last backup and data from the point of failure, so it is important to make that distinction right away. Only a recovery can help if critical data has never been backed up. Another important question is how long it will take to complete the restore – if the necessary time is too long they might need to look at other options. A final consideration is how much data are they trying to restore. Restoring several terabytes of data, for example, will take a long time from tape backups.

When to Recover
The decision to recover comes down to whether or not a company’s data loss situation is critical and how much downtime they can afford. If they don’t have enough time to schedule the restore process, it is probably best to move forward with recovery. Recovery is also the best method if backups turn out to be too old or there is some type of corruption. The bottom line is, if other options are attempted and those options fail, it is best to contact a recovery company immediately. Some administrators will try multiple restores or repairs before trying recovery and will actually cause more damage to the data.

Read More