Articles | DataRecoveryUnion.com

Glossary of Hard Disk Drive Terminology (Letter G)

Articles / 2009-01-09

GB (Gigabyte)
Western Digital defines a gigabyte as 1,000,000,000 (one billion) bytes or 1000 (one thousand) Megabytes.

Giant MR (GMR)
An advanced form of magnetoresistive head technology in which IBM is believed to hold the greatest product advancements.

Glossary of Hard Disk Drive Terminology (Letter G) Read More »

Glossary of Hard Disk Drive Terminology (Letter F)

Articles / 2009-01-09

FAT (File Allocation Table)
A data table stored at the beginning of each partition on the disk that is used by the operating system to determine which sectors are allocated to each file and in what order.

Fdisk
A software utility used to partition a hard drive. This utility is included with DOS and Windows 95 operating systems.

Fetch
The process of retrieving data.

Fibre Channel (FC)
The general name given to an integrated set of standards being developed by an ANSI-approved X3 group. This set of standards defines new protocols for flexible information transfer. Fibre channel supports three topologies: point-to-point, arbitrated loop, and fabric.

Fibre Channel Arbitrated Loop (FC-AL)
A subset of fibre channel network systems interconnection. A serial storage interface designed to meet the needs of high-end applications.

Fiscal Periods
Three-month segments of a fiscal year. Western Digital, with a June 30 fiscal year, has fiscal quarters ending on the last days of September, December, March and June.

Firmware
Permanent instructions and data programmed directly into the circuitry of read-onlymemory for controlling the operation of the computer.

FIT (Functional Integrity Testing)
A suite of tests Western Digital performs on all its drive products to ensure compatibility with different hosts, operating systems, adapters, application programs, and peripherals. This testing must be performed before the product can be released to manufacturing.

Flow Control
In PIO transfers, the ability of an EIDE drive to control the speed at which the host transfers data to or from the drive by using the IORDY signal. The host temporarily stops transferring data whenever the drive deasserts the IORDY signal. When the drive reasserts the IORDY signal, the host continues the data transfer.

Format
A process that prepares a hard drive to store data. Low-level formatting sets up the locations of sectors so user data can be stored in them. Western Digital hard drives are low-level formatted at the factory and therefore do not need to be low-level formatted by the end user. You need to perform a high-level format (with EZ-Drive or the Format command) on your new Western Digital hard drive before you can use it. Formatting erases all the information on a hard drive and it sets up the file system needed for storing and retrieving files.

Formatted Capacity
The actual capacity available to store data in a mass storage device. The formatted capacity is the gross capacity minus the capacity taken up by the overhead data required for formatting the media.

Form Factor
The industry standard that defines the physical and external dimensions of a particular device.

Full-Duplex
A communication protocol that permits simultaneous transmission in both directions.

Glossary of Hard Disk Drive Terminology (Letter F) Read More »

Memories of the Computer Anatomy

Articles / 2009-01-08

Hard drives are the memories of our computers. They store documents, data, voice recordings and even entire movies. Because hard drives are so spacious and efficient these days, we can start to believe that they offer permanent and secure storage for our data. Unfortunately, that is not the case.

For such hard-working devices, hard drives can be remarkably fragile. They store data on stacks of rotating metallic platters. Magnetic heads ‘float’ between the platters, moving information back and forth without making physical contact. The information that looks so real on a monitor is, in fact, delicate electrical impulses on a metal plate.

Once people in an organization know how hard drives work, they understand how easy it is for data to be lost. As hard drives become smaller, so does their ‘tolerance’, the distance between the platter and the heads that read and write data. Bumping into a computer while the hard drive is running can make the head actually touch the platter and literally ‘rub out’ the data there. Contamination, like dust or moisture, or a slight change in power can also cause damaging head contact.

That’s why it is absolutely vital to switch off the hard drive at the first sign of any unusual noise, like grinding, scraping or chattering. If nothing’s wrong, nothing has been lost. But if there is physical damage taking place inside the drive, prompt action can keep it to an absolute minimum and more data will be available for recovery

Memories of the Computer Anatomy Read More »

The New Media for Small and Medium Sized Storage Solutions

Articles / 2009-01-08

When small to medium-sized users need more storage capacity and faster backups than they can achieve with 8mm or DDS backups, there are two new formats to choose from.

Digital Linear Tape (DLT) systems have been available since 1985, but recent increases in both speed and capacity have given the technology a new lease on life. In fact, for small to medium-sized systems they have been the leading technology for the last several years. DDS or DAT tapes were the only competitors for DLT in that market, but the tape heads had a tendency to ‘drift’ which meant technicians had to monitor them to ensure storage. DLT reliability is based on a ‘straight up and down’ recording mode.

Earlier this year, the introduction of Super DLT brought a tremendous boost in performance. Super DLT can store as much as 110 gigabytes on one cartridge, at a speed of 10 megabytes per second. With the speed of backup doubled, and capacity more than doubled, the technology can now reach ‘up’ to systems and networks that DLT previously couldn’t handle.

Competing technologies can offer very fast backups, but the tapes themselves contain very little data – hundreds of megabytes as opposed to hundreds of gigabytes that DLT offers.

Another technology has recently emerged that is comparable to DLT. That is LTO or Linear Tape Open, a consortium product from Seagate, IBM and Hewlett-Packard. LTO can put 100 gigabytes on a cartridge at up to 15 megabytes per second.

For cautious system administrators who don’t wish to try LTO, one technician said DLT is a more than acceptable choice: “Thirty million cartridges and a million tape drives can’t be wrong.”

Of course, Super DLT incorporates a good deal of new technology as well, so even though LTO is completely new technology, it “has a nice road map in front of it.” Super DLT uses a new recording format, but it does maintain a limited form of backwards compatibility with previous iterations of DLT. It incorporates the ability to read older tapes, although it cannot write to them, which means it would probably be most useful in allowing organizations to maintain their present archives in a useable form. Where users have thousands of tapes in their libraries, there can be a considerable saving in time and money if older tapes don’t have to be re-recorded on to newer ones. For those users who are moving from 8mm or DDS format systems, and committed to re-recording all their data, then there may be little to choose between LTO and Super DLT systems.

Today’s demands for storage capacity are increasing, and if anything there is going to be more pressure on our ability to back up, store, protect and retrieve data. Low to medium size users now have a choice: Super DLT, based on generations of iterative development and refinement, or LTO, new technology from a high-powered and stable group of technology companies.

Present archives in a useable form. Where users have thousands of tapes in their libraries, there can be a considerable saving in time and money if older tapes don’t have to be re-recorded on to newer ones. For those users who are moving from 8mm or DDS format systems, and committed to re-recording all their data, then there may be little to choose between LTO and Super DLT systems.

The New Media for Small and Medium Sized Storage Solutions Read More »

Virus Protection Key to Healthy Computing

Articles / 2009-01-07

Computer viruses are proving to be highly complex but preventing viruses from infecting your computer systems is simple. Use two well-known brands of anti-virus software and keep them as current as possible.

Beyond that, there are some simple, common sense procedures that everyone should use, whether at work or in the home computing environment. Never open a file whose origins are unknown. In a simpler day, that wisdom only applied to executable files, or files that did something. They have the suffixes .exe, .com and .bat and each can start a program on your computer. These viruses spread through games downloaded from the Internet, on borrowed diskettes and through the old ‘bulletin board’ services.

Today, unfortunately, a whole new wave of viruses has been unleashed on unsuspecting computer users because software manufacturers introduced feature-rich new programs without considering how vulnerable they are to viruses. Now, almost any document and many email messages can carry and spread ‘macro’ viruses at lightning speed. That’s why it is so important never to open messages or documents from unknown sources. Viruses can delete data, change file names or even damage the physical media the data where the data is stored.

How important is virus protection?
If your data is critical to your business operations, there is nothing more important. Even though about 75 per cent of all data loss incidents are caused by human error or system malfunctions, a virus attack can still cripple your data center. A combination of regular, verified backups and constantly updated virus protection are absolutely essential to protect your data – and your organization.

Virus Protection Key to Healthy Computing Read More »

Top 10 Data Recovery Bloopers

Articles / 2009-01-07

1. People Are the Problem, Not Technology
Disk drives today are typically reliable – human beings aren’t. A recent study found that approximately 15 percent of all unplanned downtime occurs because of human error.

2. When Worlds Collide
The company’s high-level IT executives purchased a “Cadillac” system, without knowing much about it. System implementation was left to a young and inexperienced IT team. When the crisis came, neither group could talk to the other about the system.

3. An Almost Perfect Plan
The company purchased and configured a high-end, expensive, and full-featured library for the company’s system backups. Unfortunately, the backup library was placed right beside the primary system. When the primary system got fried, so too did the backup library.

4. When the Crisis Deepens, People Do Sillier Things
When the office of a civil engineering firm was devastated by floods, its owners sent 17 soaked disks from three RAID arrays to a data recovery lab in plastic bags. For some reason, someone had frozen the bags before shipping them. As the disks thawed, even more damage was done.

5. It’s the Simple Things That Matter
The client, a successful business organization, purchased a “killer” UNIX network system, and put 300+ workers in place to manage it. Backups were done daily. Unfortunately, no one thought to put in place a system to restore the data to.

6. Buy Cheap, Pay Dearly
The organization bought an IBM system – but not from IBM. Then the system manager decided to configure the system uniquely, rather than following set procedures. When things went wrong with the system, it was next to impossible to recreate the configuration.

7. Lights Are On, But No One’s Home
A regional-wide ambulance monitoring system suffered a serious disk failure, only to discover that its automated backup hadn’t run for fourteen months. A tape had jammed in the drive, but no one had noticed.

8. Hit Restore and All Will Be Well
After September’s WTC attacks, the company’s IT staff went across town to their backup system. They invoked Restore, and proceed to overwrite from the destroyed main system. Of course, all previous backups were lost.

9. In a Crisis, People Do Silly Things
The prime server in a large urban hospital’s system crashed. When minor errors started occurring, system operators, instead of gathering data about the errors, tried anything and everything, including repeatedly invoking a controller function which erased the entire RAID array data.

10The Truth, and Nothing But the Truth
After a data loss crisis, the company CEO and the IT staffer met with the data recovery team. No progress was made until the CEO was persuaded to leave the room. Then the IT staffer opened up, and solutions were developed.

Top 10 Data Recovery Bloopers Read More »

HTML files and Text Files

Articles / 2009-01-06

After all known compound file formats have been carved, their sectors are bookmarked and removed from consideration as possibly belonging to text, HTML or any other files. Using the “gather text” feature of X-Ways Forensics (or similar feature from a variety of existing forensic tools), text was extracted from the remaining sectors not bookmarked.

All .html and .txt files were manually carved and evaluated since no compound file format exists, identifying start, end, or location of structures within the file(s). Any fragmented text or .html files were manually put back together based on manual review of the content of the files.

HTML files and Text Files Read More »

JPEG Files

Articles / 2009-01-06

Next we will look at carving JPEG graphic files, as specified in the document “Description of Exif file format.” For complete details of the file format specification, please refer to the hyperlink to the document, listed on page 1 of this paper.

The JPEG graphic file starts with a Start of Image (SOI) signature of “FF D8”. Following the SOI are a series of “Marker” blocks of data used for file information. Each of these “Markers” begin with a signature “FF XX”, where “XX” identifies the type of marker. The 2 bytes following each marker header is the size of the marker data. The marker data immediately follows the size and then the next marker header “FF XX” immediately follows the previous marker data. There is no standard as to how many markers will exist, but following the markers, the signature “FF DA” indicates the “Start of Stream” marker. The SOS marker is followed by a 2-byte value of the size of the SOS data and is immediately followed by the Image stream that makes up the graphic. The end of the image stream is marked by the signature “FF D9”.

In the event that a thumbnail graphic exists within the file, the thumbnail graphic will have the exact same components as the full-size graphic, with “FF D8” indicating the start of the thumbnail and “FF D9”, indicating the end of the thumbnail. Since thumbnails are significantly smaller and less likely to experience fragmentation than their larger parent full-size graphic, they can be used as a comparison tool for evaluating what the entire jpeg graphic is supposed to look like, in the event you must do a manual visual review of the carved graphic.

By searching first for all locations of the “FF D8 FF” signature, you identify the beginning of each jpeg graphic. The reason for searching for “FF D8 FF” is that there are different versions of jpeg files, some that start with “FF D8 FF E0” and some with “FF D8 FF E1”, and leaving off the 4th byte in your signature will catch all instances, but may result in some false hits.

Rather than carve a specific length of data, in this case we will start at the beginning signatureand carve until we find “FF D9”. In the event of a non-fragmented jpeg graphic, without a thumbnail, this will carve the whole file. If we slightly modify our logic, by including a “if “FF D8” occurs again before “FF D9”, then carve to the 2nd instance of “FF D9″” statement in our search for jpegs, then we will carve entire files including their thumbnail as long as they are not fragmented. Without this “if” logic, the first search would stop carving at the end of the thumbnail and result in an invalid jpeg. In the event of a fragmented jpeg file, the above carving method results in either a partial jpeg file or a complete jpeg file that contains extraneous data in the middle of it.

After carving all jpeg files based on these rules, we next quickly review which carved jpeg files are complete, versus which ones are fragmented and need further analysis. By carving all jpeg files to a folder, you next add that folder to your forensic tool that has partial graphic file viewing capabilities, such as the “Outside In” viewer that is built into many existing forensic tools. Using a gallery view, you can quickly identify which files are not displaying properly, only showing a partial file, and require further analysis.

Once all fragmented or partial jpegs are identified, manual visual inspection of each of these files was used to determine at what point the fragmentation occurred. This was done by approximating the percentage of the file that displayed correctly in the viewer before displaying corruptly. The raw data of the carved file was then reviewed at the data at that percentage of the file to attempt to identify where the valid graphic data ended. For this process it was assumed that the extraneous data started at an offset that was a multiple of 512-bytes from the beginning of the file. Once the extraneous data was identified, it was then removed from the partial jpeg and re-evaluated as possible sector data for other fragmented files that had previously been identified

JPEG Files Read More »

MS Compound Document Files

Articles / 2009-01-06 / 1 Comment

(Includes documents, spreadsheets, templates and other MS office files)

Next we will look at carving MS Compound Document (and spreadsheet) files, as specified in the document “Open Office.org’s Documentation of the Microsoft Compound Document File Format.” For complete details of the file format specification, please refer to the hyperlink to the document, listed on page 1 of this paper.

As quoted from the above referenced document, “Compound document files are used to structure the contents of a document in the file. It is possible to divide the data into several streams, and to store these streams in different storages in the file. This way compound document files support a complete file system inside the file, the streams are like files in a real file system, and the storages are like sub-directories.”

All streams of a compound document file are divided into sectors. Sectors may contain internal control data of the compound document or parts of the user data. The entire file consists of a compound document header and a list of all sectors following the header.. The size of the sectors can be set in the header and is fixed for all sectors then.

Example:
HEADER
SECTOR 0
SECTOR 1
SECTOR 2
SECTOR 3
SECTOR 4
SECTOR 5
SECTOR 6
…and so on…

As we discussed in the section on Zip files, if you know what you are looking for, and where you expect to find it within the file, you can determine exactly what data belongs to the file in question and whether or not there is fragmented data within the file.

We start by searching for the Compound Document Header, “D0 CF 11 E0 A1 B1 1A E1,” to identify the beginning of each of the MS compound documents. Next, at offset 0x1E from the beginning of the header we find a 2-byte value that identifies the sector size used in the document, which is usually 512-bytes/sector. Now, knowing the size of each sector that makes up the file, we can start looking for document structures and where within the file they should be located. As noted in the Zip file process mentioned earlier in this paper, the difference between the EXPECTED location of a structure and its ACTUAL location is the size of the fragmented data that doesn’t belong to the file.

At file offset 0x2C, we find the # of sectors used by the Sector Allocation Table (SAT). Next, at file offset 0x30 we find the starting sector number (within the file) of the file’s Directory. Another important file structure is the Short-Sector Allocation Table (SSAT), whose starting sector # is located at file offset 0x3C, followed by the number of sectors making up the SSAT, located at file offset 0x40. Not all compound documents utilize a SSAT, in which case you can ignore these 8 bytes. And lastly, we look at the Master Sector Allocation Table (MSAT), whose starting sector # is located at file offset 0x44, followed by the number of sectors making up the MSAT, located at file offset 0x48. The following 436 bytes of data, which make up the rest of the first 512 bytes of the compound document file, contain the first 109 sector IDs (SID) of the MSAT and starts at file offset 04C.

So, now that you know where certain items should be located, the next step is to located them on the disk and find out if they are located at the expected sector number in relation to the start of the document.

First, using the first sector of the MSAT from the 4-byte value at offset 0x4C, search for “01 00 00 00 02 00 00 00 03 00 00 00 04 00 00 00” to find the beginning of the MSAT and compare the sector number you find the MSAT located at with the results of the sector # of the start of the document plus the 4-byte value at offset 0x4C. If there is a difference, then a fragmentation occurs before the start of the MSAT.

Secondly, search forward for the beginning of the Directory, starting from the document’s header. The signature for the start of the Directory is “52 00 6F 00 6F 00 74 00 20 00 45 00 6E 00 74 00 72 00 79 00” (or “Root Entry” in case sensitive Unicode). There may be left over instances of previous Directory Entries from previous file edits, so look for more than one instance of the “Root Entry”. Once you find the sector # of the start of the Directory, subtract the sector # of the start of the document, and compare the result against the 4-byte value at file offset 0x30. If the result matches your 4-byte value then no fragmentation exists between the start of the file and the Directory. If there is a difference, the difference is the amount of fragmented data that doesn’t belong to the document.

And lastly, review of the individual Directory Entries for the starting sector numbers and stream size of the objects will assist in determining where, before or after each object, any file fragmentation occurs.

The largest object within the compound document is most likely the “WordDocument” object, or”Workbook” object for spreadsheets. Which means that if fragmentation exists within a large compound document, it is likely that the fragmentation occurs within those streams. As was mentioned earlier, through a process of elimination and/or manual review of the carved block for a block of data the size of your determined fragment for data that doesn’t belong to the document.

The directory is an array of directory entries. Each directory entry is a 128-byte entry and is listed in order of their appearance in the document. It identifies the starting sector # of that file object, at directory entry offset 0x74 and the size of that object (in bytes) at offset 0x78.

MS Compound Document Files Read More »

Zip Files

Articles / 2009-01-06

The first compound file format that we will look are Zip files, as specified in the document “APPNOTE.TXT – .ZIP File Format Specification”, revision date January 6, 2006 from PKWARE,Inc. For complete details of the file format specification, please refer to the hyperlink to the document, listed on page 1. The information described below applies to most common Zip files created with current versions of Zip archive utilities, such as WinZip.

A Zip file is broken into specific parts that can be searched for and identified based on separate signatures. The basic layout of a Zip file is first the individual compressed files within the archive.

These individual files are known as “local files” and start with a local file’s decryption header of”50 4B 03 04″, followed by the file data for the compressed local file and then followed by a data descriptor, which can be identified by the signature “50 4B 07 08”. This sequence of decryption header followed by file data, followed by data descriptor continues for each local file within the archive. “The decryption header will contain the value of the local file’s compressed file size, which includes the bytes of the decryption header, unless bit 3 of a 2-byte general purpose flag located at offset 0x06 in the decryption header is set. If this bit is set, then the compressed size is stored in the “data descriptor” that immediately follows the local file’s data, and is also stored in a central directory record for the local file, as part of the central directory located that is after all individual local files in the archive.

The central directory at the end of each Zip archive can be identified by searching for the signature “50 4B 01 02”, which identifies the beginning of each central directory record contained within the central directory. And lastly, the signature “50 4B 05 06” identifies the “End of the Central Directory Record”, which identifies the size in bytes of the central directory and it’s starting offset location in relation to the beginning of the first local file decryption header in the archive.

Upon identifying the signature “50 4B 05 06”, and using the size and starting offset information in the “End of Central Directory Record”, you search backwards from the beginning of the “”50 4B 05 06” the correct number of bytes (directory size + starting offset) and determine if that leaves you at the signature “50 4B 03 04”, which is the beginning of the first local file and the start of the archive.

The same search can also be performed in a forward manner, starting at the first “50 4B 03 04” you find and searching forward to the first “50 4B 05 06” you find and comparing the distance between the two with the result of the directory size + starting offset, located at offset 0x0C of the “End of Central Directory Record”.

If the location of the “End of Central Directory Record” is at a further offset than your calculation, then you have a fragmented archive file. The difference between the actual locationyou’re your calculation is the size of the fragmented block of data that doesn’t belong to the archive file. The next step is determining where the fragment occurs and distinguishing between the archive data and the fragment(s) that don’t belong to the file.

To do this we next look at the data descriptor, if present, at the end of each local file in the archive, or the individual central directory records for each local file in the central directory. The compressed size of the local file, which includes the size of the decryption header for the local file, is locate at offset 0x14 of each individual central directory record, which starts with the signature “50 4B 01 02.”

Once you have determined the starting point of each local file in the archive, from its signature”50 4B 03 04″ and you have determined the length of the local file from either the data descriptor at the end of the local file or from the length stored in its central directory record at the end of the archive, you can now determine which individual local file(s) contain the portion of the overall archive that is fragmented.

Starting from the first local file decryption header and going forward by the “size of compressed file” found in either of the two above locations, we should find the start of the next local file decryption header. If this brings you to the start of the next decryption header then this first local file is not fragmented. Continue with this method until there is a difference between the expected start of the next local file decryption header and the ACTUAL start of the specific local file decryption header. The size of the difference is the amount of fragmentation that has occurred. This difference is compared with the overall difference noted earlier between the overall size of the archive and the location of the “End of Central Directory Record” to determine if this is the entire amount of fragmentation within the archive or if more instances of fragmentation exist in another of the local files in the archive.

Once all individual local files in the archive, that contain fragmentation, are identified, and the size of the fragmentation is noted, you now review sectors of the fragmented local files for a block of data the size of the identified fragment that doesn’t belong. This can sometimes be more difficult to determine than other times, depending on the type of the fragmented data.

Zip Files Read More »