Visits from Murphy – recovering data from a failing hard disk
I’m a fan of Adobe software (Photoshop, Lightroom, etc.) I am not a fan of poorly performing software especially when it dictates what will occur on my system instead of me directing things… In this case, what started out as a rant about software became a full-fledged visit from Murphy – a hard drive failure… My apologies to Adobe for thinking that their software was the cause of my system woes.
The scenario:
I had just completed updating a number of new images (editing meta-data for post-processing) when the behaviour of my aging XP system became somewhat erratic (Photoshop would not run automated scripting processing which ran well just a few days ago.) I speculated that some recent software update was interfering and that a reboot would resolve things – wrong…
Stepping back a minute, my photo-image workflow sequence is something like:
- capture images
- copy images files from media to hard disk (live, temporary) online backup on a separate system or external disk [files are removed after the DVD in step 6 is created]
- copy from media to hard disk work folder(s) [for most non-client projects I skip step 2]
- rename and update meta-data in image files; discard rejects
- generate web-sized images
- archive originals to DVD
- format camera media (note that this is where I prefer to do this since leaving the files on the media is both the original and a backup until the format is done.)
- archive on disk (space permitting)
Resuming the saga:
On reboot the system was extremely sluggish and I blamed the poor performance on the updates from Adobe (visual message presented for Adobe Reader updates) as well as my firewall software (which was also updated.) [Everything at once, right!] After a seemingly, endless, outrageous amount of time the system completed the boot sequence and I attempted to access my image folders. Opening ‘My Computer’ took, well, another long period; and then, to my surprise – three data drives (logical partitions) we shown as ‘un-formatted’. Uh-oh.
This particular system had three hard drives (two 160 GB drives and one 200 GB drive.) Each drive was partitioned for specific uses as well as ‘replication’. The system was also configured for multi-os booting (Xp, Linux in this case.) Since the 160 GB drives were from the same year I decided to install one, new 500 GB drive and remove the ‘good’ 160 GB as a backup/spare disk. I just happened to have a ‘rescue’ drive (40 GB with XP already installed) so my replacement sequence was:
- disconnect all drives
- install rescue drive and update to latest XP base (security patches, etc.)
- install disk tool software from 500 GB (~465 GB free!) drive vendor
- install new 500 GB drive
- use vendor disk tools to create a new, bootable copy of the 40GB (clone my boot drive onto the new, 500 GB drive and re-size at the same time)
- shutdown, disconnect ‘rescue’ disk, set 500 GB disk as ‘boot disk’ and power up
Now I have a working system on the new disk – can I get the data back from the ‘lost drive’? [Note with appropriate backups the next efforts should not be needed - just restore from backups after replacing the failing disk. I am venturing forth simply to re-explore some of the (mostly) open-source solutions available at this time.]
—
A few days pass as I attend to other projects… I had previously use a Linux tool called ‘testdisk‘; testdisk may be able to recover/save a disk image which you can then copy to a new disk. After attempting to work with the disk on the original system I removed it and used an external USB disk housing to continue working on the recovery process. This both provides flexiblity for working using different (newer in my case) hardware (my Quad system) as well as gets the disk into a cooler environment outside of the tower case. [The temperatures inside of PC cases can be quite high and lead to failure of components including disk drives...] My data rescue restoration steps will be something like:
- move failing drive to external, USB housing
- attach to newer hardware (running Fedora)
- create a partition to store the disk image(s) from the failing disk
- use ‘testdisk’ to create disk images for selected partitions
- replicate these images to the new, 500GB disk
- continue previous system use (hopefully)
NOTE: use of tools like testdisk can lead to data loss; this should only be attempted by experienced SAs who have a ‘plan B’ at the ready. Caution is always advised when using such tools…
Output from the testdisk utility:
TestDisk 6.10, Data Recovery Utility, July 2008
Christophe GRENIER <grenier@cgsecurity.org>
http://www.cgsecurity.org
TestDisk is free software, and
comes with ABSOLUTELY NO WARRANTY.
Select a media (use Arrow keys, then press Enter):
Disk /dev/sda – 640 GB / 596 GiB – ATA WDC WD6400AAKS-2
Disk /dev/sdb – 500 GB / 465 GiB – ATA MAXTOR STM350032
Disk /dev/sdg – 160 GB / 149 GiB – Initio ST3160023A
[Proceed ] [ Quit ]
Note: Disk capacity must be correctly detected for a successful recovery.
If a disk listed above has incorrect size, check HD jumper settings, BIOS
detection, and install the latest OS patches and disk drivers.
I select the 160 GB drive (I started out using an OLD USB housing that did not support drives larger than 127 GB so I had to change to another external USB disk housing – note the warning above…)
Disk /dev/sdg – 160 GB / 149 GiB – Initio ST3160023A
Please select the partition table type, press Enter when done.
[Intel ] Intel/PC partition
[EFI GPT] EFI GPT partition map (Mac i386, some x86_64…)
[Mac ] Apple partition map
[None ] Non partitioned media
[Sun ] Sun Solaris partition
[XBox ] XBox partition
[Return ] Return to disk selection
Note: Do NOT select ‘None’ for media with only a single partition. It’s very
rare for a drive to be ‘Non-partitioned’.
I select Intel.
Disk /dev/sdg – 160 GB / 149 GiB – CHS 19457 255 63
[ Analyse ] Analyse current partition structure and search for lost partitions
[ Advanced ] Filesystem Utils
[ Geometry ] Change disk geometry
[ Options ] Modify options
[ MBR Code ] Write TestDisk MBR code to first sector
[ Delete ] Delete all data in the partition table
[ Quit ] Return to disk selection
Note: Correct disk geometry is required for a successful recovery. ‘Analyse’
process may give some warnings if it thinks the logical geometry is mismatched.
I select Advanced.
Disk /dev/sdg – 160 GB / 149 GiB – CHS 19457 255 63
Partition Start End Size in sectors
1 * HPFS – NTFS 0 1 1 3186 254 63 51199092 [BD_01_E]
2 P HPFS – NTFS 3187 0 1 6373 254 63 51199155 [Dsk_1-1_F]
3 E extended LBA 6374 0 1 19456 254 63 210178395
5 L HPFS – NTFS 6374 1 1 17336 254 63 176120532 [Dsk1-85GB_H]
X extended 17362 0 1 19146 254 63 28676025
6 L Linux 17362 1 1 19146 254 63 28675962 [/]
X extended 19338 0 1 19456 254 63 1911735
7 L FAT32 19338 1 1 19456 254 63 1911672
[ Type ] [ Boot ] [Image Creation] [ Quit ]
Create an image
I select the first partition.
Do you want to save disk file image.dd in /mnt/125_GB_ext3 ? [Y/N]
To select another directory, use the arrow keys.
drwxr-xr-x 0 0 4096 28-Apr-2009 12:38 .
drwxr-xr-x 0 0 4096 9-Jan-2009 19:00 ..
drwx—— 0 0 16384 29-Sep-2008 22:42 lost+found
-rw-r–r– 0 0 51 28-Apr-2009 12:54 testdisk.log
I enter ‘Y’ and the process proceeds. I had already changed into the desired folder/disk location (make sure you have enough free space!) The time required depends on hardware performance as well as if any disk problems are encountered during the disk reading.
Disk /dev/sdg – 160 GB / 149 GiB – Initio ST3160023A
1 * HPFS – NTFS 0 1 1 3186 254 63 51199092 [BD_01_E]
0 % >
Stop
I renamed the file from ‘image.dd’ to ‘BD_01_E.dd’ (testdisk uses the same output name so you will need to rename the images or change to a new location for each rescue attempt…) The steps above are repeated for each disk partition that you wish to save as a disk image (in this case, three images will be saved.) The images created for the small paritions were useable immediately. The larger disk, however, was a problem. When I attempt to mount it I get:
mount -t ntfs-3g image.dd tmp_DRIVE (mount the file ‘image.dd for access via the folder ‘tmp_DRIVE’)
Record 6 has no FILE magic (0xe5da6857)
Failed to open inode FILE_Bitmap: Input/output error
Failed to mount ‘/media/test-data/recover_disk/image.dd’: Input/output error
NTFS is either inconsistent, or you have hardware faults, or you have a
SoftRAID/FakeRAID hardware. In the first case run chkdsk /f on Windows
then reboot into Windows TWICE. The usage of the /f parameter is very
important! If you have SoftRAID/FakeRAID then first you must activate
it and mount a different device under the /dev/mapper/ directory, (e.g.
/dev/mapper/nvidia_eahaabcc1). Please see the ‘dmraid’ documentation
for the details.
Using a Samba share and dd, I copy the image file to a new partition on the 500 GB drive.
dd bs=131072 if=recovered_image_file of=/dev/new_partition
When this completes I reboot into Xp and the XP does a automated disk check (which reports errors for over a thousand files.) When the XP boot process completes I find that the disk is missing it’s main folder structure… I run the XP ‘error disk check’ process on the drive.
Hmm, the displayed folders show usage of only ~500 MB but the entire drive shows ~60 GB used. Where are the files? I reboot into Linux, mount the drive and find the ‘found.000′ folder. Ok, reboot to Xp and connect to the drive – I can’t ’see’ the ‘found.000′ folder. Diving into a cmd shell I connect to the drive an use ‘dir /a’ to see the folder. I attempt to change the attributes and remove the ‘SH’ (system, hidden) settings but that does not work. Hmm.
Next I simply use the full path on the explorer bar and gain access to the folder, i.e. D:\found.000. At this point I proceed to move the folders into new locations (some manual restoration to put things back where they should be on this disk.) As best I can tell, only one file was lost during this crash/rescue. I successfully restored the three disk image files to new partitions on the new 500GB drive. After I installed each partition I re-booted into XP which automatically ran ‘chkdsk’ on the new partition and restored ‘lost’ files.
In addition to testdisk, some other Linux tools that are useful for disk/data recovery include:
- ddrescue and
- dd_rescue (both provide rescue type operations
- ntfsclone -rescue (ntfsclone can be used to clone working drives)
- photorec (recover image/photo files)
As mentioned in other posts you should not attempt these types of operations without a secondary plan (i.e. hire a service to attempt the restoration.) Note that the time required for this type of process can be lengthy since the variables include your CPU, RAM, disks and sub-subsystem performance capabilities – I started many of the described actions and continued on with other tasks while the steps completed. Considering the time gaps between starting and completing each step this took about eight hours – without any gaps I estimate that this could have been completed in about four hours. In general, do your recovery on as new a system/OS as possible.
The final message – how current are your backups?
Related posts:
- Visits from Murphy – Seagate FreeAgent disk crash… I previously posted about our friend Murphy - he has...
- Sharing files/disk with virtual machines With a large, enterprise or cloud network it is common...
- Sharing files/disk with virtual machines With a large, enterprise or cloud network it is common...
- Sharing application data via VirtualBox (Vista/Linux) After flipping between Vista (64 bit) and Fedora 8/9 I...
- Converting Server Logs to GeoIP data (kml) – (1) This is part one of a multi-part part post on...