(copied with permission from dolljunkie)

The better your 4x4, the further out you get stuck...

A tale of stupidity compounded

When working on the TRIUMF AG documentation, I wanted to make a small change to all the files to fix a syntax warning from tidy. So I did a quick shell script, similar to ones I've used on countless occasions. However, I did
#!/bin/sh
for f in *.html ; do
sed 's/rel=/type=text/css rel=/' $f>xxx ; mv xxx $f
done
instead of
for f in *.html ; do
sed 's%rel=%type=text/css rel=%' $f>xxx && mv xxx $f
done
which, as there was a syntax error, ended up clobbering all the files..

The files hadn't been backed up recently (a rather protracted system upgrade had "temporarily" removed the second disk), but I keep a record of filesystem inodes in case of accidental deleting. So I tried restoring a set of consecutive inodes to a temporary directory using debugfs, and recovered a few files. I then deleted the other recovered files and directories, using "rm -r". This was the second stupid mistake (third counting the lack of backups). I should have used debugfs to unlink them, and realized that recursively deleting a hard-linked directory called "930453" would delete useful files, not just copies..

It seems that as I had overwritten the HTML files, rather than deleting them, that the first inode of each file was missing. Also, the shotgun approach of linking consecutive inodes may have given access to partly-overwritten documents, but it also linked some unrelated directories as well as the photo archive directory. The subsequent delete operation zapped the image files, so I was worse off than ever.

However, having the inode numbers of the original image files, I was able to re-link them and recover most of the images (less a couple whose inodes got allocated to something else and overwritten).

Some of the HTML files were backed up on another computer, but several months before. I also had paper copies of many of them. I was able to recover 15 from my browser cache at home (turn off timestamp checking, browse the site and save the pages as they appear on the screen). An attempt to recover pages from a browser cache at TRIUMF (which had been used to print the documents) was less successful. As the browser was still running, I killed it in order to get access to the disk cache from a new session at home. It is possible that some pages were in RAM cache, and if I'd been less hasty I might have recovered them using the console keyboard.

When checking the server logs to see if anyone else at TRIUMF might have cached copies, I found that Google had been crawling the site. The next morning I realized that I might be able to recover pages from Google's archive. Sure enough, a search of "site:andrew.triumf.ca AG" turned up 27 pages which I could recover by clicking "cached copy", saving then stripping off the top 8 lines (plus removing Google's highlighting of "ag").

Anyhow, it is all back in place again (and backed up on another machine)


TRIUMF VideoConferencing

A.Daviel