Catching the Ghost of a File

Boris Loza
8 min readJun 6, 2021

Boris Loza, PhD, CISSP

In the book Takedown, by Tsutomu Shimomura and John Markoff, that tells about the pursuit of Kevin Mitnick by Tsutomu Shimomura, you can read that “… I could make out patterns of information still stored on my computer’s disk that revealed the ghost of a file that had been created and then erased. Finding it was a little like examining a piece of paper on a yellow legal pad: even though the top page had been torn off, the impression of what was written on the missing sheet can be discerned on the remaining page.”

How can one find deleted files?

Elementary, Watson! In the UNIX world, everything is a file. UNIX treats hard disks, printers, modems, directories, etc. as files. When a file is created, it is assigned an inode (a number). When a file is deleted its inode becomes 0. But after this, a file doesn’t vanish — information about this file remains on the disk.

Listing Deleted Files

Because a directory is also a file, we will use commands that can be used to look inside files: od and cat.

First, we will try to get the octal dump (using the od command) of the directory to look for deleted files. Let’s take a look at all files in the directory. This will help us to better understand what we will see later. For example (all outputs are taking from Solaris 9):

$ ls -a

. .. Project status webstat.log

We can see that there are the following files in this directory: Project, status, webstat.log and also the directory itself (.) and the parent directory (..). Let’s try the following:

$ od -c .

0000000 \0 \b 023 337 \0 \f \0 001 . \0 \0 \0 \0 013 006 P

0000020 \0 \f \0 002 . . \0 \0 \0 \b 023 340 \0 020 \0 007

0000040 P r o j e c t \0 \0 \b 023 341 \0 024 \0 013

0000060 w e b s t a t . l o g \0 \0 \b 023 342

0000100 001 304 \0 006 s t a t u s \0 \0 \0 \0 \0 \0

0000120 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0

*

0001000

$

The od command with -c option displays single-byte characters. Certain non-graphic characters appear as C-language escapes:

null \0

backspace \b

form-feed \f

new-line \n

return \r

tab \t

Others appear as 3-digit octal numbers.

In the UNIX man pages one can read that the od command with the ‘-c’ option starts each displayed line with the number of bytes, shown since the start of the file. The first line starts at byte 0. The second line starts at byte 20. And so on.

One can easily find our file names on this output. Let’s delete the status file and compare the od –c output with the previous one.

$ rm status

$ ls -a

. .. Project webstat.log

$ od -c .

0000000 \0 \b 023 337 \0 \f \0 001 . \0 \0 \0 \0 013 006 P

0000020 \0 \f \0 002 . . \0 \0 \0 \b 023 340 \0 020 \0 007

0000040 P r o j e c t \0 \0 \b 023 341 001 330 \0 013

0000060 w e b s t a t . l o g \0 \0 \0 \0 \0

0000100 001 304 \0 006 s t a t u s \0 \0 \0 \0 \0 \0

0000120 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0

*

0001000

Note that we still can see the status file. The only difference is, that instead of 023 342, Solaris UFS put NULs. It is not possible to restore a deleted file unless you have a backup. However if you want to restore a text file and you don’t have a backup, you can still try your luck (I will show you how to recover a deleted text file in the next article).

The cat -v –t –e turns non-printable characters into a printable form.

A directory usually has some long lines, so it’s a good idea to pipe cat’s output through fold:

$ cat -v -t -e . | fold -62

^@^H^SM-_^@^L^@^A.^@^@^@^@^K^FP^@^L^@^B..^@^@^@^H^SM-`^@^P^@^G

Project^@^@^H^SM-a^AM-X^@^Kwebstat.log^@^@^@^@^@^AM-D^@^Fstatu

s^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^

@^@^@^@^@$

You may try to filter out non-printable characters. You can use something like this (or use your own method):

$ cat -v -t -e . | fold | sed “s/[\x00-\x08\x80-\x88\x0B-\x19\x8B-\x99\x7F\xFF]”//g

Or:

$ od -c . | sed “s/[\001-\010\013-\037\177-\377]//g”

Understanding the Output

To understand more about all these outputs, let’s take a look at the UNIX File System (UFS) directory structure that can be found in the /usr/include/sys/ufs_fsdir.h (taking from Solaris OS):

struct direct {

uint32_t d_ino; /* inode number of entry */

u_short d_reclen; /* length of this record */

u_short d_namlen; /* length of string in d_name */

char d_name[MAXNAMLEN + 1];/* name must be no longer than this */

};

Now, we may better understand the output from od –c. We will analyze the output that has already been displayed above:

$ od -c .

0000000 \0 \b 023 337 \0 \f \0 001 . \0 \0 \0 \0 013 006 P

0000020 \0 \f \0 002 . . \0 \0 \0 \b 023 340 \0 020 \0 007

0000040 P r o j e c t \0 \0 \b 023 341 \0 024 \0 013

0000060 w e b s t a t . l o g \0 \0 \b 023 342

0000100 001 304 \0 006 s t a t u s \0 \0 \0 \0 \0 \0

0000120 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0

*

0001000

As you can see from this output, and based on what we learned about UFS directory structure, one can recognize the file names — d_name[MAXNAMELEN +1] (., .., Project, webstatu.log, and status). Before each file name, we can see a number that represents the length of string in d_name — 001 for ., 002 for .., 007 for Project, 013 (11 decimal) for webstat.log, and 006 for status.

What can also be recognized in this output are two octal digits that represent the inode number in the list of inodes — d_ino. This is 013 006 for .., \b 023 337 for ., \b 023 340 for Project, \b 023 341 for webstat.log, and \b 023 342 for status (\b — 010). You may find a sequence of octal numbers: all of them (except the parent directory “..” have \b 023 in common, following by 337, 340, 341, and 342 (Note, that in the octal world, 340 (224 in decimal) goes directly after 337 (223 in decimal)). These files have been created one after another and if we take a look at their inode numbers, we find that their numbers follow each other (except of course the parent directory “..”):

$ ls -ia

529375 . 529376 Project 529377 webstat.log

722512 .. 529378 status

Now it’s time to dissect the cat -v output:

$ cat -v -t -e . | fold -62

^@^H^SM-_^@^L^@^A.^@^@^@^@^K^FP^@^L^@^B..^@^@^@^H^SM-`^@^P^@^G

Project^@^@^H^SM-a^AM-X^@^Kwebstat.log^@^@^@^@^@^AM-D^@^Fstatu

s^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^

@^@^@^@^@$

You’ve probably seen sequences like ^N and ^G in the cat –v output. Those are control characters. Another character like this is ^@, the character NUL (ASCII 0). There are a lot of NULs in the directory; more about that below. A DEL character (ASCII 177 octal) is shown as ^?. Check an ASCII chart.

cat -v has its own symbol for characters outside the ASCII range with their high bits set, also called metacharacters. cat -v prints those as M- followed by another character. There are two of them in the cat -v output: M-^? and M-a.

To get a metacharacter, you add 200 octal. Let’s look at M-a first. The octal value of the letter a is 141. When cat -v prints M-a, it means the character you get by adding 141+200, or 341 octal.

You can decode the character cat prints as M-^? in the same way. The ^? stands for the DEL character, which is octal 177. Add 200+177 to get 377 octal.

If a character isn’t M-something or ^something, it's a regular printable character. The entries in the directory (., .., Project, webstat.log, and status) are all made of regular ASCII characters.

Note: The status file is shown by the command, although it has been deleted from the directory. UNIX puts two NUL (ASCII 0, or ^@) bytes in front of the name when a file has been deleted.

-v Non-printing characters (with the exception of tabs, new-lines and form-feeds) are printed visibly. ASCII control characters (octal 000–037) are printed as ^n, where n is the corresponding ASCII character in the range octal 100–137 (@, A, B, C, . . ., X, Y, Z, [, \, ], ^, and _); the DEL character (octal 0177) is printed ^?. Other non-printable characters are printed as M-x, where x is the ASCII character specified by the low-order seven bits.

When used with the -v option, the following options may be used:

-e A $ character will be printed at the end of each line (prior to the new-line).

-t Tabs will be printed as ^I’s and formfeeds to be printed as ^L’s.

cat has two options, -t and -e, for displaying white space in a line. The -v option doesn’t convert TAB and trailing space characters to a visible form without those options.

Next, it’s time for od –c, which is easier to explain than cat -v:

od -c shows some characters starting with a backslash (\). It uses the standard UNIX and C abbreviations for control characters, where it can. For instance, \n stands for a new line character, \t for a tab, etc.

The \0 is a NUL character (ASCII 0). It's used to pad the ends of entries in V7 directories when a name isn't the full 14 characters long.

od -c shows the octal value of other characters as three digits. For instance, the 007 means "the character 7 octal. cat -v shows this as ^G (CTRL-g).

Metacharacters, the ones with octal values 200 and above, are shown as M-something by cat -v. In od -c, you'll see their octal values – e.g., 341.

Each directory entry on a UNIX Version 7 file system starts with a two-byte pointer to its location in the disk’s inode table. When you type a filename, UNIX uses this pointer to find the actual file information on the disk. The entry for this directory (named .) is 023 337. Its parent (named ..) is at 002. And Project's entry is 007 \n.

When entries are deleted from a directory, the space is returned to the previous entry in the same directory block by increasing its dp->d_reclen (See struct direct from /usr/include/sys/ufs_fsdir.h file — above).

Conclusion

In this article we have showed you that you can still find all files that were once created on the system. This may be particular helpful when you are trying to investigate an intrusion incident. You can find more hands-on tips and tricks in my book UNIX, Solaris and Linux: A Practical Security Cookbook, which deals with securing UNIX OS without any of the third-party tools.

--

--

Boris Loza

Dr Loza, professor of cybersecurity, is an award-winning professional. He published many articles in cybersecurity magazines and is an author of several books.