Why were the photos recovered correctly, but not the videos?
A fairly common problem when recovering data from formatted memory cards used in digital cameras and camcorders is that photos are usually recovered correctly, but videos are damaged. This is most often due to the way data is written to the card when taking photos and recording videos. In the case of photos, this is quite simple, as all the material needed for recording is already
known at the moment the sensor is exposed. You can immediately create an appropriate header containing, among other things, Exif information and save the entire file in the appropriate place on partition.
The situation is different for videos. The recording device does not know how long the video will be. Therefore, it is not able to create the appropriate header until the recording is finished. It is also too risky to keep the recorded video material in the buffer until the end of recording, as this can easily lead to the buffer overflowing. For this reason, the recorded material is immediately saved to the card before the file
header is created.
Camera and camcorder software handles this task in different ways. It is possible to reserve an area where the header will later be created or to create additional auxiliary files. However, the simplest and therefore quite popular solution is to first save the video stream as it is recorded and only at the end create the the header. This
approach leads to the fragmentation of the file into two parts – a larger one containing the main part of the file and a smaller one, typically the size of one cluster, containing the file header.
And as long as there is valid metadata describing this file on the partition, there is no problem with reading the clusters in the correct order and opening the file correctly. However, if the partition is formatted, its metadata is overwritten with new metadata describing an "empty" partition. And this, due to the way data recovery programmes work, makes it much more difficult and virtually impossible to recover fragmented files automatically.
How do data recovery programmes work?
Essentially, all data recovery programmes operate in a very similar manner. They scan a given area of the media and look for characteristic patterns, which may be fragments of logical structures or files. Data recovery programmes typically have two basic scanning modes: fast and thorough. The names of these modes may vary in specific programs, but the functionality is very similar.
Fast scanning involves searching for remnants of file system logical structures in locations where they are most likely to be found (usually the beginning and end areas of both the entire media and identified partitions), and then following the addresses indicated in the found structure fragments. This approach often allows for a quick positive result in the case of minor damage to logical structures, but in more serious situations, it is too imprecise and may not find many files that are still recoverable.
Thorough scanning is slower because it analyzes each sector for content that may correspond to structures defined in the program's database. This mode allows you to find files not only within the logical structure described by file system metadata, but also independently of it, based on signatures corresponding to individual file types (so-called RAW search, чёрновое восстановление). Most file types have characteristic structures (signatures) that allow you to identify the file type regardless of its extension, marking their beginning, and part of them also marking their end.
If the metadata describing a file and its location on a partition is damaged, searching by signatures often becomes the only practical way to recover it. This method involves copying a sequence of sectors starting from the identified file signature and continuing to the end signature (if present in the given file type), the beginning of the next file, or the end of the partition (hence, in some cases, programmes can recover files of absurdly large sizes). For this reason, fragmented files can only be partially recovered, sometimes containing fragments of foreign files, which directly causes corruption. Files recovered using signatures lose their original names, directory locations, and other attributes described by file system metadata.
Basic information about FAT file systems.
FAT file systems (the name comes from File Allocation Table) have a history dating back to the late 1970s, floppy discs, and the CP/M operating system. This family of relatively simple file systems is widely supported by operating systems, making them widely used in portable storage media, including memory cards used in digital cameras. The FAT file system family currently includes four systems: FAT12, FAT16, FAT32, and exFAT, sometimes referred to as FAT64. These systems differ in the details of their metadata, which we won't go into here, but their general architecture and operating principles are common.
A FAT partition begins with a reserved area containing the boot sector, and starting with FAT32, also includes a copy of it and the FSInfo structure. The internal structure of the boot sector varies depending on the system version, but it always includes key partition information, such as its location, cluster size (expressed as a number of sectors, which is always a power of two), the number, location, and size of file allocation tables, and so on. Next comes the area where the file allocation tables are stored (usually two), and in the case of FAT12 and FAT16 systems, the root directory area. The rest of the partition is occupied by the clustering area, which is used to store data.
In FAT32 and exFAT systems, the root directory is located in the clustering area, which avoids situations where no more objects can be added to it due to the root directory itself running out of space, while generally there is enough free space on the partition in the clustering area. When, in newer versions of FAT systems, the root directory occupies an entire cluster, its remaining contents are placed in the next cluster, typically becoming fragmented. The internal directory structure also differs between file system variants, but detailed knowledge of it is not necessary. It is enough to know that the directory entry indicates the location of the first cluster occupied by a given file and contains information about its size.
Because file sizes typically exceed cluster sizes, files typically occupy multiple clusters. A file allocation table (FAT) stores information about the location of files within specific clusters. It contains a sequence of numbers: 12-(for FAT12), 16-(for FAT16 or 32-bit (for FAT32 and exFAT). If we pay attention to the cluster numbers in FAT32 partitions, we will notice that the 4 most significant bits of the cluster numbers are always set to zero, so the addressing is actually 28-bit. This is justified by the old, 28-bit version of LBA addressing (Logical Block Addressing). When FAT32 was created, the limitations of LBA addressing did not allow for the practical use of 32-bit addressing anyway.
The first two fields of the file allocation table (corresponding to clusters 0 and 1) are occupied by a signature identifying this table. For this reason, clusters in FAT file systems are numbered from 2. Then the next fields contain the numbers of clusters containing subsequent fragments of the file. In addition, in the file allocation table contains values of special significance.
A zeroed value indicates a free cluster. A number composed of bits set to 1 (0x0FFFFFFF for FAT32 – because the 4 most significant bits are zeroed) indicates the end of the cluster chain, i.e. the last cluster of a given file. The values FF7, FFF7, 0FFFFFF7, and FFFFFFF7 (respectively for the individual FAT system variants) indicate a damaged cluster that will be bypassed during file allocation.
Consequences of formatting a FAT partition.
There are two main types of partition formatting methods: quick formatting and full formatting. Quick formatting involves zeroing the file allocation table (marking all available clusters on the partition as free) and deleting all entries in the root directory. This results in loss of access to the remaining partition content, which is no longer addressed in the logical structure. However, as long as it is not overwritten with new content, it is still recoverable.
After a quick format, not only files but also subdirectories remain on the partition, which usually allows for at least partial reconstruction of the logical structure. If we know the cluster size and can determine the beginning of the clustering area, we can associate found files with entries in the subdirectories indicating the cluster numbers in which they started. It's helpful that formatting is often performed with default settings, so the parameters of the new partition often have the same values as the old one. However, information about files addressed directly by the root directory is lost.
Unfortunately, the loss of the file allocation table means that only non-fragmented files, located in a continuous sequence of consecutive clusters, can be easily recovered. In the case of fragmented files, information about the location of individual file fragments is irretrievably lost. This is also a significant reason not to format partitions from which data is to be recovered, which is sometimes recommended by incompetent users on low-quality internet forums.
Full formatting means erasing the entire partition, thus irreversibly destroying its contents. During full formatting, the state of sectors is also checked, and if errors are found, clusters containing such sectors are marked as damaged. Currently, the default formatting method is usually quick formatting, but some devices perform full formatting.
The TRIM function also poses a risk to data recovery, as it can lead to the physical erasure of unallocated areas in logical structures. Therefore, on a computer used for data recovery, it's better to disable this feature. For newer operating systems, especially Windows 10 and Windows 11, it's also worth checking whether TRIM support hasn't been automatically enabled by the system during the update.
The practice of recovering videos after formatting a memory card.
Due to the way digital cameras and camcorders store video files, described earlier, fragmenting them is generally quite simple. Typically, we have one large, sequentially recorded file fragment containing all its clusters except the initial one, followed immediately after that sequence by the first cluster of the file and its header. Therefore, it's possible to assemble such a file into a single whole by combining the header with the appropriate sequence of clusters containing the video material in the correct order. How can this be achieved in practice?
Identifying the cluster size.
This is a relatively simple task. Formatting a partition with default settings will almost certainly result in exactly the same cluster size, so you can simply check it in the boot record. A slightly more difficult but more universal method, which also works in the event of a lost boot record, involves comparing the size of the file allocation table with the partition size. A partition with a specific size can contain fewer, larger, or more, smaller clusters. The number of clusters on the partition is mathematically related to the size of the file allocation table.
Another method, requiring little analytical effort, is to check the sizes of files found using a RAW search. Very often, the smallest files found this way will be the size of a single cluster. You can also search for directory fragments. Because of the way records describing subsequent files are added to them, directories often fragment into single clusters very easily.
As a last resort, this task can be approached by trial and error, exploring various possibilities. Knowing that cluster size in sectors is always expressed as a power of two, the number of variants that require testing is limited. It's a good idea to start with the most likely possibilities, such as the default value for a given file system and partition size.
Identifying the order of files.
This task is very simple, and the principles of data recovery programmes come to our aid. When searching by signatures, these programs automatically name found files, typically containing the LBA sector number in which the file was found. These names may be adorned with additional symbols (e.g., "f," "$," etc.) depending on the program author's idea, but to arrange the list of files in the order corresponding to their location on the partition, it is sufficient to sort the files by name.
Identification of the beginning of the cluster sequence with video material.
The difficulty of this part of the process largely depends on the content stored on the partition. If the partition contained only videos, this is relatively simple. After sorting the files in the correct order, simply cut out the clusters containing the file headers and paste them into the previous files.
However, in practice, we usually save both videos and photos to the memory card, usually in a rather random order. This means that a video can be saved not only after the previous video and recovered with the previous video's header, but also after the photo. When sorting recovered files as described above, we often notice that the photo preceding the video is unusually large, while videos preceding photos are often only a single cluster in size.
This is because the data recovery program, failing to encounter the signature of the next file's beginning, appends subsequent clusters to the previous file, even if they exceed its original size. Photos are very tolerant of appended content and open correctly regardless of the extraneous content they carry. This can be a reason why they can be used, for example, to transmit malicious code. For our purposes, the key here is to free the image from the burden of redundant clusters and combine these clusters with the appropriate video file header.
So how can we find the end of the image and the beginning of the video in such a large file? If we're dealing with files with an end-of-file signature, we can try to locate it and thus determine where one file ends and a fragment of the next begins. It's important to remember that while the start-of-file signature is located in a predictable, usually fixed location (not always the first bytes of the file), the end-of-file signature can occur anywhere within the sector. This requires appropriate search engine configuration.
The trial-and-error method is simple and doesn't require any special skills, but it's also tedious and time-consuming, so it won't work when combining large numbers of files. It involves cutting off a file fragment corresponding in size to other typical photos as the image, assuming that the camera saves the images to files of a similar size. If you split the file incorrectly, the video won't play, but if the photo still opens correctly, it's still too large. If you cut too much of the file as video and leave too little of it as a photo, the photo won't open completely, and the ratio of the correct image to the damaged portion will roughly match the ratio of the damaged file size to its actual size.
It's much easier to trim erroneously attached clusters from a file if you know its size. By dividing the file size by the cluster size, you can precisely calculate the number of clusters occupied by the file. Remember to always round up the result, because if even one byte of the file occupies another cluster, that cluster will be entirely dedicated to that file. File size information can sometimes be found in its internal metadata. For photos, it's worth checking whether this information is stored in Exif.
A more difficult, but more universal, method involves converting the LBA address determined from the file name to the partition's cluster address and finding the directory record describing the file based on the cluster number it indicates. This is feasible because digital cameras typically don't save files to the root directory, but to subdirectories with names like "DCIM," "IMG," "MOV", etc., so the contents of these subdirectories are not lost during formatting. Finding the appropriate record allows for an accurate determination of the file size based on the information contained in the directory record.
This method can be used to determine the size of both photos and videos. In the first case, we will know how much of the real photo is in the photo found by the data recovery program. In the second, we can calculate how many clusters we need to go back to cut the video stream, which we will then append to the cluster containing the appropriate header. A practical disadvantage of this method is that subdirectories containing a large number of files are easily fragmented and are often scattered in single clusters across the entire partition. The most likely location for the next subdirectory cluster is the cluster immediately following the end of the file described in the last record of the current subdirectory cluster.
Another practical method for separating a photo from the video material attached to it by the data recovery program is based on the fact that compressed data, such as *.jpg files and various video formats, have high entropy, and part of the sector after the end of the file is zeroed. Finding a zeroed fragment within the file, aligned to the end of the physical allocation unit (or, in the case of semiconductor media, often to the end of the cluster), usually allows us to locate the place where the image should be separated from the video stream. With each of the methods described above, it is also important to be aware of the risk of cluttering the files recovered by the programmes with fragments of directory contents. This content is generally easy to identify, even by inexperienced users; its fragments can be read in ASCII.
Combining into a whole and checking the files.
The final part of the process is combining the identified and properly sorted "cluster header – rest of file" pairs into a single unit. This task, like any other part of the process, can be performed using any hexadecimal editor; however, for practical reasons, it's worth choosing a program that allows for convenient work with LBA sector addressing, and ideally, cluster addressing. Working with larger allocation units will help avoid errors resulting from imprecisely selecting copied file fragments.
Fragmentation - level hard.
In certain situations, fragmentation can be more serious than just the splitting of video files into two parts due to the recording method. FAT file systems are quite simple, and their drivers lack sophisticated algorithms to prevent file fragmentation. Fragmentation can be particularly facilitated by, for example, the presence of clusters freed up by previously deleted files scattered across the partition.
File corruption and fragmentation can also occur when errors occur in the metadata describing the logical structure of the file system and the chkdsk program is run. This program often damages the logical structure even further by addressing some clusters, often belonging to existing files, as "file№№№№.chk" files. Due to the high risk of causing secondary damage, using tools such as chkdsk, fsck, or scandisk is not recommended. Without first securing the initial state by making a sector-by-sector copy, using these tools is a serious mistake that can significantly impede or even prevent data recovery.
The simplicity of FAT-type systems offers hope for effective file recovery after fragmentation. They typically allocate files in the first available clusters. This allows for attempts to separate clusters occupied by healthy and complete files and reassemble at least some of the files from the remaining clusters. Information stored in directory records about the location of the first cluster and the size of deleted files (these files will have their first filename symbol replaced with 0xE5) and missing files can be useful here.
This facilitates combining clusters into larger groups, potentially containing fragments of a single file, and further attempts to reassemble files from larger fragments. If the cause of file loss was damage to logical structures other than formatting, such as incorrect writes, using chkdsk, etc., it is also worth analyzing the file allocation tables – preferably both copies, paying attention to any discrepancies between them. For some file types, analyzing their internal metadata and structure can also be useful. This can make it easier to select the correct cluster chains and assemble them in the correct order, but the more the partition has been used and the more writes have been made to it, the more difficult this task will be.
An additional complication can be the occurrence of cross-linked files. Cross-linked files are different files, at least partially allocated in the same clusters. Of course, a cluster can contain the content of only one of these files, so it will contain a fragment of the file last written to the cluster. This means that fully correct recovery of files whose fragments were stored in that cluster will not be possible, as their content has been at least partially corrupted (overwritten) by another file.