From the user's perspective, data is stored in files arranged in a logical directory structure. Directory trees are placed on partitions - logically separated parts of disks or their sets - RAID arrays. Storing files in an ordered structure is managed by file systems, which differ significantly in detail. Apart from these differences, we can notice certain similarities in all file systems. File systems place files in certain allocation units called, depending on the nomenclature of the specific file system, clusters (in file systems related to Microsoft) or blocks (typically in environments derived from Unix). Due to the fact that the concept of a block is very ambiguous, to avoid misunderstandings, we will use the term "cluster" to denote these allocation units.
Clusters contain a smaller or larger (but always equal to the power of two) number of LBA (Logical Block Addressing) sectors. LBA addressing was invented in the first half of the 1980s to simplify hard disk management, replace the CHS (cylinder, sector, head) addressing that referred directly to physical sectors and ensure compatibility between devices from different manufacturers. It was adopted by the ATA and SCSI standards, and is also used by other communication protocols used to exchange information with various data carriers.
If it is necessary to read or write data to the carrier, the clustered file system addressing is converted to LBA addressing and the appropriate commands indicate the appropriate sectors in this addressing. However, hard disks are still built from a smaller or larger number of platters, on which there are circular, concentric tracks divided into physical sectors. Therefore, it is necessary to properly assign LBA address numbers to specific physical sectors. This task is handled by part of the firmware - the logical to physical address translation subsystem, briefly called a translator.
The situation is similar in the case of semiconductor media that store data in Flash-NAND chips. They are also addressed using LBA addressing, although this type of media does not have platters, tracks or sectors. Flash-NAND chips address data in pages (this is the minimum unit of programming and reading) and blocks (minimum unit of erasure). Sectors are emulated by the controller only for the purposes of communication with the outside world. And also in the case of semiconductor media, the firmware contains a part responsible for translating physical addresses into logical ones, often referred to as the Flash Translation Layer (FTL).
LBA addressing is also used by RAID arrays. When we address a RAID array, we refer to its LBA addressing, which is then converted by the array controller into the LBA addressing of the individual disks that are part of it. If we are dealing with a complex environment in which arrays on arrays occur, the LBA address of the array we are addressing is first converted to the address of the arrays that make up it, and only then - to the address of the physical disks.
LBA addressing is also used by virtual machine files. Each of such files has its own file system installed, independent of the file system describing the virtual machine file. And inside such a file we have a virtual cluster address converted to virtual LBA addressing, which informs us which fragment of the virtual machine we want to read or write. And only with this information, these addresses can be converted to real clusters of the file system describing the virtual machine file and real LBA addresses of the physical medium.
When we buy a new disk, we think it is empty. Only when we initialize it, create partitions, format it and start placing some data on it, do we start to think that it is slowly filling up. We can also use the operating system tools to check how much free and how much occupied space we have. But we can only talk about occupied and free space in relation to the logical structures of the file system.
In the physical sense, every data carrier always stores some information, even if it is uninterpretable at the logical level. Referring to any LBA sector, we will always get some answer, regardless of whether we want to read a free sector in the sense of logical structures or occupied. In the case of new disks, it will be a sequence of bytes with the value 0x00, but in the case of used disks it is not so obvious.
Sometimes it may seem to us that free space means sectors filled with zeros, and sectors containing other content are occupied. In reality, it is different. The used space is the disk areas allocated in the logical structures of the file system, even if they contain sectors filled with zeros. If these zeros are described as part of a file or some logical structure and are interpretable in some way, then we cannot consider them as free space. On the other hand, even areas containing coherent and complete files, if they are not properly addressed in the logical structures of the file system, from their point of view will be free space in which other data can be placed. Such unaddressed files are not visible in the directory structure, but, unless they are overwritten, they are recoverable.
For decades, translating LBA to physical addressing in hard drives was relatively simple. Physical sectors received sequential LBA numbers starting from 0 to the last number, resulting from the nominal capacity of the drive. LBA addresses were assigned starting from tracks at the outer edge of the disk platter and their numbers increased moving towards the motor axis. This is related to the fact that tracks with a larger radius are longer, so more sectors can be placed on them than on shorter tracks with a smaller radius.
Thanks to this solution, the initial LBA sectors can be read faster, because more of them can be read with one rotation of the platter. This is easy to observe when testing the disk, where the read speed graph drops with increasingly higher LBA address numbers. In the case when the disk uses multiple platter surfaces, from time to time the continuity of the numbering on a given surface is interrupted in order to place subsequent addresses on other surfaces. The assignment of specific LBA addresses to specific platter surfaces is called a head map, because such a map tells us which head reads specific LBA address ranges.
One of the translator's tasks is to bypass bad sectors. From the point of view of the translation subsystem, bad sectors can be divided into factory and those created during operation. The factory defects are recorded on the primary defect list (P-list) and bypassed when assigning LBA address numbers. The others are revealed during the operation of the disk and recorded on the growth defect list (G-list), and because they already have assigned LBA address numbers, these numbers are assigned to reserve sectors in the remapping (reallocation) process.
In different firmware architectures and disk models, the details of the translation subsystem are different, but such detailed knowledge will not be needed to understand the operation of the TRIM function. In fact, the most significant change that occurred between the introduction of LBA addressing and the invention of Shingled Magnetic Recording was the replacement of 512-byte sectors with 4-kilobyte sectors (Advanced Format). In the case of the Advanced Format, one physical sector corresponds to 8 LBA sectors.
In their efforts to increase recording density, hard drives manufacturers noticed that the reading head is able to read a signal from a much narrower track than the writing head writes. Therefore, they developed the Shingled Magnetic Recording (SMR) method, which involves subsequent tracks partially overwriting the previous ones. This required changing the design of the recording head so that it induces a magnetic field asymmetrically, so that the strongest signal is recorded as close as possible to the previous track, but does not interfere with it.
A negative consequence of introducing Shingled Magnetic Recording was the loss of random access to the sector when writing. When we write new content to a sector, the magnetic field induced by the head damages the content of sectors located on further tracks. To avoid having to rewrite the entire surface of the platter each time, manufacturers group tracks into SMR zones with a capacity of several dozen MB and separate these zones with safe spaces.
Nevertheless, making changes to a single sector still requires rewriting the entire group of tracks. Since this has a very negative impact on write performance, manufacturers have decided to write new sector content not where they were physically located before, but where it is most convenient in a given situation. The details of solving this problem vary from manufacturer to manufacturer, but their common feature is breaking the relatively permanent attachment of the LBA address to a specific physical address.
In SMR disks, LBA address numbers can easily change their physical location, and the second level of the translator is responsible for tracking their current location. Such dynamic changes in the location of logical addresses in physical addresses require the introduction of special tables to the disk firmware that allow for recording these changes. Practically every write to the disk is associated with the need to record the new physical location of some LBA addresses. This results in an increased risk of errors in the second level translator modules, which is a common cause of SMR disk failure and loss of access to data.
A characteristic feature of semiconductor media is the inability to directly overwrite existing content. This results from the physics of storing data in them, which allows placing electrons only in empty floating gates of transistors, which must be erased before reprogramming. Therefore, if we change the content of the LBA sector, it is written to a different physical location, and the outdated content must be erased to free up a physical allocation unit for subsequent writes.
This solution requires that some physical allocation units in the media are always ready to write new data. Therefore, LBA addressing cannot cover the entire physical capacity of the media. The ratio of the number of available LBA sectors to the physical size of the media is a compromise between the desire to sell devices with the highest possible capacity and the need to maintain a reserve allowing for their efficient and failure-free operation.
The presence of write operations and erase operations in semiconductor media makes editing the contents of LBA sectors more complex than in magnetic media that allow direct overwriting. Writing new contents of LBA sectors in a different physical location forces recording their new location in translation tables. In turn, blocks containing invalid data are removed from LBA addressing and are intended for erasure. This method of data management means that LBA addresses are not tied to specific physical allocation units, but rotate around them in a way determined by the operations performed.
An additional complication in the case of semiconductor media is that they use two different allocation units: pages, which are the minimum unit of reading and programming, and blocks - the minimum unit of erasure. The size of pages typically allows for storing from one to 32 512-byte LBA sectors (their number in a page is always a power of two) and the redundant information necessary for the correct operation of the medium. In turn, blocks contain from several to several hundred pages, where until a certain point the number of pages in a block was a power of two, but for a good few years now this rule has not been followed.
This means that in practice changes occurring in the content of the medium at the level of LBA sectors cause physical allocation units to simultaneously contain both LBA sectors with current content and stale ones. This becomes problematic when the number of free blocks ready to accept new data begins to decrease. Another problem that semiconductor media struggle with is their wear by erase and write operations, which consequently lead to device failure. And it was these problems that underlay the implementation of the TRIM function.
The basis of the TRIM function is the fact that, at least theoretically, from the user's point of view, there is no need to physically store data that is not allocated in the logical structures of the file system. It is enough for the translation subsystem to know which sectors lie in areas occupied by files or logical structures describing them, and which in the logical sense are free space. And in the case of the latter, instead of physically reading the requested sectors, the controller can immediately send sectors filled with zeros in response.
In the operation of the TRIM function, one can see a certain analogy to sparse files, in which larger fragments of the file filled with zeros do not have to be physically written to the partition. In some file systems, it is enough for the metadata to contain information about how many clusters filled with zeros and exactly where in the file they should be inserted when reading it. And if the SSD disk firmware allows similar information to be included in the Flash translation layer, the translator will not search for the physical location of the sought sectors, but will inform the controller that these are "empty" sectors.
The introduction of the TRIM function in SSDs has increased their performance. The lack of the need to physically read sectors located in the "free area" of the disk means that we can receive a response faster. We can also maintain more erased blocks, which not only has a positive effect on write performance, but also reduces the load on the medium with programming and erasing operations and facilitates the operation of wear-leveling algorithms. This is important due to the life of semiconductor media, which depends to the greatest extent on write and erase operations.
Since data media do not interpret this data themselves, but only store it, for the TRIM function to work, in addition to its support by the disk firmware, a source of information about what is happening at the level of logical file system structures is also necessary. The disk must know from somewhere which sectors are "occupied" and which are not. Information on this subject is provided to the controller by the operating system managing file systems on the partitions on the disk. Currently, most operating systems in use support the TRIM function and support for this function is enabled by default.
However, not in every case does the operating system support the TRIM function for all supported file systems. There are also situations when the operating system does not want to work with some disks, although both the system and the media support the TRIM function. An example would be older versions of MacOS, which required external software to support SSDs from suppliers other than Apple.
For the TRIM function to work, it must also be supported by the communication protocol used to communicate with the media. Therefore, TRIM will not work in the case of older hardware that is incompatible with version 8 of the ATA standard. There may also be problems with the TRIM function support by RAID array controllers. This function is also not supported by most USB adapters. The equivalent of the TRIM function in the SCSI standard is UNMAP.
Data in semiconductor media is not written sequentially, but is scattered across different physical allocation units. This is influenced by many factors related to the performance and durability of the media. Flash-NAND memories are quite slow, so the performance of media using these chips is built by parallel data processing in many memories, similarly to RAID arrays that process data in parallel on many disks.
The second important cause of data scattering is the issue of Joule heat released during data writing and erasing. Programming and erasing operations in Flash-NAND chips use the quantum mechanical phenomenon of Fowler-Nordheim tunneling, which requires increasing the voltage inside the chip and is associated with energy losses. If physical writing in Flash-NAND chips were performed sequentially, frequent memory damage could occur due to local overheating of the chip.
Therefore, these memories are often logically divided into two or four parts, between which data is written in an interleaved manner. For the above reasons, data written in semiconductor media is dispersed in such a way that adjacent LBA sectors most often physically go to different pages. This is easy to see on media in which read errors occur. During scanning of such a medium, one can often observe sequences of damaged sectors occurring in interleaved with healthy ones. Such damaged sectors are read from the damaged page, and the correctly read sectors occurring between them are located on the healthy page.
When making changes at the LBA addressing level, the content of the changed sectors is written to other physical pages, located in other physical blocks. Since blocks contain many, currently usually hundreds of pages, it is easy to encounter situations when the blocks will contain some pages with current content and some pages that are outdated. As long as the block contains current pages, it cannot be erased and prepared to accept new data.
This situation is very inconvenient from the point of view of managing the medium's addressing, because it may turn out that too many blocks contain a small number of pages with current content and therefore cannot be erased. This can cause a significant decrease in the number of available erased blocks and make it difficult to write new data, although from a logical point of view the medium will still contain enough free space.
To solve this problem, the Garbage Collection procedure is used. Garbage collection, inspired by RAM garbage collection procedures, involves moving pages containing current content from blocks that contain relatively few current pages to other blocks. This allows for erasing blocks that no longer contain current data. This process usually takes place in the background, when the media controller is not busy handling received commands. The TRIM function allows for skipping pages that contain data that are current but have been deleted at the logical level during garbage collection. This not only speeds up the operation of the media, but also reduces its load on write and erase operations.
In the case of SMR disks, the use of the TRIM function is a consequence of the loss of random access to the sector when writing. This forced manufacturers to look for solutions to save the performance of SMR disks when writing, which resulted in the complexity of the logical-to-physical address translation subsystem. Different disk manufacturers use different approaches to the problem of writing data in SMR disks.
One of the solutions used is the use of a conventionally recorded track buffer - Media Cache. Data arriving to the disk is written in the buffer, and only later, when the disk is not burdened with executing user commands, is it transferred to the appropriate SMR zones. Thanks to this, in most cases, the long process of rewriting entire SMR zones and organizing the Media Cache buffer can take place in the background, unnoticeably to the user. However, writing very large portions of data at once is problematic, especially if this data includes a large number of small files. In such situations, the Media Cache buffer can be filled and the write speed can be drastically reduced.
Another approach involves writing data directly to SMR zones, where these are often different zones than those in which the given LBA sectors were located before writing. There is a certain similarity here to the rotation of LBA addressing on physical allocation units, which occurs in semiconductor media. Although in the case of magnetic media such as hard drives, we do not need the erase operation, because we can directly overwrite existing data, we would not want to damage the content of those sectors that do not change when writing the SMR zone.
Therefore, in the case of both solutions indicated above, the use of the TRIM function has a beneficial effect on the performance of the drives. Both when writing, when the drive knows which areas are not allocated in the logical structures of the file systems and can therefore save the time necessary to write them to the target location, and when reading, where it does not have to physically search for sectors containing "empty space", and instead can expose sectors filled with zeros to the external interface. Because, unlike solid-state media, write operations do not wear out hard drives, TRIM does not affect their lifespan as it does with SSDs.
In the case of media using the TRIM function, the fact that data is deleted at the logical level means that working at the LBA addressing level, we will not be able to recover deleted files. This is due to the fact that in response to a request to read sectors containing these files, the disk will return sectors filled with zeros. However, at the physical level, this data may still exist. However, the situation is different in the case of hard drives and SSDs.
Since semiconductor media must physically erase data, the process of physically destroying content deleted at the logical level begins basically immediately after its deletion. Importantly, this is a process that takes place at the firmware level, so if the media is connected to the power supply, it takes place independently of other commands received from the computer and cannot be stopped using write blockers commonly used in computer forensics. The process of erasing blocks can be stopped by unsoldering the memory chips and reading them on the programmer or by entering the SSD into safe mode.
In the case of Flash-NAND chips being desoldered, data recovery may encounter a number of practical obstacles, the most serious of which is encryption. Currently, encryption is commonly used in SSDs not only because it is well-marketed as an element of data security, but also because encryption algorithms randomize data well, which helps to reduce the occurrence of bit read errors. Where encryption is not used, dynamic randomization is often used, which is mathematically difficult to decode. Another serious obstacle that can be encountered when decoding binary images of memory chips is internal compression, the purpose of which is to reduce the volume of physically recorded data, and thus the number of write operations that wear the chips. For this reason, in far from every case, desoldering the medium is a solution that promises effective data recovery.
An alternative is to put the SSD into safe mode, in which the controller does not have direct access to the memory. If in safe mode you can access physical addressing while blocking background processes, it is possible to recover the part of the data that has not yet been physically erased. However, since the process of physically erasing data indicated by TRIM as deleted at the logical level can take place within a few to a dozen or so minutes, the chances of recovering anything in these ways are quite illusory.
The situation is slightly better in the case of hard drives that do not support data erase operations. In hard drives, data logically deleted in physical addressing remains intact until they are overwritten with other content. In their case, the response time and the fact of connecting the power supply in the case of TRIM support are not so critical for data recovery. Nevertheless, it must be remembered that SMR drives are not free from background processes, e.g. Media Cache organization, and not individual sectors are overwritten, but entire SMR zones.
In the case of data destruction, the TRIM function makes it difficult to verify the correctness of the task. It means that at the logical level we can get a response suggesting that the data has been effectively destroyed, while at the level of physical addressing they still exist and can be recovered. Only later do the block erasure operations taking place in the background lead to the actual destruction of the data.
Moreover, both in the case of semiconductor media and SMR disks, there is a risk of leaving undeleted data in areas outside the LBA addressing. This risk is greater in the case of hard drives, because in the case of semiconductor media, blocks that are outside the LBA addressing should be physically erased. Therefore, it is quite commonly recommended to use the Secure Erase procedure for data destruction, which has the ability to work closer to the physical addressing of the medium, which should ensure effective data destruction.
Nevertheless, doubts are sometimes raised about the correctness of the implementation of the Secure Erase procedure in the firmware of the media. An interesting example of incorrect implementation of the Secure Erase procedure in eMMC memories was demonstrated by Aya Fukami during the Flash Data Recovery & Digital Forensic Summit 2024 conference. In the presentation "Exploiting the eMMC security features using the VNR", she showed eMMC chips from which almost 100% of their content could be recovered, despite the fact that the Secure Erase operation had been performed and despite the fact that sectors with content filled with zeros were returned at the LBA addressing level. However, going down to the physical addressing level allowed us to conclude that the content of the physical addressing units was only slightly damaged and a significant part was recoverable.
For this reason, when destroying data, it is worth paying attention to the time of the Secure Erase operation. If it is performed suspiciously quickly, in a time shorter than the time required to write the media in its entirety, with a probability bordering on certainty, the data was not physically destroyed, and only appropriate operations were performed on the logical to physical address translation subsystem. In the case of encrypted media additionally, the encryption key could be destroyed (so-called cryptoerase), which significantly increases the security of the operation. Nevertheless, in such situations it is better to approach the implementation of the Secure Erase operation with limited trust and destroy the data by overwriting the entire medium.