Correct deletion of data is an important, yet often neglected and poorly understood aspect of information security. The need to securely, irreversibly delete data in such a way that it cannot be retrieved by others can arise for a number of reasons. It is often governed by legislation, such as GDPR - General Data Protection Regulation, or laws protecting state secrets or obliging private entities to protect certain categories of information as well. It can also arise from contracts and agreements governing the terms of cooperation and defining the scope of trade secrets. And sometimes, without any obligation, we want to protect our interests and privacy and do not wish outsiders to know everything about us. Deleting data also has its dark side in terms of hiding and destroying digital evidence of crimes. This too can be done wisely and effectively or foolishly and ineptly.
In this article, I refer to Peter Gutmann's publication "Secure Deletion of Data from Magnetic and Solid-State Memory" presented at the "USENIX" conference in July 1996.
It is the most cited publication in the context of data overwriting and the basis of one of the most popular algorithms for destroying information. In some circles, Peter Gutmann's work has risen to the level of religious dogma and he is seen as an unquestionable authority. Nevertheless, this publication contains a number of theses and assumptions that raise doubts as to whether its author really understands the workings of hard drives and the physics of information storage. And it is on these passages that we will focus further.
We can classify data carriers in many ways. In particular, we can divide them into analogue and digital. A digital storage medium is one that stores information in a way that machines can understand, as a sequence of logical states interpreted as zeros and ones. Other data carriers are called analogue data carriers. However, even in the case of digital media, the basis for determining logical states is certain analogue physical states digitised by encoding and decoding processes. The very process of interpreting the physical states as specific logical states follows an accepted convention.
The most commonly accepted criterion for classifying storage media is precisely the physical phenomena underlying their interpretation as logical states. With regard to data storage technology, we can distinguish between media:
MAGNETIC:
Hard Disk Drives (HDD),
Floppy Disks (FDD),
magnetic tapes (Linear Tape Open - LTO),
OPTICAL:
Compact Disks (CD),
Digital Versality Disks (DVD),
Blu-Ray (BD-R),
High Definition DVD (HD-DVD),
SEMICONDUCTORS:
Solid State Drives (SSD),
pendrives,
memory cards (SD, CF, MMC, xD, SM, MSPro...),
embedded Flash-NAND memories (eMMC, MCP...),
RESISTIVE:
Phase Change Random Access Memory (PCRAM),
Magnetoresistive Random Access Memory (MRAM),
ReRAM,
NanoRAM,
PAPER:
punched cards.
perforated strips.
From the point of view of information destruction, it is important to classify data carriers into non-volatile (energy-independent, capable of long-term, multi-year data storage even without connection to a power source) and volatile (requiring continuous supply of power to sustain logical states). The latter include DRAMs (Dynamic Random Access Memory) and SRAMs (Static Random Access Memory). In the case of volatile media, it is sufficient to briefly disconnect the power supply to irreversibly delete the data. They then lose their logical states, which is why we will not discuss them further.
Data carriers can also be divided into rewritable and write-once media (non-rewritable). Write-once media can only be written once. Their contents cannot be changed afterwards. The most typical examples of non-rewritable media are CD-ROMs and DVD-ROMs. With this category of media, it is not possible to destroy the contents by replacing them with other contents, and it is necessary to physically destroy the medium in order to delete the information. In the case of rewritable media, on the other hand, their contents can be changed, if not any, then a very large number of times, which makes it possible to use data overwriting as a method of destroying information.
Data destruction is governed by various standards developed by various governmental, military and scientific institutions. These standards describe different methods and classify in different ways the information that should be destroyed, often prescribing different methods of data destruction depending on the content of the media. However, if we realise that the interpretation of data takes place at the level of the logical structures of file systems and software, we can easily understand that the content of the data has no influence on the destruction process. From the point of view of the storage medium and the physics of storage, there is no significant difference between the different streams of zeros and ones, regardless of how we interpret them at the logical level and what subjective meaning we assign to them.
Standards describing data destruction contain a number of discrepancies in various ways assessing the effectiveness of different data destruction methods. In some cases, procedures involving multi-stage of data destruction using different methods are recommended. This approach is also popular in many internal procedures based on different standards, sometimes dictated by the need to ensure compliance with multiple regulations. A detailed reading of the standards reveals a number of moments where one may have doubts as to the level of understanding of the authors of the standards with regard to the operation of data media, and some recommendations even look like they have been transcribed directly from the regulations governing the destruction of paper documents, but such an analysis of the recommendations contained in the standards below is beyond the scope of this article.
Below you will find a list of the most popular and widely used standards describing data destruction:
1. AFSSI-5020 (Air Force System Security Instruction 5020),
2. CSEC ITSG-06 (Communication Security Establishment Canada, Information Technology Security Guide - 06),
3. HMG-IS5 (Her/His Majesty Government Infosec Standard 5),
4. IEEE 2883-2022 (Institute of Electrical and Electronics Engineers, Standard for Sanitizing Storage),
5. NAVSO P-5239-26 (Navy Staff Office Publication 5239-26, Information Systems Security Program Guidelines),
6. NISPOM DoD 5220.22-M (National Industrial Security Program Operating Manual, Department of Defence 5220.22-M),
7. NIST SP 800-88 (National Institute of Standards and Technology, Guidelines for Media Sanitization),
8. NSCS-TG-025 (National Computer Security Center, Technical Guidelines 025, A Guide to Understanding Data Remanence in Automated Information Systems),
9. RCMP TSSIT OST-II (Royal Canadian Mounted Police, Media Sanitation of the Technical Security Standards for Information Technology),
10. VSITR (Verschlusssachen IT Richtlinien),
11. ГОСТ Р50739-95 (Средства вычислительной техники. Защита от несанкционированного доступа к информации. Общие технические
требования),
The aforementioned standards classify data destruction methods in different ways, but from a technical point of view and with a view to our objective, it is important to divide these methods into effective and ineffective ones. We may consider a data destruction method to be effective if, after its use, data recovery is impossible, using both known and available data recovery methods and methods that could potentially be developed in the future. The remaining methods, also which leave only theoretical possibilities of data recovery, are ineffective methods.
By adopting this definition of the effectiveness of data destruction methods, we can draw two practical conclusions to optimise data destruction procedures. Firstly, we can discard as unnecessary effort and cost all ineffective data destruction methods because they do not contribute to the objective. Secondly, we can limit the procedure to one selected effective method because it is sufficient for data destruction.
With this approach, we can focus on identifying effective information destruction methods for given categories of media. We must bear in mind that the effectiveness of data destruction methods may vary depending on the technology used to store the information. For example, demagnetisation may be effective for magnetic media, but will not work for semiconductor or optical media.
Data destruction methods are also divided into hardware (physical) and software (logical) methods. Hardware-based methods involve acting on media in such a way as to make its contents unreadable. However, it is important to note that destroying or damaging media is not the same as destroying information, therefore not every disk damage makes data recovery impossible. On the contrary, companies specialising in data recovery often recover information from hardware-damaged media, also deliberately with the intention of destroying its contents, and in many cases for which practical data recovery methods have not been developed, there is a theoretical basis for developing such methods in the future.
Software-based methods, on the other hand, are aimed at destroying the information itself without damaging the media. Unlike physical methods, they allow selective destruction of selected data without destroying the entire contents of the medium. These methods boil down to the destruction of data by replacing it with other content, i.e. overwriting. If there is no overwriting of data, but only the deletion of metadata describing this data in the logical structures of the file system, the information itself is recoverable.
However, doubts are still raised about the recoverability of overwritten data. The controversy most often relates to the required number of overwrite passes necessary for proper data destruction. Sometimes attention is also drawn to the overwriting patterns used. These doubts are often fuelled by marketing materials designed to trick users into choosing a particular data destruction method or tool, usually by discrediting alternatives.
Concepts for methods aimed at enabling the recovery of overwritten data emerged in the late 1980s and early 1990s. At that time, a number of studies were undertaken aimed primarily at recovering the previous magnetisation state of the magnetic layer using magnetic force microscopy, among which the work of the team led by Romel Gomez deserves special attention. Less popular were oscilloscopic studies of the signal captured from the magnetic head block. Peter Gutmann's article is a kind of summary of the work carried out in the late '80s and the first half of the '90s and proposes a solution to dispel doubts about the effectiveness of data overwriting.
Physical methods of data destruction include:
mechanical (from hammering and reaming to shredding the media with special shredders),
thermal (from throwing into a fire and roasted in an oven to being melted in metallurgical furnaces),
chemical (acting on the media with various chemicals),
demagnetisation (interaction of the medium with a magnetic field),
inductive (use different types of radiation, e.g. UV, ionising, microwave),
pyrotechnics (use pyrotechnic and explosive materials).
Programme methods, are:
throwing files to the system trash bin (moving to a special directory, an obviously ineffective method),
deleting at the file system metadata level (the possibility of data recovery depends on many factors, e.g. the type of media and the operation of the TRIM function),
partition formatting (effectiveness depends on the formatting method as well as on the type of media, firmware solutions, TRIM support, etc.),
overwriting (single or multi-pass using different types of overwriting patterns - that's what this article is about),
Secure Erase (media cleansing procedure implemented at firmware level),
Block Erase (a physical block erasure procedure implemented in the firmware of semiconductor media).
In what follows, we will focus on the effectiveness of data overwriting as a method of destroying information stored on hard disks, because this issue constitutes an essential part of the considerations contained in Peter Gutmann's article. I will refer to selected passages in this article indicating that the author's understanding of certain issues is inadequate and leading to erroneous conclusions. I will also draw attention to some highly stretched theses used to justify the necessity of multiple data overwriting to ensure the effectiveness of the method.
"... when a one is written to disk the media records a one, and when a zero is written the media records a zero. However the actual effect is closer to obtaining a 0.95 when a zero is overwritten with a one, and a 1.05 when a one is overwritten with a one."
To address this assumption, we need to know what physically constitutes a bit in magnetic recording. What physical state represents a logical zero and what state represents a one. In order to understand this, let us first look at how magnetic media are read.
Data from magnetic media is read by heads floating above a magnetised surface (in the case of hard disk drives) or moving along it (in the case of magnetic tapes, floppy disks and some models of the oldest, vintage hard disk drives from the early days of this type of construction). The magnetised surface moving under the head induces an electrical waveform. Pulses in this waveform are induced by an alternating magnetic field. And it is these impulses that are interpreted as logical ones. In contrast, a logical zero is the absence of such an impulse.
So what is an area with a constant and variable magnetic field? In any body exhibiting magnetic properties, we can distinguish areas of homogeneous magnetisation - magnetic domains. These domains are separated from each other by domain walls - areas where the polarisation vector of magnetisation is reversed. And it is these walls that are areas of alternating magnetisation that induce pulses denoting logical ones, while the domains themselves are areas of constant magnetisation.
Magnetic recording involves giving the medium a specific, desired surface magnetization order. In this process, domains can reverse their polarity, but also change their size. Domain walls may shift, disappear or new domain walls may be formed, resulting in the separation of new domains. In order to speak of a logical one being overwritten by another one, after the surface has been remagnetised, the domain wall would have to be in exactly the same place as another domain wall was in the previous magnetisation. Therefore, in practice, it is not possible to unequivocally state that a logical one has been overwritten with a one or a zero.
The previous polarisation of the magnetisation can affect the shape and width of the domain walls and thus the shape of the pulses induced by them. This issue is described in detail by Serhiy Kozhevsky (Сергій Коженевський) in his book 'Перезапись Информации'. However, if we wanted to recover overwritten data in this way, it is not the previous polarity of the domain magnetisation that should interest us, but the previous arrangement of the domain walls. The results of the described oscilloscope studies do not indicate that it would be possible to determine with sufficient accuracy the arrangement of domain walls in the state before overwriting.
In addition, we must not forget other factors influencing the height of the pulse amplitudes. It largely depends on the distance between the domain walls. The closer they are to each other, the lower the signal amplitudes induced by them will be. The deviations also depend on the local properties of the magnetic surface and the state of the crystal structure. The magnetisation state of the surface and the parameters of the signal to be read are also affected by external magnetic fields and fluctuations in the temperature and supply voltage of the hard disk drive.
In the case of perpendicular magnetic recording, a very important source of electromagnetci noise is the soft underlayer (SUL) used to enclose the field lines induced by the recording head. At the time Gutmann's article was written, hard disk drives only used parallel recording, but nowadays all hard disk drives use perpendicular recording. Filtering out the effects of the above-mentioned factors on the signal waveform to isolate interference due solely to the previous magnetisation state is all the more difficult as some of these factors depend on external conditions that cannot be faithfully reproduced.
The above, but also subsequent quotes from Peter Gutmann's article, indicate that he may not understand the data encoding process in hard drives. In fact, one gets the impression from his entire publication that data is stored on the disk in a raw, unprocessed sequence of ones and zeros fed to the disk interface by the computer. This is all the more strange as at the same time he himself mentions various methods of encoding data and even tries to match the overwriting patterns of his algorithm to them.
In reality, the data stored on the disk is encoded data that does not resemble the input data stream at all. Since data is susceptible to errors and misrepresentation at every stage of processing and storage, various safeguards in the form of checksums and Error Correction Codes (ECCs) are commonly used. Data stored on disk is also protected by appropriate correction codes. The details have evolved over time and also vary between manufacturers' drives, but for our purposes it is sufficient to know that such codes exist and that they are calculated and appended to each sector of the disc when it is written to protect the contents.
Data stored on disk is also randomised. The purpose of randomisation is to break up long sequences of repetitive symbols. Long sequences of the same symbols or repeated sequences of symbols can contribute to unfavourable wave phenomena in the write and read channel, such as standing waves, wave reflections or parasitic harmonics. They can also cause inter-symbol interference (ISI) - shifts between individual symbols in the data stream. And because the tracks stored on the platter surface are adjacent to other tracks, there is inductive interference between them called Inter Track Interference (ITI). Randomisation helps to reduce the impact of this interference.
The most important stage of data encoding, from our point of view, is preparing the data to be written to the platter. The first method of coding information used in hard disk drives was FM (Frequency Modulation). This involved writing pulses of a clock signal and inserting data bits between them. If the bit was a "1", an extra pulse was inserted between the clock pulses, if it was a logical "0" - not.
This was an inefficient method in which the '0' bit was encoded with one longer magnetic domain and the '1' with two shorter ones. Over time, an attempt was made to optimise it with the introduction of the MFM (Modified Frequency Modulation) method, in which storage density was improved by reducing the number of clock component pulses. However, the real revolution came with RLL (Run Lenght Limited) coding, which allowed the complete elimination of the clock component and increased the data packing density to several bits per magnetic domain.
RLL coding is a self-clocking coding. It consists of placing a certain number of zeros between each pulse, calculated by the decoder chip based on the distance between the pulses. This means that one domain can encode several bits, with the number of zeros between the ones depending on the length of the domain. The minimum and maximum number of zeros that can occur between ones is determined by taking into account factors affecting the frequency of the signal (achievable sizes of stable magnetic domains, platter speed, etc.), the sensitivity of the reading heads and the ability of the decoder chip to process the signal and error correction by ECC codes so as to minimise the occurrence of reading errors or unsynchronisation of the signal.
At the same time, since a magnetic domain must occur between two domain walls, no two logical ones can occur consecutively in RLL coding - they must always be separated by at least one zero. As the actual data rarely meets this condition, it must be recalculated using appropriate arrays. Therefore, attempting to recover literal single bits is impossible, and attempts to recover other small portions of data are hampered by the need to properly address and decode these fragments.
You can learn more about data coding from the book by RLL code developer Cornelius (Kees) Antoin Schouhammer Immink 'Codes for Mass Data Storage Systems', as well as Bane Vasić and Erozan M. Kurtas 'Coding and signal processing for magnetic recording systems'. If you want to learn more about the data coding process, you might also be interested in Charles Sobey's work on disk-independent data recovery. The process of studying magnetic platters and decoding data independently of the disk is also described in Isaac D. Mayergoyz and Chun Tse's book "Spin-stand Microscopy of Hard Disk Data".
"...when data is written to the medium, the write head sets the polarity of most, but not all, of the magnetic domains. This is partially due to the inability of the writing device to write in exactly the same location each time, and partially due to the variations in media sensitivity and field strength over time and among devices."
Based on what we already know about data coding, we can conclude that heads do not write individual magnetic domains individually during operation. This would not be consistent with the RLL coding system, in which the number of logical zeros between the ones is determined by the distance between consecutive domain walls (the domain length), so when writing other data the domain lengths must change.
In addition, it is not technically possible to address individual magnetic domains. Part of the surface of the platter is dedicated to the information necessary to ensure the correct operation of the disk. This category includes, among other things, the servo sectors that allow correct track identification and control of the head's trajectory over its centre, as well as the sector headers that allow them to be correctly addressed.
And it is the sectors (formerly numbering 512 B, in the modern "Advanced Format" variant - 4 kB of user data) are the minimum addressing unit. To get an idea of this, you may want to look at the ATA and SCSI standards, which were developed in the mid-1980s and have since been the primary documents describing the operation of hard disk drives and ensuring their compatibility. While these standards have evolved over the decades, they have never provided for addressing units other than sectors.
And this is how disks work. Even if you want to change a single bit of a sector, this requires the appropriate encoding of the entire sector and the formation of the corresponding electromagnetic signal waveform, which is then stored in the appropriate physical location. If you want to see this in practice, create a small text file. Locate it and check in the hex-editor what its contents look like. You can change the zeros at the end of the file to other content to see if it will be retained when you edit the file. Then edit this file in notepad and check the contents of the sector in the hex-editor. You will see that the remainder of the old sector content beyond the file size will be replaced by zeros. Therefore, claims about writing, reading, recovering, or addressing single bits at all are nonsense.
"Deviations in the position of the drive head from the original track may leave significant portions of the previous data along the track edge relatively untouched."
This statement made sense at a time when hard drives still used stepper motors to position the magnetic head block. A stepper motor, as the name suggests, always rotates by a preset step or a multiple of it. It is not possible to set it to intermediate positions. And this characteristic of stepper motors resulted in the risk of writing a track with a fixed offset from the previous position, for example due to the inability to compensate for differences in temperature expansion of individual disk components. It was for this reason that it was recommended to run the drive for at least half an hour before performing low-level formatting to ensure that all components warm up evenly.
The process of replacing stepper motors with linear (Voice Coil Motors - VCM) ones began around the mid-1980s and by the time Peter Gutmann published his article, it had come to an end. Kalok, the last company to manufacture hard drives with stepper motors, went bankrupt in 1994. Two years is enough time for the publication to at least acknowledge the presence of drives on the market with stepless adjustable magnetic head blocks with VCMs or at least make it clear that the statement quoted above refers to drives with stepper motors.
Linear (VCM) motors are constructed from a coil placed between two permanent magnets. An alternating electric field induced by current flowing through a coil placed in a fixed magnetic field induces movement of this coil relative to the magnets. Typically, positioners rotate around an axis and move the heads over the surface of the platters in an arc, but solutions based on reciprocating movement of the coil have also been used in the past. However, this solution was more complicated and took up more space inside the case, and for these reasons was quickly abandoned.
Replacing stepper motors with linear ones has forced changes to the head positioning and track tracking subsystem. Stepless head positioning opens up possibilities for precise head tracking over the centre of the track, but also requires feedback to control its position over the platter. Servo sectors spaced at equal intervals on the platter surfaces serve this purpose. The number of servo sectors varies between drive models. In many cases, you can check it in Victoria. If the programme displays the parameter "Wedges" in the drive passport, this is the number of servo sectors.
The servo sectors contain a range of information to identify the number of the track being read, control the speed of the platters, correctly synchronise the signal and maintain the head trajectory over the centre of the track. Given the purpose of this article, we will focus on the latter. Each servo sector contains servo-bursts fields that generate a Positioning Error Signal (PES, СОП - Сигнал Ошибки Позиционирования). This signal makes it possible to determine which way and how much the head deviates from the centre of the track.
Based on the positioning error signal, the signal processor can issue a command to the motor controller to adjust the position of the head. Since typically in hard disk drives the number of servo sectors exceeds 100 per track, in practice it is not possible to stably maintain the head flight along the edge of the track. If there is a deviation of the head from the centre of the track, the positioning mechanism will seek to correct its position as quickly as possible. Even if compensation encounters some difficulty, it is much more likely that the head will oscillate near the centre of the track than that it will fly along one of its edges.
Of course, it is possible to make a record with an offset from the centre of the track such that a subsequent record will leave intact small portions of the previous magnetisation, but as the recording density increases, such a situation is less and less unlikely. It is also extremely unlikely that such deviations will result in leaving intact "significant portions of previous data". If they do, they will at most be small fragments that are difficult to address and decode, as well as to determine when these records were created. Based on the information indicated earlier in this article, we already know that in order to be able to practically decode data recovered from the edge of the track, we would need to have at least an entire coherent sector at our disposal.
In today's ultra-high-density hard drives, the risk of fragments of old data being left along the edge of a track is negligible. In addition, such a signal would be strongly disturbed by the influence of magnetisation on adjacent tracks. In the case of drives using SMR (Shingled Magnetic Recording), this risk is completely eliminated by the partial overwriting of previous tracks when subsequent tracks are written. In addition, much more sophisticated positioning and head position control solutions are used, such as multi-stage positioners. Nevertheless, even with discs from the first half of the '90s, no one has managed to demonstrate a practical example of recovering overwritten data read from the edge of a track.
The subject of hard disk drive servo mechanics, track finding and tracking, and motor speed control is too vast to be discussed in more detail here. It has been covered in several books, among which it is worth pointing out:
"Механика и сервосистема" by Serhiy Kozhenevsky (Сергій Коженевський),
"Hard Disk Drive Mechatronics and Control" by Abdullah al-Mamun, Guoxiao Guo and Chao Bi,
"Hard Disk Drive Servo Systems" by Ben M. Chen, Tong H. Lee, Kemao Peng and Venkatakrishnan Venkataramanan.
"When all the above factors are combined it turns out that each track contains an image of everything ever written to it, but that the contribution from each 'layer' gets progressively smaller the further back it was made."
Everyone has probably heard the analogy of overwriting data to erasing inscriptions on paper with a pencil. Yes, the original entries on a piece of paper are visible for a very long time and even if they are carefully erased, you can still try to read their fragments or guess individual symbols. And it seems that Peter Gutmann also succumbed to the magic of this analogy. But does it make sense at all in relation to magnetic recording?
The heads do not add any new layers during recording, but they change the order of magnetization of one magnetic layer. Remagnetization does not impose a new record on the previous one, but destroys it by arranging the sequence of domain walls in a different way. Therefore, this action is much more similar to, for example, changing symbols made of matches by rearranging them, and the analogy to covering the entries on paper with crayons is at least inadequate.
But are the heads actually capable of irreversibly destroying the previous magnetic record? Here we need to pay attention to the relation of the value of the field induced by the heads to the coercivity of the magnetic layer, i.e. the value of the field required to remagnetise it. The coercivity of the cobalt alloys typically used in hard disk drives is about 0,5 T. In contrast, magnetic heads are capable of inducing fields of more than 2 T. In addition, the magnetic layers are too thin (their thickness is counted in tens of nm) for two or more layers of domains with different magnetisation polarities to function stably within them. In comparison, demagnetisers (degaussers) inducing fields of around 1 T are sufficient to destroy data in the demagnetisation process, even though the platters are shielded by metal casing elements.
It is worth taking this opportunity to draw attention to the energy-assisted recording discs just appearing on the market - HAMR (Heat-Assisted Magnetic Recording) and MAMR (Microvave-Assisted Magnetic Recording). These are discs using iron-platinum alloys with a coercivity of approximately 6 T as the magnetic layer. The field induced by the recording heads is clearly too weak to remagnetise the magnetic layer, so recording must be supported by an additional energy source to locally heat the surface of the discs to a temperature close to the Curie point. The Curie point, is the characteristic temperature of a magnetic material at which it loses its magnetisation and is therefore much easier to remagnetise. This information is important for the destruction of data by demagnetisation, as energy-assisted recording discs will be resistant to the popular demagnetisers of today and new devices need to be developed to destroy them.
"The general concept behind an overwriting scheme is to flip each magnetic domain on the disk back and forth as much as possible (this is the basic idea behind degaussing) without writing the same pattern twice in a row."
Why does Peter Gutmann here mix data overwriting with degaussing (demagnetisation)? We can consider the magnetisation of a magnetic substance in two aspects. On a macro scale, we will consider a body to be magnetised if it itself induces a magnetic field. It will have a non-zero magnetisation which is the resultant magnetisation of its magnetic domains. In this sense, magnetic plates are not magnetised. This can be easily verified by observing how platters removed from a hard drive interact with metals that should respond to external magnetisation.
At the nanoscale, every magnetic body is magnetised in some way. If the magnetisation is not imparted by an external magnetic field, magnetic domains arise spontaneously and the fields induced by them cancel each other out. Magnetic recording consists of arranging the magnetic domains in such a way that they represent the logical states we want, which we can interpret as specific information. A functioning hard disk always has an ordered magnetisation, always contains some information and even if we consider it to be empty at the level of logical structures, we can always see some values in the hex-editor.
Degaussing involves applying an electromagnetic pulse in such a way as to destroy this ordering, with the consequence that the domains on the platter remain in a state of chaotic magnetisation. Such magnetisation is not interpretable, so nothing can be read from the platters, the heads cannot find the servo signal and the drive is destroyed.
Overwriting, on the other hand, involves replacing the existing magnetisation order with another, still logically interpretable, but representing worthless information. Whereby it is not necessary to change the polarity of each magnetic domain for data destruction. It is sufficient that the magnetic domains are aligned in a different way than they were originally.
Degaussing and overwriting, are two different methods of destroying data, in which the objective is achieved by different means. In the case of degaussing, it is an external device that completely destroys the ordering of magnetisation, thus destroying the disk as a device. Overwriting, on the other hand, only changes the magnetisation ordering of the sectors to be overwritten, leaving the service area information, servo sectors and sector headers intact, and allowing selective data destruction, such as the erasing of selected files.
"To erase magnetic media, we need to overwrite it many times with alternating patterns in order to expose it to a magnetic field oscillating fast enough that it does the desired flipping of the magnetic domains in a reasonable amount of time. Unfortunately, there is a complication in that we need to saturate the disk surface to the greatest depth possible, and very high frequency signals only "scratch the surface" of the magnetic medium (...). Disk drive manufacturers, in trying to achieve ever-higher densities, use the highest possible frequencies, whereas we really require the lowest frequency a disk drive can produce. Even this is still rather high. The best we can do is to use the lowest frequency possible for overwrites, to penetrate as deeply as possible into the recording medium."
As we already know, it is not so much the reversal of the polarity of the individual magnetic domains as the displacement of the domain walls that is important for the destruction of data in magnetic recording. Besides, the frequency of the magnetic field used for data recording depends primarily on the frequency of the signal to be recorded. Given the data encoding process, obtaining a signal with the highest possible frequency (containing the highest possible number of logical ones in relation to zeros) would require an understanding and consideration of all encoding steps.
The idea itself most likely comes from the method of demagnetizing magnetized bodies on a macro scale. Since it is very difficult to influence such a body with a field exactly corresponding to its coercivity in order to demagnetize it, and it is much more likely to reverse the polarization of the magnetization, demagnetization is performed using a high-frequency field with decreasing intensity. In this way, with each polarity reversal, the body is magnetized less and less (remanence drops from saturation to a state close to zero). In the case of a hard disk, the recording heads induce a magnetic field on the surface of the platter rotating beneath them, and the time during which a given area can be remagnetized depends primarily on the rotation speed of the platter.
In his article, Peter Gutmann, on the one hand, frequently refers to certain elements of data encoding, but on the other hand, he treats the issue very superficially and piecemeal, often stretching it under the assumed thesis of the necessity of multiple data overwrites for safe destruction. It basically ignores the processes of resizing, merging and splitting magnetic domains, which are crucial for RLL coding. Instead, he focuses excessively on the very process of reversing their polarity. There is a lack of coherence in his considerations, which we have already noticed and will still see later. Besides, as I mentioned above, the magnetic layer is too thin not to be magnetised to saturation in the very first run. This is especially true for perpendicular recording, in which the magnetisation polarisation vector is perpendicular to the platter surface, so that the domains themselves are aligned vertically in the magnetic layer.
“Therefore even if some data is reliably erased, it may be possible to recover it using the built-in error-correction capabilities of the drive.”
Here is another example of Peter Gutmann's excessively relaxed approach to the issue of data coding. The above sentence suggests the possibility of deleting the contents of a sector while leaving the correction codes associated with it. This is not possible because the correction codes are calculated at the data encoding stage and added to the sector before the signal waveform is formed, which will be induced by the recording head and written to the platter. By overwriting a sector with other content, we will also overwrite the correction codes associated with the original data.
In older disk models, it was possible to intentionally generate incorrect checksums and save a sector with correction codes that did not match the user's data. Although such sectors cannot be read correctly and when trying to read them, the disk returns a UNC error, the correction codes associated with the previous sector content are destroyed and replaced with new ones. This possibility is implemented, for example, in the MHDD program by the commands "MAKEBAD" - creating a "bad" sector in the indicated LBA (Logical Block Addressing) address or "RANDOMBAD" - creating "bad" sectors in random locations.
Moreover, Peter Gutmann clearly overestimates the correction capabilities of ECC codes. Although correction codes allow the location and correction of bit errors, this applies to a limited number of errors occurring in existing and readable sectors. Typically, correction codes can correct about 200 bit errors per sector, and if the number of errors exceeds the code's capacity, the drive issues a UNC error. This is definitely not enough to attempt to reconstruct the content of a non-existent sector solely based on its correction codes. We must remember that bit errors may also occur in the correction code itself.
"Data which is overwritten an arbitrary large number of times can still be recovered provided that the new data isn't written to the same location as the original data..."
Peter Gutmann is clearly contradicting himself in this sentence. It assumes that data overwritten any number of times can still be recovered, provided no new data is written to the same location. But the essence of overwriting is to write new data in place of the data we want to destroy. Even if the new data is an overwriting pattern that is not interpretable at the logical level. Because for the disk it is the same data stream as any other. And it would be very strange if Peter Gutmann didn't understand this. On the other hand, this sentence directly undermines the point of multi-pass overwriting and confirms that the first pass of overwriting destroys the data.
"The article states that «The encoding of hard disks is provided using PRML and EPRML», but at the time the Usenix article was written MFM and RLL was the standard hard drive encoding technique... "
In the epilogue, Peter Gutmann refers to the article by Craig Wright, Dave Kleiman and Ramajad Srinivasan Shyaam Sundhar "Overwriting Hard Drive Data: The Great Wiping Controversy" from 2008. The authors of this publication practically verified the assumptions presented by Peter Gutmann and demonstrated the impossibility of recovering overwritten data by micromagnetic analyzing the surface of the platter in order to look for traces of previous magnetization. Although the authors of this publication quite they approached the issue of data coding loosely, but here we are mainly concerned with the Gutmann algorithm and the article describing it.
Peter Gutmann points out that Craig Wright, Dave Kleiman and Ramajad Srinivasan Shyaam Sundhar's research is inadequate and should not question his findings because the drives they examined used PRML, while at the time he wrote his article the standard data encoding methods were MFM and RLL. This is an unfounded accusation because PRML is not a data coding technique and does not replace either MFM or RLL, but is used in signal detection and decoding, replacing the older peak-detection method of detecting pulse peaks. This method has been used since the early 1990s and therefore should not have been unfamiliar to Peter Gutmann in 1996. However, the MFM encoding method was replaced from hard drives by RLL already in the mid-1980s and in the mid-1990s it was used only in floppy disks.
In the first decades of hard drives, the recording density was low, the domains were quite large, so the domain walls were located at relatively large distances. Then they gave clear pulses with high amplitudes and easy-to-detect peaks in the signal read by the heads. The increasing recording density resulted in a deterioration of the signal-to-noise ratio, while the introduction of RLL coding eliminated the clock component, which increased the risk of signal desynchronization and the decoder circuit calculating the wrong number of zeros between successive ones. Then the peak detection method turned out to be insufficient and was replaced by the PRML method.
PRML ( Partial Response – Maximum Likelihood) is a method that allows determining the maximum probability of the read signal's course with a partial response. This method does not focus on capturing subsequent pulse peaks, but analyzes the entire signal waveform and seeks to determine the most probable pulse distribution. PRML, unlike peak detection, does not use reference threshold values, but analyzes the shape and height of the amplitudes of all pulses and, on this basis, determines which of them come from the recorded signal and which from background noise. The knowledge of the data encoding method during recording is used, which allows the rejection of variants of the signal waveform that are incompatible with it, e.g. those containing a smaller or larger number of zeros between two ones than allowed for a given version of the RLL code.
Peter Gutmann's questioning of the results of Craig Wright, Dave Kleiman and Ramajad Srinivasan Shyaam Sundhar on this basis only proves that even after 2008 he was at odds with solutions in the field of data coding and signal processing in hard drives. Suggesting that PRML replaces RLL encoding is as much of a mistake as saying that SMR replaced perpendicular recording. After the publication of the article by Craig Wright, Dave Kleiman and Ramajad Srinivasan Shyaam Sundhar, interest in research into recovering overwritten data using magnetic force microscopy basically disappeared. Similarly, in the case of oscilloscopic studies of the signal waveform captured directly from the heads, Serhiy Kozhenevsky (Сергій Коженевський)'s work did not provide sufficient grounds to provide reasonable hope for the possibility of using them in the practical recovery of overwritten data.
This does not mean, however, that overwriting data is free from risks and threats. User errors, uncontrolled interruptions in the process, device and software failures, or intentional actions aimed at preventing effective data destruction are always possible. There are also risks related to the possibility of data being found accidentally or intentionally hidden outside the LBA addressing.
Data can be found in areas hidden outside the LBA addressing using the HPA (Host Protected Area) or DCO (Device Configurration Overlay) functions. In the case of SMR drives, outdated data may survive in an uncontrolled manner outside the LBA addressing, and their location and reliable overwriting require analysis and understanding of the LBA to physical addressing translation subsystem. There are also sectors in each disk that have not been given an LBA address. These are, for example, reserve sectors or physical sectors at the end of the disk, which are more than needed to achieve its nominal capacity. Such sectors can be used to intentionally hide data, but both their hiding and subsequent reading require appropriate knowledge of the disk firmware and the ability to work with physical addressing.
However, multiple overwrites do not protect against any of the above risks. Improving the security of the data overwriting process should focus primarily on analyzing the subsystem of translating logical addresses (LBA) into physical addresses and aiming at overwriting data in physical addressing. Therefore, if we do not care about selective erasing of selected files, but want to destroy the entire contents of the disk, it is better to choose the Secure Erase procedure, which works closer to the hardware than programs operating in LBA addressing. Data is irretrievably destroyed in the first overwriting pass. Each subsequent one is just an unnecessary expense and a waste of time, and this is probably a sufficient reason to finally throw the Gutmann algorithm into the trash.
1] Gutmann, P.: Secure Deletion of Data from Magnetic and Solid-State Memory. Proceedings of the Sixth USENIX Security Symposium, San Jose, CA, July 22-25, (1996),
2] Коженевський, С. Р.: Взгляд на жёсткий диск "изнутри". Магнитные головки, ООО "ЕПОС", Київ (2009).
3] Gomez, R., Adly, A., Mayergoyz, I., Burke, E.: Magnetic Force Scanning Tunnelling Microscope Imaging of Overwritten Data, IEEE Transactions on Magnetics 28(5), (1992),
4] Gomez, R., Burke, E., Adly, A., Mayergoyz, I., Gorczyca, J.: Microscopic Investigations of Overwritten Data, Journal of Applied Physics 73(10), 6001 (1993),
5] Bertram, H. N.: Theory of Magnetic Recording, Cambridge University Press, London (1994),
6] Bertram, H. N., Fiedler, L. D.: Amplitude and bit shift spectra comparision in thin metalic media, IEEE Transactions on Magnetics 19(5) (1983),
7] Коженевський, С. Р.: Взгляд на жёсткий диск "изнутри". Перезапись информации, ООО "ЕПОС", Київ (2006),
8] Khizroev, S., Litvinov, D.: Perpendicular magnetic recording, Kluiwer Academic Publishers, Dordrecht (2004),
9] Schouhamer Immink, K. A.: Codes for Mass Data Storage Systems, Shannon Foundation Publishers, Eindhoven (2004),
10] Vasić, B., Kurtas, E. M.: Coding and signal processing for magnetic recording systems, CRC Press LLC, Boca Raton (2005),
11] Wu, Z.: Coding and Iterative Detection for Magnetic Recording Channels, Springer Science + Business Media LLC, New York (2000),
12] Sobey, Ch. H.: Drive-Independent Data Recovery: The Current State-of-the-Art, IEEE Transactions on Magnetics 42(2), (2006),
13] Mayergoyz, I. D., Tse, C.: Spin-stand Microscopy of Hard Disk Data, Elsevier Science Ltd., Amsterdam (2007),
14] Amer, A., Holliday, J., Long, D. D. E., Miller E. L., Paris, J-F., Schwartz, T. S. J.: Data Management and Layout for Shingled Magnetic Recording, IEEE Transactions on Magnetics, 47(10), (2011),
15] Miura, K., Yamamoto, E., Aoi, H., Muraoka, H.: Skew angle effect in shingled writting magnetic recording, Physics Procedia 16, (2011),
16] Коженевський, С. Р.: Взгляд на жёсткий диск "изнутри". Механика и сервосистема, ООО "ЕПОС", Київ (2007).
17] Mamun, al, A., Guo, G. X., Bi, Ch.: Hard Disk Drive Mechatronics and Control, CRC Press, Boca Raton, (2006),
18] Chen, B. M., Lee, T. H., Peng, K., Venkataramanan, V.: Hard Disk Drive Servo Systems, Springer-Verlag, London, (2006),
19] Du, C., Pang, C. K., Multi-Stage Actuation Systems and Control, CRC Press, Boca Raton, (2019),
20] Plumer, M. L., Ek van, J., Weller, D.: The physics of ultra-high-density magnetic recording, Springer-Verlag, Berlin (2001),
21] Ababei, R.-V., Ellis, M. O. A., Evans, R. F. L., Chantrell, R. W.: Anomalous damping dependence of the switching time in Fe/FePt bilayer recording media, Physical Review B99 024427 (2019),
22] Riggle, C. M., McCarthy, S. G.: Design of Error Correction Systems for Disk Drives, IEEE Transactions on Magnetics 34(4), (1998),
23] Wright, C., Kleiman, D., Shyaam Sundhar, R. S.: Overwriting Hard Drive Data: The Great Wiping Controversy. R. Sekar and A.K. Pujari (Eds.): ICISS 2008, LNCS 5352, Springer-Verlag Berlin, Heidelberg (2008),
24] Sugawara, T., Yamagishi, M., Mutoh, H., Shimoda, K., Mizoshita, Y.: Viterbi detector including PRML and EPRML, IEEE Transactions on Magnetics 29(6), (1993),
25] Gupta, M. R., Hoeschele, M. D., Rogers, M. K: Hidden Disk Areas: HPA and DCO. International Journal of Digital Evidence 5(1), (2006).