TuxOnIce, version 3.0
- What is it?
- Why would you want it?
- What do you need to use it?
- Why not just use the version already in the kernel?
- How do you use it?
- What do all those entries in /sys/power/tuxonice do?
- How do you get support?
- I think I've found a bug. What should I do?
- When will XXX be supported? 10 How does it work?
- Who wrote TuxOnIce?
1. What is it?
Imagine you're sitting at your computer, working away. For some reason, you need to turn off your computer for a while - perhaps it's time to go home for the day. When you come back to your computer next, you're going to want to carry on where you left off. Now imagine that you could push a button and have your computer store the contents of its memory to disk and power down. Then, when you next start up your computer, it loads that image back into memory and you can carry on from where you were, just as if you'd never turned the computer off. You have far less time to start up, no reopening of applications or finding what directory you put that file in yesterday. That's what TuxOnIce does.
TuxOnIce has a long heritage. It began life as work by Gabor Kuti, who, with some help from Pavel Machek, got an early version going in 1999. The project was then taken over by Florent Chabaud while still in alpha version numbers. Nigel Cunningham came on the scene when Florent was unable to continue, moving the project into betas, then 1.0, 2.0 and so on up to the present series. During the 2.0 series, the name was contracted to Suspend2 and the website suspend2.net created. Beginning around July 2007, a transition to calling the software TuxOnIce was made, to seek to help make it clear that TuxOnIce is more concerned with hibernation than suspend to ram.
Pavel Machek's swsusp code, which was merged around 2.5.17 retains the original name, and was essentially a fork of the beta code until Rafael Wysocki came on the scene in 2005 and began to improve it further.
2. Why would you want it?
Why wouldn't you want it?
Being able to save the state of your system and quickly restore it improves your productivity - you get a useful system in far less time than through the normal boot process.
3. What do you need to use it?
a. Kernel Support.
i) The TuxOnIce patch.
TuxOnIce is part of the Linux Kernel. This version is not part of Linus's 2.6 tree at the moment, so you will need to download the kernel source and apply the latest patch. Having done that, enable the appropriate options in make [menu|x]config (under Power Management Options - look for "Enhanced Hibernation"), compile and install your kernel. TuxOnIce works with SMP, Highmem, preemption, fuse filesystems, x86-32, PPC and x86_64.
TuxOnIce patches are available from http://tuxonice.net.
ii) Compression support.
Compression support is implemented via the cryptoapi. You will therefore want to select any Cryptoapi transforms that you want to use on your image from the Cryptoapi menu while configuring your kernel. Part of the TuxOnIce patch adds a new cryptoapi compression called LZF. We recommend the use of this compression method - it is very fast and still achieves good compression.
You can also tell TuxOnIce to write it's image to an encrypted and/or compressed filesystem/swap partition. In that case, you don't need to do anything special for TuxOnIce when it comes to kernel configuration.
iii) Configuring other options.
While you're configuring your kernel, try to configure as much as possible to build as modules. We recommend this because there are a number of drivers that are still in the process of implementing proper power management support. In those cases, the best way to work around their current lack is to build them as modules and remove the modules while hibernating. You might also bug the driver authors to get their support up to speed, or even help!
TuxOnIce can store the hibernation image in your swap partition, a swap file or a combination thereof. Whichever combination you choose, you will probably want to create enough swap space to store the largest image you could have, plus the space you'd normally use for swap. A good rule of thumb would be to calculate the amount of swap you'd want without using TuxOnIce, and then add the amount of memory you have. This swapspace can be arranged in any way you'd like. It can be in one partition or file, or spread over a number. The only requirement is that they be active when you start a hibernation cycle.
There is one exception to this requirement. TuxOnIce has the ability to turn on one swap file or partition at the start of hibernating and turn it back off at the end. If you want to ensure you have enough memory to store a image when your memory is fully used, you might want to make one swap partition or file for 'normal' use, and another for TuxOnIce to activate & deactivate automatically. (Further details below).
ii) Normal files.
TuxOnIce includes a 'file allocator'. The file allocator can store your image in a simple file. Since Linux has the concept of everything being a file, this is more powerful than it initially sounds. If, for example, you were to set up a network block device file, you could hibernate to a network server. This has been tested and works to a point, but nbd itself isn't stateless enough for our purposes.
Take extra care when setting up the file allocator. If you just type commands without thinking and then try to hibernate, you could cause irreversible corruption on your filesystems! Make sure you have backups.
Most people will only want to hibernate to a local file. To achieve that, do something along the lines of:
echo "TuxOnIce" > /hibernation-file dd if=/dev/zero bs=1M count=512 >> hibernation-file
This will create a 512MB file called /hibernation-file. To get TuxOnIce to use it:
echo /hibernation-file > /sys/power/tuxonice/file/target
Put the results of this into your bootloader's configuration (see also step C, below):
---EXAMPLE-ONLY-DON'T-COPY-AND-PASTE--- # cat /sys/power/tuxonice/resume file:/dev/hda2:0x1e001 ---EXAMPLE-ONLY-DON'T-COPY-AND-PASTE---
In this example, we would edit the append= line of our lilo.conf|menu.lst so that it included:
---EXAMPLE-ONLY-DON'T-COPY-AND-PASTE--- resume=file:/dev/hda2:0x1e001 ---EXAMPLE-ONLY-DON'T-COPY-AND-PASTE---
For those who are thinking 'Could I make the file sparse?', the answer is 'No!'. At the moment, there is no way for TuxOnIce to fill in the holes in a sparse file while hibernating. In the longer term (post merge!), I'd like to change things so that the file could be dynamically resized and have holes filled as needed. Right now, however, that's not possible and not a priority.
c. Bootloader configuration.
Using TuxOnIce also requires that you add an extra parameter to your lilo.conf or equivalent. Here's an example for a swap partition: append="resume=swap:/dev/hda1"
This would tell TuxOnIce that /dev/hda1 is a swap partition you have. TuxOnIce will use the swap signature of this partition as a pointer to your data when you hibernate. This means that (in this example) /dev/hda1 doesn't need to be the swap partition where all of your data is actually stored. It just needs to be a swap partition that has a valid signature.
You don't need to have a swap partition for this purpose. TuxOnIce can also use a swap file, but usage is a little more complex. Having made your swap file, turn it on and do
(this assumes you've already compiled your kernel with TuxOnIce support and booted it). The results of the cat command will tell you what you need to put in lilo.conf:
For swap partitions like /dev/hda1, simply use resume=/dev/hda1. For swapfile `swapfile`, use resume=swap:/dev/hda2:0x242d.
If the swapfile changes for any reason (it is moved to a different location, it is deleted and recreated, or the filesystem is defragmented) then you will have to check /sys/power/tuxonice/swap/headerlocations for a new resume_block value.
Once you've compiled and installed the kernel and adjusted your bootloader configuration, you should only need to reboot for the most basic part of TuxOnIce to be ready.
If you only compile in the swap allocator, or only compile in the file allocator, you don't need to add the "swap:" part of the resume= parameters above. resume=/dev/hda2:0x242d will work just as well. If you have compiled both and your storage is on swap, you can also use this format (the swap allocator is the default allocator).
When compiling your kernel, one of the options in the 'Power Management Support' menu, just above the 'Enhanced Hibernation (TuxOnIce)' entry is called 'Default resume partition'. This can be used to set a default value for the resume= parameter.
d. The hibernate script.
Since the driver model in 2.6 kernels is still being developed, you may need to do more than just configure TuxOnIce. Users of TuxOnIce usually start the process via a script which prepares for the hibernation cycle, tells the kernel to do its stuff and then restore things afterwards. This script might involve:
- Switching to a text console and back if X doesn't like the video card status on resume.
- Un/reloading drivers that don't play well with hibernation.
Note that you might not be able to unload some drivers if there are processes using them. You might have to kill off processes that hold devices open. Hint: if your X server accesses an USB mouse, doing a 'chvt' to a text console releases the device and you can unload the module.
Check out the latest script (available on tuxonice.net).
e. The userspace user interface.
TuxOnIce has very limited support for displaying status if you only apply the kernel patch - it can printk messages, but that is all. In addition, some of the functions mentioned in this document (such as cancelling a cycle or performing interactive debugging) are unavailable. To utilise these functions, or simply get a nice display, you need the 'userui' component. Userui comes in three flavours, usplash, fbsplash and text. Text should work on any console. Usplash and fbsplash require the appropriate (distro specific?) support.
To utilise a userui, TuxOnIce just needs to be told where to find the userspace binary:
echo "/usr/local/sbin/tuxoniceui_fbsplash" > /sys/power/tuxonice/user_interface/program
The hibernate script can do this for you, and a default value for this setting can be configured when compiling the kernel. This path is also stored in the image header, so if you have an initrd or initramfs, you can use the userui during the first part of resuming (prior to the atomic restore) by putting the binary in the same path in your initrd/ramfs. Alternatively, you can put it in a different location and do an echo similar to the above prior to the echo > do_resume. The value saved in the image header will then be ignored.
4. Why not just use the version already in the kernel?
The version in the vanilla kernel has a number of drawbacks. The most serious of these are:
- it has a maximum image size of 1/2 total memory.
- it doesn't allocate storage until after it has snapshotted memory. This means that you can't be sure hibernating will work until you see it start to write the image.
- it performs all of it's I/O synchronously.
- it does not allow you to press escape to cancel a cycle
- it does not allow you to automatically swapon a file when starting a cycle.
- it does not allow you to use multiple swap partitions.
- it does not allow you to use swapfiles.
- it does not allow you to use ordinary files.
- it just invalidates an image and continues to boot if you accidentally boot the wrong kernel after hibernating.
- it doesn't support any sort of nice display while hibernating
- it is moving toward requiring that you have an initrd/initramfs to ever have a hope of resuming (uswsusp). While uswsusp will address some of the concerns above, it won't address all of them, and will be more complicated to get set up.
5. How do you use it?
A hibernation cycle can be started directly by doing:
echo > /sys/power/tuxonice/do_hibernate
In practice, though, you'll probably want to use the hibernate script to unload modules, configure the kernel the way you like it and so on. In that case, you'd do (as root):
See the hibernate script's man page for more details on the options it takes.
If you're using the text or splash user interface modules, one feature of TuxOnIce that you might find useful is that you can press Escape at any time during hibernating, and the process will be aborted.
Due to the way hibernation works, this means you'll have your system back and perfectly usable almost instantly. The only exception is when it's at the very end of writing the image. Then it will need to reload a small (usually 4-50MBs, depending upon the image characteristics) portion first.
Likewise, when resuming, you can press escape and resuming will be aborted. The computer will then powerdown again according to settings at that time for the powerdown method or rebooting.
You can change the settings for powering down while the image is being written by pressing 'R' to toggle rebooting and 'O' to toggle between suspending to ram and powering down completely). If you run into problems with resuming, adding the "noresume" option to the kernel command line will let you skip the resume step and recover your system. This option shouldn't normally be needed, because TuxOnIce modifies the image header prior to the atomic restore, and will thus prompt you if it detects that you've tried to resume an image before (this flag is removed if you press Escape to cancel a resume, so you won't be prompted then).
Recent kernels (2.6.24 onwards) add support for resuming from a different kernel to the one that was hibernated (thanks to Rafael for his work on this - I've just embraced and enhanced the support for TuxOnIce). This should further reduce the need for you to use the noresume option.
6. What do all those entries in /sys/power/tuxonice do?
/sys/power/tuxonice is the directory which contains files you can use to tune and configure TuxOnIce to your liking. The exact contents of the directory will depend upon the version of TuxOnIce you're running and the options you selected at compile time. In the following descriptions, names in brackets refer to compile time options. (Note that they're all dependant upon you having selected CONFIG_TUXONICE in the first place!).
Since the values of these settings can open potential security risks, the writeable ones are accessible only to the root user. You may want to configure sudo to allow you to invoke your hibernate script as an ordinary user.
Use cryptoapi hashing routines to verify that Pageset2 pages don't change while we're saving the first part of the image, and to get any pages that do change resaved in the atomic copy. This should normally not be needed, but if you're seeing issues, please enable this. If your issues stop you being able to resume, enable this option, hibernate and cancel the cycle after the atomic copy is done. If the debugging info shows a non-zero number of pages resaved, please report this to Nigel.
Set the cryptoapi algorithm used for compressing the image.
These values allow you to set an expected compression ratio, which TuxOnice will use in calculating whether it meets constraints on the image size. If this expected compression ratio is not attained, the hibernation cycle will abort, so it is wise to allow some spare. You can see what compression ratio is achieved in the logs after hibernating.
This file returns information about your configuration that may be helpful in diagnosing problems with hibernating.
When anything is written to this file, the kernel side of TuxOnIce will begin to attempt to write an image to disk and power down. You'll normally want to run the hibernate script instead, to get modules unloaded first.
When anything is written to this file TuxOnIce will attempt to read and restore an image. If there is no image, it will return almost immediately. If an image exists, the echo > will never return. Instead, the original kernel context will be restored and the original echo > do_hibernate will return.
These option can be used to temporarily disable various parts of TuxOnIce.
When TuxOnIce does its atomic copy, it calls the driver model suspend and resume methods. If you have DRI enabled with a driver such as fglrx, this can result in the driver allocating a substantial amount of memory for storing its state. Extra_pages_allowance tells TuxOnIce how much extra memory it should ensure is available for those allocations. If your attempts at hibernating end with a message in dmesg indicating that insufficient extra pages were allowed, you need to increase this value.
Read this value to get the current setting. Write to it to point TuxOnice at a new storage location for the file allocator. See section 3.b.ii above for details of how to set up the file allocator.
This entry can be used to get TuxOnIce to just test the freezer and prepare an image without actually doing a hibernation cycle. It is useful for diagnosing freezing and image preparation issues.
Can be used in a script to determine whether a valid image exists at the location currently pointed to by resume=. Returns up to three lines. The first is whether an image exists (-1 for unsure, otherwise 0 or 1). If an image eixsts, additional lines will return the machine and version. Echoing anything to this entry removes any current image.
The maximum size of hibernation image written to disk, measured in megabytes (1024*1024).
The value returned by this file can be used by scripts and configuration tools to determine what entries should be looked for. The value is incremented whenever an entry in /sys/power/tuxonice is obsoleted or added.
The result of the last hibernation cycle, as defined in include/linux/suspend-debug.h with the values SUSPEND_ABORTED to SUSPEND_KEPT_IMAGE. This is a bitmask.
Setting this option results in all messages printed being logged. Normally, only a subset are logged, so as to not slow the process and not clutter the logs. Useful for debugging. It can be toggled during a cycle by pressing 'L'.
This option is used during debugging, to make TuxOnIce pause between each step of the process. It is ignored when the nice display is on.
Used to select a method by which TuxOnIce should powerdown after writing the image. Currently:
0: Don't use ACPI to power off.
- 3: Attempt to enter Suspend-to-ram.
- 4: Attempt to enter ACPI S4 mode.
5: Attempt to power down via ACPI S5 mode.
Note that these options are highly dependant upon your hardware & software:
3: When succesful, your machine suspends to ram instead of powering off. The advantage of using this mode is that it doesn't matter whether your battery has enough charge to make it through to your next resume. If it lasts, you will simply resume from suspend to ram (and the image on disk will be discarded). If the battery runs out, you will resume from disk instead. The disadvantage is that it takes longer than a normal suspend-to-ram to enter the state, since the suspend-to-disk image needs to be written first.
- 4/5: When successful, your machine will be off and comsume (almost) no power. But it might still react to some external events like opening the lid or trafic on a network or usb device. For the bios, resume is then the same as warm boot, similar to a situation where you used the command `reboot' to reboot your machine. If your machine has problems on warm boot or if you want to protect your machine with the bios password, this is probably not the right choice. Mode 4 may be necessary on some machines where ACPI wake up methods need to be run to properly reinitialise hardware after a hibernation cycle.
0: Switch the machine completely off. The only possible wakeup is the power button. For the bios, resume is then the same as a cold boot, in particular you would have to provide your bios boot password if your machine uses that feature for booting.
This option can be used to limit the granularity of the progress bar displayed with a bootsplash screen. The value is the maximum number of steps. That is, 10 will make the progress bar jump in 10% increments.
This option causes TuxOnIce to reboot rather than powering down at the end of saving an image. It can be toggled during a cycle by pressing 'R'.
This entry can be read after resuming to see the commandline that was used when resuming began. You might use this to set up two bootloader entries that are the same apart from the fact that one includes a extra append= argument "at_work=1". You could then grep resume_commandline in your post-resume scripts and configure networking (for example) differently depending upon whether you're at home or work. resume_commandline can be set to arbitrary text if you wish to remove sensitive contents.
This entry is used to specify the swapfile or partition that TuxOnIce will attempt to swapon/swapoff automatically. Thus, if I normally use /dev/hda1 for swap, and want to use /dev/hda2 for specifically for my hibernation image, I would
echo /dev/hda2 > /sys/power/tuxonice/swap/swapfile
/dev/hda2 would then be automatically swapon'd and swapoff'd. Note that the swapon and swapoff occur while other processes are frozen (including kswapd) so this swap file will not be used up when attempting to free memory. The parition/file is also given the highest priority, so other swapfiles/partitions will only be used to save the image when this one is filled.
The value of this file is used by headerlocations along with any currently activated swapfiles/partitions.
This option tells you the resume= options to use for swap devices you currently have activated. It is particularly useful when you only want to use a swap file to store your image. See above for further details.
This entry can be used to toggle the NOFREEZE flag on a process, to allow it to run during hibernating. It should be used with extreme caution. There are strict limitations on what a process running during hibernation can do. This is really only intended for use by TuxOnice's helpers (userui in particular).
This entry is used to tell TuxOnice what userspace program to use for providing a user interface while hibernating. The program uses a netlink socket to pass messages back and forward to the kernel, allowing all of the functions formerly implemented in the kernel user interface components.
This value, together with the console log level, controls what debugging information is displayed. The console log level determines the level of detail, and this value determines what detail is displayed. This value is a bit vector, and the meaning of the bits can be found in the kernel tree in include/linux/tuxonice.h. It can be overridden using the kernel's command line option suspend_dbg.
This determines the value of the console log level at the start of a hibernation cycle. If debugging is compiled in, the console log level can be changed during a cycle by pressing the digit keys. Meanings are:
0: Nice display.
- 1: Nice display plus numerical progress.
- 2: Errors only.
- 3: Low level debugging info.
- 4: Medium level debugging info.
- 5: High level debugging info.
6: Verbose debugging info.
Setting this to "1" will enable you abort a hibernation cycle or resuming by pressing escape, "0" (default) disables this feature. Note that enabling this option means that you cannot initiate a hibernation cycle and then walk away from your computer, expecting it to be secure. With feature disabled, you can validly have this expectation once TuxOnice begins to write the image to disk. (Prior to this point, it is possible that TuxOnice might about because of failure to freeze all processes or because constraints on its ability to save the image are not met).
The version of TuxOnIce you have compiled into the currently running kernel.
7. How do you get support?
Glad you asked. TuxOnIce is being actively maintained and supported by Nigel (the guy doing most of the kernel coding at the moment), Bernard (who maintains the hibernate script and userspace user interface components) and its users.
Resources availble include HowTos, FAQs and a Wiki, all available via tuxonice.net. You can find the mailing lists there.
8. I think I've found a bug. What should I do?
By far and a way, the most common problems people have with TuxOnIce related to drivers not having adequate power management support. In this case, it is not a bug with TuxOnIce, but we can still help you. As we mentioned above, such issues can usually be worked around by building the functionality as modules and unloading them while hibernating. Please visit the Wiki for up-to-date lists of known issues and work arounds.
If this information doesn't help, try running:
..and sending the output to the users mailing list.
Good information on how to provide us with useful information from an oops is found in the file REPORTING-BUGS, in the top level directory of the kernel tree. If you get an oops, please especially note the information about running what is printed on the screen through ksymoops. The raw information is useless.
9. When will XXX be supported?
If there's a feature missing from TuxOnIce that you'd like, feel free to ask. We try to be obliging, within reason.
Patches are welcome. Please send to the list.
10. How does it work?
TuxOnIce does its work in a number of steps.
a. Freezing system activity.
The first main stage in hibernating is to stop all other activity. This is achieved in stages. Processes are considered in fours groups, which we will describe in reverse order for clarity's sake: Threads with the PF_NOFREEZE flag, kernel threads without this flag, userspace processes with the PF_SYNCTHREAD flag and all other processes. The first set (PF_NOFREEZE) are untouched by the refrigerator code. They are allowed to run during hibernating and resuming, and are used to support user interaction, storage access or the like. Other kernel threads (those unneeded while hibernating) are frozen last. This leaves us with userspace processes that need to be frozen. When a process enters one of the *_sync system calls, we set a PF_SYNCTHREAD flag on that process for the duration of that call. Processes that have this flag are frozen after processes without it, so that we can seek to ensure that dirty data is synced to disk as quickly as possible in a situation where other processes may be submitting writes at the same time. Freezing the processes that are submitting data stops new I/O from being submitted. Syncthreads can then cleanly finish their work. So the order is:
- Userspace processes without PF_SYNCTHREAD or PF_NOFREEZE;
- Userspace processes with PF_SYNCTHREAD (they won't have NOFREEZE);
- Kernel processes without PF_NOFREEZE.
b. Eating memory.
For a successful hibernation cycle, you need to have enough disk space to store the image and enough memory for the various limitations of TuxOnIce's algorithm. You can also specify a maximum image size. In order to attain to those constraints, TuxOnIce may 'eat' memory. If, after freezing processes, the constraints aren't met, TuxOnIce will thaw all the other processes and begin to eat memory until its calculations indicate the constraints are met. It will then freeze processes again and recheck its calculations.
c. Allocation of storage.
Next, TuxOnIce allocates the storage that will be used to save the image.
The core of TuxOnIce knows nothing about how or where pages are stored. We therefore request the active allocator (remember you might have compiled in more than one!) to allocate enough storage for our expect image size. If this request cannot be fulfilled, we eat more memory and try again. If it is fulfiled, we seek to allocate additional storage, just in case our expected compression ratio (if any) isn't achieved. This time, however, we just continue if we can't allocate enough storage.
If these calls to our allocator change the characteristics of the image such that we haven't allocated enough memory, we also loop. (The allocator may well need to allocate space for its storage information).
d. Write the first part of the image.
TuxOnIce stores the image in two sets of pages called 'pagesets'. Pageset 2 contains pages on the active and inactive lists; essentially the page cache. Pageset 1 contains all other pages, including the kernel. We use two pagesets for one important reason: We need to make an atomic copy of the kernel to ensure consistency of the image. Without a second pageset, that would limit us to an image that was at most half the amount of memory available. Using two pagesets allows us to store a full image. Since pageset 2 pages won't be needed in saving pageset 1, we first save pageset 2 pages. We can then make our atomic copy of the remaining pages using both pageset 2 pages and any other pages that are free. While saving both pagesets, we are careful not to corrupt the image. Among other things, we use lowlevel block I/O routines that don't change the pagecache contents.
The next step, then, is writing pageset 2.
e. Suspending drivers and storing processor context.
Having written pageset2, TuxOnIce calls the power management functions to notify drivers of the hibernation, and saves the processor state in preparation for the atomic copy of memory we are about to make.
f. Atomic copy.
At this stage, everything else but the TuxOnIce code is halted. Processes are frozen or idling, drivers are quiesced and have stored (ideally and where necessary) their configuration in memory we are about to atomically copy. In our lowlevel architecture specific code, we have saved the CPU state. We can therefore now do our atomic copy before resuming drivers etc.
g. Save the atomic copy (pageset 1).
TuxOnice can then write the atomic copy of the remaining pages. Since we have copied the pages into other locations, we can continue to use the normal block I/O routines without fear of corruption our image.
h. Save the image header.
Nearly there! We save our settings and other parameters needed for reloading pageset 1 in an 'image header'. We also tell our allocator to serialise its data at this stage, so that it can reread the image at resume time.
i. Set the image header.
Finally, we edit the header at our resume= location. The signature is changed by the allocator to reflect the fact that an image exists, and to point to the start of that data if necessary (swap allocator).
j. Power down.
Or reboot if we're debugging and the appropriate option is selected.
k. Reloading the image.
Reloading the image is essentially the reverse of all the above. We load our copy of pageset 1, being careful to choose locations that aren't going to be overwritten as we copy it back (We start very early in the boot process, so there are no other processes to quiesce here). We then copy pageset 1 b