Windows x64 - All the Same Yet Very Different, Part 2: Kernel Memory, /3GB, PTEs, (Non-) Paged Pool
This is the second part of a mini-series on Windows x64, focusing on behind the scene changes in the operating system. In the first article I explained key concepts of the x86 platform, namely that user-mode processes each get 4 GB of virtual address space, but can use only half of it, because 2 GB are used for the operating system kernel. The Kernel – Another Beast Altogether In order to understand why, we need to take at look at how kernel mode differs from user mode. Kernel mode basically is god mode. In kernel mode, or ring 0, you own the machine and can do everything. Like crashing the machine for good. Such crashes in ring 0 differ from "ordinary" program failures in that they cause the dreaded blue screen of death (BSOD) on Windows computers. Blue screens, by the way, are not random stops but rather more or less graceful shutdowns during which the system in most cases is still capable of accessing the hard disk and storing a memory dump file there. Some trivia on that: You may have wondered why the page file needs to have at least the size of the physical RAM for the system to be able to create a full memory dump. This is because the file system driver code in memory may be corrupted or the driver may even be the cause of the crash. With the page file, the system has a location on hard disk guaranteed to be available. When a memory dump is created during a blue screen the contents of the RAM is written directly to the sectors on disk occupied by the page file bypassing the file system driver. During the next boot process the session manager subsystem (SMSS.EXE) extracts the dump (and copies it to %SYSTEMROOT%\MEMORY.DMP) before the page file is put to its proper use. Causes of Death To digress some more: Have you ever wondered what actually causes blue screens? Well, hardware failures don't. If you have a severe hardware problem, the system will just freeze or reboot out of the blue. No, blue screens are caused by software failures. To put it bluntly: some programmer's carelessness is responsible for your overtime! Blue screens are caused by "misbehaving" kernel code that tries to do things it shouldn't. Read: the current IRQL level does not allow this type of access (IRQL_NOT_LESS_OR_EQUAL). Or: not checking some buffer size correctly and writing to a memory area occupied by another driver's data or even code, which is similar to driving a tank through densely populated terrain. Only the highest development standards and rigorous testing can harden device drivers enough to withstand even unlikely conditions or heavy load. That is the simple reason why most blue screens are not caused by Microsoft code but by some third-party driver. Cause Study What to do in case of a blue screen? You will want to know which driver caused it. In many cases, determining the culprit is amazingly simple. In others, getting to the root cause (and driver) can be extremely hard. If WinDebug does not point you in the right direction, a support call with Microsoft probably helps. However, they will need a memory dump file to analyze the crash. You can configure several types of memory dumps to be created in case of a blue screen. Mini dumps are practically useless. Full memory dumps are too large for most purposes (try uploading an 8 GB dump file to Microsoft support - not that they mind; it just takes far too long). Kernel dumps are the way to go. This knowledge base article explains the configuration options. The Tale of the Rings Let's now move back to the saga of the rings. An x86 processor has four protection rings: ring 0 to ring 3, with ring 3 being the least privileged. As mentioned above, the kernel operates in ring 0 which gives it full access to the hardware. Rings 1 and 2 are not used. User mode applications execute in ring 3, each cosily encapsulated in the safety "bubble" of virtual memory. Interestingly, with the recent addition of hardware virtualization technology into the CPU, the x86 architecture now offers a new "Ring-1" that can be used by hypervisors to control ring 0 hardware access. Why 2 GB Are Not Enough... With all this background information passed on, we can move on to the inherent limitations built into the x86 architecture. As mentioned earlier, the kernel has only 2 GB of global memory available for itself that remains the same no matter what user-mode process is currently active. In those 2 GB it needs to keep track of every process and thread started, of every file opened, of every network connection, even of every single registry handle opened by applications. While one handle to a registry key or a file does not amount to much, hundreds of thousands of them can occupy a lot of memory. The more applications are running, the more handles and various other memory structures are needed. For that reason terminal servers, which typically have 50-60 active users, often run out of available kernel memory before other system resources are depleted. The kernel's memory consists of several areas, most notably "paged pool" (memory pages that can be swapped out), "non-paged pool" (pages that need to reside in RAM) and "System PTEs" (page table entries). Each of these areas has a fixed maximum size that is calculated during system startup. Determining both the current and maximum values for a given system is easy once you know how to do it: ...And How to Check
- Install the Debugging Tools for Windows
- Download and unzip LiveKd from Sysinternals
- If you installed the Debugging Tools into a different folder than \Program Files\Microsoft\Debugging Tools for Windows then you need to copy LiveKd into the installation directory of the Debugging Tools
- Run LiveKd.exe and enter a path to an existing empty directory to be used as symbol path.
- Free System PTEs tells you how many PTEs are still available.
- NonPagedPool Usage tells you how much non-paged pool is in use.
- NonPagedPool Max tells you the total size of non-paged pool memory.
- PagedPool Usage tells you how much paged pool is in use.
- PagedPool Maximum tells you the size of paged pool memory.
- Is that even necessary?
- At what cost?