Академический Документы
Профессиональный Документы
Культура Документы
This is the first blog post in a series I'll write over the coming months called Pushing the Limits of Windows that describes how Windows and applications use a particular resource, the licensing and implementation-derived limits of the resource, how to measure the resources usage, and how to diagnose leaks. To be able to manage your Windows systems effectively you need to understand how Windows manages physical resources, such as CPUs and memory, as well as logical resources, such as virtual memory, handles, and window manager objects. Knowing the limits of those resources and how to track their usage enables you to attribute resource usage to the applications that consume them, effectively size a system for a particular workload, and identify applications that leak resources. Heres the index of the entire Pushing the Limits series. While they can stand on their own, they assume that you read them in order. Pushing the Limits of Windows: Physical Memory Pushing the Limits of Windows: Virtual Memory Pushing the Limits of Windows: Paged and Nonpaged Pool Pushing the Limits of Windows: Processes and Threads Pushing the Limits of Windows: Handles Pushing the Limits of Windows: USER and GDI Objects Part 1 Pushing the Limits of Windows: USER and GDI Objects Part 2
Physical Memory
One of the most fundamental resources on a computer is physical memory. Windows' memory manager is responsible with populating memory with the code and data of active processes, device drivers, and the operating system itself. Because most systems access more code and data than can fit in physical memory as they run, physical memory is in essence a window into the code and data used over time. The amount of memory can therefore affect performance, because when data or code a process or the operating system needs is not present, the memory manager must bring it in from disk. Besides affecting performance, the amount of physical memory impacts other resource limits. For example, the amount of non-paged pool, operating system buffers backed by physical memory, is obviously constrained by physical memory. Physical memory also contributes to the system virtual memory limit, which is the sum of roughly the size of physical memory plus the maximum configured size of any paging files. Physical memory also can indirectly limit the maximum number of processes, which I'll talk about in a future post on process and thread limits.
You can see physical memory support licensing differentiation across the server SKUs for all versions of Windows. For example, the 32-bit version of Windows Server 2008 Standard supports only 4GB, while the 32-bit Windows Server 2008 Datacenter supports 64GB. Likewise, the 64-bit Windows Server 2008 Standard supports 32GB and the 64-bit Windows Server 2008 Datacenter can handle a whopping 2TB. There aren't many 2TB systems out there, but the Windows Server Performance Team knows of a couple, including one they had in their lab at one point. Here's a screenshot of Task Manager running on that system:
The maximum 32-bit limit of 128GB, supported by Windows Server 2003 Datacenter Edition, comes from the fact that structures the Memory Manager uses to track physical memory would consume too much of the system's virtual address space on larger systems. The Memory Manager keeps track of each page of memory in an array called the PFN database and, for performance, it maps the entire PFN database into virtual memory. Because it represents each page of memory with a 28-byte data structure, the PFN database on a 128GB system requires about 980MB. 32-bit Windows has a 4GB virtual address space defined by hardware that it splits by default between the currently executing user-mode process (e.g. Notepad) and the system. 980MB therefore consumes almost half the available 2GB of system virtual address space, leaving only 1GB for mapping the kernel, device drivers, system cache and other system data structures, making that a reasonable cut off:
That's also why the memory limits table lists lower limits for the same SKU's when booted with 4GB tuning (called 4GT and enabled with the Boot.ini's /3GB or /USERVA, and Bcdedit's /Set IncreaseUserVa boot options), because 4GT moves the split to give 3GB to user mode and leave only 1GB for the system. For improved performance, Windows Server 2008 reserves more for system address space by lowering its maximum 32-bit physical memory support to 64GB. The Memory Manager could accommodate more memory by mapping pieces of the PFN database into the system address as needed, but that would add complexity and possibly reduce performance with the added overhead of map and unmap operations. It's only recently that systems have become large enough for that to be considered, but because the system address space is not a constraint for mapping the entire PFN database on 64-bit Windows, support for more memory is left to 64-bit Windows. The maximum 2TB limit of 64-bit Windows Server 2008 Datacenter doesn't come from any implementation or hardware limitation, but Microsoft will only support configurations they can test. As of the release of Windows Server 2008, the largest system available anywhere was 2TB and so Windows caps its use of physical memory there.
generally surfaced these problems. The problematic client driver ecosystem led to the decision for client SKUs to ignore physical memory that resides above 4GB, even though they can theoretically address it.
The result is that, if you have a system with 3GB or more of memory and you are running a 32-bit Windows client, you may not be getting the benefit of all of the RAM. On Windows 2000, Windows XP and Windows Vista RTM, you can see how much RAM Windows has accessible to it in the System Properties dialog, Task Manager's Performance page, and, on Windows XP and Windows Vista (including SP1), in the Msinfo32 and Winver utilities. On Window Vista SP1, some of these locations changed to show installed RAM, rather than available RAM, as documented in this Knowledge Base article. On my 4GB laptop, when booted with 32-bit Vista, the amount of physical memory available is 3.5GB, as seen in the Msinfo32 utility:
You can see physical memory layout with the Meminfo tool by Alex Ionescu (who's contributing to the 5th Edition of the Windows Internals that I'm coauthoring with David Solomon). Here's the output of Meminfo when I run it on that system with the -r switch to dump physical memory ranges:
Note the gap in the memory address range from page 9F0000 to page 100000, and another gap from DFE6D000 to FFFFFFFF (4GB). However, when I boot that system with 64-bit Vista, all 4GB show up as available and you can see how Windows uses the remaining 500MB of RAM that are above the 4GB boundary:
What's occupying the holes below 4GB? The Device Manager can answer that question. To check, launch "devmgmt.msc", select Resources by Connection in the View Menu, and expand the Memory node. On my laptop, the primary consumer of mapped device memory is, unsurprisingly, the video card, which consumes 256MB in the range E0000000-EFFFFFFF:
Other miscellaneous devices account for most of the rest, and the PCI bus reserves additional ranges for devices as part of the conservative estimation the firmware uses during boot. The consumption of memory addresses below 4GB can be drastic on high-end gaming systems with large video cards. For example, I purchased one from a boutique gaming rig company that came with 4GB of RAM and two 1GB video cards. I hadn't specified the OS version and assumed that they'd put 64-bit Vista on it, but it came with the 32-bit version and as a result only 2.2GB of the memory was accessible by Windows. You can see a giant memory hole from 8FEF0000 to FFFFFFFF in this Meminfo output from the system after I installed 64-bit Windows:
Device Manager reveals that 512MB of the over 2GB hole is for the video cards (256MB each), and it looks like the firmware has reserved more for either dynamic mappings or because it was conservative in its estimate:
Even systems with as little as 2GB can be prevented from having all their memory usable under 32-bit Windows because of chipsets that aggressively reserve memory regions for devices. Our shared family computer, which we purchased only a few months ago from a major OEM, reports that only 1.97GB of the 2GB installed is available:
The physical address range from 7E700000 to FFFFFFFF is reserved by the PCI bus and devices, which leaves a theoretical maximum of 7E700000 bytes (1.976GB) of physical address space, but even some of that is reserved for device memory, which explains why Windows reports 1.97GB.
Because device vendors now have to submit both 32-bit and 64-bit drivers to Microsoft's Windows Hardware Quality Laboratories (WHQL) to obtain a driver signing certificate, the majority of device drivers today can probably handle physical addresses above the 4GB line. However, 32-bit Windows will continue to ignore memory above it because there is still some difficult to measure risk, and OEMs are (or at least should be) moving to 64-bit Windows where it's not an issue. The bottom line is that you can fully utilize your system's memory (up the SKU's limit) with 64-bit Windows, regardless of the amount, and if you are purchasing a high end gaming system you should definitely ask the OEM to put 64-bit Windows on it at the factory.
On all versions of Windows you can graph available memory using the Performance Monitor by adding the Available Bytes counter in the Memory performance counter group:
You can see the instantaneous value in Process Explorer's System Information dialog, or, on versions of Windows prior to Vista, on Task Manager's Performance page.
Applications might use Heap APIs, the .NET garbage collector, or the C runtime malloc library to allocate virtual memory, but under the hood all of these rely on the VirtualAlloc API. When an application runs out of address space then VirtualAlloc, and therefore the memory managers layered on top of it, return errors (represented by a NULL address). The Testlimit utility, which I wrote for the 4th Edition of Windows Internals to demonstrate various Windows limits, calls VirtualAlloc repeatedly until it gets an error when you specify the r switch. Thus, when you run the 32-bit version of Testlimit on 32bit Windows, it will consume the entire 2GB of its address space:
2010 MB isnt quite 2GB, but Testlimits other code and data, including its executable and system DLLs, account for the difference. You can see the total amount of address space its consumed by looking at its Virtual Size in Process Explorer:
Some applications, like SQL Server and Active Directory, manage large data structures and perform better the more that they can load into their address space at the same time. Windows NT 4 SP3 therefore introduced a boot option, /3GB, that gives a process 3GB of its 4GB address space by reducing the size of the system address space to 1GB, and Windows XP and Windows Server 2003 introduced the /userva option that moves the split anywhere between 2GB and 3GB:
To take advantage of the address space above the 2GB line, however, a process must have the large address space aware flag set in its executable image. Access to the additional virtual memory is opt-in because some applications have assumed that theyd be given at most 2GB of the address space. Since the high bit of a pointer referencing an address below 2GB is always zero, they would use the high bit in their pointers as a flag for their own data, clearing it of course before referencing the data. If they ran with a 3GB address space they would inadvertently truncate pointers that have values greater than 2GB, causing program errors including possible data corruption. All Microsoft server products and data intensive executables in Windows are marked with the large address space awareness flag, including Chkdsk.exe, Lsass.exe (which hosts Active Directory services on a domain controller), Smss.exe (the session manager), and Esentutl.exe (the Active Directory Jet database repair tool). You can see whether an image has the flag with the Dumpbin utility, which comes with Visual Studio:
Testlimit is also marked large-address aware, so if you run it with the r switch when booted with the 3GB of user address space, youll see something like this:
Because the address space on 64-bit Windows is much larger than 4GB, something Ill describe shortly, Windows can give 32-bit processes the maximum 4GB that they can address and use the rest for the operating systems virtual memory. If you run Testlimit on 64-bit Windows, youll see it consume the entire 32-bit addressable address space:
64-bit processes use 64-bit pointers, so their theoretical maximum address space is 16 exabytes (2^64). However, Windows doesnt divide the address space evenly between the active process and the system, but instead defines a region in the address space for the process and others for various system memory resources, like system page table entries (PTEs), the file cache, and paged and non-paged pools. The size of the process address space is different on IA64 and x64 versions of Windows where the sizes were chosen by balancing what applications need against the memory costs of the overhead (page table pages and translation lookaside buffer - TLB - entries) needed to support the address space. On x64, thats 8192GB (8TB) and on IA64 its 7168GB (7TB - the 1TB difference from x64 comes from the fact that the top level page directory on IA64 reserves slots for Wow64 mappings). On both IA64 and x64 versions
of Windows, the size of the various resource address space regions is 128GB (e.g. non-paged pool is assigned 128GB of the address space), with the exception of the file cache, which is assigned 1TB. The address space of a 64-bit process therefore looks something like this:
The figure isnt drawn to scale, because even 8TB, much less 128GB, would be a small sliver. Suffice it to say that like our universe, theres a lot of emptiness in the address space of a 64-bit process. When you run the 64-bit version of Testlimit (Testlimit64) on 64-bit Windows with the r switch, youll see it consume 8TB, which is the size of the part of the address space it can manage:
Committed Memory
Testlimits r switch has it reserve virtual memory, but not actually commit it. Reserved virtual memory cant actually store data or code, but applications sometimes use a reservation to create a large block of virtual memory and then commit it as needed to ensure that the committed memory is contiguous in the address space. When a process commits a region of virtual memory, the operating system guarantees that it can maintain all the data the process stores in the memory either in physical memory or on disk. That means that a process can run up against another limit: the commit limit. As youd expect from the description of the commit guarantee, the commit limit is the sum of physical memory and the sizes of the paging files. In reality, not quite all of physical memory counts toward the commit limit since the operating system reserves part of physical memory for its own use. The amount of committed virtual memory for all the active processes, called the current commit charge, cannot exceed the system commit limit. When the commit limit is reached, virtual allocations that commit memory fail. That means that even a standard 32-bit process may get virtual memory allocation failures before it hits the 2GB address space limit. The current commit charge and commit limit is tracked by Process Explorer in its System Information window in the Commit Charge section and in the Commit History bar chart and graph:
Task Manager prior to Vista and Windows Server 2008 shows the current commit charge and limit similarly, but calls the current commit charge "PF Usage" in its graph:
On Vista and Server 2008, Task Manager doesn't show the commit charge graph and labels the current commit charge and limit values with "Page File" (despite the fact that they will be non-zero values even if you have no paging file):
You can stress the commit limit by running Testlimit with the -m switch, which directs it to allocate committed memory. The 32-bit version of Testlimit may or may not hit its address space limit before hitting the commit limit, depending on the size of physical memory, the size of the paging files and the current commit charge when you run it. If you're running 32-bit Windows and want to see how the system behaves when you hit the commit limit, simply run multiple instances of Testlimit until one hits the commit limit before exhausting its address space. Note that, by default, the paging file is configured to grow, which means that the commit limit will grow when the commit charge nears it. And even when when the paging file hits its maximum size, Windows is holding back some memory and its internal tuning, as well as that of applications that cache data, might free up more. Testlimit anticipates this and when it reaches the commit limit, it sleeps for a few seconds and then tries to allocate more memory, repeating this indefinitely until you terminate it. If you run the 64-bit version of Testlimit, it will almost certainly will hit the commit limit before exhausting its address space, unless physical memory and the paging files sum to more than 8TB, which as described previously is the size of the 64-bit application-accessible address space. Here's the partial
output of the 64-bit Testlimit running on my 8GB system (I specified an allocation size of 100MB to make it leak more quickly):
And here's the commit history graph with steps when Testlimit paused to allow the paging file to grow:
When system virtual memory runs low, applications may fail and you might get strange error messages when attempting routine operations. In most cases, though, Windows will be able present you the lowmemory resolution dialog, like it did for me when I ran this test:
After you exit Testlimit, the commit limit will likely drop again when the memory manager truncates the tail of the paging file that it created to accommodate Testlimit's extreme commit requests. Here, Process
Explorer shows that the current limit is well below the peak that was achieved when Testlimit was running:
Pagefile-backed virtual memory is harder to attribute, because it can be shared between processes. In fact, there's no process-specific counter you can look at to see how much a process has allocated or is referencing. When you run Testlimit with the -s switch, it allocates pagefile-backed virtual memory until it hits the commit limit, but even after consuming over 29GB of commit, the virtual memory statistics for the process don't provide any indication that it's the one responsible:
For that reason, I added the -l switch to Handle a while ago. A process must open a pagefile-backed virtual memory object, called a section, for it to create a mapping of pagefile-backed virtual memory in its address space. While Windows preserves existing virtual memory even if an application closes the handle to the section that it was made from, most applications keep the handle open. The -l switch prints the size of the allocation for pagefile-backed sections that processes have open. Here's partial output for the handles open by Testlimit after it has run with the -s switch:
You can see that Testlimit is allocating pagefile-backed memory in 1MB blocks and if you summed the size of all the sections it had opened, you'd see that it was at least one of the processes contributing large amounts to the commit charge.
suggestions are based on multiplying RAM size by some factor, with common values being 1.2, 1.5 and 2. Now that you understand the role that the paging file plays in defining a systems commit limit and how processes contribute to the commit charge, youre well positioned to see how useless such formulas truly are. Since the commit limit sets an upper bound on how much private and pagefile-backed virtual memory can be allocated concurrently by running processes, the only way to reasonably size the paging file is to know the maximum total commit charge for the programs you like to have running at the same time. If the commit limit is smaller than that number, your programs wont be able to allocate the virtual memory they want and will fail to run properly. So how do you know how much commit charge your workloads require? You might have noticed in the screenshots that Windows tracks that number and Process Explorer shows it: Peak Commit Charge. To optimally size your paging file you should start all the applications you run at the same time, load typical data sets, and then note the commit charge peak (or look at this value after a period of time where you know maximum load was attained). Set the paging file minimum to be that value minus the amount of RAM in your system (if the value is negative, pick a minimum size to permit the kind of crash dump you are configured for). If you want to have some breathing room for potentially large commit demands, set the maximum to double that number. Some feel having no paging file results in better performance, but in general, having a paging file means Windows can write pages on the modified list (which represent pages that arent being accessed actively but have not been saved to disk) out to the paging file, thus making that memory available for more useful purposes (processes or file cache). So while there may be some workloads that perform better with no paging file, in general having one will mean more usable memory being available to the system (never mind that Windows wont be able to write kernel crash dumps without a paging file sized large enough to hold them). Paging file configuration is in the System properties, which you can get to by typing sysdm.cpl into the Run dialog, clicking on the Advanced tab, clicking on the Performance Options button, clicking on the Advanced tab (this is really advanced), and then clicking on the Change button:
Youll notice that the default configuration is for Windows to automatically manage the page file size. When that option is set on Windows XP and Server 2003, Windows creates a single paging file thats minimum size is 1.5 times RAM if RAM is less than 1GB, and RAM if it's greater than 1GB, and that has
a maximum size that's three times RAM. On Windows Vista and Server 2008, the minimum is intended to be large enough to hold a kernel-memory crash dump and is RAM plus 300MB or 1GB, whichever is larger. The maximum is either three times the size of RAM or 4GB, whichever is larger. That explains why the peak commit on my 8GB 64-bit system thats visible in one of the screenshots is 32GB. I guess whoever wrote that code got their guidance from one of those magazines I mentioned! A couple of final limits related to virtual memory are the maximum size and number of paging files supported by Windows. 32-bit Windows has a maximum paging file size of 16TB (4GB if you for some reason run in non-PAE mode) and 64-bit Windows can having paging files that are up to 16TB in size on x64 and 32TB on IA64. For all versions, Windows supports up to 16 paging files, where each must be on a separate volume.
Nonpaged Pool
The kernel and device drivers use nonpaged pool to store data that might be accessed when the system cant handle page faults. The kernel enters such a state when it executes interrupt service routines (ISRs) and deferred procedure calls (DPCs), which are functions related to hardware interrupts. Page faults are also illegal when the kernel or a device driver acquires a spin lock, which, because they are the only type of lock that can be used within ISRs and DPCs, must be used to protect data structures that are accessed from within ISRs or DPCs and either other ISRs or DPCs or code executing on kernel threads. Failure by a driver to honor these rules results in the most common crash code, IRQL_NOT_LESS_OR_EQUAL. Nonpaged pool is therefore always kept present in physical memory and nonpaged pool virtual memory is assigned physical memory. Common system data structures stored in nonpaged pool include the kernel and objects that represent processes and threads, synchronization objects like mutexes, semaphores and events, references to files, which are represented as file objects, and I/O request packets (IRPs), which represent I/O operations.
Paged Pool
Paged pool, on the other hand, gets its name from the fact that Windows can write the data it stores to the paging file, allowing the physical memory it occupies to be repurposed. Just as for user-mode virtual memory, when a driver or the system references paged pool memory thats in the paging file, an operation called a page fault occurs, and the memory manager reads the data back into physical memory. The largest consumer of paged pool, at least on Windows Vista and later, is typically the Registry, since references to registry keys and other registry data structures are stored in paged pool. The data structures that represent memory mapped files, called sections internally, are also stored in paged pool. Device drivers use the ExAllocatePoolWithTag API to allocate nonpaged and paged pool, specifying the type of pool desired as one of the parameters. Another parameter is a 4-byte Tag, which drivers are supposed to use to uniquely identify the memory they allocate, and that can be a useful key for tracking down drivers that leak pool, as Ill show later.
Pool nonpaged bytes Pool paged bytes (virtual size of paged pool some may be paged out) Pool paged resident bytes (physical size of paged pool)
However, there are no performance counters for the maximum size of these pools. They can be viewed with the kernel debugger !vm command, but with Windows Vista and later to use the kernel debugger in local kernel debugging mode you must boot the system in debugging mode, which disables MPEG2 playback. So instead, use Process Explorer to view both the currently allocated pool sizes, as well as the maximum. To see the maximum, youll need to configure Process Explorer to use symbol files for the operating system. First, install the latest Debugging Tools for Windows package. Then run Process Explorer and open the Symbol Configuration dialog in the Options menu and point it at the dbghelp.dll in the Debugging Tools for Windows installation directory and set the symbol path to point at Microsofts symbol server:
After youve configured symbols, open the System Information dialog (click System Information in the View menu or press Ctrl+I) to see the pool information in the Kernel Memory section. Heres what that looks like on a 2GB Windows XP system:
physical memory on the system. The amount it assigns to nonpaged pool starts at 128MB on a system with 512MB and goes up to 256MB for a system with a little over 1GB or more. On a system booted with the /3GB option, which expands the user-mode address space to 3GB at the expense of the kernel address space, the maximum nonpaged pool is 128MB. The Process Explorer screenshot shown earlier reports the 256MB maximum on a 2GB Windows XP system booted without the /3GB switch. The memory manager in 32-bit Windows Vista and later, including Server 2008 and Windows 7 (there is no 32-bit version of Windows Server 2008 R2) doesnt carve up the system address statically; instead, it dynamically assigns ranges to different types of memory according to changing demands. However, it still sets a maximum for nonpaged pool thats based on the amount of physical memory, either slightly more than 75% of physical memory or 2GB, whichever is smaller. Heres the maximum on a 2GB Windows Server 2008 system:
2GB 32-bit Windows Server 2008 64-bit Windows systems have a much larger address space, so the memory manager can carve it up statically without worrying that different types might not have enough space. 64-bit Windows XP and Windows Server 2003 set the maximum nonpaged pool to a little over 400K per MB of RAM or 128GB, whichever is smaller. Heres a screenshot from a 2GB 64-bit Windows XP system:
2GB 64-bit Windows XP 64-bit Windows Vista, Windows Server 2008, Windows 7 and Windows Server 2008 R2 memory managers match their 32-bit counterparts (where applicable as mentioned earlier, there is no 32-bit version of Windows Server 2008 R2) by setting the maximum to approximately 75% of RAM, but they cap the maximum at 128GB instead of 2GB. Heres the screenshot from a 2GB 64-bit Windows Vista system, which has a nonpaged pool limit similar to that of the 32-bit Windows Server 2008 system shown earlier.
2GB 32-bit Windows Server 2008 Finally, heres the limit on an 8GB 64-bit Windows 7 system:
8GB 64-bit Windows 7 Heres a table summarizing the nonpaged pool limits across different version of Windows: 32-bit XP, Server 2003 64-bit
up to 1.2GB RAM: 32-256 min( ~400K/MB of RAM, MB 128GB) > 1.2GB RAM: 256MB
Vista, Server 2008, min( ~75% of RAM, 2GB) min(~75% of RAM, 128GB) Windows 7, Server 2008 R2
2GB 32-bit Windows XP 32-bit Windows Server 2003 reserves more space for paged pool, so its upper limit is 650MB.
Since 32-bit Windows Vista and later have dynamic kernel address space, they simply set the limit to 2GB. Paged pool will therefore run out either when the system address space is full or the system commit limit is reached. 64-bit Windows XP and Windows Server 2003 set their maximums to four times the nonpaged pool limit or 128GB, whichever is smaller. Here again is the screenshot from the 64-bit Windows XP system, which shows that the paged pool limit is exactly four times that of nonpaged pool:
2GB 64-bit Windows XP Finally, 64-bit versions of Windows Vista, Windows Server 2008, Windows 7 and Windows Server 2008 R2 simply set the maximum to 128GB, allowing paged pools limit to track the system commit limit. Heres the screenshot of the 64-bit Windows 7 system again:
8GB 64-bit Windows 7 Heres a summary of paged pool limits across operating systems: 32-bit XP, Server 2003 64-bit
XP: up to 491MB min( 4 * nonpaged pool limit, Server 2003: up to 128GB) 650MB
Vista, Server 2008, min( system commit limit, min( system commit limit, 128GB) Windows 7, Server 2008 2GB) R2
Dont run this on a system unless youre prepared for possible data loss, as applications and I/O operations will start failing when pool runs out. You might even get a blue screen if the driver doesnt handle the out-of-memory condition correctly (which is considered a bug in the driver). The Windows Hardware Quality Laboratory (WHQL) stresses drivers using the Driver Verifier, a tool built into Windows, to make sure that they can tolerate out-of-pool conditions without crashing, but you might have third-party drivers that havent gone through such testing or that have bugs that werent caught during WHQL testing. I ran Notmyfault on a variety of test systems in virtual machines to see how they behaved and didnt encounter any system crashes, but did see erratic behavior. After nonpaged pool ran out on a 64-bit Windows XP system, for example, trying to launch a command prompt resulted in this dialog:
On a 32-bit Windows Server 2008 system where I already had a command prompt running, even simple operations like changing the current directory and directory listings started to fail after nonpaged pool was exhausted:
On one test system, I eventually saw this error message indicating that data had potentially been lost. I hope you never see this dialog on a real system!
Running out of paged pool causes similar errors. Heres the result of trying to launch Notepad from a command prompt on a 32-bit Windows XP system after paged pool had run out. Note how Windows failed to redraw the windows title bar and the different errors encountered for each attempt:
And heres the start menus Accessories folder failing to populate on a 64-bit Windows Server 2008 system thats out of paged pool:
Here you can see the system commit level, also displayed on Process Explorers System Information dialog, quickly rise as Notmyfault leaks large chunks of paged pool and hits the 2GB maximum on a 2GB 32-bit Windows Server 2008 system:
The reason that Windows doesnt simply crash when pool is exhausted, even though the system is unusable, is that pool exhaustion can be a temporary condition caused by an extreme workload peak, after which pool is freed and the system returns to normal operation. When a driver (or the kernel) leaks pool, however, the condition is permanent and identifying the cause of the leak becomes important. Thats where the pool tags described at the beginning of the post come into play.
When you suspect a pool leak and the system is still able to launch additional applications, Poolmon, a tool in the Windows Driver Kit, shows you the number of allocations and outstanding bytes of allocation by type of pool and the tag passed into calls of ExAllocatePoolWithTag. Various hotkeys cause Poolmon to sort by different columns; to find the leaking allocation type, use either b to sort by bytes or d to sort by the difference between the number of allocations and frees. Heres Poolmon running on a system where Notmyfault has leaked 14 allocations of about 100MB each:
After identifying the guilty tag in the left column, in this case Leak, the next step is finding the driver thats using it. Since the tags are stored in the driver image, you can do that by scanning driver images for the tag in question. The Strings utility from Sysinternals dumps printable strings in the files you specify (that are by default a minimum of three characters in length), and since most device driver images are in the %Systemroot%\System32\Drivers directory, you can open a command prompt, change to that directory and execute strings * | findstr <tag>. After youve found a match, you can dump the drivers version information with the Sysinternals Sigcheck utility. Heres what that process looks like when looking for the driver using Leak:
If a system has crashed and you suspect that its due to pool exhaustion, load the crash dump file into the Windbg debugger, which is included in the Debugging Tools for Windows package, and use the !vm command to confirm it. Heres the output of !vm on a system where Notmyfault has exhausted nonpaged pool:
Once youve confirmed a leak, use the !poolused command to get a view of pool usage by tag thats similar to Poolmons. !poolused by default shows unsorted summary information, so specify 1 as the the option to sort by paged pool usage and 2 to sort by nonpaged pool usage:
Use Strings on the system where the dump came from to search for the driver using the tag that you find causing the problem. So far in this blog series Ive covered the most fundamental limits in Windows, including physical memory, virtual memory, paged and nonpaged pool. Next time Ill talk about the limits for the number of processes and threads that Windows supports, which are limits that derive from these.
Thread Limits
Besides basic information about a thread, including its CPU register state, scheduling priority, and resource usage accounting, every thread has a portion of the process address space assigned to it, called a stack, which the thread can use as scratch storage as it executes program code to pass function parameters, maintain local variables, and save function return addresses. So that the systems virtual memory isnt unnecessarily wasted, only part of the stack is initially allocated, or committed and the rest is simply reserved. Because stacks grow downward in memory, the system places guard pages beyond the committed part of the stack that trigger an automatic commitment of additional memory (called a stack expansion) when accessed. This figure shows how a stacks committed region grows down and the guard page moves when the stack expands, with a 32-bit address space as an example (not drawn to scale):
The Portable Executable (PE) structures of the executable image specify the amount of address space reserved and initially committed for a threads stack. The linker defaults to a reserve of 1MB and commit of one page (4K), but developers can override these values either by changing the PE values when they link their program or for an individual thread in a call to CreateThread. You can use a tool like Dumpbin that comes with Visual Studio to look at the settings for an executable. Heres the Dumpbin output with the /headers option for the executable generated by a new Visual Studio project:
Converting the numbers from hexadecimal, you can see the stack reserve size is 1MB and the initial commit is 4K and using the new Sysinternals VMMap tool to attach to this process and view its address space, you can clearly see a thread stacks initial committed page, a guard page, and the rest of the reserved stack memory:
Because each thread consumes part of a processs address space, processes have a basic limit on the number of threads they can create thats imposed by the size of their address space divided by the thread stack size.
Again, since part of the address space was already used by the code and initial heap, not all of the 2GB was available for thread stacks, thus the total threads created could not quite reach the theoretical limit of 2,048. I linked the Testlimit executable with the large address space-aware option, meaning that if its presented with more than 2GB of address space (for example on 32-bit systems booted with the /3GB or /USERVA Boot.ini option or its equivalent BCD option on Vista and later increaseuserva), it will use it. 32-bit processes are given 4GB of address space when they run on 64-bit Windows, so how many threads can the 32-bit Testlimit create when run on 64-bit Windows? Based on what weve covered so far, the answer should be roughly 4096 (4GB divided by 1MB), but the number is actually significantly smaller. Heres 32-bit Testlimit running on 64-bit Windows XP:
The reason for the discrepancy comes from the fact that when you run a 32-bit application on 64-bit Windows, it is actually a 64-bit process that executes 64-bit code on behalf of the 32-bit threads, and therefore there is a 64-bit thread stack and a 32-bit thread stack area reserved for each thread. The 64-bit stack has a reserve of 256K (except that on systems prior to Vista, the initial threads 64-bit stack is 1MB). Because every 32-bit thread begins its life in 64-bit mode and the stack space it uses when starting exceeds a page, youll typically see at least 16KB of the 64-bit stack committed. Heres an example of a 32-bit threads 64-bit and 32-bit stacks (the one labeled Wow64 is the 32-bit stack):
32-bit Testlimit was able to create 3,204 threads on 64-bit Windows, which given that each thread uses 1MB+256K of address space for stack (again, except the first on versions of Windows prior to Vista, which uses 1MB+1MB), is exactly what youd expect. I got different results when I ran 32-bit Testlimit on 64-bit Windows 7, however:
The difference between the Windows XP result and the Windows 7 result is caused by the more random nature of address space layout introduced in Windows Vista, Address Space Load Randomization (ASLR), that leads to some fragmentation. Randomization of DLL loading, thread stack and heap placement, helps defend against malware code injection. As you can see from this VMMap output, theres 357MB of address space still available, but the largest free block is only 128K in size, which is smaller than the 1MB required for a 32-bit stack:
As I mentioned, a developer can override the default stack reserve. One reason to do so is to avoid wasting address space when a threads stack usage will always be significantly less than the default 1MB. Testlimit sets the default stack reservation in its PE image to 64K and when you include the n switch along with the t switch, Testlimit creates threads with 64K stacks. Heres the output on a 32-bit Windows XP system with 256MB RAM (I did this experiment on a small system to highlight this particular limit):
Note the different error, which implies that address space isnt the issue here. In fact, 64K stacks should allow for around 32,000 threads (2GB/64K = 32,768). Whats the limit thats being hit in this case? A look at the likely candidates, including commit and pool, dont give any clues, as theyre all below their limits:
Its only a look at additional memory information in the kernel debugger that reveals the threshold thats being hit, resident available memory, which has been exhausted:
Resident available memory is the physical memory that can be assigned to data or code that must be kept in RAM. Nonpaged pool and nonpaged drivers count against it, for example, as does memory thats locked in RAM for device I/O operations. Every thread has both a user-mode stack, which is what Ive been talking about, but they also have a kernel-mode stack thats used when they run in kernel mode, for example while executing system calls. When a thread is active its kernel stack is locked in memory so that the thread can execute code in the kernel that cant page fault. A basic kernel stack is 12K on 32-bit Windows and 24K on 64-bit Windows. 14,225 threads require about 170MB of resident available memory, which corresponds to exactly how much is free on this system when Testlimit isnt running:
Once the resident available memory limit is hit, many basic operations begin failing. For example, heres the error I got when I double-clicked on the desktops Internet Explorer shortcut:
As expected, when run on 64-bit Windows with 256MB of RAM, Testlimit is only able to create 6,600 threads roughly half what it created on 32-bit Windows with 256MB RAM - before running out of resident available memory:
The reason I said basic kernel stack earlier is that a thread that executes graphics or windowing functions gets a large stack when it executes the first call thats 20K on 32-bit Windows and 48K on 64-bit Windows. Testlimits threads dont call any such APIs, so they have basic kernel stacks.
In this case, its the initial thread stack commit that causes the system to run out of virtual memory and the paging file is too small error. Once the commit level reached the size of RAM, the rate of thread creation slowed to a crawl because the system started thrashing, paging out stacks of threads created earlier to make room for the stacks of new threads, and the paging file had to expand. The results are the same when the n switch is specified, because the threads have the same initial stack commitment.
Process Limits
The number of processes that Windows supports obviously must be less than the number of threads, since each process has one thread and a process itself causes additional resource usage. 32-bit Testlimit running on a 2GB 64-bit Windows XP system created about 8,400 processes:
A look in the kernel debugger shows that it hit the resident available memory limit:
If the only cost of a process with respect to resident available memory was the kernel-mode thread stack, Testlimit would have been able to create far more than 8,400 threads on a 2GB system. The amount of resident available memory on this system when Testlimit isnt running is 1.9GB:
Dividing the amount of resident memory Testlimit used (1.9GB) by the number of processes it created (8,400) yields 230K of resident memory per process. Since a 64-bit kernel stack is 24K, that leaves about 206K unaccounted for. Wheres the rest of the cost coming from? When a process is created, Windows reserves enough physical memory to accommodate the processs minimum working set size. This acts as a guarantee to the process that no matter what, there will enough physical memory available to hold enough data to satisfy its minimum working set. The default working set size happens to be 200KB, a fact thats evident when you add the Minimum Working Set column to Process Explorers display:
The remaining roughly 6K is resident available memory charged for additional non-pageable memory allocated to represent a process. A process on 32-bit Windows will use slightly less resident memory because its kernel-mode thread stack is smaller. As they can for user-mode thread stacks, processes can override their default working set size with the SetProcessWorkingSetSize function. Testlimit supports a n switch, that when combined with p, causes child processes of the main Testlimit process to set their working set to the minimum possible, which is 80K. Because the child processes must run to shrink their working sets, Testlimit sleeps after it cant create any more processes and then tries again to give its children a chance to execute. Testlimit executed with the n switch on a Windows 7 system with 4GB of RAM hit a limit other than resident available memory: the system commit limit:
Here you can see the kernel debugger reporting not only that the system commit limit had been hit, but that there have been thousands of memory allocation failures, both virtual and paged pool allocations, following the exhaustion of the commit limit (the system commit limit was actually hit several times as the paging file was filled and then grown to raise the limit):
The baseline commitment before Testlimit ran was about 1.5GB, so the threads had consumed about 8GB of committed memory. Each process therefore consumed roughly 8GB/6,600, or 1.2MB. The output of the kernel debuggers !vm command, which shows the private memory allocated by each active process, confirms that calculation:
The initial thread stack commitment, described earlier, has a negligible impact with the rest coming from the memory required for the process address space data structures, page table entries, the handle table, process and thread objects, and private data the process creates when it initializes.
commit limit. In any case, applications that create enough threads or processes to get anywhere near these limits should rethink their design, as there are almost always alternate ways to accomplish the same goals with a reasonable number. For instance, the general goal for a scalable application is to keep the number of threads running equal to the number of CPUs (with NUMA changing this to consider CPUs per node) and one way to achieve that is to switch from using synchronous I/O to using asynchronous I/O and rely on I/O completion ports to help match the number of running threads to the number of CPUs.
When an application wants to manage one of these resources it first must call the appropriate API to create or open the resource. For instance, the CreateFile function opens or creates a file, the RegOpenKeyEx function opens a registry key, and the CreateSemaphoreEx function opens or creates a semaphore. If the function succeeds, Windows allocates a handle in the handle table of the applications process and returns the handle value, which applications treat as opaque but that is actually the index of the returned handle in the handle table. With the handle in hand, the application then queries or manipulates the object by passing the handle value to subsequent API functions like ReadFile, SetEvent, SetThreadPriority, and MapViewOfFile. The system can look up the object the handle refers to by indexing into the handle table to locate the corresponding handle entry, which contains a pointer to the object. The handle entry also stores the accesses the process was granted at the time it opened the object, which enables the system to make sure it doesnt allow the process to perform an operation on the object for which it didnt ask permission. For example, if the process successfully opened a file for read access, the handle entry would look like this:
If the process tried to write to the file, the function would fail because the access hadnt been granted and the cached read access means that the system doesnt have to execute a more expensive access-check again.
The result doesnt represent the total number of handles a process can create, however, because system DLLs open various objects during process initialization. You see a processs total handle count by adding a handle count column to Task Manager or Process Explorer. The total shown for Testlimit in this case is 16,711,680:
When you run Testlimit on a 32-bit system, the number of handles it can create is slightly different:
Where do the differences come from? The answer lies in the way that the Executive, which is responsible for managing handle tables, sets the per-process handle limit, as well as the size of a handle table entry. In one of the rare cases where Windows sets a hard-coded upper limit on a resource, the Executive defines 16,777,216 (16*1024*1024) as the maximum number of handles a process can allocate. Any process that has more than a ten thousand handles open at any given point in time is likely either poorly designed or has a handle leak, so a limit of 16 million is essentially infinite and can simply help prevent a process
with a leak from impacting the rest of the system. Understanding why the numbers Task Manager shows dont equal the hard-coded maximum requires a look at the way the Executive organizes handle tables. A handle table entry must be large enough to store the granted-access mask and an object pointer. The access mask is 32-bits, but the pointer size obviously depends on whether its a 32-bit or 64-bit system. Thus, a handle entry is 8-bytes on 32-bit Windows and 12-bytes on 64-bit Windows. 64-bit Windows aligns the handle entry data structure on 64-bit boundaries, so a 64-bit handle entry actually consumes 16bytes. Heres the definition for a handle entry on 64-bit Windows, as shown in a kernel debugger using the dt (dump type) command:
The output reveals that the structure is actually a union that can sometimes store information other than an object pointer and access mask, but those two fields are highlighted. The Executive allocates handle tables on demand in page-sized blocks that it divides into handle table entries. That means a page, which is 4096 bytes on both x86 and x64, can store 512 entries on 32-bit Windows and 256 entries on 64-bit Windows. The Executive determines the maximum number of pages to allocate for handle entries by dividing the hard-coded maximum,16,777,216, by the number of handle entries in a page, which results on 32-bit Windows to 32,768 and on 64-bit Windows to 65,536. Because the Executive uses the first entry of each page for its own tracking information, the number of handles available to a process is actually 16,777,216 minus those numbers, which explains the results obtained by Testlimit: 16,777,216-65,536 is 16,711,680 and 16,777,216-65,536-32,768 is 16,744,448.
On 64-bit Windows, there are 256 pointers in a page of top-level pointers. That means the total paged pool usage for a full handle table is 16,777,216/256*4096, which is 256MB. A look at Testlimits paged pool usage on 64-bit Windows confirms the calculation:
Paged pool is generally large enough to more than accommodate those sizes, but as I stated earlier, a process that creates that many handles is almost certainly going to exhaust other resources, and if it reaches the per-process handle limit it will probably fail itself because it cant open any other objects.
Handle Leaks
A handle leaker will have a handle count that rises over time. The reason that a handle leak is so insidious is that unlike the handles Testlimit creates, which all point to the same object, a process leaking handles is probably leaking objects as well. For example, if a process creates events but fails to close them, it will leak both handle entries and event objects. Event objects consume nonpaged pool, so the leak will impact nonpaged pool in addition to paged pool. You can graphically spot the objects a process is leaking using Process Explorers handle view because it highlights new handles in green and closed handles in red; if you see lots of green with infrequent red then you might be seeing a leak. You can watch Process Explorers handle highlighting in action by opening a Command Prompt process, selecting the process in Process Explorer, opening the handle-view lower pane and then changing directory in the Command Prompt. The old working directorys handle will highlight in red and the new one in green:
By default, Process Explorer only shows handles that reference objects that have names, which means that you wont see all the handles a process is using unless you select Show Unnamed Handles and Mappings from the View menu. Here are some of the unnamed handles in Command Prompts handle table:
Just like most bugs, only the developer of the code thats leaking can fix it. If you spot a leak in a process that can host multiple components or extensions, like Explorer, a Service Host or Internet Explorer, then the question is what component is the one responsible for the leak. Figuring that out might enable you to avoid the problem by disabling or uninstalling the problematic extension, fix the problem by checking for an update, or report the bug to the vendor. Fortunately, Windows includes a handle tracing facility that you can use to help identify leaks and the responsible software. Its enabled on a per-process basis and when active causes the Executive to record a stack trace at the time every handle is created or closed. You can enable it either by using the Application Verifier, a free download from Microsoft, or by using the Windows Debugger (Windbg). You should use the Application Verifier if you want the system to track a processs handle activity from when it starts. In
either case, youll need to use a debugger and the !htrace debugger command to view the trace information. To demonstrate the tracing in action, I launched Windbg and attached to the Command Prompt I had opened earlier. I then executed the !htrace command with the -enable switch to turn on handle tracing:
I let the processs execution continue and changed directory again. Then I switched back to Windbg, stopped the processs execution, and executed htrace without any options, which has it list all the open and close operations the process executed since the previous !htrace snapshot (created with the snapshot option) or from when handle tracing was enabled. Heres the output of the command for the same session:
The events are printed from most recent operation to least, so reading from the bottom, Command Prompt opened handle 0xb8, then closed it, next opened handle 0x22c, and finally closed handle 0xec. Process Explorer would show handle 0x22c in green and 0xec in red if it was refreshed after the directory change, but probably wouldnt see 0xb8 unless it happened to refresh between the open and close of that handle. The stack for 0x22cs open reveals that it was the result of Command Prompt (cmd.exe) executing its
ChangeDirectory function. Adding the handle value column to Process Explorer confirms that the new handle is 0x22c:
If youre just looking for leaks, you should use !htrace with the diff switch, which has it show only new handles since the last snapshot or start of tracing. Executing that command shows just handle 0x22c, as expected:
Finally, a great video that presents more tips for debugging handle leaks is this Channel 9 interview with Jeff Dailey, a Microsoft Escalation Engineer that debugs them for a living: http://channel9.msdn.com/posts/jeff_dailey/Understanding-handle-leaks-and-how-to-use-htrace-to-findthem/ Next time Ill look at limits for a couple of other handle-based resources, GDI Object and USER Objects. Handles to those resources are managed by the Windows subsystem, not the Executive, so use different resources and have different limits.
Since every process is associated with a specific session and the operating system usually only needs access to the session-specific data of the current processs session, Windows defines a view into a processs session data in the processs virtual address space. Thus, when the system switches between threads of different processes, it also switches address spaces, switching the current session view. When the Csrss.exe process of Session 0 is the current process, for example, the address space mappings include the system address space (which is included in every processs address space), Csrsss address space, and the Session 0 address space. The region of memory mapping a sessions data is known as Session View Space or Session Space. When the system switches to a thread from Session 1s Explorer process, the mappings change accordingly, and when it switches to a thread from Notepad, the Session 1 Session Space remains mapped:
Note that the figure isnt exactly correct for 32-bit Windows Vista and higher, since dynamic system address space means that the Session Space isnt necessarily contiguous and can grow and shrink as required on those systems. The next concept is the desktop, an object defined by the window manager to represent a virtual display that includes the windows associated with the desktop (note that this is different than Explorers definition of a desktop, which is the users directory with shortcuts and other objects the user places there). The default desktop is named Default, but applications can create additional desktops and switch the connection to the logical display, something the Sysinternals Desktops utility uses to create up to four virtual desktops a user can switch between. Finally, to support multiple virtual displays that are associated with the same window manager instance, the window manager defines the window station object. A window station is associated with a particular session and a session can have multiple window stations, but each session has only one interactive window station, called Winsta0, that can connect with a physical or logical display, keyboard and mouse; the other window stations are essentially headless and support for them exists solely to isolate processes that expect window manager services, but that shouldnt. For example, the system creates non-interactive window stations for each service account with which it associates processes running in the account, since Windows services should not display user-interfaces. You can see the window stations associated with Session 0 by looking in the Object Manager namespace under the \Windows directory using the Sysinternals Winobj tool (viewing the directory requires running elevated with administrative privileges). Here you can see that a window station the Microsoft Windows
Search Service creates to run search filters in, window stations for each of the three built-in service accounts (System, Network Service and Local Service), and Session 0s interactive window station:
You can see the window stations associated with other sessions in the Sessions directory in the Object Manager namespace. Heres the only window station, the interactive WinSta0 window station, associated with my logon session:
This diagram shows the relationship between sessions, window stations, and desktops for a system that has one user logged into Session1 on the physical console and another logged on to Session 2 via a remote desktop connection where the user has run a virtual desktops utility and switched the display to Desktop1.
Besides having an association with a particular session, processes are associated with a specific window station and desktop, though processes can switch between both and threads can switch between desktops. Thus, every processs association can be represented with a hierarchical path like this: Session 1\WinSta0\Default. You can in most cases indirectly determine what window station and desktop a process is connected to by looking at its handle table in Process Explorers handle view to see the names of the objects it has open. This screen shots of the handle table of an Explorer process show that it is connected to the Default desktop on WinSta0 of Session 1:
User Objects
With the fundamental concepts in hand, lets turn our attention first to USER objects. USER objects get their name from the fact that they represent user interface elements like desktops, windows, menus, cursors, icons, and accelerator tables (menu keyboard shortcuts). Despite the fact that USER objects are associated with a specific desktop, they must be accessible from all the desktops of a session, for example to allow a process on one desktop to register for a hotkey that can be entered on any of them. For that reason, the window manager assigns USER object identifiers that are scoped to a window station. A basic limitation imposed by the window manager is that no process can create more than 10,000 USER objects. That limitation attempts to prevent a single process from exhausting the resources associated with USER objects, either because its programmed with algorithms that can create excessive number of objects or because it leaks objects by allocating them and not deleting them when its through using them. You can easily verify this limit by running the Sysinternals Testlimit utility with the u switch, which directs Testlimit to create as many USER objects as it can:
The window manager keeps track of how many USER objects a process allocates, which you can see when you add the USER Objects column to Process Explorers display, so that you can keep tabs on the number of objects processes allocate. This screenshot shows that, as expected, Windows system processes, including Lsass.exe (the Local Security Authority Subsystem) and service processes like Svchost, dont allocate USER objects because they have no user interface:
Process Explorer shows the number of USER objects a process has allocated on the Performance page of a processs process properties dialog:
One fundamental limitation on the number of USER objects comes from the fact that their identifiers were 16-bit values in the first versions of Windows, which were 16-bit. When 32-bit support was added in later versions, USER identifiers had to remain restricted to 16-bit values so that 16-bit processes could interact with windows and other USER objects created by 32-bit processes. Thus, 65,535 (2^16) is the limit on the total number of USER objects that can be created on a session (and for historical reasons, windows must have even-numbered identifiers, so there can be a maximum of 32,768 windows per session). You can verify this limit by running multiple copies of Testlimit with the u switch until you cant create any more. Assuming the processes you already have running arent using an excessive number of objects, you should be able to run 7 copies, where the first 6 allocate 10,000 objects and the last allocates the difference between the number of already allocated objects and 65,535:
Be sure that youre prepared to hard power-off your system when you do this because the desktop may become unusable. Many operations, like even opening the start menus shutdown menu, require USER objects, and when no more can be allocated the system will behave in bizarre ways. I wasnt even able to terminate a Notepad process I had running by clicking on its close menu button after USER objects had been exhausted. So far Ive talked only about the limits associated with absolute number of USER objects a process or window station can allocate, but there are other limits caused by the storage used for the USER objects themselves. Each desktop has its own region of memory, called the desktop heap, from which most USER objects created on the desktop are allocated. Because desktop heaps are stored in Session Space and 32-bit address spaces limit the amount of kernel-mode address space, the sizes of desktop heaps are capped at a relatively modest amount. They also vary in size depending on the type of desktop they are for and whether the system is a 32-bit or 64-bit system. Matthew Justices Desktop Heap Overview and Desktop Heap, Part 2 articles from the NT Debugging Blog do an excellent job of documenting desktop heap sizes up through Windows Vista SP1. This table summarizes the sizes across Windows versions up through Windows Server 2008 R2: Interactive Non-Interactive Winlogon Disconnect
Desktop Windows XP 32-bit 3 MB Windows Server 20033 MB 32-bit Windows Server 200320 MB 64-bit Windows 12 MB Vista/Windows Server 2008 32-bit Windows 20 MB Vista/Windows Server 2008 64-bit Windows 7 32-bit 12 MB Windows 7/Windows20 MB Server 2008 R2 64-bit
Desktop 64 KB 64 KB 96 KB 64 KB
768 KB
192 KB
96 KB
512 KB 768 KB
128 KB 192 KB
64 KB 96 KB
Its worth noting that the original release of Windows Vista 32-bit had the 3 MB Interactive Heap size that previous 32-bit versions of Windows had. After the release our telemetry showed us that some users occasionally ran out of heap, presumably because they were running more applications on systems that had more memory, so SP1 raised the size to 12 MB. Its also possible to override the default desktop heap sizes with registry settings described in Matthews article. On versions of Windows prior to Windows Vista, you can use the Microsoft Desktop Heap Monitor tool to view the sizes of the Desktop Heaps and how much of each is in use. Heres the output of the tool on a 32-bit Windows XP system showing that only 5.6% of heap (172 KB) on the interactive desktop, Default, has been consumed:
The tool hasnt been updated for Windows Vista because the larger desktop heap sizes on newer versions of Windows mean that desktop heap is rarely exhausted before other USER object limits are hit. However, you can use Testlimit with the u and i switches to see how the system behaves when interactive desktop heap exhaustion occurs. The switch combination has Testlimit create window class data structures that have 4 KB of extra class storage until it fails. Heres the output of Testlimit when run immediately after I captured the above Desktop Heap Monitor output. 2823 KB plus the 172 KB that Desktop Heap Monitor said was already allocated equals about 3 MB:
Though theres no way to determine how much heap is in use on newer systems, the window manager writes an event to the system event log when theres heap exhaustion that can help troubleshoot window manager issues:
That covers USER object limits. Stay tuned for Part 2, where Ill discuss the limits related to window manager GDI objects.
GDI Objects
GDI objects represent graphical device interface resources like fonts, bitmaps, brushes, pens, and device contexts (drawing surfaces). As it does for USER objects, the window manager limits processes to at most 10,000 GDI objects, which you can verify with Testlimit using the g switch:
You can look at an individual processs GDI object usage on the Performance page of its Process Explorer process properties dialog and add the GDI Objects column to Process Explorer to watch GDI object usage across processes:
Also like USER objects, 16-bit interoperability means that USER objects have 16-bit identifiers, limiting them to 65,535 per session. Heres the desktop as it appeared when Testlimit hit that limit on a Windows Vista 64-bit system:
Note the Start button on the bottom left where it belongs, but the rest of the task bar at the top of the screen. The desktop has turned black and the sidebar has lost most of its color. Your mileage may vary, but you can see that bizarre things start to happen, potentially making it impossible to interact with the desktop in a reliable way. Heres what the display switched to when I pressed the Start button:
Unlike USER objects, GDI objects arent allocated from desktop heaps; instead, on Windows XP and Windows Server 2003 systems that dont have Terminal Services installed, they allocate from general paged pool; on all other systems they allocate from per-session session pool. The kernel debuggers !vm 4 command dumps general virtual memory information, including session information at the end of the output. On a Windows XP system it shows that session paged pool is unused:
On a Windows Server 2003 system without Terminal Services, the output is similar:
The GDI object memory limit on these systems is therefore the paged pool limit, as described in my previous post, Pushing the Limits of Windows: Paged and Nonpaged Pool. However, when Terminal Services are installed on the same Windows Server 2003 system, you can see from the non-zero session pool usage that GDI objects come from session pool:
The !vm 4 command in the above output also shows the session paged pool maximum and session pool sizes, but the session paged pool maximum and session space sizes dont display on Windows Vista and higher because they are variable. Session paged pool usage on those systems is capped by either the amount of address space it can grow to or the System Commit Limit, whichever is smaller. Heres the output of the command on a Windows 7 system showing the current session paged pool usage by session:
As youd expect, the main interactive session, Session 1, is consuming the most session paged pool.
You can use the Testlimit tool with the g 0 switch to see what happens when the storage used for GDI objects is exhausted. The number you specify after the g is the size of the GDI bitmap objects Testlimit allocates, but a size of 0 has Testlimit simply try and allocate the largest objects possible. Heres the result on a 32-bit Windows XP system:
On a Windows XP or Windows Server 2003 that doesnt have Terminal Services installed you can use the Poolmon utility from Windows Driver Kit (WDK) to see the GDI object allocations by their pool tag. The output of Poolmon the while Testlimit was exhausting paged pool on the WIndows XP system looks like this when sorted by bytes allocated (type b in the Poolmon display to sort by bytes allocated), by inference indicating that Gh05 is the tag for bitmap objects on Windows Server 2003:
On a Windows Server 2003 system with Terminal Services installed, and on Windows Vista and higher, you have to use Poolmon with the /s switch to specify which session you want to view. Heres Testlimit executed on a Windows Server 2003 system that has Terminal Services installed:
The command poolmon /s1 shows the tags with the largest allocation contributing for Session 1. You can see the Gh15 tag at the top, showing that a different pool tag is being used for bitmap allocations:
Note how Testlimit was able to allocate around 58 MB of bitmap data (that number doesnt account for GDIs internal overhead for a bitmap object) on the Windows XP system, but only 10MB on the Windows Server 2003 system. The smaller number comes from the fact that session pool on the Windows Server 2003 Terminal Server system is only 32 MB, which is about the amount of memory Poolmon shows attributed to the Gh15 tag. The output of !vm 4 confirms that session pool for Session1 is been consumed and that subsequent attempts to allocate GDI objects from session pool have failed:
You can also use the !poolused kernel debugger command to look at session pool usage. First, switch to the correct session by using the .process command with the /p switch and the address of a process object thats connected to the session. To see what processes are running in a particular session, use the ! sprocess command. Heres the output of !poolmon on the same Windows Server 2003 system, where the c option to !poolused has it sort the output by allocated bytes:
Unfortunately, theres no public mapping between the window managers heap tags and the objects they represent, but the kernel debuggers !poolused command uses the triage.ini file from the debuggers installation directory to print more descriptive information about a tag. The command reports that Gh15 is GDITAG_HMGR_SPRITE_TYPE, which is only slightly more helpful, but others are more clear. Fortunately, most GDI and USER object issues are limited to a particular process hitting the per-process 10,000 object limit and so more advanced investigation to figure out what process is responsible for exhausting session pool or allocating GDI objects to exhaust paged pool is unnecessary. Next time Ill take a look at System Page Table Entries (System PTEs), another key system resource that can has limits that can be hit, especially on Remote Desktop sessions on Windows Server 2003 systems.