Академический Документы
Профессиональный Документы
Культура Документы
A Simple Mutator:
Were going to try and illustrate what Ive just explained using a simple C# program; a Mutator, as its called. This program makes an instance of a collection, which it then assigns into a local variable, and because this collection is assigned to the local variable, itll be live. And because this local variable is used throughout the execution of the While loop you can see below, itll be live for the rest of the program:
var collect = new List<B>(); while(true) { collect.Add(new A()); new A(); new A(); }
Listing 1 A simple mutator What were doing is allocating three instances of a small class weve called A. The first instance well put into the collection; it will remain live because its referenced by the collection, which is in turn referenced by the local variable. Then the two other instances will be allocated, but those wont be referenced by anything, so theyll be dead. If we consider how this allocation pattern looks when we study Gen0, well see it looks something like this:
Allocation in Generation 0
Figure 1 A hypothetically empty Generation 0 Here were assuming that Gen0 is empty at the point when our function starts running. Now of course that isnt ever true; when you start the CLR up, the base class libraries are loaded right away, and they allocate a lot of objects on the heap before the point at which your program runs. Well ignore those for the moment (for the sake of clarity), and well also ignore the collection object that we assigned to that local variable in our first step. So lets look at the While loop, which we know is allocating 3 instances every time it iterates weve coloured the instance that is referenced by the collection black to mark it as a live object, and then the two other instances, which are the dead objects, are coloured red.
Figure 2 The initial allocations by the simple mutator (not to scale) Well obviously have the same pattern in the second iteration; allocate one live object, followed by two dead objects - and then the third iteration will be the same again. Looking at Figure 3, you can clearly see that, as we go into the fourth iteration, well find we have no more space left in Gen0, so well need to perform a collection.
No Space? Copy
When dealing with Generations 0, 1 and 2 (i.e. the Small Object Heap) the .NET CLR uses a copying strategy; in this instance, it tries to promote the live objects out of Gen0 and into Gen1. The idea is to find all of the live objects in Gen0 and copy them into some free space within Gen1 (which were also assuming is empty, for clarity). Bear in mind that this is just the first step in the collection process, which applies in the same way when the GC is working from Gen1 to Gen2, and is illustrated using the arrows below:
Figure 4 Copying live objects from Gen0 to Gen1 (or Genn to Genn+1) At this point, the GC needs to go through other objects on the heap which reference our instances (such as our collection object), and fix up the pointers from those objects to now point to the new place where the referenced objects have been copied to. This is a slightly tricky step, because at the point where the GC is copying the objects, it needs to also make sure that no threads are actually manipulating those contents. If they were, then it might miss updates that were made to the old versions of the objects after it had copied them, or it might actually modify pointers between objects in such a way that it forgets to copy some object forward. In order to manage this, the .NET runtime brings the threads to whats known as safe points. Essentially, it stops the threads and gives itself an opportunity to redirect these pointers from old objects in Gen0 to the newly created copies in Gen1.
Figure 5 Updating pointers to reference the newly copied object, one Generation up. Of course, the cool thing is that once the GC has done that, it can now recycle the whole of Gen0, and can do so without individually scanning the objects that it used to hold. After all, it knows that the live objects have been safely promoted and are correctly referenced, and everything else is dead, and thus
irrelevant. So, assuming most objects die young, weve only had to process a very small number of objects in order to recycle the whole of Gen0.
Observations
The basic trick behind the .NET Generational GC is that objects are allowed to move (or rather, are copied). This is a great way to get them out of the way so that we can reuse their memory without having to process every object individually. It also means that the amount of time needed to perform a collection is proportional to the number of live objects which the GC has to move, rather than the number of dead objects in memory, which its going to ignore anyway. However, as a result of this system, we do have an overhead of needing to get all threads to a safe point. where we can fix up the pointers to reference the location where each object is copied to. This obviously has repercussions on the design of the run-time. For example, you need to have access to data that you get from the JIT and from the program itself, telling you the offset between the various objects at which you might find pointers, so that you can a) scan them to find the live objects, and b) fix those up at some later time. A term you occasionally hear associated with this promotion policy and its effects is bump allocation, which just means that we have the handy ability to allocate things very quickly. If Gen0 starts out completely blank then, when we want to allocate our first object, what we have to do is increment the pointer which is initially pointing to the beginning of the generation by a number of bytes corresponding to the size of that first object. That way, we then know that we can immediately place the next object at that newly offset location, and move the pointer along by the size of that object, etc. This gives us that clean, stacked layout which we saw in the earlier figures, where the objects all occur one after the other. Now it turns out that its not quite as easy to do all that as you might think, because there can be multiple threads, and if you want to avoid locking during the allocation of the object and incrimination of the pointer, you need to do some trickery to ensure that you dont need to do any thread locking on the fast path (the path you normally take). This is mentioned in more detail in the downloadable webinar that accompanies this discussion, but I wont go into it here.
Objects which are > 85k in size are another matter altogether, and we wont worry about them for now.
I said earlier about how most objects die young, this structure means that the GC can focus its attention on doing GCs of just Gen0 (i.e. just a sub-set of the available memory), which is where we expect to get the greatest return in terms of recycling dead memory.
Periodic Measurements
First of all, its important to remember that these counters are updated periodically, and in particular the .NET memory ones are only updated when a collection happens. That means that if no collection is happening, then the counter is stuck at its current reading. This means that things like the average values you see in Perfmon are not really telling you exactly whats happening inside your application, although theyre admittedly better than nothing. To demonstrate some of this, Ive written a simple C# program that has the same basic structure that we saw before: we make a collection object, assign it to a local variable, and we allocate instances of a small class. This program class will take about 12 bytes on x86. However, we constrain the allocation rate, and only allocate one of these objects once every millisecond. Naturally, with this
accumulation, and given the capacity of Gen0 being 1 or 2 MB, its going to take quite a few seconds before we fill Gen0 up and provoke a collection:
class Program { static void Main(string[] args) { var accumulator = new List<Program>(); while (true) { DateTime start = DateTime.Now; while ((DateTime.Now - start).TotalSeconds < 15) { accumulator.Add(new Program()); Thread.Sleep(1); } Console.WriteLine(accumulator.Count); } } }
Figure 7 Apparently spiking allocations. If you look at such a program running under Perfmon, instead of seeing a constant allocated bytes per sec counter (which is what we know our program is actually doing), due to the periodic nature of the measurement driving the counter, it looks as if the allocation rate is just spiking whenever collections happen.
Figure 8 Visualizing the varying generation sizes. Its also important to remember that the runtime itself is measuring whats happening. Every time a collection happens, it works out what percentage of the objects survived in order to adapt, choosing optimal sizes for the various Generations, and trying to maximize throughput. So, if you graph some things, like the various heap sizes, youll find you get misleading figures. For example, you can see in Figure 8 that the system decided to enlarge Gen2 by a massive amount, and then chose later to shrink it down again. In short, even though Perfmon is giving us this pseudorealtime feel, what it shows us is not necessarily exactly how the application itself is behaving.
Getting Lower
In order to really see how your application is behaving, you need to dive into it and look at things at the object or type level, and there are several ways to do this.
The first one, which is illustrated in Figure 9 and Figure 10, is to use WinDbg, which is part of the debugging tools for windows, and which you can attach to a running executable. The .NET framework itself comes with a debugger extension, called SOS, which you can load into WinDbg and which then allows you to scan the heaps and find details about the objects they contain. Essentially, loading that DLL makes a whole set of extra commands (which know about .NET memory layout) available to the debugger. In Figure 10, for example, were dumping all objects of the type program, and it will tell us (for example) that there were 3953 instances of that type on the heap at the point when I took this snapshot. It will also show us that each instance is taking up 12 bytes of memory. Now, if we consider a particular instance, we can use commands like GCRoots to try and relate that object back to the root thats actually keeping it in memory, and that path will show us how its being kept alive, which can be pretty useful information.
Figure 11 ANTS Memory Profiler displaying performance counters. I just want to quickly mention in passing that there are other tools that allow you to do this Ill inevitably nod to our own ANTS Memory Profiler as an example. All of these tools try to make it easier to deal with the vast amount of information available in memory debugging. In the case of ANTS, it tries to first show us the information from the various performance counters at the top of the screen (See Figure 11 above) in order to guide us to a point in time at which we might want to take a snapshot (which is a dump of all the objects in the heap). The profiler then has tools to allow you to compare your snapshots to try and work out which objects have survived unexpectedly, and which objects have been allocated in vast numbers when you dont expect it. It also allows us to do the rootfinding trick we saw a moment ago, but in a much more graphical way (see Figure 12)
Figure 12 Using ANTS Memory Profiler to find an objects roots So, to wrap up the discussion of performance counters, you can use WinDbg, which gives you lots of information but is hard to navigate, or you can try and use more graphical tools, which offer you filtering and a means to graphically explore the contents of the heap at the point when you took the snapshot. Either way, always bear in mind that the performance counters that these are based on are not necessarily representative of whats going on within your application in real time.
Old Definition
When you used malloc and free to manage memory yourself, a leak was what happened any time that you forgot to do the free part. Basically, youd allocate some memory, youd do some work with it, and then youd forget to release it back to the runtime system. Or maybe you were dealing with a very large data structure, and you couldnt work out what the actual root node into that structure was, which made it very hard for you to start freeing things. Or maybe you called into a library routine which gave you some objects back, and it wasnt quite clear if it was you or the library that would later free those objects.
In addition, prematurely releasing objects was often fatal. Say you allocated an objected, freed it, and then continued to try and use it. If the memory space you were trying to access had since been allocated to a different object, youd find yourself with two objects of different types competing over the same memory, and that would often cause things to go catastrophically wrong.
New Definition
The good news is that those days are gone2, and that the .NET runtime, which takes care of freeing objects for you, is also ultra-cautious. It works out whether it thinks a particular object is going to be needed while your program runs, and it will only release that object if it can completely guarantee that it is not going to be needed again. The difficulty with this, of course, is that its difficult to have an effective cost model in your head, describing when objects that you allocate are actually going to be freed again, so understanding your own code can pose its own challenges! Moreover, while this managed memory is a boon, it also has opportunities and loopholes which allow objects to live longer than they should.
Libraries
Youll find some libraries will have caches within themselves to improve their performance, but they may not have a very good lifetime policy on those caches. So you might find that youre unintentionally keeping the last 50 results, or something like that. Even if you dont call into that library for a long time, the cache is still going to stay live, and all of those objects are still going to be around.
The Compiler
My favorite of all of these problems has to do with the way the compiler translates more modern constructs in C# to run on the CLR 2 infrastructure that the .NET framework provides. Closures and Lambda expressions are a very good example. Lambda expressions are not represented as objects in themselves at the level of IL in the CLR, but are represented as compiler-generated classes which are used to maintain references to what were the local variables.
class Program { private static Func<int> s_LongLived; static void Main(string[] args) { var x = 20; var y = new int[20200]; Func<int> getSum = () => x + y.Length; Func<int> getFirst = () => x; s_LongLived = getFirst; } }
Listing 3 Illustrating compiler translation with a lambda expression. In this simple example, we have two local variables which are referenced by a lambda expression, which itself lives for a very long time by being put in a static field. Now, in order to make the lifetime of these local variables match the lifetime of the lambda function, the C# compiler actually generates all this by wrapping the local variables into a class, of which it makes an instance, and the compiler then represents the lambda functions as delegates on that class. So in the case of this example, even though the local variable Y doesnt need to live for a long time (because the lambda expression only refers to the variable X), well find that, due to the way the C# compiler behaves, this large array will live for a very long time.
Figure 13 Using .NET Reflector to see how the C# compiler unintentionally keeps objects alive unnecessarily. If we look in .NET Reflector to see how that code is generated, we see the compiler has generated this extra display class (see Figure 13, above), and the local variables are actually represented as fields within that display class. However, there is no effort to clear out those fields, even when the system knows that their values cant be accessed in the future.
At the point that garbage collection happens, rather than moving these live objects into a new generation, the GC just makes a note of where they are, and then scans through the dead objects, noting their address ranges as free blocks, which itll try to use later for allocation requests.
Figure 15 The postcollection heap, with the free blocks noted by the GC. Once again, crucially, there is no copying everything is left in place.
Some Observations
That has some advantages for starters, there is no movement, so we dont have to do any fix-ups, and we dont need to bring other threads to a safe point in order to adjust their pointers. Thats also given us a potentially parallelism advantage. The trouble with this model is that its introduced potential fragmentation, and well see in just a moment what fragmentation really means. The GC also now has to make a decision regarding at what point it actually does collections in this Large Object Heap, and it was decided to make this area synonymous with Gen2, at least for GC purposes. This means that whenever a collection of Gen2 occurs, so too does a collection of this Large Object Heap. The repercussion of that decision is that temporary large objects dont really fit into this model. For small objects, it was fine to generate things temporarily, as theyd be very quickly and cheaply recycled, but for large objects thats clearly not the case. So if we generate temporary large objects, it can be a very long time before a Gen2 collection is carried out, and those temporary objects will be holding memory throughout that period. We also saw that, in order to find the free blocks of memory, the GC has to walk all of the objects on this heap. This behavior is much more expensive from a paging perspective, as were actually touching a lot more of the live memory.
Figure 16 An allocation pattern likely to generate a certain degree of fragmentation. After collection, those dead objects are all marked as free space, ready to be recycled:
The problem occurs when you try to allocate an object thats actually slightly bigger than any of these free blocks...
Figure 18 A hypothetical object, larger than any of the available free slots. Obviously, the GC finds that this new object wont fit into any of these free areas. So, even though there is enough free memory available to satisfy the request for an object of that size, the GC will find it doesnt actually have anywhere to put the new object, and will be forced to resize the LOH to make allocation possible:
Conclusion
Weve looked at 5 different issues you might have with your .NET memory management, and tried to tell you a little bit of the story and history behind each of them. I think the conclusion is really that theres a lot going on inside the heap of your process, and ideally you need to be able to visualize whats going on to be able to understand why things are being kept alive longer than you think.