web/performance

   1 * Writing better performing .NET and Mono applications
   2
   3 <center>
   4 Miguel de Icaza (miguel@novell.com)<br>
   5 Ben Maurer (bmaurer@users.sourceforge.net)
   6 </center>
   7
   8         The following document contains a few hints on how to improve
   9         the performance of your Mono/.NET applications.
  10
  11         These are just guidelines, and you should still profile your
  12         code to find the actual performance problems in your
  13         application. It is never a smart idea to make a change with the
  14         hopes of improving the performance of your code without first
  15         measuring. In general, these guidelines should serve as ideas
  16         to help you figure out `how can I make this method run faster'.
  17
  18         It is up to you to figure out, `Which method is running slowly.'
  19
  20 ** Using the Mono profiler
  21
  22         So, how does one measure what method are running slowly? A profiler
  23         helps with this task. Mono includes a profiler that is built
  24         into the runtime system. You can invoke this profiler on your program
  25         by running with the --profile flag.
  26
  27 <pre class="shell">
  28         mono --profile program.exe
  29 </pre>
  30
  31         The above will instruct Mono to instrument your application
  32         for profiling.  The default Mono profiler will record the time
  33         spent on a routine, the number of times the routine called,
  34         the memory consumed by each method broken down by invoker, and
  35         the total amount of memory consumed.
  36
  37         It does this by asking the JIT to insert a call to the profiler
  38         every time a method is entered or left. The profiler times the
  39         amount of time elapsed between the beginning and the end of the
  40         call. The profiler is also notified of allocations.
  41
  42         When the program has finished executing, the profiler prints the
  43         data in human readable format. It looks like:
  44
  45 <pre class="shell">
  46 Total time spent compiling 227 methods (sec): 0.07154
  47 Slowest method to compile (sec): 0.01893: System.Console::.cctor()
  48 Time(ms) Count   P/call(ms) Method name
  49 ########################
  50   91.681       1   91.681   .DebugOne::Main()
  51   Callers (with count) that contribute at least for 1%:
  52            1  100 % .DebugOne::Main(object,intptr,intptr)
  53 ...
  54 Total number of calls: 3741
  55 ...
  56 Allocation profiler
  57 Total mem Method
  58 ########################
  59      406 KB .DebugOne::Main()
  60          406 KB     1000 System.Int32[]
  61   Callers (with count) that contribute at least for 1%:
  62            1  100 % .DebugOne::Main(object,intptr,intptr)
  63 Total memory allocated: 448 KB
  64 </pre>
  65
  66         At the top, it shows each method that is called. The data is sorted
  67         by the total time that the program spent within the method. Then
  68         it shows how many times the method was called, and the average time
  69         per call.
  70
  71         Below this, it shows the top callers of the method. This is very useful
  72         data. If you find, for example, that the method Data::Computate () takes
  73         a very long time to run, you can look to see if any of the calls can be
  74         avoided.
  75
  76         Two warnings must be given about the method data. First,
  77         the profiler has an overhead associated with it. As such,
  78         a high number of calls to a method may show up as consuming
  79         lots of time, when in reality they do not consume much time
  80         at all. If you see a method that has a very high number of
  81         calls, you may be able to ignore it. However, do consider
  82         removing calls if possible, as that will sometimes help
  83         performance. This problem is often seen with the use
  84         of built in collection types.
  85
  86         Secondly, due to the nature of the profiler, recursive calls
  87         have extremely large times (because the profiler double counts
  88         when the method calls itself). One easy way to see this problem
  89         is that if a method is shown as taking more time than the Main
  90         method, it is very likely recursive, and causing this problem.
  91
  92         Below the method data, allocation data is shown. This shows
  93         how much memory each method allocates. The number beside
  94         the method is the total amount of memory. Below that, it
  95         is broken down into types. Then, the caller data is given. This
  96         data is again useful when you want to figure out how to eliminate calls.
  97
  98         You might want to keep a close eye on the memory consumption
  99         and on the method invocation counts.   A lot of the
 100         performance gains in MCS for example came from reducing its
 101         memory usage, as opposed to changes in the execution path.
 102
 103 ** Profiling without JIT instrumentation
 104
 105         You might also be interested in using mono --aot to generate
 106         precompiled code, and then use a system like `oprofile' to
 107         profile your programs.
 108
 109 ** Memory Management in the .NET/Mono world.
 110
 111         Since Mono and .NET offer automatic garbage collection, the
 112         programmer is freed from having to track and dispose the
 113         objects it consumes (except for IDispose-like classes).   This
 114         is a great productivity gain, but if you create thousands of
 115         objects, that will make the garbage collector do more work,
 116         and it might slow down your application.
 117
 118         Remember, each time you allocate an object, the GC is forced
 119         to find space for the object. Each object has an 8 byte overhead
 120         (4 to tell what type it is, then 4 for a sync block). If
 121         the GC finds that it is running out of room, it will scan every
 122         object for pointers, looking for unreferenced objects. If you allocate
 123         extra objects, the GC then must take the effort to free the objects.
 124
 125         Mono uses the Boehm GC, which is a conservative collector,
 126         and this might lead to some memory fragmentation and unlike
 127         generational GC systems, it has to scan the entire allocated
 128         memory pool.
 129
 130 *** Boxing
 131         The .NET framework provides a rich hierarchy of object types.
 132         Each object not only has value information, but also type
 133         information associated with it. This type information makes
 134         many types of programs easier to write. It also has a cost
 135         associated with it. The type information takes up space.
 136
 137         In order to reduce the cost of type information, almost every
 138         Object Oriented language has the concept of `primitives'.
 139         They usually map to types such as integers and booleans. These
 140         types do not have any type information associated with them.
 141
 142         However, the language also must be able to treat primitives
 143         as first class datums -- in the class with objects. Languages
 144         handle this issue in different ways. Some choose to make a
 145         special class for each primitive, and force the user to do an
 146         operation such as:
 147 <pre class="shell">
 148 // This is Java
 149 list.add (new Integer (1));
 150 System.out.println (list.get (1).intValue ());
 151 </pre>
 152
 153         The C# design team was not satisfied with this type
 154         of construct. They added a notion of `boxing' to the language.
 155
 156         Boxing preforms the same thing as Java's <code>new Integer (1)</code>.
 157         The user is not forced to write the extra code. However,
 158         behind the scenes the <em>same thing</em> is being done
 159         by the runtime. Each time a primitive is cast to an object,
 160         a new object is allocated.
 161
 162         You must be careful when casting a primitive to an object.
 163         Note that because it is an implicit conversion, you will
 164         not see it in your code. For example, boxing is happening here:
 165
 166 <pre class="shell">
 167 ArrayList foo = new ArrayList ();
 168 foo.Add (1);
 169 </pre>
 170
 171         In high performance code, this operation can be very costly.
 172
 173 *** Using structs instead of classes for small objects
 174
 175         For small objects, you might want to consider using value
 176         types (structs) instead of object (classes).
 177
 178         However, you must be careful that you do not use the struct
 179         as an object, in that case it will actually be more costly.
 180
 181         As a rule of thumb, only use structs if you have a small
 182         number of fields (totaling less than 32 bytes), and
 183         need to pass the item `by value'. You should not box the object.
 184
 185 *** Assisting the Garbage Collector
 186
 187         Although the Garbage Collector will do the right thing in
 188         terms of releasing and finalizing objects on time, you can
 189         assist the garbage collector by clearing the fields that
 190         points to objects.  This means that some objects might be
 191         eligible for collection earlier than they would, this can help
 192         reduce the memory consumption and reduce the work that the GC
 193         has to do.
 194
 195 ** Common problems with <tt>foreach</tt>
 196
 197         The <tt>foreach</tt> C# statement handles various kinds of
 198         different constructs (about seven different code patterns are
 199         generated).   Typically foreach generates more efficient code
 200         than loops constructed manually, and also ensures that objects
 201         which implement IDispose are properly released.
 202
 203         But foreach sometimes might generate code that under stress
 204         performs badly.  Foreach performs badly when its used in tight
 205         loops, and its use leads to the creation of many enumerators.
 206         Although technically obtaining an enumerator for some objects
 207         like ArrayList is more efficient than using the ArrayList
 208         indexer, the pressure introduced due to the extra memory
 209         requirements and the demands on the garbage collector make it
 210         more inefficient.
 211
 212         There is no straight-forward rule on when to use foreach, and
 213         when to use a manual loop.  The best thing to do is to always
 214         use foreach, and only when profile shows a problem, replace
 215         foreach with for loops.