1 * Writing better performing .NET and Mono applications
4 Miguel de Icaza (miguel@novell.com)<br>
5 Ben Maurer (bmaurer@users.sourceforge.net)
8 The following document contains a few hints on how to improve
9 the performance of your Mono/.NET applications.
11 These are just guidelines, and you should still profile your
12 code to find the actual performance problems in your
13 application. It is never a smart idea to make a change with the
14 hopes of improving the performance of your code without first
15 measuring. In general, these guidelines should serve as ideas
16 to help you figure out `how can I make this method run faster'.
18 It is up to you to figure out, `Which method is running slowly.'
20 ** Using the Mono profiler
22 So, how does one measure what method are running slowly? A profiler
23 helps with this task. Mono includes a profiler that is built
24 into the runtime system. You can invoke this profiler on your program
25 by running with the --profile flag.
28 mono --profile program.exe
31 The above will instruct Mono to instrument your application
32 for profiling. The default Mono profiler will record the time
33 spent on a routine, the number of times the routine called,
34 the memory consumed by each method broken down by invoker, and
35 the total amount of memory consumed.
37 It does this by asking the JIT to insert a call to the profiler
38 every time a method is entered or left. The profiler times the
39 amount of time elapsed between the beginning and the end of the
40 call. The profiler is also notified of allocations.
42 When the program has finished executing, the profiler prints the
43 data in human readable format. It looks like:
46 Total time spent compiling 227 methods (sec): 0.07154
47 Slowest method to compile (sec): 0.01893: System.Console::.cctor()
48 Time(ms) Count P/call(ms) Method name
49 ########################
50 91.681 1 91.681 .DebugOne::Main()
51 Callers (with count) that contribute at least for 1%:
52 1 100 % .DebugOne::Main(object,intptr,intptr)
54 Total number of calls: 3741
58 ########################
59 406 KB .DebugOne::Main()
60 406 KB 1000 System.Int32[]
61 Callers (with count) that contribute at least for 1%:
62 1 100 % .DebugOne::Main(object,intptr,intptr)
63 Total memory allocated: 448 KB
66 At the top, it shows each method that is called. The data is sorted
67 by the total time that the program spent within the method. Then
68 it shows how many times the method was called, and the average time
71 Below this, it shows the top callers of the method. This is very useful
72 data. If you find, for example, that the method Data::Computate () takes
73 a very long time to run, you can look to see if any of the calls can be
76 Two warnings must be given about the method data. First,
77 the profiler has an overhead associated with it. As such,
78 a high number of calls to a method may show up as consuming
79 lots of time, when in reality they do not consume much time
80 at all. If you see a method that has a very high number of
81 calls, you may be able to ignore it. However, do consider
82 removing calls if possible, as that will sometimes help
83 performance. This problem is often seen with the use
84 of built in collection types.
86 Secondly, due to the nature of the profiler, recursive calls
87 have extremely large times (because the profiler double counts
88 when the method calls itself). One easy way to see this problem
89 is that if a method is shown as taking more time than the Main
90 method, it is very likely recursive, and causing this problem.
92 Below the method data, allocation data is shown. This shows
93 how much memory each method allocates. The number beside
94 the method is the total amount of memory. Below that, it
95 is broken down into types. Then, the caller data is given. This
96 data is again useful when you want to figure out how to eliminate calls.
98 You might want to keep a close eye on the memory consumption
99 and on the method invocation counts. A lot of the
100 performance gains in MCS for example came from reducing its
101 memory usage, as opposed to changes in the execution path.
103 ** Profiling without JIT instrumentation
105 You might also be interested in using mono --aot to generate
106 precompiled code, and then use a system like `oprofile' to
107 profile your programs.
109 ** Memory Management in the .NET/Mono world.
111 Since Mono and .NET offer automatic garbage collection, the
112 programmer is freed from having to track and dispose the
113 objects it consumes (except for IDispose-like classes). This
114 is a great productivity gain, but if you create thousands of
115 objects, that will make the garbage collector do more work,
116 and it might slow down your application.
118 Remember, each time you allocate an object, the GC is forced
119 to find space for the object. Each object has an 8 byte overhead
120 (4 to tell what type it is, then 4 for a sync block). If
121 the GC finds that it is running out of room, it will scan every
122 object for pointers, looking for unreferenced objects. If you allocate
123 extra objects, the GC then must take the effort to free the objects.
125 Mono uses the Boehm GC, which is a conservative collector,
126 and this might lead to some memory fragmentation and unlike
127 generational GC systems, it has to scan the entire allocated
131 The .NET framework provides a rich hierarchy of object types.
132 Each object not only has value information, but also type
133 information associated with it. This type information makes
134 many types of programs easier to write. It also has a cost
135 associated with it. The type information takes up space.
137 In order to reduce the cost of type information, almost every
138 Object Oriented language has the concept of `primitives'.
139 They usually map to types such as integers and booleans. These
140 types do not have any type information associated with them.
142 However, the language also must be able to treat primitives
143 as first class datums -- in the class with objects. Languages
144 handle this issue in different ways. Some choose to make a
145 special class for each primitive, and force the user to do an
149 list.add (new Integer (1));
150 System.out.println (list.get (1).intValue ());
153 The C# design team was not satisfied with this type
154 of construct. They added a notion of `boxing' to the language.
156 Boxing preforms the same thing as Java's <code>new Integer (1)</code>.
157 The user is not forced to write the extra code. However,
158 behind the scenes the <em>same thing</em> is being done
159 by the runtime. Each time a primitive is cast to an object,
160 a new object is allocated.
162 You must be careful when casting a primitive to an object.
163 Note that because it is an implicit conversion, you will
164 not see it in your code. For example, boxing is happening here:
167 ArrayList foo = new ArrayList ();
171 In high performance code, this operation can be very costly.
173 *** Using structs instead of classes for small objects
175 For small objects, you might want to consider using value
176 types (structs) instead of object (classes).
178 However, you must be careful that you do not use the struct
179 as an object, in that case it will actually be more costly.
181 As a rule of thumb, only use structs if you have a small
182 number of fields (totaling less than 32 bytes), and
183 need to pass the item `by value'. You should not box the object.
185 *** Assisting the Garbage Collector
187 Although the Garbage Collector will do the right thing in
188 terms of releasing and finalizing objects on time, you can
189 assist the garbage collector by clearing the fields that
190 points to objects. This means that some objects might be
191 eligible for collection earlier than they would, this can help
192 reduce the memory consumption and reduce the work that the GC
195 ** Common problems with <tt>foreach</tt>
197 The <tt>foreach</tt> C# statement handles various kinds of
198 different constructs (about seven different code patterns are
199 generated). Typically foreach generates more efficient code
200 than loops constructed manually, and also ensures that objects
201 which implement IDispose are properly released.
203 But foreach sometimes might generate code that under stress
204 performs badly. Foreach performs badly when its used in tight
205 loops, and its use leads to the creation of many enumerators.
206 Although technically obtaining an enumerator for some objects
207 like ArrayList is more efficient than using the ArrayList
208 indexer, the pressure introduced due to the extra memory
209 requirements and the demands on the garbage collector make it
212 There is no straight-forward rule on when to use foreach, and
213 when to use a manual loop. The best thing to do is to always
214 use foreach, and only when profile shows a problem, replace
215 foreach with for loops.