RtlHeapReAllocate() should not allocate memory.
[wine/hacks.git] / documentation / implementation.sgml
blobc35587e1c9e7a2018f26ad250011850c7c0d8f5b
1 <chapter id="implementation">
2 <title>Low-level Implementation</title>
3 <para>Details of Wine's Low-level Implementation...</para>
5 <sect1 id="undoc-func">
6 <title>Undocumented APIs</title>
8 <para>
9 Some background: On the i386 class of machines, stack entries are
10 usually dword (4 bytes) in size, little-endian. The stack grows
11 downward in memory. The stack pointer, maintained in the
12 <literal>esp</literal> register, points to the last valid entry;
13 thus, the operation of pushing a value onto the stack involves
14 decrementing <literal>esp</literal> and then moving the value into
15 the memory pointed to by <literal>esp</literal>
16 (i.e., <literal>push p</literal> in assembly resembles
17 <literal>*(--esp) = p;</literal> in C). Removing (popping)
18 values off the stack is the reverse (i.e., <literal>pop p</literal>
19 corresponds to <literal>p = *(esp++);</literal> in C).
20 </para>
22 <para>
23 In the <literal>stdcall</literal> calling convention, arguments are
24 pushed onto the stack right-to-left. For example, the C call
25 <function>myfunction(40, 20, 70, 30);</function> is expressed in
26 Intel assembly as:
27 <screen>
28 push 30
29 push 70
30 push 20
31 push 40
32 call myfunction
33 </screen>
34 The called function is responsible for removing the arguments
35 off the stack. Thus, before the call to myfunction, the
36 stack would look like:
37 <screen>
38 [local variable or temporary]
39 [local variable or temporary]
43 esp -> 40
44 </screen>
45 After the call returns, it should look like:
46 <screen>
47 [local variable or temporary]
48 esp -> [local variable or temporary]
49 </screen>
50 </para>
52 <para>
53 To restore the stack to this state, the called function must know how
54 many arguments to remove (which is the number of arguments it takes).
55 This is a problem if the function is undocumented.
56 </para>
58 <para>
59 One way to attempt to document the number of arguments each function
60 takes is to create a wrapper around that function that detects the
61 stack offset. Essentially, each wrapper assumes that the function will
62 take a large number of arguments. The wrapper copies each of these
63 arguments into its stack, calls the actual function, and then calculates
64 the number of arguments by checking esp before and after the call.
65 </para>
67 <para>
68 The main problem with this scheme is that the function must actually
69 be called from another program. Many of these functions are seldom
70 used. An attempt was made to aggressively query each function in a
71 given library (<filename>ntdll.dll</filename>) by passing 64 arguments,
72 all 0, to each function. Unfortunately, Windows NT quickly goes to a
73 blue screen of death, even if the program is run from a
74 non-administrator account.
75 </para>
77 <para>
78 Another method that has been much more successful is to attempt to
79 figure out how many arguments each function is removing from the
80 stack. This instruction, <literal>ret hhll</literal> (where
81 <symbol>hhll</symbol> is the number of bytes to remove, i.e. the
82 number of arguments times 4), contains the bytes
83 <literal>0xc2 ll hh</literal> in memory. It is a reasonable
84 assumption that few, if any, functions take more than 16 arguments;
85 therefore, simply searching for
86 <literal>hh == 0 && ll &lt; 0x40</literal> starting from the
87 address of a function yields the correct number of arguments most
88 of the time.
89 </para>
91 <para>
92 Of course, this is not without errors. <literal>ret 00ll</literal>
93 is not the only instruction that can have the byte sequence
94 <literal>0xc2 ll 0x0</literal>; for example,
95 <literal>push 0x000040c2</literal> has the byte sequence
96 <literal>0x68 0xc2 0x40 0x0 0x0</literal>, which matches
97 the above. Properly, the utility should look for this sequence
98 only on an instruction boundary; unfortunately, finding
99 instruction boundaries on an i386 requires implementing a full
100 disassembler -- quite a daunting task. Besides, the probability
101 of having such a byte sequence that is not the actual return
102 instruction is fairly low.
103 </para>
105 <para>
106 Much more troublesome is the non-linear flow of a function. For
107 example, consider the following two functions:
108 <screen>
109 somefunction1:
110 jmp somefunction1_impl
112 somefunction2:
113 ret 0004
115 somefunction1_impl:
116 ret 0008
117 </screen>
118 In this case, we would incorrectly detect both
119 <function>somefunction1</function> and
120 <function>somefunction2</function> as taking only a single
121 argument, whereas <function>somefunction1</function> really
122 takes two arguments.
123 </para>
125 <para>
126 With these limitations in mind, it is possible to implement more stubs
127 in Wine and, eventually, the functions themselves.
128 </para>
129 </sect1>
131 <sect1 id="accel-impl">
132 <title>Accelerators</title>
134 <para>
135 There are <emphasis>three</emphasis> differently sized
136 accelerator structures exposed to the user:
137 </para>
138 <orderedlist>
139 <listitem>
140 <para>
141 Accelerators in NE resources. This is also the internal
142 layout of the global handle <type>HACCEL</type> (16 and
143 32) in Windows 95 and Wine. Exposed to the user as Win16
144 global handles <type>HACCEL16</type> and
145 <type>HACCEL32</type> by the Win16/Win32 API.
146 These are 5 bytes long, with no padding:
147 <programlisting>
148 BYTE fVirt;
149 WORD key;
150 WORD cmd;
151 </programlisting>
152 </para>
153 </listitem>
154 <listitem>
155 <para>
156 Accelerators in PE resources. They are exposed to the user
157 only by direct accessing PE resources.
158 These have a size of 8 bytes:
159 </para>
160 <programlisting>
161 BYTE fVirt;
162 BYTE pad0;
163 WORD key;
164 WORD cmd;
165 WORD pad1;
166 </programlisting>
167 </listitem>
168 <listitem>
169 <para>
170 Accelerators in the Win32 API. These are exposed to the
171 user by the <function>CopyAcceleratorTable</function>
172 and <function>CreateAcceleratorTable</function> functions
173 in the Win32 API.
174 These have a size of 6 bytes:
175 </para>
176 <programlisting>
177 BYTE fVirt;
178 BYTE pad0;
179 WORD key;
180 WORD cmd;
181 </programlisting>
182 </listitem>
183 </orderedlist>
185 <para>
186 Why two types of accelerators in the Win32 API? We can only
187 guess, but my best bet is that the Win32 resource compiler
188 can/does not handle struct packing. Win32 <type>ACCEL</type>
189 is defined using <function>#pragma(2)</function> for the
190 compiler but without any packing for RC, so it will assume
191 <function>#pragma(4)</function>.
192 </para>
194 </sect1>
196 <sect1 id="hardware-trace">
197 <title>Doing A Hardware Trace</title>
199 <para>
200 The primary reason to do this is to reverse engineer a
201 hardware device for which you don't have documentation, but
202 can get to work under Wine.
203 </para>
204 <para>
205 This lot is aimed at parallel port devices, and in particular
206 parallel port scanners which are now so cheap they are
207 virtually being given away. The problem is that few
208 manufactures will release any programming information which
209 prevents drivers being written for Sane, and the traditional
210 technique of using DOSemu to produce the traces does not work
211 as the scanners invariably only have drivers for Windows.
212 </para>
213 <para>
214 Presuming that you have compiled and installed wine the first
215 thing to do is is to enable direct hardware access to your
216 parallel port. To do this edit <filename>config</filename>
217 (usually in <filename>~/.wine/</filename>) and in the
218 ports section add the following two lines
219 </para>
220 <programlisting>
221 read=0x378,0x379,0x37a,0x37c,0x77a
222 write=0x378,x379,0x37a,0x37c,0x77a
223 </programlisting>
224 <para>
225 This adds the necessary access required for SPP/PS2/EPP/ECP
226 parallel port on LPT1. You will need to adjust these number
227 accordingly if your parallel port is on LPT2 or LPT0.
228 </para>
229 <para>
230 When starting wine use the following command line, where
231 <literal>XXXX</literal> is the program you need to run in
232 order to access your scanner, and <literal>YYYY</literal> is
233 the file your trace will be stored in:
234 </para>
235 <programlisting>
236 wine -debugmsg +io XXXX 2&gt; &gt;(sed 's/^[^:]*:io:[^ ]* //' &gt; YYYY)
237 </programlisting>
238 <para>
239 You will need large amounts of hard disk space (read hundreds
240 of megabytes if you do a full page scan), and for reasonable
241 performance a really fast processor and lots of RAM.
242 </para>
243 <para>
244 You will need to postprocess the output into a more manageable
245 format, using the <command>shrink</command> program. First
246 you need to compile the source (which is located at the end of
247 this section):
248 <programlisting>
249 cc shrink.c -o shrink
250 </programlisting>
251 </para>
252 <para>
253 Use the <command>shrink</command> program to reduce the
254 physical size of the raw log as follows:
255 </para>
256 <programlisting>
257 cat log | shrink &gt; log2
258 </programlisting>
259 <para>
260 The trace has the basic form of
261 </para>
262 <programlisting>
263 XXXX &gt; YY @ ZZZZ:ZZZZ
264 </programlisting>
265 <para>
266 where <literal>XXXX</literal> is the port in hexidecimal being
267 accessed, <literal>YY</literal> is the data written (or read)
268 from the port, and <literal>ZZZZ:ZZZZ</literal> is the address
269 in memory of the instruction that accessed the port. The
270 direction of the arrow indicates whether the data was written
271 or read from the port.
272 </para>
273 <programlisting>
274 &gt; data was written to the port
275 &lt; data was read from the port
276 </programlisting>
277 <para>
278 My basic tip for interpreting these logs is to pay close
279 attention to the addresses of the IO instructions. Their
280 grouping and sometimes proximity should reveal the presence of
281 subroutines in the driver. By studying the different versions
282 you should be able to work them out. For example consider the
283 following section of trace from my UMAX Astra 600P
284 </para>
285 <programlisting>
286 0x378 &gt; 55 @ 0297:01ec
287 0x37a &gt; 05 @ 0297:01f5
288 0x379 &lt; 8f @ 0297:01fa
289 0x37a &gt; 04 @ 0297:0211
290 0x378 &gt; aa @ 0297:01ec
291 0x37a &gt; 05 @ 0297:01f5
292 0x379 &lt; 8f @ 0297:01fa
293 0x37a &gt; 04 @ 0297:0211
294 0x378 &gt; 00 @ 0297:01ec
295 0x37a &gt; 05 @ 0297:01f5
296 0x379 &lt; 8f @ 0297:01fa
297 0x37a &gt; 04 @ 0297:0211
298 0x378 &gt; 00 @ 0297:01ec
299 0x37a &gt; 05 @ 0297:01f5
300 0x379 &lt; 8f @ 0297:01fa
301 0x37a &gt; 04 @ 0297:0211
302 0x378 &gt; 00 @ 0297:01ec
303 0x37a &gt; 05 @ 0297:01f5
304 0x379 &lt; 8f @ 0297:01fa
305 0x37a &gt; 04 @ 0297:0211
306 0x378 &gt; 00 @ 0297:01ec
307 0x37a &gt; 05 @ 0297:01f5
308 0x379 &lt; 8f @ 0297:01fa
309 0x37a &gt; 04 @ 0297:0211
310 </programlisting>
311 <para>
312 As you can see there is a repeating structure starting at
313 address <literal>0297:01ec</literal> that consists of four io
314 accesses on the parallel port. Looking at it the first io
315 access writes a changing byte to the data port the second
316 always writes the byte <literal>0x05</literal> to the control
317 port, then a value which always seems to
318 <literal>0x8f</literal> is read from the status port at which
319 point a byte <literal>0x04</literal> is written to the control
320 port. By studying this and other sections of the trace we can
321 write a C routine that emulates this, shown below with some
322 macros to make reading/writing on the parallel port easier to
323 read.
324 </para>
325 <programlisting>
326 #define r_dtr(x) inb(x)
327 #define r_str(x) inb(x+1)
328 #define r_ctr(x) inb(x+2)
329 #define w_dtr(x,y) outb(y, x)
330 #define w_str(x,y) outb(y, x+1)
331 #define w_ctr(x,y) outb(y, x+2)
333 /* Seems to be sending a command byte to the scanner */
334 int udpp_put(int udpp_base, unsigned char command)
336 int loop, value;
338 w_dtr(udpp_base, command);
339 w_ctr(udpp_base, 0x05);
341 for (loop=0; loop &lt; 10; loop++)
342 if ((value = r_str(udpp_base)) & 0x80)
344 w_ctr(udpp_base, 0x04);
345 return value & 0xf8;
348 return (value & 0xf8) | 0x01;
350 </programlisting>
351 <para>
352 For the UMAX Astra 600P only seven such routines exist (well
353 14 really, seven for SPP and seven for EPP). Whether you
354 choose to disassemble the driver at this point to verify the
355 routines is your own choice. If you do, the address from the
356 trace should help in locating them in the disassembly.
357 </para>
358 <para>
359 You will probably then find it useful to write a script/perl/C
360 program to analyse the logfile and decode them futher as this
361 can reveal higher level grouping of the low level routines.
362 For example from the logs from my UMAX Astra 600P when decoded
363 further reveal (this is a small snippet)
364 </para>
365 <programlisting>
366 start:
367 put: 55 8f
368 put: aa 8f
369 put: 00 8f
370 put: 00 8f
371 put: 00 8f
372 put: c2 8f
373 wait: ff
374 get: af,87
375 wait: ff
376 get: af,87
377 end: cc
378 start:
379 put: 55 8f
380 put: aa 8f
381 put: 00 8f
382 put: 03 8f
383 put: 05 8f
384 put: 84 8f
385 wait: ff
386 </programlisting>
387 <para>
388 From this it is easy to see that <varname>put</varname>
389 routine is often grouped together in five successive calls
390 sending information to the scanner. Once these are understood
391 it should be possible to process the logs further to show the
392 higher level routines in an easy to see format. Once the
393 highest level format that you can derive from this process is
394 understood, you then need to produce a series of scans varying
395 only one parameter between them, so you can discover how to
396 set the various parameters for the scanner.
397 </para>
399 <para>
400 The following is the <filename>shrink.c</filename> program:
401 <programlisting>
402 /* Copyright David Campbell &lt;campbell@torque.net&gt; */
403 #include &lt;stdio.h&gt;
404 #include &lt;string.h&gt;
406 void
407 main (void)
409 char buff[256], lastline[256];
410 int count;
412 count = 0;
413 lastline[0] = 0;
415 while (!feof (stdin))
417 fgets (buff, sizeof (buff), stdin);
418 if (strcmp (buff, lastline) == 0)
420 count++;
422 else
424 if (count &gt; 1)
425 fprintf (stdout, "# Last line repeated %i times #\n", count);
426 fprintf (stdout, "%s", buff);
427 strcpy (lastline, buff);
428 count = 1;
432 </programlisting>
433 </para>
434 </sect1>
436 </chapter>
438 <!-- Keep this comment at the end of the file
439 Local variables:
440 mode: sgml
441 sgml-parent-document:("wine-devel.sgml" "set" "book" "part" "chapter" "")
442 End: