1 # Rust Coding Standards
3 You MUST follow the standards laid out in `doc/HACKING/CodingStandards.md`,
6 ## Module/Crate Declarations
8 Each Tor C module which is being rewritten MUST be in its own crate.
9 See the structure of `src/rust` for examples.
11 In your crate, you MUST use `lib.rs` ONLY for pulling in external
12 crates (e.g. `extern crate libc;`) and exporting public objects from
13 other Rust modules (e.g. `pub use mymodule::foo;`). For example, if
14 you create a crate in `src/rust/yourcrate`, your Rust code should
15 live in `src/rust/yourcrate/yourcode.rs` and the public interface
16 to it should be exported in `src/rust/yourcrate/lib.rs`.
18 If your code is to be called from Tor C code, you MUST define a safe
19 `ffi.rs`. See the "Safety" section further down for more details.
21 For example, in a hypothetical `tor_addition` Rust module:
23 In `src/rust/tor_addition/addition.rs`:
25 pub fn get_sum(a: i32, b: i32) -> i32 {
29 In `src/rust/tor_addition/lib.rs`:
33 In `src/rust/tor_addition/ffi.rs`:
36 pub extern "C" fn tor_get_sum(a: c_int, b: c_int) -> c_int {
40 If your Rust code must call out to parts of Tor's C code, you must
41 declare the functions you are calling in the `external` crate, located
42 at `src/rust/external`.
44 <!-- XXX get better examples of how to declare these externs, when/how they -->
45 <!-- XXX are unsafe, what they are expected to do —isis -->
47 Modules should strive to be below 500 lines (tests excluded). Single
48 responsibility and limited dependencies should be a guiding standard.
50 If you have any external modules as dependencies (e.g. `extern crate
51 libc;`), you MUST declare them in your crate's `lib.rs` and NOT in any
54 ## Dependencies and versions
56 In general, we use modules from only the Rust standard library
57 whenever possible. We will review including external crates on a
60 If a crate only contains traits meant for compatibility between Rust
61 crates, such as [the digest crate](https://crates.io/crates/digest) or
62 [the failure crate](https://crates.io/crates/failure), it is very likely
63 permissible to add it as a dependency. However, a brief review should
64 be conducted as to the usefulness of implementing external traits
65 (i.e. how widespread is the usage, how many other crates either
66 implement the traits or have trait bounds based upon them), as well as
67 the stability of the traits (i.e. if the trait is going to change, we'll
68 potentially have to re-do all our implementations of it).
70 For large external libraries, especially which implement features which
71 would be labour-intensive to reproduce/maintain ourselves, such as
72 cryptographic or mathematical/statistics libraries, only crates which
73 have stabilised to 1.0.0 should be considered, however, again, we may
74 make exceptions on a case-by-case basis.
76 Currently, Tor requires that you use the latest stable Rust version. At
77 some point in the future, we will freeze on a given stable Rust version,
78 to ensure backward compatibility with stable distributions that ship it.
80 ## Updating/Adding Dependencies
82 To add/remove/update dependencies, first add your dependencies,
83 exactly specifying their versions, into the appropriate *crate-level*
84 `Cargo.toml` in `src/rust/` (i.e. *not* `/src/rust/Cargo.toml`, but
85 instead the one for your crate). Also, investigate whether your
86 dependency has any optional dependencies which are unnecessary but are
87 enabled by default. If so, you'll likely be able to enable/disable
88 them via some feature, e.g.:
92 foo = { version = "1.0.0", default-features = false }
95 Next, run `/scripts/maint/updateRustDependencies.sh`. Then, go into
96 `src/ext/rust` and commit the changes to the `tor-rust-dependencies`
101 You MUST include `#![deny(missing_docs)]` in your crate.
103 For function/method comments, you SHOULD include a one-sentence, "first person"
104 description of function behaviour (see requirements for documentation as
105 described in `src/HACKING/CodingStandards.md`), then an `# Inputs` section
106 for inputs or initialisation values, a `# Returns` section for return
107 values/types, a `# Warning` section containing warnings for unsafe behaviours or
108 panics that could happen. For publicly accessible
109 types/constants/objects/functions/methods, you SHOULD also include an
110 `# Examples` section with runnable doctests.
112 You MUST document your module with _module docstring_ comments,
113 i.e. `//!` at the beginning of each line.
117 You SHOULD consider breaking up large literal numbers with `_` when it makes it
118 more human readable to do so, e.g. `let x: u64 = 100_000_000_000`.
122 All code MUST be unittested and integration tested.
124 Public functions/objects exported from a crate SHOULD include doctests
125 describing how the function/object is expected to be used.
127 Integration tests SHOULD go into a `tests/` directory inside your
128 crate. Unittests SHOULD go into their own module inside the module
129 they are testing, e.g. in `src/rust/tor_addition/addition.rs` you
137 fn addition_with_zero() {
138 let sum: i32 = get_sum(5i32, 0i32);
145 The external `test` crate can be used for most benchmarking. However, using
146 this crate requires nightly Rust. Since we may want to switch to a more
147 stable Rust compiler eventually, we shouldn't do things which will automatically
148 break builds for stable compilers. Therefore, you MUST feature-gate your
149 benchmarks in the following manner.
151 If you wish to benchmark some of your Rust code, you MUST put the
152 following in the `[features]` section of your crate's `Cargo.toml`:
157 Next, in your crate's `lib.rs` you MUST put:
159 #[cfg(all(test, feature = "bench"))]
162 This ensures that the external crate `test`, which contains utilities
163 for basic benchmarks, is only used when running benchmarks via `cargo
164 bench --features bench`.
166 Finally, to write your benchmark code, in
167 `src/rust/tor_addition/addition.rs` you SHOULD put:
169 #[cfg(all(test, features = "bench"))]
175 fn addition_small_integers(b: &mut Bencher) {
176 b.iter(| | get_sum(5i32, 0i32));
182 If you wish to fuzz parts of your code, please see the
183 [cargo fuzz](https://github.com/rust-fuzz/cargo-fuzz) crate, which uses
184 [libfuzzer-sys](https://github.com/rust-fuzz/libfuzzer-sys).
186 ## Whitespace & Formatting
188 You MUST run `rustfmt` (https://github.com/rust-lang-nursery/rustfmt)
189 on your code before your code will be merged. You can install rustfmt
190 by doing `cargo install rustfmt-nightly` and then run it with `cargo
195 You SHOULD read [the nomicon](https://doc.rust-lang.org/nomicon/) before writing
196 Rust FFI code. It is *highly advised* that you read and write normal Rust code
197 before attempting to write FFI or any other unsafe code.
199 Here are some additional bits of advice and rules:
201 0. Any behaviours which Rust considers to be undefined are forbidden
203 From https://doc.rust-lang.org/reference/behavior-considered-undefined.html:
205 > Behavior considered undefined
207 > The following is a list of behavior which is forbidden in all Rust code,
208 > including within unsafe blocks and unsafe functions. Type checking provides the
209 > guarantee that these issues are never caused by safe code.
212 > * Dereferencing a null/dangling raw pointer
213 > * Reads of [undef](https://llvm.org/docs/LangRef.html#undefined-values)
214 > (uninitialized) memory
216 > [pointer aliasing rules](https://llvm.org/docs/LangRef.html#pointer-aliasing-rules)
217 > with raw pointers (a subset of the rules used by C)
218 > * `&mut T` and `&T` follow LLVM’s scoped noalias model, except if the `&T`
219 > contains an `UnsafeCell<U>`. Unsafe code must not violate these aliasing
221 > * Mutating non-mutable data (that is, data reached through a shared
222 > reference or data owned by a `let` binding), unless that data is
223 > contained within an `UnsafeCell<U>`.
224 > * Invoking undefined behavior via compiler intrinsics:
225 > - Indexing outside of the bounds of an object with
226 > `std::ptr::offset` (`offset` intrinsic), with the exception of
227 > one byte past the end which is permitted.
228 > - Using `std::ptr::copy_nonoverlapping_memory` (`memcpy32`/`memcpy64`
229 > intrinsics) on overlapping buffers
230 > * Invalid values in primitive types, even in private fields/locals:
231 > - Dangling/null references or boxes
232 > - A value other than `false` (0) or `true` (1) in a `bool`
233 > - A discriminant in an `enum` not included in the type definition
234 > - A value in a `char` which is a surrogate or above `char::MAX`
235 > - Non-UTF-8 byte sequences in a `str`
236 > * Unwinding into Rust from foreign code or unwinding from Rust into foreign
237 > code. Rust's failure system is not compatible with exception handling in other
238 > languages. Unwinding must be caught and handled at FFI boundaries.
242 If you call `unwrap()`, anywhere, even in a test, you MUST include
243 an inline comment stating how the unwrap will either 1) never fail,
244 or 2) should fail (i.e. in a unittest).
246 You SHOULD NOT use `unwrap()` anywhere in which it is possible to handle the
247 potential error with the eel operator, `?` or another non panicking way.
248 For example, consider a function which parses a string into an integer:
250 fn parse_port_number(config_string: &str) -> u16 {
251 u16::from_str_radix(config_string, 10).unwrap()
254 There are numerous ways this can fail, and the `unwrap()` will cause the
255 whole program to byte the dust! Instead, either you SHOULD use `ok()`
256 (or another equivalent function which will return an `Option` or a `Result`)
257 and change the return type to be compatible:
259 fn parse_port_number(config_string: &str) -> Option<u16> {
260 u16::from_str_radix(config_string, 10).ok()
263 or you SHOULD use `or()` (or another similar method):
265 fn parse_port_number(config_string: &str) -> Option<u16> {
266 u16::from_str_radix(config_string, 10).or(Err("Couldn't parse port into a u16")
269 Using methods like `or()` can be particularly handy when you must do
270 something afterwards with the data, for example, if we wanted to guarantee
271 that the port is high. Combining these methods with the eel operator (`?`)
272 makes this even easier:
274 fn parse_port_number(config_string: &str) -> Result<u16, Err> {
275 let port = u16::from_str_radix(config_string, 10).or(Err("Couldn't parse port into a u16"))?;
280 return Err("Low ports not allowed");
286 If you use `unsafe`, you MUST describe a contract in your
287 documentation which describes how and when the unsafe code may
288 fail, and what expectations are made w.r.t. the interfaces to
289 unsafe code. This is also REQUIRED for major pieces of FFI between
292 When creating an FFI in Rust for C code to call, it is NOT REQUIRED
293 to declare the entire function `unsafe`. For example, rather than doing:
296 pub unsafe extern "C" fn increment_and_combine_numbers(mut numbers: [u8; 4]) -> u32 {
297 for number in &mut numbers {
300 std::mem::transmute::<[u8; 4], u32>(numbers)
303 You SHOULD instead do:
306 pub extern "C" fn increment_and_combine_numbers(mut numbers: [u8; 4]) -> u32 {
307 for index in 0..numbers.len() {
311 std::mem::transmute::<[u8; 4], u32>(numbers)
315 3. Pass only C-compatible primitive types and bytes over the boundary
317 Rust's C-compatible primitive types are integers and floats.
318 These types are declared in the [libc crate](https://doc.rust-lang.org/libc/x86_64-unknown-linux-gnu/libc/index.html#types).
319 Most Rust objects have different [representations](https://doc.rust-lang.org/libc/x86_64-unknown-linux-gnu/libc/index.html#types)
320 in C and Rust, so they can't be passed using FFI.
322 Tor currently uses the following Rust primitive types from libc for FFI:
323 * defined-size integers: `uint32_t`
324 * native-sized integers: `c_int`
325 * native-sized floats: `c_double`
326 * native-sized raw pointers: `* c_void`, `* c_char`, `** c_char`
328 TODO: C smartlist to Stringlist conversion using FFI
330 The only non-primitive type which may cross the FFI boundary is
331 bytes, e.g. `&[u8]`. This SHOULD be done on the Rust side by
332 passing a pointer (`*mut libc::c_char`). The length can be passed
333 explicitly (`libc::size_t`), or the string can be NUL-byte terminated
336 One might be tempted to do this via doing
337 `CString::new("blah").unwrap().into_raw()`. This has several problems:
339 a) If you do `CString::new("bl\x00ah")` then the unwrap() will fail
340 due to the additional NULL terminator, causing a dangling
341 pointer to be returned (as well as a potential use-after-free).
343 b) Returning the raw pointer will cause the CString to run its deallocator,
344 which causes any C code which tries to access the contents to dereference a
347 c) If we were to do `as_raw()` this would result in a potential double-free
348 since the Rust deallocator would run and possibly Tor's deallocator.
350 d) Calling `into_raw()` without later using the same pointer in Rust to call
351 `from_raw()` and then deallocate in Rust can result in a
352 [memory leak](https://doc.rust-lang.org/std/ffi/struct.CString.html#method.into_raw).
354 [It was determined](https://github.com/rust-lang/rust/pull/41074) that this
355 is safe to do if you use the same allocator in C and Rust and also specify
356 the memory alignment for CString (except that there is no way to specify
357 the alignment for CString). It is believed that the alignment is always 1,
358 which would mean it's safe to dealloc the resulting `*mut c_char` in Tor's
359 C code. However, the Rust developers are not willing to guarantee the
360 stability of, or a contract for, this behaviour, citing concerns that this
361 is potentially extremely and subtly unsafe.
363 4. Perform an allocation on the other side of the boundary
365 After crossing the boundary, the other side MUST perform an
366 allocation to copy the data and is therefore responsible for
367 freeing that memory later.
369 5. No touching other language's enums
371 Rust enums should never be touched from C (nor can they be safely
372 `#[repr(C)]`) nor vice versa:
374 > "The chosen size is the default enum size for the target platform's C
375 > ABI. Note that enum representation in C is implementation defined, so this is
376 > really a "best guess". In particular, this may be incorrect when the C code
377 > of interest is compiled with certain flags."
379 (from https://gankro.github.io/nomicon/other-reprs.html)
383 Wherever possible and sensical, you SHOULD create new types in a
384 manner which prevents type confusion or misuse. For example,
385 rather than using an untyped mapping between strings and integers
388 use std::collections::HashMap;
390 pub fn get_elements_with_over_9000_points(map: &HashMap<String, usize>) -> Vec<String> {
394 It would be safer to define a new type, such that some other usage
395 of `HashMap<String, usize>` cannot be confused for this type:
397 pub struct DragonBallZPowers(pub HashMap<String, usize>);
399 impl DragonBallZPowers {
400 pub fn over_nine_thousand<'a>(&'a self) -> Vec<&'a String> {
401 let mut powerful_enough: Vec<&'a String> = Vec::with_capacity(5);
403 for (character, power) in &self.0 {
405 powerful_enough.push(character);
412 Note the following code, which uses Rust's type aliasing, is valid
413 but it does NOT meet the desired type safety goals:
415 pub type Power = usize;
417 pub fn over_nine_thousand(power: &Power) -> bool {
424 // We can still do the following:
425 let his_power: usize = 9001;
426 over_nine_thousand(&his_power);
428 7. Unsafe mucking around with lifetimes
430 Because lifetimes are technically, in type theory terms, a kind, i.e. a
431 family of types, individual lifetimes can be treated as types. For example,
432 one can arbitrarily extend and shorten lifetime using `std::mem::transmute`:
434 struct R<'a>(&'a i32);
436 unsafe fn extend_lifetime<'b>(r: R<'b>) -> R<'static> {
437 std::mem::transmute::<R<'b>, R<'static>>(r)
440 unsafe fn shorten_invariant_lifetime<'b, 'c>(r: &'b mut R<'static>) -> &'b mut R<'c> {
441 std::mem::transmute::<&'b mut R<'static>, &'b mut R<'c>>(r)
444 Calling `extend_lifetime()` would cause an `R` passed into it to live forever
445 for the life of the program (the `'static` lifetime). Similarly,
446 `shorten_invariant_lifetime()` could be used to take something meant to live
447 forever, and cause it to disappear! This is incredibly unsafe. If you're
448 going to be mucking around with lifetimes like this, first, you better have
449 an extremely good reason, and second, you may as be honest and explicit about
450 it, and for ferris' sake just use a raw pointer.
452 In short, just because lifetimes can be treated like types doesn't mean you
455 8. Doing excessively unsafe things when there's a safer alternative
457 Similarly to #7, often there are excessively unsafe ways to do a task and a
458 simpler, safer way. You MUST choose the safer option where possible.
460 For example, `std::mem::transmute` can be abused in ways where casting with
461 `as` would be both simpler and safer:
465 let ptr_num_transmute = unsafe { std::mem::transmute::<&i32, usize>(ptr)};
467 // Use an `as` cast instead
468 let ptr_num_cast = ptr as *const i32 as usize;
470 In fact, using `std::mem::transmute` for *any* reason is a code smell and as
471 such SHOULD be avoided.
473 9. Casting integers with `as`
475 This is generally fine to do, but it has some behaviours which you should be
476 aware of. Casting down chops off the high bits, e.g.:
478 let x: u32 = 4294967295;
479 println!("{}", x as u16); // prints 65535
481 Some cases which you MUST NOT do include:
483 * Casting an `u128` down to an `f32` or vice versa (e.g.
484 `u128::MAX as f32` but this isn't only a problem with overflowing
485 as it is also undefined behaviour for `42.0f32 as u128`),
487 * Casting between integers and floats when the thing being cast
488 cannot fit into the type it is being casted into, e.g.:
490 println!("{}", 42949.0f32 as u8); // prints 197 in debug mode and 0 in release
491 println!("{}", 1.04E+17 as u8); // prints 0 in both modes
492 println!("{}", (0.0/0.0) as i64); // prints whatever the heck LLVM wants
494 Because this behaviour is undefined, it can even produce segfaults in
495 safe Rust code. For example, the following program built in release
499 pub fn trigger_ub(sl: &[u8; 666]) -> &[u8] {
500 // Note that the float is out of the range of `usize`, invoking UB when casting.
501 let idx = 1e99999f64 as usize;
502 &sl[idx..] // The bound check is elided due to `idx` being of an undefined value.
506 println!("{}", trigger_ub(&[1; 666])[999999]); // ~ out of bound
509 And in debug mode panics with:
511 thread 'main' panicked at 'slice index starts at 140721821254240 but ends at 666', /checkout/src/libcore/slice/mod.rs:754:4