summaryrefslogtreecommitdiffstats
path: root/talk.md
blob: 5e5d53e8976f58a2b958d93f9681feb11fa15dd3 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
# Motivation

Rust is a new systems language being developed by Mozilla. It is still
in development, but the first stable release is planned for later this
year. As a heads up, the details in this talk will be based on the state
of the current master branch, *not* the latest development release.

Rust's goal is to provide an alternative for projects which would
otherwise be written in C or C++. The problem with C-based languages is
that they are extremely difficult to use correctly, as a project gets
big enough. It is very easy to accidentally write C++ code that causes
segmentation faults (unrecoverable errors caused by accessing memory
that doesn't exist), silent memory corruption, and all kinds of other
issues that can result in security issues and data loss.

Now, the reason that people still use C++ is because of the high level
of control that it gives you. This control allows you to write code
which wouldn't even be possible in other languages (how would you write
an interrupt handler in Perl?), and also allows you to write extremely
efficient code (computation-heavy code can run several orders of
magnitude faster in C++ compared to Perl). We often talk about premature
optimization, and how getting that last 10 or 15% of performance out of
a piece of code isn't actually worth it, but that is largely a factor of
the field that most of us work in. We can ignore those optimizations
because of how insignificant they are compared to the time it takes for
the OS to read some data in from disk or from a database, but that does
imply that a 10% difference in speed in the disk controller or database
code can actually matter.

Rust's goal, therefore, is to provide the same level of control you get
when writing in C++ while removing as many of the dangerous sharp edges
as possible. Its philosophy is based strongly on the idea of zero cost
abstractions. One of the main benefits of writing in C++ is that it is
fairly straightforward to see how a given piece of C++ code translates
down into machine instructions. To succeed, Rust needs to retain that.
This means no mandatory boxing of variables, no mandatory garbage
collection or reference counting, and really no mandatory runtime at
all. Instead, Rust has things like the ability to optionally box
variables explicitly, and have the compiler verify that they are used
and cleaned up properly, using the same sort of memory management you
would write by hand in C++ using new and delete. When it introduces an
entirely new abstraction like closures, it makes sure that those
closures are inlinable, so code written using them can end up just as
efficient as code written without the abstraction layer. These new
abstractions can be used by Rust's compiler to completely eliminate
things like null pointers, memory corruption, data races in concurrent
code, and use of uninitialized data while adding no overhead at all.

Sometimes avoiding those kinds of things isn't possible, though - for
instance, Rust is self-hosting, and so it needs to be able to talk to
the operating system somehow. Also, there are situations where a safe
implementation of an algorithm would be possible, but being able to
"cheat" internally can make the code much faster while still providing
an entirely safe public API. For this case, Rust also provides a way to
disable most of its safety checking within specific scopes. In effect,
the code within these unsafe blocks becomes an alternative syntax for C,
so anything you would be able to express in C should be possible within
that limited scope.

Now, a common question at this point is "Why a new language? Couldn't
you just write a better C++ compiler instead?" There are a couple
answers here. First, given the level of safety that Rust is targeting,
effectively no existing C++ programs would even compile. So much of the
reasoning behind why existing programs are safe is implicit that there
is no hope of writing a compiler which can figure it all out. So at this
point, you need to start adding additional annotations and such in order
to make it all explicit, and then you already basically have another
language. Also, Rust is still built on top of LLVM (the backend for
clang), so it's not like it's starting entirely from scratch - Rust
isn't throwing out the years of work that has gone into optimizing C++
code because most of that optimization only happens once it gets to the
compiler backend, and that is still the same.

Another common question is "Why Mozilla?" Well, as mentioned earlier,
there are a few places where every bit of speed counts, and these days,
web browsers are definitely one of those places. Really, if you squint a
bit, web browsers are basically on the level of operating systems at
this point. They run all kinds of untrusted code, all of that untrusted
code has to go through them to access the hardware, and their job is to
keep it all safe, sandboxed, and secure. Firefox, though, is around 8
million lines of C++ code at this point, and it's effectively impossible
to write 8 million lines of C++ code without a memory or concurrency bug
showing up somewhere. The issue with those kinds of bugs though is that
they are completely invisible until the exact right circumstances
occur, and so the normal strategies of testing and things like that
don't really help all that much. Mozilla and the other browser makers
are doing an excellent job at keeping things running the way they are,
but it's not clear at all if that's going to be sustainable in the long
term. With that in mind, Mozilla is using Rust to write a new browser
rendering engine called Servo, which is built from the ground up to be
both secure, leveraging Rust's stronger safety guarantees, and fast,
being built from the ground up to support pervasive (and safe)
parallelism, among other things. It already has parallel layout and
rendering, and passes the Acid2 test, and while it's not likely to
replace Firefox for quite some time yet, the goal is to have a usable
browser based on Servo implemented by the end of the year.

# Overview

## Language structure

Rust's syntax is based on C and ML, among a few others. Like Perl, it's
a whitespace-insensitive, brace-based language, but unlike Perl, pretty
much everything is an expression, including things like if statements.
This is what "hello world" looks like in Rust. Functions are declared
with 'fn', the entry point to the program is the function 'main' (just
like in C), and 'println!' is Rust's equivalent to printf.

Here's a more complicated example (from the main page of the Rust
website). As you can see, variables are declared using 'let', must be
initialized at the point of declaration, and are immutable by default.
Mutable variables are declared using 'let mut'. Iteration is done
through 'for' and 'while' loops. In this example, the 'chars' method on
a string returns an iterator which returns each character in the string
in turn. Characters in Rust are four byte Unicode codepoints, and
strings are stored internally in utf8. Another minor point is that like
Perl 6, for loops (and while loops, and conditionals) don't require
parentheses around the condition.

Rust also has pattern matching, similar to ML. Matching can be done on
arbitrary data structures, and the compiler verifies that the match is
exhaustive, so not only is it more readable than a series of if
statements, it is also more safe.

Finally, you can see a more complicated example of 'println!' at the
end. The trailing '!' indicates that 'println!' is a macro, so it can do
things not normally possible in the language syntax. This is a general
rule in order to make the language more easily parsable by external
tools - macros are introduced with an identifier that ends with an
exclamation mark, and must be delimited by matching parentheses,
brackets, or braces. The pattern language that println! uses is actually
based on Python rather than printf. A bare set of braces means to
automatically choose the correct stringification based on the type of
the given parameter (for types that define one, which includes most
builtin types). You can also pass the specifier explicitly if you need
to pass arguments to it, and the special '{:?}' specifier uses
reflection mechanisms in order to print out complicated data structures
for debugging, even if they haven't implemented a stringification.

As mentioned earlier, for loops use iterators for iteration. This lets
them avoid using more memory than necessary, and also allows operations
to be easily composed. In this example, for instance, we take the chars
iterator and filter out the spaces, leaving only the characters we care
about. This is all done without ever building a new list - the character
values are calculated out of the string directly. The filter method (and
most of the other iterator methods) can (most likely) then be inlined,
and the resulting code is no different from what you would write
otherwise by manually moving pointers around.

Another thing to note is that the filter method takes a closure as an
argument. Closure syntax is based on Ruby's block syntax. In this case,
the closure takes a borrowed pointer to the character to be filtered,
which is why the parameter is declared as '&x'. We'll get into what
exactly that means later in the talk.

Notice also that the closure doesn't require a return statement. Rust
works the same way that Perl does, in that return statements are
optional at the end of a function body, whether it's a closure or a
named function. There is one minor difference in that just as in Perl,
semicolons are statement separators rather than terminators, but unlike
in Perl, empty statements aren't ignored, so if you want to implicitly
return a value, the final semicolon must be omitted, or else your
function will be returning nil.

## Type System

In addition to basic types like integers floating point values, and
arrays, Rust also has several different ways to build more complicated
data structures. The most basic way is using structs, like this. Structs
in Rust are pretty much the same as structs in C, but you can actually
initialize them anywhere you allocate them (in fact, you're required
to). These structs are also entirely compatible on the memory
representation level with C, and so passing structs back and forth
between Rust and C is guaranteed to work.

Rust also has enum types, just like C. One advantage to them over C
enums is that when they are used in a pattern match, the compiler checks
that your match statement covers all of the possible enum values (like
this), and that it doesn't include values that don't exist (like this).
A bigger advantage though is that Rust enums aren't just enums - they
are actually algebraic data types in disguise. For instance, the Color
enum could be extended to include a custom color, like this. Here, the
Custom enum value includes data attached to it, which we can extract
through destructuring bind in the match statement (note that
destructuring bind also works identically in 'let' statements). The Rust
standard library includes some useful examples of enums, such as an
Option type, which looks like this.

The option type is also a good example of Rust's support for generics.
Structs, enums, and functions (as well as a few other things) can be
parameterized by types. This works pretty much identically to C++
templates, in that the compiler will see which types are actually being
used for the parameter, and generate separate copies of the type or
function for each type argument that was used.

As you can see from these examples, Rust is also capable of type
inference. You almost never have to explicitly specify types when
defining variables or calling functions, even when using things like
destructuring bind. One exception here is method signatures. One of
Rust's design principles is that public API should always be explicit to
avoid accidental incompatibilities, and so things like function
signatures require explicit types. Another exception is that you can't
infer on return values, but that's usually only relevant when using
generics.

In addition to the basic builtin types, Rust's standard library also
includes a lot of helpful data structures. The two that you'll probably
be using most often are Vec and str (roughly corresponding to vector and
string in C++). Here's an example of using vectors - you can see the
vector being initialized and modified, and printing the length and the
individual values. Here's a similar example using strings. Something to
notice is how both vectors and strings have special initialization
syntax (the vec! macro and the String::from_str function). This is
because the builtin vectors and string that you can use with bare
brackets or a bare quoted string are fixed size, which allows them to be
allocated in place, which is much more efficient in general. If you want
to be able to modify the string or vector, you need to create a
modifiable version, which requires special initialization. You can
easily get fixed size slices out of the data stored in a growable vector
or string, though, and this is useful because the majority of functions
in the Rust standard library operate on fixed size slices.

One other thing you may have noticed in the previous examples is that I
was calling methods on the vectors and strings. Rust allows you to
define implementations of types using the impl keyword. You can define
class methods, which are called just like normal functions, as well as
instance methods, which are distinguished from class methods by taking
an initial 'self' parameter (we'll talk about what that '&' means
later). Methods use static dispatch - dynamic dispatch does exist, but
it's more complicated and not really in the scope of this talk.

One final aspect to the type system I'd like to cover is traits. Traits
work pretty similar to implementations elsewhere - they represent a
common bundle of behavior that can be implemented by any given type.
Traits can have default implementations for their methods, and can be
implemented on a type either by the author of the trait or by the author
of the type, for maximum flexibility. Traits can also be used as bounds
on type parameters, in order to write functions that only operate on
types that implement a given trait. Traits are also used to implement
various builtin features like operator overloading, as well as things in
the standard library - for instance, the Show trait implements the
default formatting behavior for println! as seen here. The details of
this implementation aren't important, just the fact that this is all
handled through traits.

## Pointers and ownership

You may have heard that Rust has all of these different kinds of
pointers and it's all confusing. This is no longer really the case. As
the language is moving towards a stable release, the development team
has been putting a lot of effort into simplifying the language and
removing features that don't really pull their weight.

In general, most data you will deal with will be values allocated
statically on the stack. If you need an integer, you can just declare an
integer variable and use it. The same thing holds true for more
complicated data structures - for instance, the Point example earlier.
Allocating as much as possible on the stack is a good thing because
stack allocation is extremely fast.

Stack variables have limitations though, in that they are only valid in
the function in which they are declared. They can only be passed into
functions and returned from functions by copying. This is fine for small
types like integers, but can have a significant impact for larger types.
In order to pass data around without requiring copying it everywhere,
you'll need to use pointers. The most common type of pointer you'll
encounter is the borrowed pointer. When you take a borrowed pointer to a
piece of data, the compiler verifies that the data it's pointing to
lives as least as long as the pointer - if it doesn't, then it throws a
compile-time error. Once it has verified this, you can use it however
you want, and you'll know that it will never end up pointing to invalid
data. This means that borrowed pointers have no runtime impact at all -
they don't require any cleanup because the compiler already verified
that the data will be cleaned up elsewhere.

Take this C++ example, for instance. This program will happily compile,
and result in undefined behavior since the variable being pointed to no
longer exists once the function returns. This is called a "dangling
pointer", and can also happen when you dynamically allocate memory, but
free it too early. In contrast, if we translate the same example into
Rust, a compile time error is issued, telling us that we're trying to
make a borrowed pointer live longer than the thing it points to.

Borrowed pointers allow you to take references to existing data easily
enough, but sometimes you need to create data that will outlive the
current function's scope. In other words, you need to allocate a new
chunk of memory that you own, and ensure that it is cleaned up. For this
case, Rust allows you to "box" values, which just means to allocate a
chunk of memory and give you a pointer to it instead. For instance, we
can fix our earlier example like this. Here we create a new boxed value
with the integer 2 inside it, and then we return that boxed value. Since
this memory was dynamically allocated rather than allocated on the
stack, it still exists when the function in which it was allocated
returns, and so we can then use it by dereferencing it.

One thing you'll notice here is that there is no deallocation code
anywhere. We're not actually leaking memory here - Rust can determine
at compile time where the allocated memory is done being used, and it
automatically inserts the call to free the memory at that point. The way
it determines this is by using a concept called "ownership" (boxed
values are sometimes called "owned pointers"). See this example: if I
create a boxed value and then try to store it in two different
variables, I get a compiler error. This is because boxed values aren't
copied, they are "moved". Assigning a boxed value to a different
variable doesn't copy anything at all, it just changes the name of the
variable that can be used to access the same data. Only a single
variable can own a boxed value at any given point, and given that
constraint, it is trivial to just trace through the code to see where
the value is no longer used.

Boxed values are not usually used on their own like this, however. In
almost all cases, for simple values, stack allocated values with
borrowed pointers are sufficient, and where they aren't, copying values
doesn't have a large enough performance impact to worry about. Where
boxed values are useful is in building data structures. Take this linked
list example, for instance. If you try to compile this code, you'll get
an error, because the compiler has no way of knowing how big the List
data structure is, since it contains a copy of itself. The solution here
is to instead make it contain a pointer to a copy of itself, which works
because pointers have a fixed size. Boxed values are also used in the
implementation of things like strings and vectors, since the data they
contain may need to be reallocated as they grow, and so storing the data
externally makes that possible.

Finally, we also have unsafe pointers (also called raw pointers), but
these are only intended for use when interoperating with C (these
pointers work exactly like C pointers). You can ignore their existence
entirely when writing normal Rust code.

Something you may have noticed in how we are using borrowed pointers and
boxed values is that they must always be initialized. Null pointers do
not exist in Rust (except when using unsafe pointers). Instead, you can
use the Option type mentioned earlier to wrap any pointers you want. The
compiler has an optimization for this which allows it to use a single
normal pointer as the representation, since it knows that null is an
invalid value for these pointers and the Option type has a single
"extra" value outside of the normal pointer range, and so using Option
with pointers actually has no overhead at all. This eliminates a huge
range of potential errors, since it's no longer possible to forget to
check a value for null - if you do, your program will fail to compile.

## Concurrency

Rust has also put a lot of effort into concurrency. In the interest of
time, I'm just going to give a brief overview, but the most interesting
point is that not only can the Rust type system ensure that your code
uses memory safely, it can also ensure that your code has no data races
when accessing the same memory from different threads. This allows you
to use parallelism quite a bit more effectively than you would be able
to without those guarantees, because figuring out where data races might
be in your code is incredibly hard to do on your own, and so usually
languages just fall back on copying a lot more than is necessary. Rust
just expands the ownership semantics I mentioned earlier with regards to
boxed values to also be applied to shared memory.

Rust's concurrency model is based around tasks. Tasks default to mapping
directly to threads (1:1 model), but they also have an optional M:N
scheduler if OS-level threads are too heavy. The basic idea is that all
data races are caused by data that is both mutable and aliasable, and so
any memory that is shared between tasks must be either entirely
immutable, or it must be owned by the task. Here's a basic example which
calculates the value of the Ackermann function at a given point in a
background task, and the main task waits for the result and then prints
it out. The channel function here is similar to the 'pipe' operator in
Perl - it just creates a one-way communication channel that the tasks
can communicate with. Now, clearly the channel can't be entirely
immutable, since you have to be able to send data across it, so the
thing that makes this example work is the 'proc' keyword here. A 'proc'
is a special type of closure which takes ownership of anything it closes
over (normal closures just take borrowed pointers to things they close
over). In this case, it closes over the writing end of the channel, and
so the main task can no longer access that end of the pipe, and neither
can any other tasks you might try to spawn in the same scope (if you
tried to, you would get a compilation error). This ensures that at any
given point in your program's execution, there is only a single task
trying to write to the pipe at any given time, and only a single task
trying to read from the pipe at any given time, so your program remains
deterministic. On the other hand, 'm' and 'n' are entirely immutable,
and so there are no issues with them being accessed from both the main
task and the calculation task.

## Misc

Rust also has quite a few other useful features that I didn't touch on.
It has namespacing and a module system with privacy controls. It has
integrated testing and benchmarks. It has quite a few compiler lint
checks, from warnings about things like unused variables and dead code
to optional errors about entire language features like "allocation" or
"unsafe blocks", and they can all be adjusted to be ignored, to warn, or
to error independently. It can interoperate with C directly, via extern
"C" blocks. The entire runtime and standard library can even be left out
or replaced in order to write things like kernels or embedded code -
there are already existing projects for writing a simple kernel in Rust
and running Rust code on Arduinos. There is a powerful macro system
available which is still constrained enough to not make writing external
parsing tools impossible. And the language is very flexible - most
language features are implemented via normal Rust functions which can be
overriden - either via traits for operations on new data types, or via
special "language items" for low level operations like memory
allocation.

# Contributing

So you've heard all of this and you're interested in learning more? A
good start to getting into the language is the tutorial on the Rust
website, as well as play.rust-lang.org and rustbyexample.com. If you're
interested in getting into Rust development, Rust is developed entirely
openly, and is always welcoming of new contributors. Discussion happens
both on IRC (on irc.mozilla.org) and on the rust-dev mailing list, and
decisions are made during open meetings between Mozilla's Rust team. For
keeping up with the language changes until 1.0 is released, This Week In
Rust is an excellent resource - it documents the major changes to the
language and libraries on a weekly basis, in case you don't have the
time to keep up with everything going on. Finally, Rust has a community
Standards of Conduct that is regularly enforced by the core team, and
this has helped to make the Rust community to be, in my experience, one
of the friendliest and most pleasant programming communities I've seen.
If this talk seemed interesting to you at all, I highly recommend
getting involved.

Any questions?