The Little Book of Rust Macros

Note: This is a continuation of Daniel Keep's Book which has not been updated since the early summer of 2016, adapted to make use of mdBook.

View the rendered version here and the repository here.

A chinese version of this book can be found here.

This book is an attempt to distill the Rust community's collective knowledge of Rust macros, the Macros by Example ones as well as procedural macros(WIP). As such, both additions (in the form of pull requests) and requests (in the form of issues) are very much welcome. If something's unclear, opens up questions or is not understandable as written down, fear not to make an issue asking for clarification. The goal is for this book to become the best learning resource possible.

The original Little Book of Rust Macros has helped me immensely with understanding Macros by Example style macros while I was still learning the language. Unfortunately, the original book hasn't been updated since april of 2016, while the Rust language as well as it's macro-system keeps evolving. Which is why I took up the task to update the book and keep it updated as well as I can while also adding new found things to it. In hopes that it will help out all the fresh faces coming to Rust understanding its macro systems, a part of the language a people tend to have trouble with.

This book expects you to have basic knowledge of Rust, it will not explain language features or constructs that are irrelevant to macros. No prior knowledge of macros is assumed. Having read and understood the first seven chapters of the Rust Book is a must, though having read the majority of the book is recommended.

Thanks

A big thank you to Daniel Keep for the original work as well as all the contributors that added to the original which can be found here.

License

This work inherits the licenses of the original, hence it is licensed under both the Creative Commons Attribution-ShareAlike 4.0 International License and the MIT license.

Syntax Extensions

Before talking about Rust's different macro systems it is worthwhile to discuss the general mechanism they are built on: syntax extensions.

To do that, we must first discuss how Rust source is processed by the compiler, and the general mechanisms on which user-defined macros and proc-macros are built upon.

Note: This book will use the term syntax extension from now on when talking about all of rust's different macro kinds in general to reduce potential confusion with the upcoming declarative macro 2.0 proposal which uses the macro keyword.

Source Analysis

Tokenization

The first stage of compilation for a Rust program is tokenization. This is where the source text is transformed into a sequence of tokens (i.e. indivisible lexical units; the programming language equivalent of "words"). Rust has various kinds of tokens, such as:

  • Identifiers: foo, Bambous, self, we_can_dance, LaCaravane, …
  • Literals: 42, 72u32, 0_______0, 1.0e-40, "ferris was here", …
  • Keywords: _, fn, self, match, yield, macro, …
  • Symbols: [, :, ::, ?, ~, @1, …

…among others. There are some things to note about the above: first, self is both an identifier and a keyword. In almost all cases, self is a keyword, but it is possible for it to be treated as an identifier, which will come up later (along with much cursing). Secondly, the list of keywords includes some suspicious entries such as yield and macro that aren't actually in the language, but are parsed by the compiler—these are reserved for future use. Third, the list of symbols also includes entries that aren't used by the language. In the case of <-, it is vestigial: it was removed from the grammar, but not from the lexer. As a final point, note that :: is a distinct token; it is not simply two adjacent : tokens. The same is true of all multi-character symbol tokens in Rust, as of Rust 1.2.2

1

@ has a purpose, though most people seem to forget about it completely: it is used in patterns to bind a non-terminal part of the pattern to a name.

2

Technically rust currently(1.46) has two lexers, rustc_lexer which only emits single character symbols as tokens and the lexer in rustc_parse which sees multi-character symbols as distinct tokens.

As a point of comparison, it is at this stage that some languages have their macro layer, though Rust does not. For example, C/C++ macros are effectively processed at this point. 3 This is why the following code works: 4

#define SUB void
#define BEGIN {
#define END }

SUB main() BEGIN
    printf("Oh, the horror!\n");
END
3

In fact, the C preprocessor uses a different lexical structure to C itself, but the distinction is broadly irrelevant.

4

Whether it should work is an entirely different question.

Parsing

The next stage is parsing, where the stream of tokens is turned into an Abstract Syntax Tree (AST). This involves building up the syntactic structure of the program in memory. For example, the token sequence 1 + 2 is transformed into the equivalent of:

┌─────────┐   ┌─────────┐
│ BinOp   │ ┌╴│ LitInt  │
│ op: Add │ │ │ val: 1  │
│ lhs: ◌  │╶┘ └─────────┘
│ rhs: ◌  │╶┐ ┌─────────┐
└─────────┘ └╴│ LitInt  │
              │ val: 2  │
              └─────────┘

The AST contains the structure of the entire program, though it is based on purely lexical information. For example, although the compiler may know that a particular expression is referring to a variable called a, at this stage, it has no way of knowing what a is, or even where it comes from.

It is after the AST has been constructed that macros are processed. However, before we can discuss that, we have to talk about token trees.

Token trees

Token trees are somewhere between tokens and the AST. Firstly, almost all tokens are also token trees; more specifically, they are leaves. There is one other kind of thing that can be a token tree leaf, but we will come back to that later.

The only basic tokens that are not leaves are the "grouping" tokens: (...), [...], and {...}. These three are the interior nodes of token trees, and what give them their structure. To give a concrete example, this sequence of tokens:

a + b + (c + d[0]) + e

would be parsed into the following token trees:

«a» «+» «b» «+» «(   )» «+» «e»
          ╭────────┴──────────╮
           «c» «+» «d» «[   ]»
                        ╭─┴─╮
                         «0»

Note that this has no relationship to the AST the expression would produce; instead of a single root node, there are seven token trees at the root level. For reference, the AST would be:

              ┌─────────┐
              │ BinOp   │
              │ op: Add │
            ┌╴│ lhs: ◌  │
┌─────────┐ │ │ rhs: ◌  │╶┐ ┌─────────┐
│ Var     │╶┘ └─────────┘ └╴│ BinOp   │
│ name: a │                 │ op: Add │
└─────────┘               ┌╴│ lhs: ◌  │
              ┌─────────┐ │ │ rhs: ◌  │╶┐ ┌─────────┐
              │ Var     │╶┘ └─────────┘ └╴│ BinOp   │
              │ name: b │                 │ op: Add │
              └─────────┘               ┌╴│ lhs: ◌  │
                            ┌─────────┐ │ │ rhs: ◌  │╶┐ ┌─────────┐
                            │ BinOp   │╶┘ └─────────┘ └╴│ Var     │
                            │ op: Add │                 │ name: e │
                          ┌╴│ lhs: ◌  │                 └─────────┘
              ┌─────────┐ │ │ rhs: ◌  │╶┐ ┌─────────┐
              │ Var     │╶┘ └─────────┘ └╴│ Index   │
              │ name: c │               ┌╴│ arr: ◌  │
              └─────────┘   ┌─────────┐ │ │ ind: ◌  │╶┐ ┌─────────┐
                            │ Var     │╶┘ └─────────┘ └╴│ LitInt  │
                            │ name: d │                 │ val: 0  │
                            └─────────┘                 └─────────┘

It is important to understand the distinction between the AST and token trees. When writing macros, you have to deal with both as distinct things.

One other aspect of this to note: it is impossible to have an unpaired paren, bracket or brace; nor is it possible to have incorrectly nested groups in a token tree.

Macros in the AST

As previously mentioned, macro processing in Rust happens after the construction of the AST. As such, the syntax used to invoke a macro must be a proper part of the language's syntax. In fact, there are several "syntax extension" forms which are part of Rust's syntax. Specifically, the following 4 forms (by way of examples):

  1. # [ $arg ]; e.g. #[derive(Clone)], #[no_mangle], …
  2. # ! [ $arg ]; e.g. #![allow(dead_code)], #![crate_name="blang"], …
  3. $name ! $arg; e.g. println!("Hi!"), concat!("a", "b"), …
  4. $name ! $arg0 $arg1; e.g. macro_rules! dummy { () => {}; }.

The first two are attributes which annotate items, expressions and statements. They can be classified into different kinds, built-in attributes, proc-macro attributes and derive attributes. proc-macro attributes and derive attributes can be implemented with the second macro system that Rust offers, procedural macros. built-in attributes on the other hand are attributes implemented by the compiler.

The third form $name ! $arg are function-like macros. It is the form available for use with macro_rules!, macro and also procedural macros. Note that this form is not limited to macro_rules! macros: it is a generic syntax extension form. For example, whilst format! is a macro_rules! macro, format_args! (which is used to implement format!) is not as it is a compiler builtin.

The fourth form is essentially a variation which is not available to macros. In fact, the only case where this form is used at all is with the macro_rules! construct itself which.

So, starting with the third form, how does the Rust parser know what the $arg in ($name ! $arg) looks like for every possible syntax extension? The answer is that it doesn't have to. Instead, the argument of a syntax extension invocation is a single token tree. More specifically, it is a single, non-leaf token tree; (...), [...], or {...}. With that knowledge, it should become apparent how the parser can understand all of the following invocation forms:

bitflags! {
    struct Color: u8 {
        const RED    = 0b0001,
        const GREEN  = 0b0010,
        const BLUE   = 0b0100,
        const BRIGHT = 0b1000,
    }
}

lazy_static! {
    static ref FIB_100: u32 = {
        fn fib(a: u32) -> u32 {
            match a {
                0 => 0,
                1 => 1,
                a => fib(a-1) + fib(a-2)
            }
        }

        fib(100)
    };
}

fn main() {
    use Color::*;
    let colors = vec![RED, GREEN, BLUE];
    println!("Hello, World!");
}

Although the above invocations may look like they contain various kinds of Rust code, the parser simply sees a collection of meaningless token trees. To make this clearer, we can replace all these syntactic "black boxes" with ⬚, leaving us with:

bitflags! ⬚

lazy_static! ⬚

fn main() {
    let colors = vec! ⬚;
    println! ⬚;
}

Just to reiterate: the parser does not assume anything about ⬚; it remembers the tokens it contains, but doesn't try to understand them. This means ⬚ can be anything, even invalid Rust! As to why this is a good thing, we will come back to that at a later point.

So, does this also apply to $arg in form 1 and 2, and to ther two args in form 4? Kind of. The $arg$ for form 1 and 2 is a bit different in that it is not directly a token tree, but a simple path that is either followed by an = token and a literal expression, or a token tree. We will explore this more in-depth in the appropriate proc-macro chapter. The important part here is that this form as well, makes use of token trees to describe the input. The 4th form in general is more special and accepts a very specific grammar that also makes use of token trees though. The specifics of this form do not matter at this point so we will skip them until they become relevant.

The important takeaways from this are:

  • There are multiple kinds of syntax extensions in Rust.
  • Just seeing something of the form $name! $arg, doesn't tell you what kind of syntax extension it might be. It could be a macro_rules! macro, a proc-macro or maybe even a builtin.
  • The input to every ! macro invocation, that is form 3, is a single non-leaf token tree.
  • Syntax extensions are parsed as part of the abstract syntax tree.

The last point is the most important, as it has significant implications. Because syntax extensions are parsed into the AST, they can only appear in positions where they are explicitly supported. Specifically syntax extensions can appear in place of the following:

  • Patterns
  • Statements
  • Expressions
  • Items(this includes impl items)
  • Types

Some things not on this list:

  • Identifiers
  • Match arms
  • Struct fields

There is absolutely, definitely no way to use syntax extensions in any position not on the first list.

Expansion

Expansion is a relatively simple affair. At some point after the construction of the AST, but before the compiler begins constructing its semantic understanding of the program, it will expand all syntax extensions.

This involves traversing the AST, locating syntax extension invocations and replacing them with their expansion.

Once the compiler has run a syntax extension, it expects the result to be parsable as one of a limited set of syntax elements, based on context. For example, if you invoke a syntax extension at module scope, the compiler will parse the result into an AST node that represents an item. If you invoke a syntax extension in expression position, the compiler will parse the result into an expression AST node.

In fact, it can turn a syntax extension result into any of the following:

  • an expression,
  • a pattern,
  • a type,
  • zero or more items, or
  • zero or more statements.

In other words, where you can invoke a syntax extension determines what its result will be interpreted as.

The compiler will take this AST node and completely replace the syntax extension's invocation node with the output node. This is a structural operation, not a textual one!

For example, consider the following:

let eight = 2 * four!();

We can visualize this partial AST as follows:

┌─────────────┐
│ Let         │
│ name: eight │   ┌─────────┐
│ init: ◌     │╶─╴│ BinOp   │
└─────────────┘   │ op: Mul │
                ┌╴│ lhs: ◌  │
     ┌────────┐ │ │ rhs: ◌  │╶┐ ┌────────────┐
     │ LitInt │╶┘ └─────────┘ └╴│ Macro      │
     │ val: 2 │                 │ name: four │
     └────────┘                 │ body: ()   │
                                └────────────┘

From context, four!() must expand to an expression (the initializer can only be an expression). Thus, whatever the actual expansion is, it will be interpreted as a complete expression. In this case, we will assume four! is defined such that it expands to the expression 1 + 3. As a result, expanding this invocation will result in the AST changing to:

┌─────────────┐
│ Let         │
│ name: eight │   ┌─────────┐
│ init: ◌     │╶─╴│ BinOp   │
└─────────────┘   │ op: Mul │
                ┌╴│ lhs: ◌  │
     ┌────────┐ │ │ rhs: ◌  │╶┐ ┌─────────┐
     │ LitInt │╶┘ └─────────┘ └╴│ BinOp   │
     │ val: 2 │                 │ op: Add │
     └────────┘               ┌╴│ lhs: ◌  │
                   ┌────────┐ │ │ rhs: ◌  │╶┐ ┌────────┐
                   │ LitInt │╶┘ └─────────┘ └╴│ LitInt │
                   │ val: 1 │                 │ val: 3 │
                   └────────┘                 └────────┘

This can be written out like so:

let eight = 2 * (1 + 3);

Note that we added parentheses despite them not being in the expansion. Remember that the compiler always treats the expansion of a syntax extension as a complete AST node, not as a mere sequence of tokens. To put it another way, even if you don't explicitly wrap a complex expression in parentheses, there is no way for the compiler to "misinterpret" the result, or change the order of evaluation.

It is important to understand that syntax extension expansions are treated as AST nodes, as this design has two further implications:

  • In addition to there being a limited number of invocation positions, syntax extension can only expand to the kind of AST node the parser expects at that position.
  • As a consequence of the above, syntax extension absolutely cannot expand to incomplete or syntactically invalid constructs.

There is one further thing to note about expansion: what happens when a syntax extension expands to something that contains another syntax extension invocation. For example, consider an alternative definition of four!; what happens if it expands to 1 + three!()?

let x = four!();

Expands to:

let x = 1 + three!();

This is resolved by the compiler checking the result of expansions for additional syntax extension invocations, and expanding them. Thus, a second expansion step turns the above into:

let x = 1 + 3;

The takeaway here is that expansion happens in "passes"; as many as is needed to completely expand all invocations.

Well, not quite. In fact, the compiler imposes an upper limit on the number of such recursive passes it is willing to run before giving up. This is known as the syntax extension recursion limit and defaults to 32. If the 32nd expansion contains a syntax extension invocation, the compiler will abort with an error indicating that the recursion limit was exceeded.

This limit can be raised using the #![recursion_limit="…"] attribute, though it must be done crate-wide. Generally, it is recommended to try and keep syntax extension below this limit wherever possible as it may impact compilation times.

Hygiene

Hygiene is an important concept for macros, it describes the ability for a macro to work in its own syntax context, not affecting nor being affected by its surroundings. In other words this means that a syntax extension should be invocable anywhere without interfering with its surrounding context.

In a perfect world all syntax extensions in rust would be fully hygienic, unfortunately this isn't the case, so care should be taken to avoid writing syntax extensions that aren't fully hygienic. We will go into general hygiene concepts here which will be touched upon in the corresponding hygiene chapters for the different syntax extensions rust has to offer.

 

Hygiene mainly affects identifiers and paths emitted by syntax extensions. In short, if an identifier created by a syntax extension cannot be accessed by the environment where the syntax extension has been invoked it is hygienic in regards to that identifier. Likewise, if an identifier used in a syntax extension cannot reference something defined outside of a syntax extension it is considered hygienic.

Note: The terms create and use refer to the position the identifier is in. That is the Foo in struct Foo {} or the foo in let foo = …; are created in the sense that they introduce something new under the name, but the Foo in fn foo(_: Foo) {} or the foo in foo + 3 are usages in the sense that they are referring to something existing.

This is best shown by example:

Let's assume we have some syntax extension make_local that expands to let local = 0;, that is is creates the identifier local. Then given the following snippet:

make_local!();
assert_eq!(local, 0);

If the local in assert_eq!(local, 0); resolves to the local defined by the syntax extension, the syntax extension is not hygienic(at least in regards to local names/bindings).

Now let's assume we have some syntax extension use_local that expands to local = 42;, that is it makes use of the identifier local. Then given the following snippet:

let mut local = 0;
use_local!();

If the local inside of the syntax extension for the given invocation resolves to the local defined before its invocation, the syntax extension is not hygienic either.

This is a rather short introduction to the general concept of hygiene. It will be explained in more depth in the corresponding macro_rules! hygiene and proc-macro hygiene chapters, with their specific peculiarities.

Debugging

rustc provides a number of tools to debug general syntax extensions, as well as some more specific ones tailored towards declarative and procedural macros respectively.

Sometimes, it is what the extension expands to that proves problematic as you do not usually see the expanded code. Fortunately rustc offers the ability to look at the expanded code, for this, the --pretty argument can be used. Given the following code:

// Shorthand for initializing a `String`.
macro_rules! S {
    ($e:expr) => {String::from($e)};
}

fn main() {
    let world = S!("World");
    println!("Hello, {}!", world);
}

compiled with the following command:

rustc -Z unstable-options -Zunpretty=expanded hello.rs

produces the following output (modified for formatting):

#![feature(prelude_import)]
#[prelude_import]
use std::prelude::rust_2018::*;
#[macro_use]
extern crate std;
// Shorthand for initializing a `String`.
macro_rules! S { ($e : expr) => { String :: from($e) } ; }

fn main() {
    let world = String::from("World");
    {
        ::std::io::_print(
            ::core::fmt::Arguments::new_v1(
                &["Hello, ", "!\n"],
                &match (&world,) {
                    (arg0,) => [
                        ::core::fmt::ArgumentV1::new(arg0, ::core::fmt::Display::fmt)
                    ],
                }
            )
        );
    };
}

But not just rustc exposes means to aid in debugging syntax extensions. For the aforementioned --pretty=expanded option, there exists a nice cargo plugin called cargo-expand made by dtolnay which is basically just a wrapper around it.

You can also use the playground, clicking on its TOOLS button in the top right gives you the option to expand syntax extensions as well!

Declarative Macros

This chapter will introduce Rust's declarative macro system: macro_rules!.

There are two different introductions in this chapter, a methodical and a practical.

The former will attempt to give you a complete and thorough explanation of how the system works, while the latter one will cover more practical examples. As such, the methodical introduction is intended for people who just want the system as a whole explained, while the practical introduction guides one through the implementation of a single macro.

Following up the two introductions it offers some generally very useful patterns and building blocks for creating feature rich macros.

Other resources about declarative macros include the Macros chapter of the Rust Book which is a more approachable, high-level explanation as well as the reference chapter which goes more into the precise details of things.

Note: This book will usually use the term mbe(Macro-By-Example), mbe macro or macro_rules! macro when talking about macro_rules! macros.

Macros, A Methodical Introduction

This chapter will introduce Rust's declarative Macro-By-Example system by explaining the system as a whole. It will do so by first going into the construct's syntax and its key parts and then following it up with more general information that one should at least be aware of.

macro_rules!

With all that in mind, we can introduce macro_rules! itself. As noted previously, macro_rules! is itself a syntax extension, meaning it is technically not part of the Rust syntax. It uses the following forms:

macro_rules! $name {
    $rule0 ;
    $rule1 ;
    // …
    $ruleN ;
}

There must be at least one rule, and you can omit the semicolon after the last rule. You can use brackets([]), parentheses(()) or braces({}).

Each "rule" looks like the following:

    ($matcher) => {$expansion}

Like before, the types of parentheses used can be any kind, but parentheses around the matcher and braces around the expansion are somewhat conventional. The expansion part of a rule is also called its transcriber.

Note that the choice of the parentheses does not matter in regards to how the mbe macro may be invoked. In fact, function-like macros can be invoked with any kind of parentheses as well, but invocations with { .. } and ( ... );, notice the trailing semicolon, are special in that their expansion will always be parsed as an item.

If you are wondering, the macro_rules! invocation expands to... nothing. At least, nothing that appears in the AST; rather, it manipulates compiler-internal structures to register the mbe macro. As such, you can technically use macro_rules! in any position where an empty expansion is valid.

Matching

When a macro_rules! macro is invoked, the macro_rules! interpreter goes through the rules one by one, in declaration order. For each rule, it tries to match the contents of the input token tree against that rule's matcher. A matcher must match the entirety of the input to be considered a match.

If the input matches the matcher, the invocation is replaced by the expansion; otherwise, the next rule is tried. If all rules fail to match, the expansion fails with an error.

The simplest example is of an empty matcher:

macro_rules! four {
    () => { 1 + 3 };
}

This matches if and only if the input is also empty (i.e. four!(), four![] or four!{}).

Note that the specific grouping tokens you use when you invoke the function-like macro are not matched, they are in fact not passed to the invocation at all. That is, you can invoke the above macro as four![] and it will still match. Only the contents of the input token tree are considered.

Matchers can also contain literal token trees, which must be matched exactly. This is done by simply writing the token trees normally. For example, to match the sequence 4 fn ['spang "whammo"] @_@, you would write:

macro_rules! gibberish {
    (4 fn ['spang "whammo"] @_@) => {...};
}

You can use any token tree that you can write.

Metavariables

Matchers can also contain captures. These allow input to be matched based on some general grammar category, with the result captured to a metavariable which can then be substituted into the output.

Captures are written as a dollar ($) followed by an identifier, a colon (:), and finally the kind of capture which is also called the fragment-specifier, which must be one of the following:

  • block: a block (i.e. a block of statements and/or an expression, surrounded by braces)
  • expr: an expression
  • ident: an identifier (this includes keywords)
  • item: an item, like a function, struct, module, impl, etc.
  • lifetime: a lifetime (e.g. 'foo, 'static, ...)
  • literal: a literal (e.g. "Hello World!", 3.14, '🦀', ...)
  • meta: a meta item; the things that go inside the #[...] and #![...] attributes
  • pat: a pattern
  • path: a path (e.g. foo, ::std::mem::replace, transmute::<_, int>, …)
  • stmt: a statement
  • tt: a single token tree
  • ty: a type
  • vis: a possible empty visibility qualifier (e.g. pub, pub(in crate), ...)

For more in-depth description of the fragment specifiers, check out the Fragment Specifiers chapter.

For example, here is a macro_rules! macro which captures its input as an expression under the metavariable $e:

macro_rules! one_expression {
    ($e:expr) => {...};
}

These metavariables leverage the Rust compiler's parser, ensuring that they are always "correct". An expr metavariables will always capture a complete, valid expression for the version of Rust being compiled.

You can mix literal token trees and metavariables, within limits (explained in Metavariables and Expansion Redux).

To refer to a metavariable you simply write $name, as the type of the variable is already specified in the matcher. For example:

macro_rules! times_five {
    ($e:expr) => { 5 * $e };
}

Much like macro expansion, metavariables are substituted as complete AST nodes. This means that no matter what sequence of tokens is captured by $e, it will be interpreted as a single, complete expression.

You can also have multiple metavariables in a single matcher:

macro_rules! multiply_add {
    ($a:expr, $b:expr, $c:expr) => { $a * ($b + $c) };
}

And use them as often as you like in the expansion:

macro_rules! discard {
    ($e:expr) => {};
}
macro_rules! repeat {
    ($e:expr) => { $e; $e; $e; };
}

There is also a special metavariable called $crate which can be used to refer to the current crate.

Repetitions

Matchers can contain repetitions. These allow a sequence of tokens to be matched. These have the general form $ ( ... ) sep rep.

  • $ is a literal dollar token.

  • ( ... ) is the paren-grouped matcher being repeated.

  • sep is an optional separator token. It may not be a delimiter or one of the repetition operators. Common examples are , and ;.

  • rep is the required repeat operator. Currently, this can be:

    • ?: indicating at most one repetition
    • *: indicating zero or more repetitions
    • +: indicating one or more repetitions

    Since ? represents at most one occurrence, it cannot be used with a separator.

Repetitions can contain any other valid matcher, including literal token trees, metavariables, and other repetitions allowing arbitrary nesting.

Repetitions use the same syntax in the expansion and repeated metavariables can only be accessed inside of repetitions in the expansion.

For example, below is a mbe macro which formats each element as a string. It matches zero or more comma-separated expressions and expands to an expression that constructs a vector.

macro_rules! vec_strs {
    (
        // Start a repetition:
        $(
            // Each repeat must contain an expression...
            $element:expr
        )
        // ...separated by commas...
        ,
        // ...zero or more times.
        *
    ) => {
        // Enclose the expansion in a block so that we can use
        // multiple statements.
        {
            let mut v = Vec::new();

            // Start a repetition:
            $(
                // Each repeat will contain the following statement, with
                // $element replaced with the corresponding expression.
                v.push(format!("{}", $element));
            )*

            v
        }
    };
}

fn main() {
    let s = vec_strs![1, "a", true, 3.14159f32];
    assert_eq!(s, &["1", "a", "true", "3.14159"]);
}

You can repeat multiple metavariables in a single repetition as long as all metavariables repeat equally often. So this invocation of the following macro works:


#![allow(unused)]
fn main() {
macro_rules! repeat_two {
    ($($i:ident)*, $($i2:ident)*) => {
        $( let $i: (); let $i2: (); )*
    }
}

repeat_two!( a b c d e f, u v w x y z );
}

But this does not:


#![allow(unused)]
fn main() {
macro_rules! repeat_two {
    ($($i:ident)*, $($i2:ident)*) => {
        $( let $i: (); let $i2: (); )*
    }
}

repeat_two!( a b c d e f, x y z );
}

failing with the following error

error: meta-variable `i` repeats 6 times, but `i2` repeats 3 times
 --> src/main.rs:6:10
  |
6 |         $( let $i: (); let $i2: (); )*
  |          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 

For the complete grammar definition you may want to consult the Macros By Example chapter of the Rust reference.

Minutiae

This section goes through some of the finer details of the macro_rules! system. At a minimum, you should try to be at least aware of these details and issues.

Fragment Specifiers

As shown in the macro_rules chapter, Rust, as of 1.53, has 13 fragment specifiers. This section will go a bit more into detail for some of them and shows a few example inputs of what each matcher matches.

Note: Capturing with anything but the ident, lifetime and tt fragments will render the captured AST opaque, making it impossible to further match it with other fragment specifiers in future macro invocations.

block

The block fragment solely matches a block expression, which consists of an opening { brace, followed by any number of statements and finally followed by a closing } brace.

macro_rules! blocks {
    ($($block:block)*) => ();
}

blocks! {
    {}
    {
        let zig;
    }
    { 2 }
}
fn main() {}

expr

The expr fragment matches any kind of expression(Rust has a lot of them, given it is an expression orientated language).

macro_rules! expressions {
    ($($expr:expr)*) => ();
}

expressions! {
    "literal"
    funcall()
    future.await
    break 'foo bar
}
fn main() {}

ident

The ident fragment matches an identifier or keyword.

macro_rules! idents {
    ($($ident:ident)*) => ();
}

idents! {
    // _ <- This is not an ident, it is a pattern
    foo
    async
    O_________O
    _____O_____
}
fn main() {}

item

The item fragment simply matches any of Rust's item definitions, not identifiers that refer to items. This includes visibility modifiers.

macro_rules! items {
    ($($item:item)*) => ();
}

items! {
    struct Foo;
    enum Bar {
        Baz
    }
    impl Foo {}
    pub use crate::foo;
    /*...*/
}
fn main() {}

lifetime

The lifetime fragment matches a lifetime or label. It's quite similar to ident but with a prepended '.

macro_rules! lifetimes {
    ($($lifetime:lifetime)*) => ();
}

lifetimes! {
    'static
    'shiv
    '_
}
fn main() {}

literal

The literal fragment matches any literal expression.

macro_rules! literals {
    ($($literal:literal)*) => ();
}

literals! {
    -1
    "hello world"
    2.3
    b'b'
    true
}
fn main() {}

meta

The meta fragment matches an attribute, to be more precise, the contents of an attribute. That is, it will match a simple path, one without generic arguments followed by a delimited token tree or an = followed by a literal expression.

Note: You will usually see this fragment being used in a matcher like #[$meta:meta] or #![$meta:meta] to actually capture an attribute.

macro_rules! metas {
    ($($meta:meta)*) => ();
}

metas! {
    ASimplePath
    super::man
    path = "home"
    foo(bar)
}
fn main() {}

Doc-Comment Fact: Doc-Comments like /// ... and !// ... are actually syntax sugar for attributes! They desugar to #[doc="..."] and #![doc="..."] respectively, meaning you can match on them like with attributes!

pat

The pat fragment matches any kind of pattern, except or-patterns.

macro_rules! patterns {
    ($($pat:pat)*) => ();
}

patterns! {
    "literal"
    _
    0..5
    ref mut PatternsAreNice
}
fn main() {}

path

The path fragment matches a so called TypePath style path. This includes the function style trait forms, Fn() -> ().

macro_rules! paths {
    ($($path:path)*) => ();
}

paths! {
    ASimplePath
    ::A::B::C::D
    G::<eneri>::C
    FnMut(u32) -> ()
}
fn main() {}

stmt

The statement fragment solely matches a statement without its trailing semicolon, unless its an item statement that requires one.

What would be an item statement that requires one? A Unit-Struct would be a simple one, as defining one requires a trailing semicolon.

Let's use a simple example to show exactly what is meant with this. We use a macro that merely emits what it captures:

macro_rules! statements {
    ($($stmt:stmt)*) => ($($stmt)*);
}

fn main() {
    statements! {
        struct Foo;
        fn foo() {}
        let zig = 3
        let zig = 3;
        3
        3;
        if true {} else {}
        {}
    }
}

Expanding this, via the playground for example1, gives us roughly the following:

/* snip */

fn main() {
    struct Foo;
    fn foo() { }
    let zig = 3;
    let zig = 3;
    ;
    3;
    3;
    ;
    if true { } else { }
    { }
}

From this we can tell a few things:

The first you should be able to see immediately is that while the stmt fragment doesn't capture trailing semicolons, it still emits them when required, even if the statement is already followed by one. The simple reason for that is that semicolons on their own are already valid statements which the fragment captures eagerly. So our macro isn't capturing 8 times, but 11! This can be important when doing multiples repetitions and expanding these in one repetition expansion, as the repetition numbers have to match in those cases.

Another thing you should be able to notice here is that the trailing semicolon of the struct Foo; item statement is being matched, otherwise we would've seen an extra one like in the other cases. This makes sense as we already said, that for item statements that require one, the trailing semicolon will be matched with.

A last observation is that expressions get emitted back with a trailing semicolon, unless the expression solely consists of only a block expression or control flow expression.

The fine details of what was just mentioned here can be looked up in the reference.

Fortunately, these fine details here are usually not of importance whatsoever, with the small exception that was mentioned earlier in regards to repetitions which by itself shouldn't be a common problem to run into.

1

See the debugging chapter for tips on how to do this.

tt

The tt fragment matches a TokenTree. If you need a refresher on what exactly a TokenTree was you may want to revisit the TokenTree chapter of this book. The tt fragment is one of the most powerful fragments, as it can match nearly anything while still allowing you to inspect the contents of it at a later state in the macro.

This allows one to make use of very powerful patterns like the tt-muncher or the push-down-accumulator.

ty

The ty fragment matches any kind of type expression.

macro_rules! types {
    ($($type:ty)*) => ();
}

types! {
    foo::bar
    bool
    [u8]
    impl IntoIterator<Item = u32>
}
fn main() {}

vis

The vis fragment matches a possibly empty Visibility qualifier. Emphasis lies on the possibly empty part. You can kind of think of this fragment as having an implicit ? repetition to it, meaning you don't, and in fact cannot, wrap it in a direct repetition, though this only applies for matching. When expanding this fragment you expand it without a $(...)? repetition block.

macro_rules! visibilities {
    //         ∨~~Note this comma, since we cannot repeat a `vis` fragment on its own
    ($($vis:vis,)*) => ();
}

visibilities! {
    , // no vis is fine, as its implicitly
    pub,
    pub(crate),
    pub(in super),
    pub(in some_path),
}
fn main() {}

Metavariables and Expansion Redux

Once the parser begins consuming tokens for a metavariable, it cannot stop or backtrack. This means that the second rule of the following macro cannot ever match, no matter what input is provided:

macro_rules! dead_rule {
    ($e:expr) => { ... };
    ($i:ident +) => { ... };
}

Consider what happens if this macro is invoked as dead_rule!(x+). The interpreter will start at the first rule, and attempt to parse the input as an expression. The first token x is valid as an expression. The second token is also valid in an expression, forming a binary addition node.

At this point, given that there is no right-hand side of the addition, you might expect the parser to give up and try the next rule. Instead, the parser will panic and abort the entire compilation, citing a syntax error.

As such, it is important in general that you write macro rules from most-specific to least-specific.

To defend against future syntax changes altering the interpretation of macro input, macro_rules! restricts what can follow various metavariables. The complete list, showing what may follow what fragment specifier, as of Rust 1.46 is as follows:

  • stmt and expr: =>, ,, or ;
  • pat: =>, ,, =, |, if, in
  • path and ty:=>, ,, =, |, ;, :, >, >>, [, {, as, where, or a macro variable of the block fragment specifier.
  • vis: ,, an identifier other than a non-raw priv, any token that can begin a type or a metavariable with an ident, ty, or path fragment specifier.
  • All other fragment specifiers have no restrictions.

Repetitions also adhere to these restrictions, meaning if a repetition can repeat multiple times(* or +), then the contents must be able to follow themselves. If a repetition can repeat zero times (? or *) then what comes after the repetition must be able to follow what comes before.

The parser also does not perform any kind of lookahead. That means if the compiler cannot unambiguously determine how to parse the macro invocation one token at a time, it will abort with an ambiguity error. A simple example that triggers this:


#![allow(unused)]
fn main() {
macro_rules! ambiguity {
    ($($i:ident)* $i2:ident) => { };
}

// error:
//    local ambiguity: multiple parsing options: built-in NTs ident ('i') or ident ('i2').
ambiguity!(an_identifier);
}

The parser does not look ahead past the identifier to see if the following token is a ), which would allow it to parse properly.

One aspect of substitution that often surprises people is that substitution is not token-based, despite very much looking like it.

Consider the following:

macro_rules! capture_then_match_tokens {
    ($e:expr) => {match_tokens!($e)};
}

macro_rules! match_tokens {
    ($a:tt + $b:tt) => {"got an addition"};
    (($i:ident)) => {"got an identifier"};
    ($($other:tt)*) => {"got something else"};
}

fn main() {
    println!("{}\n{}\n{}\n",
        match_tokens!((caravan)),
        match_tokens!(3 + 6),
        match_tokens!(5));
    println!("{}\n{}\n{}",
        capture_then_match_tokens!((caravan)),
        capture_then_match_tokens!(3 + 6),
        capture_then_match_tokens!(5));
}

The output is:

got an identifier
got an addition
got something else

got something else
got something else
got something else

By parsing the input into an AST node, the substituted result becomes un-destructible; i.e. you cannot examine the contents or match against it ever again.

Here is another example which can be particularly confusing:

macro_rules! capture_then_what_is {
    (#[$m:meta]) => {what_is!(#[$m])};
}

macro_rules! what_is {
    (#[no_mangle]) => {"no_mangle attribute"};
    (#[inline]) => {"inline attribute"};
    ($($tts:tt)*) => {concat!("something else (", stringify!($($tts)*), ")")};
}

fn main() {
    println!(
        "{}\n{}\n{}\n{}",
        what_is!(#[no_mangle]),
        what_is!(#[inline]),
        capture_then_what_is!(#[no_mangle]),
        capture_then_what_is!(#[inline]),
    );
}

The output is:

no_mangle attribute
inline attribute
something else (#[no_mangle])
something else (#[inline])

The only way to avoid this is to capture using the tt, ident or lifetime kinds. Once you capture with anything else, the only thing you can do with the result from then on is substitute it directly into the output.

Hygiene

macro_rules! macros in Rust are partially hygienic, also called mixed hygiene. Specifically, they are hygienic when it comes to local variables, labels and $crate, but nothing else.

Hygiene works by attaching an invisible "syntax context" value to all identifiers. When two identifiers are compared, both the identifiers' textual names and syntax contexts must be identical for the two to be considered equal.

To illustrate this, consider the following code:

macro_rules! using_a {
    ($e:expr) => {
        {
            let a = 42;
            $e
        }
    }
}

let four = using_a!(a / 10);

We will use the background colour to denote the syntax context. Now, let's expand the macro invocation:

let four = {
    let a = 42;
    a / 10
};

First, recall that macro_rules! invocations effectively disappear during expansion.

Second, if you attempt to compile this code, the compiler will respond with something along the following lines:

error[E0425]: cannot find value `a` in this scope
  --> src/main.rs:13:21
   |
13 | let four = using_a!(a / 10);
   |                     ^ not found in this scope

Note that the background colour (i.e. syntax context) for the expanded macro changes as part of expansion. Each macro_rules! macro expansion is given a new, unique syntax context for its contents. As a result, there are two different as in the expanded code: one in the first syntax context, the second in the other. In other words, a is not the same identifier as a, however similar they may appear.

That said, tokens that were substituted into the expanded output retain their original syntax context (by virtue of having been provided to the macro as opposed to being part of the macro itself). Thus, the solution is to modify the macro as follows:

macro_rules! using_a {
    ($a:ident, $e:expr) => {
        {
            let $a = 42;
            $e
        }
    }
}

let four = using_a!(a, a / 10);

Which, upon expansion becomes:

let four = {
    let a = 42;
    a / 10
};

The compiler will accept this code because there is only one a being used.

$crate

Hygiene is also the reason that we need the $crate metavariable when our macro needs access to other items in the defining crate. What this special metavariable does is that it expands to an absolute path to the defining crate.

//// Definitions in the `helper_macro` crate.
#[macro_export]
macro_rules! helped {
    // () => { helper!() } // This might lead to an error due to 'helper' not being in scope.
    () => { $crate::helper!() }
}

#[macro_export]
macro_rules! helper {
    () => { () }
}

//// Usage in another crate.
// Note that `helper_macro::helper` is not imported!
use helper_macro::helped;

fn unit() {
   // but it still works due to `$crate` properly expanding to the crate path `helper_macro`
   helped!();
}

Note that, because $crate refers to the current crate, it must be used with a fully qualified module path when referring to non-macro items:


#![allow(unused)]
fn main() {
pub mod inner {
    #[macro_export]
    macro_rules! call_foo {
        () => { $crate::inner::foo() };
    }

    pub fn foo() {}
}
}

Non-Identifier Identifiers

There are two tokens which you are likely to run into eventually that look like identifiers, but aren't. Except when they are.

First is self. This is very definitely a keyword. However, it also happens to fit the definition of an identifier. In regular Rust code, there's no way for self to be interpreted as an identifier, but it can happen with macro_rules! macros:

macro_rules! what_is {
    (self) => {"the keyword `self`"};
    ($i:ident) => {concat!("the identifier `", stringify!($i), "`")};
}

macro_rules! call_with_ident {
    ($c:ident($i:ident)) => {$c!($i)};
}

fn main() {
    println!("{}", what_is!(self));
    println!("{}", call_with_ident!(what_is(self)));
}

The above outputs:

the keyword `self`
the keyword `self`

But that makes no sense; call_with_ident! required an identifier, matched one, and substituted it! So self is both a keyword and not a keyword at the same time. You might wonder how this is in any way important. Take this example:

macro_rules! make_mutable {
    ($i:ident) => {let mut $i = $i;};
}

struct Dummy(i32);

impl Dummy {
    fn double(self) -> Dummy {
        make_mutable!(self);
        self.0 *= 2;
        self
    }
}

fn main() {
    println!("{:?}", Dummy(4).double().0);
}

This fails to compile with:

error: `mut` must be followed by a named binding
 --> src/main.rs:2:24
  |
2 |     ($i:ident) => {let mut $i = $i;};
  |                        ^^^^^^ help: remove the `mut` prefix: `self`
...
9 |         make_mutable!(self);
  |         -------------------- in this macro invocation
  |
  = note: `mut` may be followed by `variable` and `variable @ pattern`

So the macro will happily match self as an identifier, allowing you to use it in cases where you can't actually use it. But, fine; it somehow remembers that self is a keyword even when it's an identifier, so you should be able to do this, right?

macro_rules! make_self_mutable {
    ($i:ident) => {let mut $i = self;};
}

struct Dummy(i32);

impl Dummy {
    fn double(self) -> Dummy {
        make_self_mutable!(mut_self);
        mut_self.0 *= 2;
        mut_self
    }
}

fn main() {
    println!("{:?}", Dummy(4).double().0);
}

This fails with:

error[E0424]: expected value, found module `self`
  --> src/main.rs:2:33
   |
2  |       ($i:ident) => {let mut $i = self;};
   |                                   ^^^^ `self` value is a keyword only available in methods with a `self` parameter
...
8  | /     fn double(self) -> Dummy {
9  | |         make_self_mutable!(mut_self);
   | |         ----------------------------- in this macro invocation
10 | |         mut_self.0 *= 2;
11 | |         mut_self
12 | |     }
   | |_____- this function has a `self` parameter, but a macro invocation can only access identifiers it receives from parameters
   |

Now the compiler thinks we refer to our module with self, but that doesn't make sense. We already have a self right there, in the function signature which is definitely not a module. It's almost like it's complaining that the self it's trying to use isn't the same self... as though the self keyword has hygiene, like an... identifier.

macro_rules! double_method {
    ($body:expr) => {
        fn double(mut self) -> Dummy {
            $body
        }
    };
}

struct Dummy(i32);

impl Dummy {
    double_method! {{
        self.0 *= 2;
        self
    }}
}

fn main() {
    println!("{:?}", Dummy(4).double().0);
}

Same error. What about...

macro_rules! double_method {
    ($self_:ident, $body:expr) => {
        fn double(mut $self_) -> Dummy {
            $body
        }
    };
}

struct Dummy(i32);

impl Dummy {
    double_method! {self, {
        self.0 *= 2;
        self
    }}
}

fn main() {
    println!("{:?}", Dummy(4).double().0);
}

At last, this works. So self is both a keyword and an identifier when it feels like it. Surely this works for other, similar constructs, right?

macro_rules! double_method {
    ($self_:ident, $body:expr) => {
        fn double($self_) -> Dummy {
            $body
        }
    };
}

struct Dummy(i32);

impl Dummy {
    double_method! {_, 0}
}

fn main() {
    println!("{:?}", Dummy(4).double().0);
}
error: no rules expected the token `_`
  --> src/main.rs:12:21
   |
1  | macro_rules! double_method {
   | -------------------------- when calling this macro
...
12 |     double_method! {_, 0}
   |                     ^ no rules expected this token in macro call

No, of course not. _ is a keyword that is valid in patterns and expressions, but somehow isn't an identifier like the keyword self is, despite matching the definition of an identifier just the same.

You might think you can get around this by using $self_:pat instead; that way, _ will match! Except, no, because self isn't a pattern. Joy.

The only work around for this (in cases where you want to accept some combination of these tokens) is to use a tt matcher instead.

Debugging

Note: This is a list of debugging tools specifically tailored towards declarative macros, additional means of debugging these can be found in the debugging chapter of syntax extensions.

One of the most useful is trace_macros!, which is a directive to the compiler instructing it to dump every macro_rules! macro invocation prior to expansion. For example, given the following:

// Note: make sure to use a nightly channel compiler.
#![feature(trace_macros)]

macro_rules! each_tt {
    () => {};
    ($_tt:tt $($rest:tt)*) => {each_tt!($($rest)*);};
}

each_tt!(foo bar baz quux);
trace_macros!(true);
each_tt!(spim wak plee whum);
trace_macros!(false);
each_tt!(trom qlip winp xod);

fn main() {}

The output is:

note: trace_macro
  --> src/main.rs:11:1
   |
11 | each_tt!(spim wak plee whum);
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: expanding `each_tt! { spim wak plee whum }`
   = note: to `each_tt ! (wak plee whum) ;`
   = note: expanding `each_tt! { wak plee whum }`
   = note: to `each_tt ! (plee whum) ;`
   = note: expanding `each_tt! { plee whum }`
   = note: to `each_tt ! (whum) ;`
   = note: expanding `each_tt! { whum }`
   = note: to `each_tt ! () ;`
   = note: expanding `each_tt! {  }`
   = note: to ``

This is particularly invaluable when debugging deeply recursive macro_rules! macros. You can also enable this from the command-line by adding -Z trace-macros to the compiler command line.

Secondly, there is log_syntax! which causes the compiler to output all tokens passed to it. For example, this makes the compiler sing a song:

// Note: make sure to use a nightly channel compiler.
#![feature(log_syntax)]

macro_rules! sing {
    () => {};
    ($tt:tt $($rest:tt)*) => {log_syntax!($tt); sing!($($rest)*);};
}

sing! {
    ^ < @ < . @ *
    '\x08' '{' '"' _ # ' '
    - @ '$' && / _ %
    ! ( '\t' @ | = >
    ; '\x08' '\'' + '$' ? '\x7f'
    , # '"' ~ | ) '\x07'
}

fn main() {}

This can be used to do slightly more targeted debugging than trace_macros!.

Another amazing tool is lukaslueg's macro_railroad, a tool that allows you visualize and generate syntax diagrams for Rust's macro_rules! macros. It visualizes the accepted macro's grammar as an automata.

Scoping

The way in which mbe macros are scoped can be somewhat unintuitive. They use two forms of scopes: textual scope, and path-based scope.

When such a macro is invoked by an unqualified identifier(an identifier that isn't part of a multi-segment path), it is first looked up in textual scoping and then in path-based scoping should the first lookup not yield any results. If it is invoked by a qualified identifier it will skip the textual scoping lookup and instead only do a look up in the path-based scoping.

Textual Scope

Firstly, unlike everything else in the language, function-like macros will remain visible in sub-modules.

macro_rules! X { () => {}; }
mod a {
    X!(); // defined
}
mod b {
    X!(); // defined
}
mod c {
    X!(); // defined
}
fn main() {}

Note: In these examples, remember that all of them have the same behavior when the module contents are in separate files.

Secondly, also unlike everything else in the language, macro_rules! macros are only accessible after their definition. Also note that this example demonstrates how macro_rules! macros do not "leak" out of their defining scope:

mod a {
    // X!(); // undefined
}
mod b {
    // X!(); // undefined
    macro_rules! X { () => {}; }
    X!(); // defined
}
mod c {
    // X!(); // undefined
}
fn main() {}

To be clear, this lexical order dependency applies even if you move the macro to an outer scope:

mod a {
    // X!(); // undefined
}
macro_rules! X { () => {}; }
mod b {
    X!(); // defined
}
mod c {
    X!(); // defined
}
fn main() {}

However, this dependency does not apply to macros themselves:

mod a {
    // X!(); // undefined
}
macro_rules! X { () => { Y!(); }; }
mod b {
    // X!(); // defined, but Y! is undefined
}
macro_rules! Y { () => {}; }
mod c {
    X!(); // defined, and so is Y!
}
fn main() {}

Defining macro_rules! macros multiple times is allowed and the most recent declaration will simply shadow previous ones unless it has gone out of scope.


#![allow(unused)]
fn main() {
macro_rules! X { (1) => {}; }
X!(1);
macro_rules! X { (2) => {}; }
// X!(1); // Error: no rule matches `1`
X!(2);

mod a {
    macro_rules! X { (3) => {}; }
    // X!(2); // Error: no rule matches `2`
    X!(3);
}
// X!(3); // Error: no rule matches `3`
X!(2);

}

macro_rules! macros can be exported from a module using the #[macro_use] attribute. Using this on a module is similar to saying that you do not want to have the module's macro's scope end with the module.

mod a {
    // X!(); // undefined
}
#[macro_use]
mod b {
    macro_rules! X { () => {}; }
    X!(); // defined
}
mod c {
    X!(); // defined
}
fn main() {}

Note that this can interact in somewhat bizarre ways due to the fact that identifiers in a macro_rules! macro (including other macros) are only resolved upon expansion:

mod a {
    // X!(); // undefined
}
#[macro_use]
mod b {
    macro_rules! X { () => { Y!(); }; }
    // X!(); // defined, but Y! is undefined
}
macro_rules! Y { () => {}; }
mod c {
    X!(); // defined, and so is Y!
}
fn main() {}

Another complication is that #[macro_use] applied to an extern crate does not behave this way: such declarations are effectively hoisted to the top of the module. Thus, assuming X! is defined in an external crate called mac, the following holds:

mod a {
    // X!(); // defined, but Y! is undefined
}
macro_rules! Y { () => {}; }
mod b {
    X!(); // defined, and so is Y!
}
#[macro_use] extern crate macs;
mod c {
    X!(); // defined, and so is Y!
}
fn main() {}

Finally, note that these scoping behaviors apply to functions as well, with the exception of #[macro_use] (which isn't applicable):

macro_rules! X {
    () => { Y!() };
}

fn a() {
    macro_rules! Y { () => {"Hi!"} }
    assert_eq!(X!(), "Hi!");
    {
        assert_eq!(X!(), "Hi!");
        macro_rules! Y { () => {"Bye!"} }
        assert_eq!(X!(), "Bye!");
    }
    assert_eq!(X!(), "Hi!");
}

fn b() {
    macro_rules! Y { () => {"One more"} }
    assert_eq!(X!(), "One more");
}

fn main() {
    a();
    b();
}

These scoping rules are why a common piece of advice is to place all macro_rules! macros which should be accessible "crate wide" at the very top of your root module, before any other modules. This ensures they are available consistently. This also applies to mod definitions for files, as in:

#[macro_use]
mod some_mod_that_defines_macros;
mod some_mod_that_uses_those_macros;

The order here is important, swap the declaration order and it won't compile.

Path-Based Scope

By default, a macro_rules! macro has no path-based scope. However, if it has the #[macro_export] attribute, then it is declared in the crate root scope and can be referred to similar to how you refer to any other item. The Import and Export chapter goes more in-depth into said attribute.

Import and Export

Importing macro_rules! macros differs between the two Rust Editions, 2015 and 2018. It is recommended to read both parts nevertheless, as the 2018 Edition can still use the constructs that are explained in the 2015 Edition.

Edition 2015

In Edition 2015 you have to use the #[macro_use] attribute that has already been introduced in the scoping chapter. This can be applied to either modules or external crates. For example:

#[macro_use]
mod macros {
    macro_rules! X { () => { Y!(); } }
    macro_rules! Y { () => {} }
}

X!();

fn main() {}

macro_rules! macros can be exported from the current crate using #[macro_export]. Note that this ignores all visibility.

Given the following definition for a library package macs:

mod macros {
    #[macro_export] macro_rules! X { () => { Y!(); } }
    #[macro_export] macro_rules! Y { () => {} }
}

// X! and Y! are *not* defined here, but *are* exported,
// despite `macros` being private.

The following code will work as expected:

X!(); // X is defined
#[macro_use] extern crate macs;
X!();

fn main() {}

This works, as said in the scoping chapter, because #[macro_use] works slightly different on extern crates, as it basically hoists the exported macros out of the crate to the top of the module.

Note: you can only #[macro_use] an external crate from the root module.

Finally, when importing macro_rules! macros from an external crate, you can control which macros you import. You can use this to limit namespace pollution, or to override specific macros, like so:

// Import *only* the `X!` macro.
#[macro_use(X)] extern crate macs;

// X!(); // X is defined, but Y! is undefined

macro_rules! Y { () => {} }

X!(); // X is defined, and so is Y!

fn main() {}

When exporting macro_rules! macros, it is often useful to refer to non-macro symbols in the defining crate. Because crates can be renamed, there is a special substitution variable available: $crate. This will always expand to an absolute path prefix to the containing crate(e.g. :: macs).

Note that unless your compiler version is >= 1.30, this does not work for macro_rules! macros, since macro_rules! macros do not interact with regular name resolution in any way. Otherwise, you cannot use something like $crate::Y! to refer to a particular macro within your crate. The implication, combined with selective imports via #[macro_use] is that there is currently no way to guarantee any given macro will be available when imported by another crate.

It is recommended that you always use absolute paths to non-macro names, to avoid conflicts, including names in the standard library.

Edition 2018

The 2018 Edition made our lives a lot easier when it comes to macro_rules! macros. Why you ask? Quite simply because it managed to make them feel more like proper items than some special thing in the language. What this means is that we can properly import and use them in a namespaced fashion!

So instead of using #[macro_use] to import every exported macro from a crate into the global namespace we can now do the following:

use some_crate::some_macro;

fn main() {
    some_macro!("hello");
    // as well as
    some_crate::some_other_macro!("macro world");
}

Unfortunately, this only applies for external crates, if you use macro_rules! macros that you have defined in your own crate you are still required to go with #[macro_use] on the defining modules. So scoping applies there the same way as before as well.

The $crate prefix works in this version for everything, macros and items alike since this Edition came out with Rust 1.31.

Macros, A Practical Introduction

This chapter will introduce Rust's declarative Macro-By-Example system using a relatively simple, practical example. It does not attempt to explain all of the intricacies of the system; its goal is to get you comfortable with how and why macros are written.

There is also the Macros chapter of the Rust Book which is another high-level explanation, and the methodical introduction chapter of this book, which explains the macro system in detail.

A Little Context

Note: don't panic! What follows is the only math that will be talked about. You can quite safely skip this section if you just want to get to the meat of the article.

If you aren't familiar, a recurrence relation is a sequence where each value is defined in terms of one or more previous values, with one or more initial values to get the whole thing started. For example, the Fibonacci sequence can be defined by the relation:

\[F_{n} = 0, 1, ..., F_{n-1} + F_{n-2}\]

Thus, the first two numbers in the sequence are 0 and 1, with the third being \( F_{0} + F_{1} = 0 + 1 = 1\), the fourth \( F_{1} + F_{2} = 1 + 1 = 2\), and so on forever.

Now, because such a sequence can go on forever, that makes defining a fibonacci function a little tricky, since you obviously don't want to try returning a complete vector. What you want is to return something which will lazily compute elements of the sequence as needed.

In Rust, that means producing an Iterator. This is not especially hard, but there is a fair amount of boilerplate involved: you need to define a custom type, work out what state needs to be stored in it, then implement the Iterator trait for it.

However, recurrence relations are simple enough that almost all of these details can be abstracted out with a little macro_rules! macro-based code generation.

So, with all that having been said, let's get started.

Construction

Usually, when working on a new macro_rules! macro, the first thing I do is decide what the invocation should look like. In this specific case, my first attempt looked like this:

let fib = recurrence![a[n] = 0, 1, ..., a[n-1] + a[n-2]];

for e in fib.take(10) { println!("{}", e) }

From that, we can take a stab at how the macro_rules! macro should be defined, even if we aren't sure of the actual expansion. This is useful because if you can't figure out how to parse the input syntax, then maybe you need to change it.

macro_rules! recurrence {
    ( a[n] = $($inits:expr),+ , ... , $recur:expr ) => { /* ... */ };
}
fn main() {}

Assuming you aren't familiar with the syntax, allow me to elucidate. This is defining a syntax extension, using the macro_rules! system, called recurrence!. This macro_rules! macro has a single parsing rule. That rule says the input to the invocation must match:

  • the literal token sequence a [ n ] =,
  • a repeating (the $( ... )) sequence, using , as a separator, and one or more (+) repeats of:
    • a valid expression captured into the metavariable inits ($inits:expr)
  • the literal token sequence , ... ,,
  • a valid expression captured into the metavariable recur ($recur:expr).

Finally, the rule says that if the input matches this rule, then the invocation should be replaced by the token sequence /* ... */.

It's worth noting that inits, as implied by the name, actually contains all the expressions that match in this position, not just the first or last. What's more, it captures them as a sequence as opposed to, say, irreversibly pasting them all together. Also note that you can do "zero or more" with a repetition by using * instead of + and even optional, "zero or one" with ?. There is no support for "zero or one" or more specific numbers of repetitions.

As an exercise, let's take the proposed input and feed it through the rule, to see how it is processed. The "Position" column will show which part of the syntax pattern needs to be matched against next, denoted by a "⌂". Note that in some cases, there might be more than one possible "next" element to match against. "Input" will contain all of the tokens that have not been consumed yet. inits and recur will contain the contents of those bindings.

Position Input inits recur
a[n] = $($inits:expr),+ , ... , $recur:expr a[n] = 0, 1, ..., a[n-1] + a[n-2]
a[n] = $($inits:expr),+ , ... , $recur:expr [n] = 0, 1, ..., a[n-1] + a[n-2]
a[n] = $($inits:expr),+ , ... , $recur:expr n] = 0, 1, ..., a[n-1] + a[n-2]
a[n] = $($inits:expr),+ , ... , $recur:expr ] = 0, 1, ..., a[n-1] + a[n-2]
a[n] = $($inits:expr),+ , ... , $recur:expr = 0, 1, ..., a[n-1] + a[n-2]
a[n] = $($inits:expr),+ , ... , $recur:expr 0, 1, ..., a[n-1] + a[n-2]
a[n] = $($inits:expr),+ , ... , $recur:expr 0, 1, ..., a[n-1] + a[n-2]
a[n] = $($inits:expr),+ , ... , $recur:expr ⌂ ⌂ , 1, ..., a[n-1] + a[n-2] 0
Note: there are two ⌂ here, because the next input token might match either the comma separator betweenem> elements in the repetition, or the comma after the repetition. The macro system will keep track of both possibilities, until it is able to decide which one to follow.
a[n] = $($inits:expr),+ , ... , $recur:expr ⌂ ⌂ 1, ..., a[n-1] + a[n-2] 0
a[n] = $($inits:expr),+ , ... , $recur:expr ⌂ ⌂ , ..., a[n-1] + a[n-2] 0, 1
Note: the third, crossed-out marker indicates that the macro system has, as a consequence of the last token consumed, eliminated one of the previous possible branches.
a[n] = $($inits:expr),+ , ... , $recur:expr ⌂ ⌂ ..., a[n-1] + a[n-2] 0, 1
a[n] = $($inits:expr),+ , ... , $recur:expr , a[n-1] + a[n-2] 0, 1
a[n] = $($inits:expr),+ , ... , $recur:expr a[n-1] + a[n-2] 0, 1
a[n] = $($inits:expr),+ , ... , $recur:expr 0, 1 a[n-1] + a[n-2]
Note: this particular step should make it clear that a binding like $recur:expr will consume an entire expression, using the compiler's knowledge of what constitutes a valid expression. As will be noted later, you can do this for other language constructs, too.

The key take-away from this is that the macro system will try to incrementally match the tokens provided as input to the macro against the provided rules. We'll come back to the "try" part.

Now, let's begin writing the final, fully expanded form. For this expansion, I was looking for something like:

let fib = {
    struct Recurrence {
        mem: [u64; 2],
        pos: usize,
    }

This will be the actual iterator type. mem will be the memo buffer to hold the last few values so the recurrence can be computed. pos is to keep track of the value of n.

Aside: I've chosen u64 as a "sufficiently large" type for the elements of this sequence. Don't worry about how this will work out for other sequences; we'll come to it.

    impl Iterator for Recurrence {
        type Item = u64;

        fn next(&mut self) -> Option<Self::Item> {
            if self.pos < 2 {
                let next_val = self.mem[self.pos];
                self.pos += 1;
                Some(next_val)

We need a branch to yield the initial values of the sequence; nothing tricky.

            } else {
                let a = /* something */;
                let n = self.pos;
                let next_val = a[n-1] + a[n-2];

                self.mem.TODO_shuffle_down_and_append(next_val);

                self.pos += 1;
                Some(next_val)
            }
        }
    }

This is a bit harder; we'll come back and look at how exactly to define a. Also, TODO_shuffle_down_and_append is another placeholder; I want something that places next_val on the end of the array, shuffling the rest down by one space, dropping the 0th element.


    Recurrence { mem: [0, 1], pos: 0 }
};

for e in fib.take(10) { println!("{}", e) }

Lastly, return an instance of our new structure, which can then be iterated over. To summarize, the complete expansion is:

let fib = {
    struct Recurrence {
        mem: [u64; 2],
        pos: usize,
    }

    impl Iterator for Recurrence {
        type Item = u64;

        fn next(&mut self) -> Option<u64> {
            if self.pos < 2 {
                let next_val = self.mem[self.pos];
                self.pos += 1;
                Some(next_val)
            } else {
                let a = /* something */;
                let n = self.pos;
                let next_val = (a[n-1] + a[n-2]);

                self.mem.TODO_shuffle_down_and_append(next_val.clone());

                self.pos += 1;
                Some(next_val)
            }
        }
    }

    Recurrence { mem: [0, 1], pos: 0 }
};

for e in fib.take(10) { println!("{}", e) }

Aside: Yes, this does mean we're defining a different Recurrence struct and its implementation for each invocation. Most of this will optimise away in the final binary.

It's also useful to check your expansion as you're writing it. If you see anything in the expansion that needs to vary with the invocation, but isn't in the actual accepted syntax of our macro, you should work out where to introduce it. In this case, we've added u64, but that's not necessarily what the user wants, nor is it in the macro syntax. So let's fix that.

macro_rules! recurrence {
    ( a[n]: $sty:ty = $($inits:expr),+ , ... , $recur:expr ) => { /* ... */ };
}

/*
let fib = recurrence![a[n]: u64 = 0, 1, ..., a[n-1] + a[n-2]];

for e in fib.take(10) { println!("{}", e) }
*/
fn main() {}

Here, I've added a new metavariable: sty which should be a type.

Aside: if you're wondering, the bit after the colon in a metavariable can be one of several kinds of syntax matchers. The most common ones are item, expr, and ty. A complete explanation can be found in Macros, A Methodical Introduction; macro_rules! (Matchers).

There's one other thing to be aware of: in the interests of future-proofing the language, the compiler restricts what tokens you're allowed to put after a matcher, depending on what kind it is. Typically, this comes up when trying to match expressions or statements; those can only be followed by one of =>, ,, and ;.

A complete list can be found in Macros, A Methodical Introduction; Minutiae; Metavariables and Expansion Redux.

Indexing and Shuffling

I will skim a bit over this part, since it's effectively tangential to the macro-related stuff. We want to make it so that the user can access previous values in the sequence by indexing a; we want it to act as a sliding window keeping the last few (in this case, 2) elements of the sequence.

We can do this pretty easily with a wrapper type:

struct IndexOffset<'a> {
    slice: &'a [u64; 2],
    offset: usize,
}

impl<'a> Index<usize> for IndexOffset<'a> {
    type Output = u64;

    fn index<'b>(&'b self, index: usize) -> &'b u64 {
        use std::num::Wrapping;

        let index = Wrapping(index);
        let offset = Wrapping(self.offset);
        let window = Wrapping(2);

        let real_index = index - offset + window;
        &self.slice[real_index.0]
    }
}

Aside: since lifetimes come up a lot with people new to Rust, a quick explanation: 'a and 'b are lifetime parameters that are used to track where a reference (i.e. a borrowed pointer to some data) is valid. In this case, IndexOffset borrows a reference to our iterator's data, so it needs to keep track of how long it's allowed to hold that reference for, using 'a.

'b is used because the Index::index function (which is how subscript syntax is actually implemented) is also parameterized on a lifetime, on account of returning a borrowed reference. 'a and 'b are not necessarily the same thing in all cases. The borrow checker will make sure that even though we don't explicitly relate 'a and 'b to one another, we don't accidentally violate memory safety.

This changes the definition of a to:

let a = IndexOffset { slice: &self.mem, offset: n };

The only remaining question is what to do about TODO_shuffle_down_and_append. I wasn't able to find a method in the standard library with exactly the semantics I wanted, but it isn't hard to do by hand.

{
    use std::mem::swap;

    let mut swap_tmp = next_val;
    for i in (0..2).rev() {
        swap(&mut swap_tmp, &mut self.mem[i]);
    }
}

This swaps the new value into the end of the array, swapping the other elements down one space.

Aside: doing it this way means that this code will work for non-copyable types, as well.

The working code thus far now looks like this:

macro_rules! recurrence {
    ( a[n]: $sty:ty = $($inits:expr),+ , ... , $recur:expr ) => { /* ... */ };
}

fn main() {
    /*
    let fib = recurrence![a[n]: u64 = 0, 1, ..., a[n-1] + a[n-2]];

    for e in fib.take(10) { println!("{}", e) }
    */
    let fib = {
        use std::ops::Index;

        struct Recurrence {
            mem: [u64; 2],
            pos: usize,
        }

        struct IndexOffset<'a> {
            slice: &'a [u64; 2],
            offset: usize,
        }

        impl<'a> Index<usize> for IndexOffset<'a> {
            type Output = u64;

            #[inline(always)]
            fn index<'b>(&'b self, index: usize) -> &'b u64 {
                use std::num::Wrapping;

                let index = Wrapping(index);
                let offset = Wrapping(self.offset);
                let window = Wrapping(2);

                let real_index = index - offset + window;
                &self.slice[real_index.0]
            }
        }

        impl Iterator for Recurrence {
            type Item = u64;

            #[inline]
            fn next(&mut self) -> Option<u64> {
                if self.pos < 2 {
                    let next_val = self.mem[self.pos];
                    self.pos += 1;
                    Some(next_val)
                } else {
                    let next_val = {
                        let n = self.pos;
                        let a = IndexOffset { slice: &self.mem, offset: n };
                        (a[n-1] + a[n-2])
                    };

                    {
                        use std::mem::swap;

                        let mut swap_tmp = next_val;
                        for i in (0..2).rev() {
                            swap(&mut swap_tmp, &mut self.mem[i]);
                        }
                    }

                    self.pos += 1;
                    Some(next_val)
                }
            }
        }

        Recurrence { mem: [0, 1], pos: 0 }
    };

    for e in fib.take(10) { println!("{}", e) }
}

Note that I've changed the order of the declarations of n and a, as well as wrapped them(along with the recurrence expression) in a block. The reason for the first should be obvious(n needs to be defined first so I can use it for a). The reason for the second is that the borrowed reference &self.mem will prevent the swaps later on from happening (you cannot mutate something that is aliased elsewhere). The block ensures that the &self.mem borrow expires before then.

Incidentally, the only reason the code that does the mem swaps is in a block is to narrow the scope in which std::mem::swap is available, for the sake of being tidy.

If we take this code and run it, we get:

0
1
1
2
3
5
8
13
21
34

Success! Now, let's copy & paste this into the macro expansion, and replace the expanded code with an invocation. This gives us:

macro_rules! recurrence {
    ( a[n]: $sty:ty = $($inits:expr),+ , ... , $recur:expr ) => {
        {
            /*
                What follows here is *literally* the code from before,
                cut and pasted into a new position. No other changes
                have been made.
            */

            use std::ops::Index;

            struct Recurrence {
                mem: [u64; 2],
                pos: usize,
            }

            struct IndexOffset<'a> {
                slice: &'a [u64; 2],
                offset: usize,
            }

            impl<'a> Index<usize> for IndexOffset<'a> {
                type Output = u64;

                fn index<'b>(&'b self, index: usize) -> &'b u64 {
                    use std::num::Wrapping;

                    let index = Wrapping(index);
                    let offset = Wrapping(self.offset);
                    let window = Wrapping(2);

                    let real_index = index - offset + window;
                    &self.slice[real_index.0]
                }
            }

            impl Iterator for Recurrence {
                type Item = u64;

                fn next(&mut self) -> Option<u64> {
                    if self.pos < 2 {
                        let next_val = self.mem[self.pos];
                        self.pos += 1;
                        Some(next_val)
                    } else {
                        let next_val = {
                            let n = self.pos;
                            let a = IndexOffset { slice: &self.mem, offset: n };
                            (a[n-1] + a[n-2])
                        };

                        {
                            use std::mem::swap;

                            let mut swap_tmp = next_val;
                            for i in (0..2).rev() {
                                swap(&mut swap_tmp, &mut self.mem[i]);
                            }
                        }

                        self.pos += 1;
                        Some(next_val)
                    }
                }
            }

            Recurrence { mem: [0, 1], pos: 0 }
        }
    };
}

fn main() {
    let fib = recurrence![a[n]: u64 = 0, 1, ..., a[n-1] + a[n-2]];

    for e in fib.take(10) { println!("{}", e) }
}

Obviously, we aren't using the metavariables yet, but we can change that fairly easily. However, if we try to compile this, rustc aborts, telling us:

error: local ambiguity: multiple parsing options: built-in NTs expr ('inits') or 1 other option.
  --> src/main.rs:75:45
   |
75 |     let fib = recurrence![a[n]: u64 = 0, 1, ..., a[n-1] + a[n-2]];
   |

Here, we've run into a limitation of the macro_rules system. The problem is that second comma. When it sees it during expansion, macro_rules can't decide if it's supposed to parse another expression for inits, or .... Sadly, it isn't quite clever enough to realise that ... isn't a valid expression, so it gives up. Theoretically, this should work as desired, but currently doesn't.

Aside: I did fib a little about how our rule would be interpreted by the macro system. In general, it should work as described, but doesn't in this case. The macro_rules machinery, as it stands, has its foibles, and its worthwhile remembering that on occasion, you'll need to contort a little to get it to work.

In this particular case, there are two issues. First, the macro system doesn't know what does and does not constitute the various grammar elements (e.g. an expression); that's the parser's job. As such, it doesn't know that ... isn't an expression. Secondly, it has no way of trying to capture a compound grammar element (like an expression) without 100% committing to that capture.

In other words, it can ask the parser to try and parse some input as an expression, but the parser will respond to any problems by aborting. The only way the macro system can currently deal with this is to just try to forbid situations where this could be a problem.

On the bright side, this is a state of affairs that exactly no one is enthusiastic about. The macro keyword has already been reserved for a more rigorously-defined future macro system. Until then, needs must.

Thankfully, the fix is relatively simple: we remove the comma from the syntax. To keep things balanced, we'll remove both commas around ...:

macro_rules! recurrence {
    ( a[n]: $sty:ty = $($inits:expr),+ ... $recur:expr ) => {
//                                     ^~~ changed
        /* ... */
        // Cheat :D
        (vec![0u64, 1, 2, 3, 5, 8, 13, 21, 34]).into_iter()
    };
}

fn main() {
    let fib = recurrence![a[n]: u64 = 0, 1 ... a[n-1] + a[n-2]];
//                                         ^~~ changed

    for e in fib.take(10) { println!("{}", e) }
}

Success! ... or so we thought. Turns out this is being rejected by the compiler nowadays, while it was fine back when this was written. The reason for this is that the compiler now recognizes the ... as a token, and as we know we may only use =>, , or ; after an expression fragment. So unfortunately we are now out of luck as our dreamed up syntax will not work out this way, so let us just choose one that looks the most befitting that we are allowed to use instead, I'd say replacing , with ; works.

macro_rules! recurrence {
    ( a[n]: $sty:ty = $($inits:expr),+ ; ... ; $recur:expr ) => {
//                                     ^~~~~~^ changed
        /* ... */
        // Cheat :D
        (vec![0u64, 1, 2, 3, 5, 8, 13, 21, 34]).into_iter()
    };
}

fn main() {
    let fib = recurrence![a[n]: u64 = 0, 1; ...; a[n-1] + a[n-2]];
//                                        ^~~~~^ changed

    for e in fib.take(10) { println!("{}", e) }
}

Success! But for real this time.

Substitution

Substituting something you've captured in a macro is quite simple; you can insert the contents of a metavariable $sty:ty by using $sty. So, let's go through and fix the u64s:

macro_rules! recurrence {
    ( a[n]: $sty:ty = $($inits:expr),+ ; ... ; $recur:expr ) => {
        {
            use std::ops::Index;

            struct Recurrence {
                mem: [$sty; 2],
//                    ^~~~ changed
                pos: usize,
            }

            struct IndexOffset<'a> {
                slice: &'a [$sty; 2],
//                          ^~~~ changed
                offset: usize,
            }

            impl<'a> Index<usize> for IndexOffset<'a> {
                type Output = $sty;
//                            ^~~~ changed

                #[inline(always)]
                fn index<'b>(&'b self, index: usize) -> &'b $sty {
//                                                          ^~~~ changed
                    use std::num::Wrapping;

                    let index = Wrapping(index);
                    let offset = Wrapping(self.offset);
                    let window = Wrapping(2);

                    let real_index = index - offset + window;
                    &self.slice[real_index.0]
                }
            }

            impl Iterator for Recurrence {
                type Item = $sty;
//                          ^~~~ changed

                #[inline]
                fn next(&mut self) -> Option<$sty> {
//                                           ^~~~ changed
                    /* ... */
                    if self.pos < 2 {
                        let next_val = self.mem[self.pos];
                        self.pos += 1;
                        Some(next_val)
                    } else {
                        let next_val = {
                            let n = self.pos;
                            let a = IndexOffset { slice: &self.mem, offset: n };
                            (a[n-1] + a[n-2])
                        };

                        {
                            use std::mem::swap;

                            let mut swap_tmp = next_val;
                            for i in (0..2).rev() {
                                swap(&mut swap_tmp, &mut self.mem[i]);
                            }
                        }

                        self.pos += 1;
                        Some(next_val)
                    }
                }
            }

            Recurrence { mem: [0, 1], pos: 0 }
        }
    };
}

fn main() {
    let fib = recurrence![a[n]: u64 = 0, 1; ...; a[n-1] + a[n-2]];

    for e in fib.take(10) { println!("{}", e) }
}

Let's tackle a harder one: how to turn inits into both the array literal [0, 1] and the array type, [$sty; 2]. The first one we can do like so:

            Recurrence { mem: [$($inits),+], pos: 0 }
//                             ^~~~~~~~~~~ changed

This effectively does the opposite of the capture: repeat inits one or more times, separating each with a comma. This expands to the expected sequence of tokens: 0, 1.

Somehow turning inits into a literal 2 is a little trickier. It turns out that there's no direct way to do this, but we can do it by using a second macro_rules! macro. Let's take this one step at a time.

macro_rules! count_exprs {
    /* ??? */
    () => {}
}
fn main() {}

The obvious case is: given zero expressions, you would expect count_exprs to expand to a literal 0.

macro_rules! count_exprs {
    () => (0);
//  ^~~~~~~~~~ added
}
fn main() {
    const _0: usize = count_exprs!();
    assert_eq!(_0, 0);
}

Aside: You may have noticed I used parentheses here instead of curly braces for the expansion. macro_rules really doesn't care what you use, so long as it's one of the "matcher" pairs: ( ), { } or [ ]. In fact, you can switch out the matchers on the macro itself(i.e. the matchers right after the macro name), the matchers around the syntax rule, and the matchers around the corresponding expansion.

You can also switch out the matchers used when you invoke a macro, but in a more limited fashion: a macro invoked as { ... } or ( ... ); will always be parsed as an item (i.e. like a struct or fn declaration). This is important when using macros in a function body; it helps disambiguate between "parse like an expression" and "parse like a statement".

What if you have one expression? That should be a literal 1.

macro_rules! count_exprs {
    () => (0);
    ($e:expr) => (1);
//  ^~~~~~~~~~~~~~~~~ added
}
fn main() {
    const _0: usize = count_exprs!();
    const _1: usize = count_exprs!(x);
    assert_eq!(_0, 0);
    assert_eq!(_1, 1);
}

Two?

macro_rules! count_exprs {
    () => (0);
    ($e:expr) => (1);
    ($e0:expr, $e1:expr) => (2);
//  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ added
}
fn main() {
    const _0: usize = count_exprs!();
    const _1: usize = count_exprs!(x);
    const _2: usize = count_exprs!(x, y);
    assert_eq!(_0, 0);
    assert_eq!(_1, 1);
    assert_eq!(_2, 2);
}

We can "simplify" this a little by re-expressing the case of two expressions recursively.

macro_rules! count_exprs {
    () => (0);
    ($e:expr) => (1);
    ($e0:expr, $e1:expr) => (1 + count_exprs!($e1));
//                           ^~~~~~~~~~~~~~~~~~~~~ changed
}
fn main() {
    const _0: usize = count_exprs!();
    const _1: usize = count_exprs!(x);
    const _2: usize = count_exprs!(x, y);
    assert_eq!(_0, 0);
    assert_eq!(_1, 1);
    assert_eq!(_2, 2);
}

This is fine since Rust can fold 1 + 1 into a constant value. What if we have three expressions?

macro_rules! count_exprs {
    () => (0);
    ($e:expr) => (1);
    ($e0:expr, $e1:expr) => (1 + count_exprs!($e1));
    ($e0:expr, $e1:expr, $e2:expr) => (1 + count_exprs!($e1, $e2));
//  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ added
}
fn main() {
    const _0: usize = count_exprs!();
    const _1: usize = count_exprs!(x);
    const _2: usize = count_exprs!(x, y);
    const _3: usize = count_exprs!(x, y, z);
    assert_eq!(_0, 0);
    assert_eq!(_1, 1);
    assert_eq!(_2, 2);
    assert_eq!(_3, 3);
}

Aside: You might be wondering if we could reverse the order of these rules. In this particular case, yes, but the macro system can sometimes be picky about what it is and is not willing to recover from. If you ever find yourself with a multi-rule macro that you swear should work, but gives you errors about unexpected tokens, try changing the order of the rules.

Hopefully, you can see the pattern here. We can always reduce the list of expressions by matching one expression, followed by zero or more expressions, expanding that into 1 + a count.

macro_rules! count_exprs {
    () => (0);
    ($head:expr) => (1);
    ($head:expr, $($tail:expr),*) => (1 + count_exprs!($($tail),*));
//  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ changed
}
fn main() {
    const _0: usize = count_exprs!();
    const _1: usize = count_exprs!(x);
    const _2: usize = count_exprs!(x, y);
    const _3: usize = count_exprs!(x, y, z);
    assert_eq!(_0, 0);
    assert_eq!(_1, 1);
    assert_eq!(_2, 2);
    assert_eq!(_3, 3);
}

JFTE: this is not the only, or even the best way of counting things. You may wish to peruse the Counting section later for a more efficient way.

With this, we can now modify recurrence to determine the necessary size of mem.

// added:
macro_rules! count_exprs {
    () => (0);
    ($head:expr) => (1);
    ($head:expr, $($tail:expr),*) => (1 + count_exprs!($($tail),*));
}

macro_rules! recurrence {
    ( a[n]: $sty:ty = $($inits:expr),+ ; ... ; $recur:expr ) => {
        {
            use std::ops::Index;

            const MEM_SIZE: usize = count_exprs!($($inits),+);
//          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ added

            struct Recurrence {
                mem: [$sty; MEM_SIZE],
//                          ^~~~~~~~ changed
                pos: usize,
            }

            struct IndexOffset<'a> {
                slice: &'a [$sty; MEM_SIZE],
//                                ^~~~~~~~ changed
                offset: usize,
            }

            impl<'a> Index<usize> for IndexOffset<'a> {
                type Output = $sty;

                #[inline(always)]
                fn index<'b>(&'b self, index: usize) -> &'b $sty {
                    use std::num::Wrapping;

                    let index = Wrapping(index);
                    let offset = Wrapping(self.offset);
                    let window = Wrapping(MEM_SIZE);
//                                        ^~~~~~~~ changed

                    let real_index = index - offset + window;
                    &self.slice[real_index.0]
                }
            }

            impl Iterator for Recurrence {
                type Item = $sty;

                #[inline]
                fn next(&mut self) -> Option<$sty> {
                    if self.pos < MEM_SIZE {
//                                ^~~~~~~~ changed
                        let next_val = self.mem[self.pos];
                        self.pos += 1;
                        Some(next_val)
                    } else {
                        let next_val = {
                            let n = self.pos;
                            let a = IndexOffset { slice: &self.mem, offset: n };
                            (a[n-1] + a[n-2])
                        };

                        {
                            use std::mem::swap;

                            let mut swap_tmp = next_val;
                            for i in (0..MEM_SIZE).rev() {
//                                       ^~~~~~~~ changed
                                swap(&mut swap_tmp, &mut self.mem[i]);
                            }
                        }

                        self.pos += 1;
                        Some(next_val)
                    }
                }
            }

            Recurrence { mem: [$($inits),+], pos: 0 }
        }
    };
}
/* ... */

fn main() {
    let fib = recurrence![a[n]: u64 = 0, 1; ...; a[n-1] + a[n-2]];

    for e in fib.take(10) { println!("{}", e) }
}

With that done, we can now substitute the last thing: the recur expression.

macro_rules! count_exprs {
    () => (0);
    ($head:expr $(, $tail:expr)*) => (1 + count_exprs!($($tail),*));
}
macro_rules! recurrence {
    ( a[n]: $sty:ty = $($inits:expr),+ ; ... ; $recur:expr ) => {
        {
            use std::ops::Index;

            const MEM_SIZE: usize = count_exprs!($($inits),+);
            struct Recurrence {
                mem: [$sty; MEM_SIZE],
                pos: usize,
            }
            struct IndexOffset<'a> {
                slice: &'a [$sty; MEM_SIZE],
                offset: usize,
            }
            impl<'a> Index<usize> for IndexOffset<'a> {
                type Output = $sty;

                #[inline(always)]
                fn index<'b>(&'b self, index: usize) -> &'b $sty {
                    use std::num::Wrapping;

                    let index = Wrapping(index);
                    let offset = Wrapping(self.offset);
                    let window = Wrapping(MEM_SIZE);

                    let real_index = index - offset + window;
                    &self.slice[real_index.0]
                }
            }
            impl Iterator for Recurrence {
              type Item = $sty;
/* ... */
                #[inline]
                fn next(&mut self) -> Option<u64> {
                    if self.pos < MEM_SIZE {
                        let next_val = self.mem[self.pos];
                        self.pos += 1;
                        Some(next_val)
                    } else {
                        let next_val = {
                            let n = self.pos;
                            let a = IndexOffset { slice: &self.mem, offset: n };
                            $recur
//                          ^~~~~~ changed
                        };
                        {
                            use std::mem::swap;
                            let mut swap_tmp = next_val;
                            for i in (0..MEM_SIZE).rev() {
                                swap(&mut swap_tmp, &mut self.mem[i]);
                            }
                        }
                        self.pos += 1;
                        Some(next_val)
                    }
                }
/* ... */
            }
            Recurrence { mem: [$($inits),+], pos: 0 }
        }
    };
}
fn main() {
    let fib = recurrence![a[n]: u64 = 1, 1; ...; a[n-1] + a[n-2]];
    for e in fib.take(10) { println!("{}", e) }
}

And, when we compile our finished macro_rules! macro...

error[E0425]: cannot find value `a` in this scope
  --> src/main.rs:68:50
   |
68 |     let fib = recurrence![a[n]: u64 = 1, 1; ...; a[n-1] + a[n-2]];
   |                                                  ^ not found in this scope

error[E0425]: cannot find value `n` in this scope
  --> src/main.rs:68:52
   |
68 |     let fib = recurrence![a[n]: u64 = 1, 1; ...; a[n-1] + a[n-2]];
   |                                                    ^ not found in this scope

error[E0425]: cannot find value `a` in this scope
  --> src/main.rs:68:59
   |
68 |     let fib = recurrence![a[n]: u64 = 1, 1; ...; a[n-1] + a[n-2]];
   |                                                           ^ not found in this scope

error[E0425]: cannot find value `n` in this scope
  --> src/main.rs:68:61
   |
68 |     let fib = recurrence![a[n]: u64 = 1, 1; ...; a[n-1] + a[n-2]];
   |                                                             ^ not found in this scope

... wait, what? That can't be right... let's check what the macro is expanding to.

$ rustc -Z unstable-options --pretty expanded recurrence.rs

The --pretty expanded argument tells rustc to perform macro expansion, then turn the resulting AST back into source code. Because this option isn't considered stable yet, we also need -Z unstable-options. The output (after cleaning up some formatting) is shown below; in particular, note the place in the code where $recur was substituted:

#![feature(no_std)]
#![no_std]
#[prelude_import]
use std::prelude::v1::*;
#[macro_use]
extern crate std as std;
fn main() {
    let fib = {
        use std::ops::Index;
        const MEM_SIZE: usize = 1 + 1;
        struct Recurrence {
            mem: [u64; MEM_SIZE],
            pos: usize,
        }
        struct IndexOffset<'a> {
            slice: &'a [u64; MEM_SIZE],
            offset: usize,
        }
        impl <'a> Index<usize> for IndexOffset<'a> {
            type Output = u64;
            #[inline(always)]
            fn index<'b>(&'b self, index: usize) -> &'b u64 {
                use std::num::Wrapping;
                let index = Wrapping(index);
                let offset = Wrapping(self.offset);
                let window = Wrapping(MEM_SIZE);
                let real_index = index - offset + window;
                &self.slice[real_index.0]
            }
        }
        impl Iterator for Recurrence {
            type Item = u64;
            #[inline]
            fn next(&mut self) -> Option<u64> {
                if self.pos < MEM_SIZE {
                    let next_val = self.mem[self.pos];
                    self.pos += 1;
                    Some(next_val)
                } else {
                    let next_val = {
                        let n = self.pos;
                        let a = IndexOffset{slice: &self.mem, offset: n,};
                        a[n - 1] + a[n - 2]
                    };
                    {
                        use std::mem::swap;
                        let mut swap_tmp = next_val;
                        {
                            let result =
                                match ::std::iter::IntoIterator::into_iter((0..MEM_SIZE).rev()) {
                                    mut iter => loop {
                                        match ::std::iter::Iterator::next(&mut iter) {
                                            ::std::option::Option::Some(i) => {
                                                swap(&mut swap_tmp, &mut self.mem[i]);
                                            }
                                            ::std::option::Option::None => break,
                                        }
                                    },
                                };
                            result
                        }
                    }
                    self.pos += 1;
                    Some(next_val)
                }
            }
        }
        Recurrence{mem: [0, 1], pos: 0,}
    };
    {
        let result =
            match ::std::iter::IntoIterator::into_iter(fib.take(10)) {
                mut iter => loop {
                    match ::std::iter::Iterator::next(&mut iter) {
                        ::std::option::Option::Some(e) => {
                            ::std::io::_print(::std::fmt::Arguments::new_v1(
                                {
                                    static __STATIC_FMTSTR: &'static [&'static str] = &["", "\n"];
                                    __STATIC_FMTSTR
                                },
                                &match (&e,) {
                                    (__arg0,) => [::std::fmt::ArgumentV1::new(__arg0, ::std::fmt::Display::fmt)],
                                }
                            ))
                        }
                        ::std::option::Option::None => break,
                    }
                },
            };
        result
    }
}

But that looks fine! If we add a few missing #![feature(...)] attributes and feed it to a nightly build of rustc, it even compiles! ... what?!

Aside: You can't compile the above with a non-nightly build of rustc. This is because the expansion of the println! macro depends on internal compiler details which are not publicly stabilized.

Being Hygienic

The issue here is that identifiers in Rust syntax extensions are hygienic. That is, identifiers from two different contexts cannot collide. To show the difference, let's take a simpler example.

macro_rules! using_a {
    ($e:expr) => {
        {
            let a = 42i;
            $e
        }
    }
}

let four = using_a!(a / 10);
fn main() {}

This macro simply takes an expression, then wraps it in a block with a variable a defined. We then use this as a round-about way of computing 4. There are actually two syntax contexts involved in this example, but they're invisible. So, to help with this, let's give each context a different colour. Let's start with the unexpanded code, where there is only a single context:

macro_rules! using_a {
    ($e:expr) => {
        {
            let a = 42;
            $e
        }
    }
}

let four = using_a!(a / 10);

Now, let's expand the invocation.

let four = {
    let a = 42;
    a / 10
};

As you can see, the a that's defined by the macro invocation is in a different context to the a we provided in our invocation. As such, the compiler treats them as completely different identifiers, even though they have the same lexical appearance.

This is something to be really careful of when working on macro_rules! macros, syntax extensions in general even: they can produce ASTs which will not compile, but which will compile if written out by hand, or dumped using --pretty expanded.

The solution to this is to capture the identifier with the appropriate syntax context. To do that, we need to again adjust our macro syntax. To continue with our simpler example:

macro_rules! using_a {
    ($a:ident, $e:expr) => {
        {
            let $a = 42;
            $e
        }
    }
}

let four = using_a!(a, a / 10);

This now expands to:

let four = {
    let a = 42;
    a / 10
};

Now, the contexts match, and the code will compile. We can make this adjustment to our recurrence! macro by explicitly capturing a and n. After making the necessary changes, we have:

macro_rules! count_exprs {
    () => (0);
    ($head:expr) => (1);
    ($head:expr, $($tail:expr),*) => (1 + count_exprs!($($tail),*));
}

macro_rules! recurrence {
    ( $seq:ident [ $ind:ident ]: $sty:ty = $($inits:expr),+ ; ... ; $recur:expr ) => {
//    ^~~~~~~~~~   ^~~~~~~~~~ changed
        {
            use std::ops::Index;

            const MEM_SIZE: usize = count_exprs!($($inits),+);

            struct Recurrence {
                mem: [$sty; MEM_SIZE],
                pos: usize,
            }

            struct IndexOffset<'a> {
                slice: &'a [$sty; MEM_SIZE],
                offset: usize,
            }

            impl<'a> Index<usize> for IndexOffset<'a> {
                type Output = $sty;

                #[inline(always)]
                fn index<'b>(&'b self, index: usize) -> &'b $sty {
                    use std::num::Wrapping;

                    let index = Wrapping(index);
                    let offset = Wrapping(self.offset);
                    let window = Wrapping(MEM_SIZE);

                    let real_index = index - offset + window;
                    &self.slice[real_index.0]
                }
            }

            impl Iterator for Recurrence {
                type Item = $sty;

                #[inline]
                fn next(&mut self) -> Option<$sty> {
                    if self.pos < MEM_SIZE {
                        let next_val = self.mem[self.pos];
                        self.pos += 1;
                        Some(next_val)
                    } else {
                        let next_val = {
                            let $ind = self.pos;
//                              ^~~~ changed
                            let $seq = IndexOffset { slice: &self.mem, offset: $ind };
//                              ^~~~ changed
                            $recur
                        };

                        {
                            use std::mem::swap;

                            let mut swap_tmp = next_val;
                            for i in (0..MEM_SIZE).rev() {
                                swap(&mut swap_tmp, &mut self.mem[i]);
                            }
                        }

                        self.pos += 1;
                        Some(next_val)
                    }
                }
            }

            Recurrence { mem: [$($inits),+], pos: 0 }
        }
    };
}

fn main() {
    let fib = recurrence![a[n]: u64 = 0, 1; ...; a[n-1] + a[n-2]];

    for e in fib.take(10) { println!("{}", e) }
}

And it compiles! Now, let's try with a different sequence.

macro_rules! count_exprs {
    () => (0);
    ($head:expr) => (1);
    ($head:expr, $($tail:expr),*) => (1 + count_exprs!($($tail),*));
}

macro_rules! recurrence {
    ( $seq:ident [ $ind:ident ]: $sty:ty = $($inits:expr),+ ; ... ; $recur:expr ) => {
        {
            use std::ops::Index;

            const MEM_SIZE: usize = count_exprs!($($inits),+);

            struct Recurrence {
                mem: [$sty; MEM_SIZE],
                pos: usize,
            }

            struct IndexOffset<'a> {
                slice: &'a [$sty; MEM_SIZE],
                offset: usize,
            }

            impl<'a> Index<usize> for IndexOffset<'a> {
                type Output = $sty;

                #[inline(always)]
                fn index<'b>(&'b self, index: usize) -> &'b $sty {
                    use std::num::Wrapping;

                    let index = Wrapping(index);
                    let offset = Wrapping(self.offset);
                    let window = Wrapping(MEM_SIZE);

                    let real_index = index - offset + window;
                    &self.slice[real_index.0]
                }
            }

            impl Iterator for Recurrence {
                type Item = $sty;

                #[inline]
                fn next(&mut self) -> Option<$sty> {
                    if self.pos < MEM_SIZE {
                        let next_val = self.mem[self.pos];
                        self.pos += 1;
                        Some(next_val)
                    } else {
                        let next_val = {
                            let $ind = self.pos;
                            let $seq = IndexOffset { slice: &self.mem, offset: $ind };
                            $recur
                        };

                        {
                            use std::mem::swap;

                            let mut swap_tmp = next_val;
                            for i in (0..MEM_SIZE).rev() {
                                swap(&mut swap_tmp, &mut self.mem[i]);
                            }
                        }

                        self.pos += 1;
                        Some(next_val)
                    }
                }
            }

            Recurrence { mem: [$($inits),+], pos: 0 }
        }
    };
}

fn main() {
for e in recurrence!(f[i]: f64 = 1.0; ...; f[i-1] * i as f64).take(10) {
    println!("{}", e)
}
}

Which gives us:

1
1
2
6
24
120
720
5040
40320
362880

Success!

Patterns

Parsing and expansion patterns.

Callbacks

macro_rules! call_with_larch {
    ($callback:ident) => { $callback!(larch) };
}

macro_rules! expand_to_larch {
    () => { larch };
}

macro_rules! recognize_tree {
    (larch) => { println!("#1, the Larch.") };
    (redwood) => { println!("#2, the Mighty Redwood.") };
    (fir) => { println!("#3, the Fir.") };
    (chestnut) => { println!("#4, the Horse Chestnut.") };
    (pine) => { println!("#5, the Scots Pine.") };
    ($($other:tt)*) => { println!("I don't know; some kind of birch maybe?") };
}

fn main() {
    recognize_tree!(expand_to_larch!());
    call_with_larch!(recognize_tree);
}

Due to the order that macros are expanded in, it is (as of Rust 1.2) impossible to pass information to a macro from the expansion of another macro. This can make modularizing macros very difficult.

An alternative is to use recursion and pass a callback. Here is a trace of the above example to demonstrate how this takes place:

recognize_tree! { expand_to_larch ! (  ) }
println! { "I don't know; some kind of birch maybe?" }
// ...

call_with_larch! { recognize_tree }
recognize_tree! { larch }
println! { "#1, the Larch." }
// ...

Using a tt repetition, one can also forward arbitrary arguments to a callback.

macro_rules! callback {
    ($callback:ident( $($args:tt)* )) => {
        $callback!( $($args)* )
    };
}

fn main() {
    callback!(callback(println("Yes, this *was* unnecessary.")));
}

You can, of course, insert additional tokens in the arguments as needed.

Incremental TT Munchers

macro_rules! mixed_rules {
    () => {};
    (trace $name:ident; $($tail:tt)*) => {
        {
            println!(concat!(stringify!($name), " = {:?}"), $name);
            mixed_rules!($($tail)*);
        }
    };
    (trace $name:ident = $init:expr; $($tail:tt)*) => {
        {
            let $name = $init;
            println!(concat!(stringify!($name), " = {:?}"), $name);
            mixed_rules!($($tail)*);
        }
    };
}

fn main() {
    let a = 42;
    let b = "Ho-dee-oh-di-oh-di-oh!";
    let c = (false, 2, 'c');
    mixed_rules!(
        trace a;
        trace b;
        trace c;
        trace b = "They took her where they put the crazies.";
        trace b;
    );
}

This pattern is perhaps the most powerful macro parsing technique available, allowing one to parse grammars of significant complexity.

A TT muncher is a recursive macro_rules! macro that works by incrementally processing its input one step at a time. At each step, it matches and removes (munches) some sequence of tokens from the start of its input, generates some intermediate output, then recurses on the input tail.

The reason for "TT" in the name specifically is that the unprocessed part of the input is always captured as $($tail:tt)*. This is done as a tt repetition is the only way to losslessly capture part of a macro's input.

The only hard restrictions on TT munchers are those imposed on the macro_rules! macro system as a whole:

  • You can only match against literals and grammar constructs which can be captured by macro_rules!.
  • You cannot match unbalanced groups.

It is important, however, to keep the macro recursion limit in mind. macro_rules! does not have any form of tail recursion elimination or optimization. It is recommended that, when writing a TT muncher, you make reasonable efforts to keep recursion as limited as possible. This can be done by adding additional rules to account for variation in the input (as opposed to recursion into an intermediate layer), or by making compromises on the input syntax to make using standard repetitions more tractable.

Internal Rules

#[macro_export]
macro_rules! foo {
    (@as_expr $e:expr) => {$e};

    ($($tts:tt)*) => {
        foo!(@as_expr $($tts)*)
    };
}

fn main() {
    assert_eq!(foo!(42), 42);
}

Internal rules can be used to unify multiple macro_rules! macros into one, or to make it easier to read and write TT Munchers by explicitly naming what rule you wish to call in a macro.

So why is it useful to unify multiple mbe macros into one? The main reasoning for this is how they are handled in the 2015 Edition of Rust due to macro_rules! macros not being namespaced in said edition. This gives one the troubles of having to re-export all the internal macro_rules! macros as well polluting the global macro namespace or even worse, macro name collisions with other crates. In short, it's quite a hassle. This fortunately isn't really a problem anymore nowadays with a rustc version >= 1.30, for more information consult the Import and Export chapter.

Nevertheless, let's talk about how we can unify multiple macro_rules! macros into one with this technique and what exactly this technique even is.

We have two macro_rules! macros, the common as_expr! macro and a foo macro that makes use of the first one:

#[macro_export]
macro_rules! as_expr { ($e:expr) => {$e} }

#[macro_export]
macro_rules! foo {
    ($($tts:tt)*) => {
        as_expr!($($tts)*)
    };
}

fn main() {
    assert_eq!(foo!(42), 42);
}

This is definitely not the nicest solution we could have for this macro, as it pollutes the global macro namespace as mentioned earlier. In this specific case as_expr is also a very simple macro that we only used once, so let's "embed" this macro in our foo macro with internal rules! To do so we simply prepend a new matcher for our macro consists of the matcher used in the as_expr macro, but with a small addition. We prepend a tokentree that makes it match only when specifically asked to. In this case we can for example use @as_expr, so our matcher becomes (@as_expr $e:expr) => {$e};. With this we get the macro that was defined at the very top of this page:

#[macro_export]
macro_rules! foo {
    (@as_expr $e:expr) => {$e};

    ($($tts:tt)*) => {
        foo!(@as_expr $($tts)*)
    };
}

fn main() {
    assert_eq!(foo!(42), 42);
}

You see how we embedded the as_expr macro in the foo one? All that changed is that instead of invoking the as_expr macro, we now invoke foo recursively but with a special token tree prepended to the arguments, foo!(@as_expr $($tts)*). If you look closely you might even see that this pattern can be combined quite nicely with TT Munchers!

The reason for using @ was that, as of Rust 1.2, the @ token is not used in prefix position; as such, it cannot conflict with anything. This reasoning became obsolete later on when in Rust 1.7 macro matchers got future proofed by emitting a warning to prevent certain tokens from being allowed to follow certain fragments1, which in Rust 1.12 became a hard-error. There other symbols or unique prefixes may be used as desired, but use of @ has started to become widespread, so using it may aid readers in understanding your macro.

Note: in the early days of Rust the @ token was previously used in prefix position to denote a garbage-collected pointer, back when the language used sigils to denote pointer types. Its only current purpose is for binding names to patterns. For this, however, it is used as an infix operator, and thus does not conflict with its use here.

Additionally, internal rules will often come before any "bare" rules, to avoid issues with macro_rules! incorrectly attempting to parse an internal invocation as something it cannot possibly be, such as an expression.

Push-down Accumulation


#![allow(unused)]
fn main() {
macro_rules! init_array {
    (@accum (0, $_e:expr) -> ($($body:tt)*))
        => {init_array!(@as_expr [$($body)*])};
    (@accum (1, $e:expr) -> ($($body:tt)*))
        => {init_array!(@accum (0, $e) -> ($($body)* $e,))};
    (@accum (2, $e:expr) -> ($($body:tt)*))
        => {init_array!(@accum (1, $e) -> ($($body)* $e,))};
    (@accum (3, $e:expr) -> ($($body:tt)*))
        => {init_array!(@accum (2, $e) -> ($($body)* $e,))};
    (@as_expr $e:expr) => {$e};
    [$e:expr; $n:tt] => {
        {
            let e = $e;
            init_array!(@accum ($n, e.clone()) -> ())
        }
    };
}

let strings: [String; 3] = init_array![String::from("hi!"); 3];
assert_eq!(format!("{:?}", strings), "[\"hi!\", \"hi!\", \"hi!\"]");
}

All syntax extensions in Rust must result in a complete, supported syntax element (such as an expression, item, etc.). This means that it is impossible to have a syntax extension expand to a partial construct.

One might hope that the above example could be more directly expressed like so:

macro_rules! init_array {
    (@accum 0, $_e:expr) => {/* empty */};
    (@accum 1, $e:expr) => {$e};
    (@accum 2, $e:expr) => {$e, init_array!(@accum 1, $e)};
    (@accum 3, $e:expr) => {$e, init_array!(@accum 2, $e)};
    [$e:expr; $n:tt] => {
        {
            let e = $e;
            [init_array!(@accum $n, e)]
        }
    };
}

The expectation is that the expansion of the array literal would proceed as follows:

            [init_array!(@accum 3, e)]
            [e, init_array!(@accum 2, e)]
            [e, e, init_array!(@accum 1, e)]
            [e, e, e]

However, this would require each intermediate step to expand to an incomplete expression. Even though the intermediate results will never be used outside of a macro context, it is still forbidden.

Push-down, however, allows us to incrementally build up a sequence of tokens without needing to actually have a complete construct at any point prior to completion. In the example given at the top, the sequence of invocations proceeds as follows:

init_array! { String:: from ( "hi!" ) ; 3 }
init_array! { @ accum ( 3 , e . clone (  ) ) -> (  ) }
init_array! { @ accum ( 2 , e.clone() ) -> ( e.clone() , ) }
init_array! { @ accum ( 1 , e.clone() ) -> ( e.clone() , e.clone() , ) }
init_array! { @ accum ( 0 , e.clone() ) -> ( e.clone() , e.clone() , e.clone() , ) }
init_array! { @ as_expr [ e.clone() , e.clone() , e.clone() , ] }

As you can see, each layer adds to the accumulated output until the terminating rule finally emits it as a complete construct.

The only critical part of the above formulation is the use of $($body:tt)* to preserve the output without triggering parsing. The use of ($input) -> ($output) is simply a convention adopted to help clarify the behavior of such macros.

Push-down accumulation is frequently used as part of incremental TT munchers, as it allows arbitrarily complex intermediate results to be constructed. Internal Rules were of use here as well, as they simplify creating such macros.

Repetition Replacement

macro_rules! replace_expr {
    ($_t:tt $sub:expr) => {$sub};
}

This pattern is where a matched repetition sequence is simply discarded, with the variable being used to instead drive some repeated pattern that is related to the input only in terms of length.

For example, consider constructing a default instance of a tuple with more than 12 elements (the limit as of Rust 1.2).


#![allow(unused)]
fn main() {
macro_rules! tuple_default {
    ($($tup_tys:ty),*) => {
        (
            $(
                replace_expr!(
                    ($tup_tys)
                    Default::default()
                ),
            )*
        )
    };
}

macro_rules! replace_expr {
    ($_t:tt $sub:expr) => {$sub};
}

assert_eq!(tuple_default!(i32, bool, String), (i32::default(), bool::default(), String::default()));
}

JFTE: we could have simply used $tup_tys::default().

Here, we are not actually using the matched types. Instead, we throw them away and replace them with a single, repeated expression. To put it another way, we don't care what the types are, only how many there are.

TT Bundling

macro_rules! call_a_or_b_on_tail {
    ((a: $a:ident, b: $b:ident), call a: $($tail:tt)*) => {
        $a(stringify!($($tail)*))
    };

    ((a: $a:ident, b: $b:ident), call b: $($tail:tt)*) => {
        $b(stringify!($($tail)*))
    };

    ($ab:tt, $_skip:tt $($tail:tt)*) => {
        call_a_or_b_on_tail!($ab, $($tail)*)
    };
}

fn compute_len(s: &str) -> Option<usize> {
    Some(s.len())
}

fn show_tail(s: &str) -> Option<usize> {
    println!("tail: {:?}", s);
    None
}

fn main() {
    assert_eq!(
        call_a_or_b_on_tail!(
            (a: compute_len, b: show_tail),
            the recursive part that skips over all these
            tokens doesn't much care whether we will call a
            or call b: only the terminal rules care.
        ),
        None
    );
    assert_eq!(
        call_a_or_b_on_tail!(
            (a: compute_len, b: show_tail),
            and now, to justify the existence of two paths
            we will also call a: its input should somehow
            be self-referential, so let's make it return
            some eighty-six!
        ),
        Some(92)
    );
}

In particularly complex recursive macros, a large number of arguments may be needed in order to carry identifiers and expressions to successive layers. However, depending on the implementation there may be many intermediate layers which need to forward these arguments, but do not need to use them.

As such, it can be very useful to bundle all such arguments together into a single TT by placing them in a group. This allows layers which do not need to use the arguments to simply capture and substitute a single tt, rather than having to exactly capture and substitute the entire argument group.

The example above bundles the $a and $b expressions into a group which can then be forwarded as a single tt by the recursive rule. This group is then destructured by the terminal rules to access the expressions.

Building Blocks

Reusable snippets of macro_rules! macro code.

AST Coercion

The Rust parser is not very robust in the face of tt substitutions. Problems can arise when the parser is expecting a particular grammar construct and instead finds a lump of substituted tt tokens. Rather than attempt to parse them, it will often just give up. In these cases, it is necessary to employ an AST coercion.

#![allow(dead_code)]
macro_rules! as_expr { ($e:expr) => {$e} }
macro_rules! as_item { ($i:item) => {$i} }
macro_rules! as_pat  { ($p:pat)  => {$p} }
macro_rules! as_stmt { ($s:stmt) => {$s} }
macro_rules! as_ty   { ($t:ty)   => {$t} }

as_item!{struct Dummy;}

fn main() {
    as_stmt!(let as_pat!(_): as_ty!(_) = as_expr!(42));
}

These coercions are often used with push-down accumulation macros in order to get the parser to treat the final tt sequence as a particular kind of grammar construct.

Note that this specific set of macros is determined by what macros are allowed to expand to, not what they are able to capture.

Counting

What follows are several techniques for counting in macro_rules! macros:

Note: If you are just interested in the most efficient way look here

Repetition with replacement

Counting things in a macro is a surprisingly tricky task. The simplest way is to use replacement with a repetition match.

macro_rules! replace_expr {
    ($_t:tt $sub:expr) => {$sub};
}

macro_rules! count_tts {
    ($($tts:tt)*) => {0usize $(+ replace_expr!($tts 1usize))*};
}

fn main() {
    assert_eq!(count_tts!(0 1 2), 3);
}

This is a fine approach for smallish numbers, but will likely crash the compiler with inputs of around 500 or so tokens. Consider that the output will look something like this:

0usize + 1usize + /* ~500 `+ 1usize`s */ + 1usize

The compiler must parse this into an AST, which will produce what is effectively a perfectly unbalanced binary tree 500+ levels deep.

Recursion

An older approach is to use recursion.

macro_rules! count_tts {
    () => {0usize};
    ($_head:tt $($tail:tt)*) => {1usize + count_tts!($($tail)*)};
}

fn main() {
    assert_eq!(count_tts!(0 1 2), 3);
}

Note: As of rustc 1.2, the compiler has grievous performance problems when large numbers of integer literals of unknown type must undergo inference. We are using explicitly usize-typed literals here to avoid that.

If this is not suitable (such as when the type must be substitutable), you can help matters by using as (e.g. 0 as $ty, 1 as $ty, etc.).

This works, but will trivially exceed the recursion limit. Unlike the repetition approach, you can extend the input size by matching multiple tokens at once.

macro_rules! count_tts {
    ($_a:tt $_b:tt $_c:tt $_d:tt $_e:tt
     $_f:tt $_g:tt $_h:tt $_i:tt $_j:tt
     $_k:tt $_l:tt $_m:tt $_n:tt $_o:tt
     $_p:tt $_q:tt $_r:tt $_s:tt $_t:tt
     $($tail:tt)*)
        => {20usize + count_tts!($($tail)*)};
    ($_a:tt $_b:tt $_c:tt $_d:tt $_e:tt
     $_f:tt $_g:tt $_h:tt $_i:tt $_j:tt
     $($tail:tt)*)
        => {10usize + count_tts!($($tail)*)};
    ($_a:tt $_b:tt $_c:tt $_d:tt $_e:tt
     $($tail:tt)*)
        => {5usize + count_tts!($($tail)*)};
    ($_a:tt
     $($tail:tt)*)
        => {1usize + count_tts!($($tail)*)};
    () => {0usize};
}

fn main() {
    assert_eq!(700, count_tts!(
        ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,,
        ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,,

        ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,,
        ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,,

        ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,,
        ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,,

        ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,,
        ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,,

        ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,,
        ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,,

        // Repetition breaks somewhere after this
        ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,,
        ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,,

        ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,,
        ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,,
    ));
}

This particular formulation will work up to ~1,200 tokens.

Slice length

A third approach is to help the compiler construct a shallow AST that won't lead to a stack overflow. This can be done by constructing an array literal and calling the len method.

macro_rules! replace_expr {
    ($_t:tt $sub:expr) => {$sub};
}

macro_rules! count_tts {
    ($($tts:tt)*) => {<[()]>::len(&[$(replace_expr!($tts ())),*])};
}

fn main() {
    assert_eq!(count_tts!(0 1 2), 3);
}

This has been tested to work up to 10,000 tokens, and can probably go much higher.

Array length

Another modification of the previous approach is to use const generics stabilized in Rust 1.51. It's only slightly slower than slice length method on 20,000 tokens and works in const contexts.

const fn count_helper<const N: usize>(_: [(); N]) -> usize { N }

macro_rules! replace_expr {
    ($_t:tt $sub:expr) => { $sub }
}

macro_rules! count_tts {
    ($($smth:tt)*) => {
        count_helper([$(replace_expr!($smth ())),*])
    }
}

fn main() {
    assert_eq!(count_tts!(0 1 2), 3);
}

Enum counting

This approach can be used where you need to count a set of mutually distinct identifiers.

macro_rules! count_idents {
    () => {0};
    ($last_ident:ident, $($idents:ident),* $(,)?) => {
        {
            #[allow(dead_code, non_camel_case_types)]
            enum Idents { $($idents,)* $last_ident }
            const COUNT: u32 = Idents::$last_ident as u32 + 1;
            COUNT
        }
    };
}

fn main() {
    const COUNT: u32 = count_idents!(A, B, C);
    assert_eq!(COUNT, 3);
}

This method does have two drawbacks. As implied above, it can only count valid identifiers (which are also not keywords), and it does not allow those identifiers to repeat.

Bit twiddling

Another recursive approach using bit operations:

macro_rules! count_tts {
    () => { 0 };
    ($odd:tt $($a:tt $b:tt)*) => { (count_tts!($($a)*) << 1) | 1 };
    ($($a:tt $even:tt)*) => { count_tts!($($a)*) << 1 };
}

fn main() {
    assert_eq!(count_tts!(0 1 2), 3);
}

This approach is pretty smart as it effectively halves its input whenever its even and then multiplying the counter by 2 (or in this case shifting 1 bit to the left which is equivalent). If the input is uneven it simply takes one token tree from the input ors the token tree to the previous counter which is equivalent to adding 1 as the lowest bit has to be a 0 at this point due to the previous shifting. Rinse and repeat until we hit the base rule () => 0.

The benefit of this is that the constructed AST expression that makes up the counter value will grow with a complexity of O(log(n)) instead of O(n) like the other approaches. Be aware that you can still hit the recursion limit with this if you try hard enough. Credits for this method go to Reddit user YatoRust.

Let's go through the procedure by hand once:

count_tts!(0 0 0 0 0 0 0 0 0 0);

This invocation will match the third rule due to the fact that we have an even number of token trees(10). The matcher names the odd token trees in the sequence $a and the even ones $even but the expansion only makes use of $a, which means it effectively discards all the even elements cutting the input in half. So the invocation now becomes:

count_tts!(0 0 0 0 0) << 1;

This invocation will now match the second rule as its input is an uneven amount of token trees. In this case the first token tree is discarded to make the input even again, then we also do the halving step in this invocation again since we know the input would be even now anyways. Therefor we can count 1 for the uneven discard and multiply by 2 again since we also halved.

((count_tts!(0 0) << 1) | 1) << 1;
((count_tts!(0) << 1 << 1) | 1) << 1;
(((count_tts!() | 1) << 1 << 1) | 1) << 1;
((((0 << 1) | 1) << 1 << 1) | 1) << 1;

Now to check if we expanded correctly manually we can use a one of the tools we introduced for debugging. When expanding the macro there we should get:

((((0 << 1) | 1) << 1 << 1) | 1) << 1;

That's the same so we didn't make any mistakes, great!

Abacus Counters

Provisional: needs a more compelling example. Matching nested groups that are not denoted by Rust groups is sufficiently unusual that it may not merit inclusion.

Note: this section assumes understanding of push-down accumulation and incremental TT munchers.

macro_rules! abacus {
    ((- $($moves:tt)*) -> (+ $($count:tt)*)) => {
        abacus!(($($moves)*) -> ($($count)*))
    };
    ((- $($moves:tt)*) -> ($($count:tt)*)) => {
        abacus!(($($moves)*) -> (- $($count)*))
    };
    ((+ $($moves:tt)*) -> (- $($count:tt)*)) => {
        abacus!(($($moves)*) -> ($($count)*))
    };
    ((+ $($moves:tt)*) -> ($($count:tt)*)) => {
        abacus!(($($moves)*) -> (+ $($count)*))
    };

    // Check if the final result is zero.
    (() -> ()) => { true };
    (() -> ($($count:tt)+)) => { false };
}

fn main() {
    let equals_zero = abacus!((++-+-+++--++---++----+) -> ());
    assert_eq!(equals_zero, true);
}

This technique can be used in cases where you need to keep track of a varying counter that starts at or near zero, and must support the following operations:

  • Increment by one.
  • Decrement by one.
  • Compare to zero (or any other fixed, finite value).

A value of n is represented by n instances of a specific token stored in a group. Modifications are done using recursion and push-down accumulation. Assuming the token used is x, the operations above are implemented as follows:

  • Increment by one: match ($($count:tt)*), substitute (x $($count)*).
  • Decrement by one: match (x $($count:tt)*), substitute ($($count)*).
  • Compare to zero: match ().
  • Compare to one: match (x).
  • Compare to two: match (x x).
  • (and so on...)

In this way, operations on the counter are like flicking tokens back and forth like an abacus.1

1

This desperately thin reasoning conceals the real reason for this name: to avoid having yet another thing with "token" in the name. Talk to your writer about avoiding semantic satiation today!
In fairness, it could also have been called "unary counting".

In cases where you want to represent negative values, -n can be represented as n instances of a different token. In the example given above, +n is stored as n + tokens, and -m is stored as m - tokens.

In this case, the operations become slightly more complicated; increment and decrement effectively reverse their usual meanings when the counter is negative. To which given + and - for the positive and negative tokens respectively, the operations change to:

  • Increment by one:
    • match (), substitute (+).
    • match (- $($count:tt)*), substitute ($($count)*).
    • match ($($count:tt)+), substitute (+ $($count)+).
  • Decrement by one:
    • match (), substitute (-).
    • match (+ $($count:tt)*), substitute ($($count)*).
    • match ($($count:tt)+), substitute (- $($count)+).
  • Compare to 0: match ().
  • Compare to +1: match (+).
  • Compare to -1: match (-).
  • Compare to +2: match (++).
  • Compare to -2: match (--).
  • (and so on...)

Note that the example at the top combines some of the rules together (for example, it combines increment on () and ($($count:tt)+) into an increment on ($($count:tt)*)).

If you want to extract the actual value of the counter, this can be done using a regular counter macro. For the example above, the terminal rules can be replaced with the following:

macro_rules! abacus {
    // ...

    // This extracts the counter as an integer expression.
    (() -> ()) => {0};
    (() -> (- $($count:tt)*)) => {
        - ( count_tts!($( $count_tts:tt )*) )
    };
    (() -> (+ $($count:tt)*)) => {
        count_tts!($( $count_tts:tt )*)
    };
}

// One of the many token tree counting macros in the counting chapter
macro_rules! count_tts {
    // ...
}

JFTE: strictly speaking, the above formulation of abacus! is needlessly complex. It can be implemented much more efficiently using repetition, provided you do not need to match against the counter's value in a macro:

macro_rules! abacus {
    (-) => {-1};
    (+) => {1};
    ($( $moves:tt )*) => {
        0 $(+ abacus!($moves))*
    }
}

Parsing Rust

Parsing some of Rust's items can be useful in certain situations. This section will show a few macros that can parse some of Rust's more complex items like structs and functions to a certain extent. The goal of these macros is not to be able to parse the entire grammar of the items but to parse parts that are in general quite useful without being too complex to parse. This means we ignore things like generics and such.

The main points of interest of these macros are their matchers. The transcribers are only there for example purposes and are usually not that impressive.

Function

macro_rules! function_item_matcher {
    (

        $( #[$meta:meta] )*
    //  ^~~~attributes~~~~^
        $vis:vis fn $name:ident ( $( $arg_name:ident : $arg_ty:ty ),* $(,)? )
    //                          ^~~~~~~~~~~~~~~~argument list!~~~~~~~~~~~~~~^
            $( -> $ret_ty:ty )?
    //      ^~~~return type~~~^
            { $($tt:tt)* }
    //      ^~~~~body~~~~^
    ) => {
        $( #[$meta] )*
        $vis fn $name ( $( $arg_name : $arg_ty ),* ) $( -> $ret_ty )? { $($tt)* }
    }
}

function_item_matcher!(
   #[inline]
   #[cold]
   pub fn foo(bar: i32, baz: i32, ) -> String {
       format!("{} {}", bar, baz)
   }
);

fn main() {
    assert_eq!(foo(13, 37), "13 37");
}

A simple function matcher that ignores qualifiers like unsafe, async, ... as well a generics and where clauses. If parsing those is required it is likely that you are better off using a proc-macro instead.

This lets you for example, inspect the function signature, generate some extra things from it and then re-emit the entire function again. Kind of like a Derive proc-macro but weaker and for functions.

Ideally we would like to use a pattern fragment specifier instead of an ident for the arguments but this is currently not allowed. Fortunately people don't use non-identifier patterns in function signatures that often so this is okay(a shame, really).

Method

The macro for parsing basic functions is nice and all, but sometimes we would like to also parse methods, functions that refer to their object via some form of self usage. This makes things a bit trickier:

WIP

Struct

macro_rules! struct_item_matcher {
    // Unit-Struct
    (
        $( #[$meta:meta] )*
    //  ^~~~attributes~~~~^
        $vis:vis struct $name:ident;
    ) => {
        $( #[$meta] )*
        $vis struct $name;
    };

    // Tuple-Struct
    (
        $( #[$meta:meta] )*
    //  ^~~~attributes~~~~^
        $vis:vis struct $name:ident (
            $(
                $( #[$field_meta:meta] )*
    //          ^~~~field attributes~~~~^
                $field_vis:vis $field_ty:ty
    //          ^~~~~~a single field~~~~~~^
            ),*
        $(,)? );
    ) => {
        $( #[$meta] )*
        $vis struct $name (
            $(
                $( #[$field_meta] )*
                $field_vis $field_ty
            ),*
        );
    };

    // Named-Struct
    (
        $( #[$meta:meta] )*
    //  ^~~~attributes~~~~^
        $vis:vis struct $name:ident {
            $(
                $( #[$field_meta:meta] )*
    //          ^~~~field attributes~~~!^
                $field_vis:vis $field_name:ident : $field_ty:ty
    //          ^~~~~~~~~~~~~~~~~a single field~~~~~~~~~~~~~~~^
            ),*
        $(,)? }
    ) => {
        $( #[$meta] )*
        $vis struct $name {
            $(
                $( #[$field_meta] )*
                $field_vis $field_name : $field_ty
            ),*
        }
    }
}

struct_item_matcher!(
   #[derive(Copy, Clone)]
   pub(crate) struct Foo {
      pub bar: i32,
      baz: &'static str,
      qux: f32
   }
);
struct_item_matcher!(
   #[derive(Copy, Clone)]
   pub(crate) struct Bar;
);
struct_item_matcher!(
   #[derive(Clone)]
   pub(crate) struct Baz (i32, pub f32, String);
);
fn main() {
   let _: Foo = Foo { bar: 42, baz: "macros can be nice", qux: 3.14, };
   let _: Bar = Bar;
   let _: Baz = Baz(2, 0.1234, String::new());
}

Enum

Parsing enums is a bit more complex than structs so we will finally make use of some of the patterns we have discussed, Incremental TT Muncher and Internal Rules. Instead of just building the parsed enum again we will merely visit all the tokens of the enum, as rebuilding the enum would require us to collect all the parsed tokens temporarily again via a Push Down Accumulator.


#![allow(unused)]
fn main() {
macro_rules! enum_item_matcher {
    // tuple variant
    (@variant $variant:ident (
        $(
            $( #[$field_meta:meta] )*
    //      ^~~~field attributes~~~~^
            $field_vis:vis $field_ty:ty
    //      ^~~~~~a single field~~~~~~^
        ),* $(,)?
    //∨~~rest of input~~∨
    ) $(, $($tt:tt)* )? ) => {

        // process rest of the enum
        $( enum_item_matcher!(@variant $( $tt )*) )?
    };
    // named variant
    (@variant $variant:ident {
        $(
            $( #[$field_meta:meta] )*
    //      ^~~~field attributes~~~!^
            $field_vis:vis $field_name:ident : $field_ty:ty
    //      ^~~~~~~~~~~~~~~~~a single field~~~~~~~~~~~~~~~^
        ),* $(,)?
    //∨~~rest of input~~∨
    } $(, $($tt:tt)* )? ) => {
        // process rest of the enum
        $( enum_item_matcher!(@variant $( $tt )*) )?
    };
    // unit variant
    (@variant $variant:ident $(, $($tt:tt)* )? ) => {
        // process rest of the enum
        $( enum_item_matcher!(@variant $( $tt )*) )?
    };
    // trailing comma
    (@variant ,) => {};
    // base case
    (@variant) => {};
    // entry point
    (
        $( #[$meta:meta] )*
        $vis:vis enum $name:ident {
            $($tt:tt)*
        }
    ) => {
        enum_item_matcher!(@variant $($tt)*)
    };
}

enum_item_matcher!(
   #[derive(Copy, Clone)]
   pub(crate) enum Foo {
       Bar,
       Baz,
   }
);
enum_item_matcher!(
   #[derive(Copy, Clone)]
   pub(crate) enum Bar {
       Foo(i32, f32),
       Bar,
       Baz(),
   }
);
enum_item_matcher!(
   #[derive(Clone)]
   pub(crate) enum Baz {}
);
}

Procedural Macros

This chapter will introduce Rust's second syntax extension type, procedural macros.

As with the declarative macros chapter, this one is also split into a methodical and a (WIP) practical subchapter with the former being a more formal introduction and the latter being a more practical oriented one.

A lot of the basic information covered has been sourced from the rust reference, as most knowledge about procedural macros is currently located there.

A Methodical Introduction

This chapter will introduce Rust's procedural macro system by explaining the system as a whole.

Unlike a declarative macro, a procedural macro takes the form of a rust function taking in a token stream(or two) and outputting a token stream.

A proc-macro is at its core just a function exported from a crate with the proc-macro crate type, so when writing multiple proc macros you can have them all live in one crate.

Note: When using Cargo, to define a proc-macro crate you define and set the lib.proc-macro key in the Cargo.toml to true.

[lib]
proc-macro = true

A proc-macro type crate implicitly links to the compiler-provided proc_macro crate, which contains all the things you need to get going with developing procedural macros. The two most important types exposed by the crate are the TokenStream, which are the proc-macro variant of the already familiar token trees as well as the Span, which describes a part of source code used primarily for error reporting and hygiene. See the Hygiene and Spans chapter for more information.

As proc-macros therefore are functions living in a crate, they can be addressed as all the other items in a rust project. All thats required to add the crate to the dependency graph of a project and bring the desired item into scope.

Note: Procedural macros invocations still run at the same stage in the compiler expansion-wise as declarative macros, just that they are standalone rust programs that the compiler compiles, runs, and finally either replaces or appends to.

Types of procedural macros

With procedural macros, there are actually exist 3 different kinds with each having slightly different properties.

  • function-like proc-macros which are used to implement $name ! $arg invocable macros
  • attribute proc-macros which are used to implement #[$arg] attributes
  • derive proc-macros which are used to implement a derive, an input to a #[derive(…)] attribute

At their core, all 3 work almost the same with a few differences in their inputs and output reflected by their function definition. As mentioned all a procedural macro really is, is a function that maps a token stream so let's take a quick look at each basic definition and their differences.

function-like

#[proc_macro]
pub fn my_proc_macro(input: TokenStream) -> TokenStream {
    TokenStream::new()
}

attribute

#[proc_macro_attribute]
pub fn my_attribute(input: TokenStream, annotated_item: TokenStream) -> TokenStream {
    TokenStream::new()
}

derive

#[proc_macro_derive(MyDerive)]
pub fn my_derive(annotated_item: TokenStream) -> TokenStream {
    TokenStream::new()
}

As shown, the basic structure is the same for each, a public function marked with an attribute defining its procedural macro type returning a TokenStream. Note how the return type is a TokenStream and not a result or something else that gives the notion of being fallible. This does not mean that proc-macros cannot fail though, in fact they have two ways of reporting errors, the first one being to panic and the second to emit a compile_error! invocation. If a proc-macro panics the compiler will catch it and emit the payload as an error coming from the macro invocation.

Beware: The compiler will happily hang on endless loops spun up inside proc-macros causing the compilation of crates using the proc-macro to hang as well.

Function-like

Function-like procedural macros are invoked like declarative macros that is makro!(…).

This type of macro is the simplest of the three though. It is also the only one which you can't differentiate from declarative macros when solely looking at the invocation.

A simple skeleton of a function-like procedural macro looks like the following:

use proc_macro::TokenStream;

#[proc_macro]
pub fn tlborm_fn_macro(input: TokenStream) -> TokenStream {
    input
}

As one can see this is in fact just a mapping from one TokenStream to another where the input will be the tokens inside of the invocation delimiters, e.g. for an example invocation foo!(bar) the input token stream would consist of the bar token. The returned token stream will replace the macro invocation.

For this macro type the same placement and expansion rules apply as for declarative macros, that is the macro must output a correct token stream for the invocation location. Unlike with declarative macros though, function-like procedural macros do not have certain restrictions imposed on their inputs though. That is the restrictions for what may follow fragment specifiers listed in the Metavariables and Expansion Redux chapter listed is not applicable here, as the procedural macros work on the tokens directly instead of matching them against fragment specifiers or similar.

With that said it is apparent that the procedural counter part to these macros is more powerful as they can arbitrarily modify their input, and produce any output desired as long as its within the bounds of the language syntax.

Usage example:

use tlborm_proc::tlborm_attribute;

fn foo() {
    tlborm_attribute!(be quick; time is mana);
}

Attribute

Attribute procedural macros define new outer attributes which can be attached to items. This type can be invoked with the #[attr] or #[attr(…)] syntax where is an arbitrary token tree.

A simple skeleton of an attribute procedural macro looks like the following:

use proc_macro::TokenStream;

#[proc_macro_attribute]
pub fn tlborm_attribute(input: TokenStream, annotated_item: TokenStream) -> TokenStream {
    annotated_item
}

Of note here is that unlike the other two procedural macro kinds, this one has two input parameters instead of one.

  • The first parameter is the delimited token tree following the attribute's, name excluding the delimiters around it. It is empty if the attribute is written bare, that is just a name without a (TokenTree) following it, e.g. #[attr].
  • The second token stream is the item the attribute is attached to without the attribute this proc macro defines. As this is an active attribute, the attribute will be stripped from the item before it is being passed to the proc macro.

The returned token stream will replace the annotated item fully. Note that the replacement does not have to be a single item, it can be 0 or more.

Usage example:

use tlborm_proc::tlborm_attribute;

#[tlborm_attribute]
fn foo() {}

#[tlborm_attribute(attributes are pretty handsome)]
fn bar() {}

Derive

Derive procedural macros define new inputs for the derive attribute. This type can be invoked by feeding it to a derive attribute's input, e.g. #[derive(TlbormDerive)].

A simple skeleton of a derive procedural macro looks like the following:

use proc_macro::TokenStream;

#[proc_macro_derive(TlbormDerive)]
pub fn tlborm_derive(item: TokenStream) -> TokenStream {
    TokenStream::neW()
}

The proc_macro_derive is a bit more special in that it requires an extra identifier, this identifier will become the actual name of the derive proc macro. The input token stream is the item the derive attribute is attached to, that is, it will always be an enum, struct or union as these are the only items a derive attribute can annotated. The returned token stream will be appended to the containing block or module of the annotated item with the requirement that the token stream consists of a set of valid items.

Usage example:

use tlborm_proc::TlbormDerive;

#[derive(TlbormDerive)]
struct Foo;

Helper Attributes

Derive proc macros are a bit more special in that they can add additional attributes visible only in the scope of the item definition. These attributes are called derive macro helper attributes and are inert. Their purpose is to give derive proc macros additional customizability on a per field or variant basis, that is these attributes can be used to annotate fields or enum variants while having no effect on their own. As they are inert they will not be stripped and are visible to all macros.

They can be defined by adding an attributes(helper0, helper1, ..) argument to the proc_macro_derive attribute containing a comma separated list of identifiers which are the names of the helper attributes.

Thus a simple skeleton of a derive procedural macro with helper attributes looks like the following:

use proc_macro::TokenStream;

#[proc_macro_derive(TlbormDerive, attributes(tlborm_helper))]
pub fn tlborm_derive(item: TokenStream) -> TokenStream {
    TokenStream::neW()
}

That is all there is to helper attributes, to consume them in the proc macro the implementation will then have to check the attributes of fields and variants to see whether they are attributed with the corresponding helper. It is an error to use a helper attribute if none of the used derive macros of the given item declare it as such, as the compiler will then instead try to resolve it as a normal attribute.

Usage example:

use tlborm_proc::TlbormDerive;

#[derive(TlbormDerive)]
struct Foo {
    #[tlborm_helper]
    field: u32
}

#[derive(TlbormDerive)]
enum Bar {
    #[tlborm_helper]
    Variant { #[tlborm_helper] field: u32 }
}

Third-Party Crates

Note: Crates beyond the automatically linked proc_macro crate are not required to write procedural macros. The crates listed here merely make writing them simpler and more concise, while potentially adding to the compilation time of the procedural macro due to added dependencies.

As procedural macros live in a crate they can naturally depend on (crates.io) crates. turns out the crate ecosystem has some really helpful crates tailored towards procedural macros that this chapter will quickly go over, most of which will be used in the following chapters to implement the example macros. As these are merely quick introductions it is advised to look at each crate's documentation for more in-depth information if required.

proc-macro2

proc-macro2, the successor of the proc_macro crate! Or so you might think but that is of course not correct, the name might be a bit misleading. This crate is actually just a wrapper around the proc_macro crate serving two specific purposes, taken from the documentation:

  • Bring proc-macro-like functionality to other contexts like build.rs and main.rs.
  • Make procedural macros unit testable.

As the proc_macro crate is exclusive to proc_macro type crates, making them unit testable or accessing them from non-proc macro code is next to impossible. With that in mind the proc-macro2 crate mimics the original proc_macro crate's api, acting as a wrapper in proc-macro crates and standing on its own in non-proc-macro crates. Hence it is advised to build libraries targeting proc-macro code to be built against proc-macro2 instead as that will enable those libraries to be unit testable, which is also the reason why the following listed crates take and emit proc-macro2::TokenStreams instead. When a proc_macro token stream is required, one can simply .into() the proc-macro2 token stream to get the proc_macro version and vice-versa.

Procedural macros using the proc-macro2 crate will usually import the proc-macro2::TokenStream in an aliased form like use proc-macro2::TokenStream as TokenStream2.

quote

The quote crate mainly exposes just one macro, the quote! macro.

This little macro allows you to easily create token streams by writing the actual source out as syntax while also giving you the power of interpolating tokens right into the written syntax. Interpolation can be done by using the #local syntax where local refers to a local in the current scope. Likewise #( #local )* can be used to interpolate over an iterator of types that implement ToTokens, this works similar to declarative macro_rules! repetitions in that they allow a separator as well as extra tokens inside the repetition.

let name = /* some identifier */;
let exprs = /* an iterator over expressions tokenstreams */;
let expanded = quote! {
    impl SomeTrait for #name { // #name interpolates the name local from above
        fn some_function(&self) -> usize {
            #( #exprs )* // #name interpolates exprs by iterating the iterator
        }
    }
};

This a very useful tool when preparing macro output avoiding the need of creating a token stream by inserting tokens one by one.

Note: As stated earlier, this crate makes use of proc_macro2 and thus the quote! macro returns a proc-macro2::TokenStream.

syn

The syn crate is a parsing library for parsing a stream of Rust tokens into a syntax tree of Rust source code. It is a very powerful library that makes parsing proc-macro input quite a bit easier, as the proc_macro crate itself does not expose any kind of parsing capabilities, merely the tokens. As the library can be a heavy compilation dependency, it makes heavy use of feature gates to allow users to cut it as small as required.

So what does it offer? A bunch of things.

First of all it has definitions and parsing for all standard Rust syntax nodes(when the full feature is enabled), as well as a DeriveInput type which encapsulates all the information a derive macro gets passed as an input stream as a structured input(requires the derive feature, enabled by default). These can be used right out of the box with the parse_macro_input! macro(requires the parsing and proc-macro features, enabled by default) to parse token streams into these types.

If rust syntax doesn't cut it, and instead one wishes to parse custom non-rust syntax the crate also offers a generic parsing API, mainly in the form of the Parse trait(requires the parsing feature, enabled by default).

Aside from this the types exposed by the library keep location information and spans which allows procedural macros to emit detailed error messages pointing at the macro input at the points of interest.

As this is again a library for procedural macros, it makes use of the proc_macro2 token streams and spans and as such, conversions may be required.

Hygiene and Spans

This chapter talks about procedural macro hygiene and the type that encodes it, Span.

Every token in a TokenStream has an associated Span holding some additional info. A span, as its documentation states, is A region of source code, along with macro expansion information. It points into a region of the original source code(important for displaying diagnostics at the correct places) as well as holding the kind of hygiene for this location. The hygiene is relevant mainly for identifiers, as it allows or forbids the identifier from referencing things or being referenced by things defined outside of the invocation.

There are 3 kinds of hygiene(which can be seen by the constructors of the Span type):

  • definition site(unstable): A span that resolves at the macro definition site. Identifiers with this span will not be able to reference things defined outside or be referenced by things outside of the invocation. This is what one would call "hygienic".
  • mixed site: A span that has the same hygiene as macro_rules declarative macros, that is it may resolve to definition site or call site depending on the type of identifier. See here for more information.
  • call site: A span that resolves to the invocation site. Identifiers in this case will behave as if written directly at the call site, that is they freely resolve to things defined outside of the invocation and can be referenced from the outside as well. This is what one would call "unhygienic".

Glossary

A place for obscure words and their descriptions. If you feel like there is an important word missing here, please open an issue or a pull request.

Function-like macro

A function like macro describes a syntax extension that can be invoked via the form identifier!(...). It is called this way due to its resemblance of a function call.

Syntax Extension

The mechanism rust's macro_rules! and procedural macros are built on.