Counting
What follows are several techniques for counting in macro_rules!
macros:
Note: If you are just interested in the most efficient way look here
Repetition with replacement
Counting things in a macro is a surprisingly tricky task. The simplest way is to use replacement with a repetition match.
macro_rules! replace_expr { ($_t:tt $sub:expr) => {$sub}; } macro_rules! count_tts { ($($tts:tt)*) => {0usize $(+ replace_expr!($tts 1usize))*}; } fn main() { assert_eq!(count_tts!(0 1 2), 3); }
This is a fine approach for smallish numbers, but will likely crash the compiler with inputs of around 500 or so tokens. Consider that the output will look something like this:
0usize + 1usize + /* ~500 `+ 1usize`s */ + 1usize
The compiler must parse this into an AST, which will produce what is effectively a perfectly unbalanced binary tree 500+ levels deep.
Recursion
An older approach is to use recursion.
macro_rules! count_tts { () => {0usize}; ($_head:tt $($tail:tt)*) => {1usize + count_tts!($($tail)*)}; } fn main() { assert_eq!(count_tts!(0 1 2), 3); }
Note: As of
rustc
1.2, the compiler has grievous performance problems when large numbers of integer literals of unknown type must undergo inference. We are using explicitlyusize
-typed literals here to avoid that.If this is not suitable (such as when the type must be substitutable), you can help matters by using
as
(e.g.0 as $ty
,1 as $ty
, etc.).
This works, but will trivially exceed the recursion limit. Unlike the repetition approach, you can extend the input size by matching multiple tokens at once.
macro_rules! count_tts { ($_a:tt $_b:tt $_c:tt $_d:tt $_e:tt $_f:tt $_g:tt $_h:tt $_i:tt $_j:tt $_k:tt $_l:tt $_m:tt $_n:tt $_o:tt $_p:tt $_q:tt $_r:tt $_s:tt $_t:tt $($tail:tt)*) => {20usize + count_tts!($($tail)*)}; ($_a:tt $_b:tt $_c:tt $_d:tt $_e:tt $_f:tt $_g:tt $_h:tt $_i:tt $_j:tt $($tail:tt)*) => {10usize + count_tts!($($tail)*)}; ($_a:tt $_b:tt $_c:tt $_d:tt $_e:tt $($tail:tt)*) => {5usize + count_tts!($($tail)*)}; ($_a:tt $($tail:tt)*) => {1usize + count_tts!($($tail)*)}; () => {0usize}; } fn main() { assert_eq!(700, count_tts!( ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, // Repetition breaks somewhere after this ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, )); }
This particular formulation will work up to ~1,200 tokens.
Slice length
A third approach is to help the compiler construct a shallow AST that won't lead to a stack overflow.
This can be done by constructing an array literal and calling the len
method.
macro_rules! replace_expr { ($_t:tt $sub:expr) => {$sub}; } macro_rules! count_tts { ($($tts:tt)*) => {<[()]>::len(&[$(replace_expr!($tts ())),*])}; } fn main() { assert_eq!(count_tts!(0 1 2), 3); }
This has been tested to work up to 10,000 tokens, and can probably go much higher.
Array length
Another modification of the previous approach is to use const generics stabilized in Rust 1.51. It's only slightly slower than slice length method on 20,000 tokens and works in const contexts.
const fn count_helper<const N: usize>(_: [(); N]) -> usize { N } macro_rules! replace_expr { ($_t:tt $sub:expr) => { $sub } } macro_rules! count_tts { ($($smth:tt)*) => { count_helper([$(replace_expr!($smth ())),*]) } } fn main() { assert_eq!(count_tts!(0 1 2), 3); }
Enum counting
This approach can be used where you need to count a set of mutually distinct identifiers.
macro_rules! count_idents { () => {0}; ($last_ident:ident, $($idents:ident),* $(,)?) => { { #[allow(dead_code, non_camel_case_types)] enum Idents { $($idents,)* $last_ident } const COUNT: u32 = Idents::$last_ident as u32 + 1; COUNT } }; } fn main() { const COUNT: u32 = count_idents!(A, B, C); assert_eq!(COUNT, 3); }
This method does have two drawbacks. As implied above, it can only count valid identifiers (which are also not keywords), and it does not allow those identifiers to repeat.
Bit twiddling
Another recursive approach using bit operations:
macro_rules! count_tts { () => { 0 }; ($odd:tt $($a:tt $b:tt)*) => { (count_tts!($($a)*) << 1) | 1 }; ($($a:tt $even:tt)*) => { count_tts!($($a)*) << 1 }; } fn main() { assert_eq!(count_tts!(0 1 2), 3); }
This approach is pretty smart as it effectively halves its input whenever its even and then multiplying the counter by 2 (or in this case shifting 1 bit to the left which is equivalent).
If the input is uneven it simply takes one token tree from the input or
s the token tree to the previous counter which is equivalent to adding 1 as the lowest bit has to be a 0 at this point due to the previous shifting.
Rinse and repeat until we hit the base rule () => 0
.
The benefit of this is that the constructed AST expression that makes up the counter value will grow with a complexity of O(log(n))
instead of O(n)
like the other approaches.
Be aware that you can still hit the recursion limit with this if you try hard enough.
Credits for this method go to Reddit user YatoRust
.
Let's go through the procedure by hand once:
count_tts!(0 0 0 0 0 0 0 0 0 0);
This invocation will match the third rule due to the fact that we have an even number of token trees(10).
The matcher names the odd token trees in the sequence $a
and the even ones $even
but the expansion only makes use of $a
, which means it effectively discards all the even elements cutting the input in half.
So the invocation now becomes:
count_tts!(0 0 0 0 0) << 1;
This invocation will now match the second rule as its input is an uneven amount of token trees. In this case the first token tree is discarded to make the input even again, then we also do the halving step in this invocation again since we know the input would be even now anyways. Therefore we can count 1 for the uneven discard and multiply by 2 again since we also halved.
((count_tts!(0 0) << 1) | 1) << 1;
((count_tts!(0) << 1 << 1) | 1) << 1;
((((count_tts!() << 1) | 1) << 1 << 1) | 1) << 1;
((((0 << 1) | 1) << 1 << 1) | 1) << 1;
Now to check if we expanded correctly manually we can use a one of the tools we introduced for debugging
.
When expanding the macro there we should get:
((((0 << 1) | 1) << 1 << 1) | 1) << 1;
That's the same so we didn't make any mistakes, great!