Not optimize boolean computations for the `usize` type #140103

ZhonghaoPan-nju · 2025-04-21T08:03:24Z

The following example shows that rustc does not optimize boolean computations for the usize type.

I tried this code: (opt-level=3)
https://www.godbolt.org/z/zY3Kqj8ax

#[no_mangle]
fn even(x: usize) -> bool {
    if !(x >= (0 + 2)) {
        return false;
    } else if x == 2 {
        return true;
    } else {
        return even(x - 2);
    }
}

I expected to see this happen:

even:
        cmp     rdi, 2
        jae     .LBB0_2
        xor     eax, eax
        ret
.LBB0_2:
        mov     rax, rdi
        add     rdi, -2
        and     rax, -2
        cmp     rax, 2
        jne     .LBB0_2
        test    rdi, rdi
        sete    al
        ret

Instead, this happened:

even:
        cmp     rdi, 2
        jb      .LBB0_1
        lea     rcx, [rdi - 2]
        mov     eax, ecx
        shr     eax
        inc     eax
        and     eax, 7
        je      .LBB0_3
        add     eax, eax
        xor     edx, edx
.LBB0_5:
        add     rdx, 2
        cmp     rax, rdx
        jne     .LBB0_5
        sub     rdi, rdx
        sete    al
        cmp     rcx, 14
        jae     .LBB0_8
.LBB0_10:
        and     al, 1
        ret
.LBB0_1:
        xor     eax, eax
        and     al, 1
        ret
.LBB0_3:
        cmp     rcx, 14
        jb      .LBB0_10
.LBB0_8:
        add     rdi, -16
        cmp     rdi, 1
        ja      .LBB0_8
        test    rdi, rdi
        sete    al
        and     al, 1
        ret

However, when I change the type from usize to i32, the compiler generates the desired assembly code. Additionally, I can also get optimized result by rewriting if !(x >= (0 + 2)) as if x < 2.

Therefore, I think that something wrong may occur when optimizing boolean computation for usize type. If addressed in the future, this improvement could enhance rustc's code generation efficiency for such cases.

Could you please review the situation? Thank you!

Meta

rustc 1.85.0-nightly (d117b7f21 2024-12-31)
binary: rustc
commit-hash: d117b7f211835282b3b177dc64245fff0327c04c
commit-date: 2024-12-31
host: x86_64-unknown-linux-gnu
release: 1.85.0-nightly
LLVM version: 19.1.6

The text was updated successfully, but these errors were encountered:

saethlin · 2025-04-21T14:02:37Z

I clicked on the godbolt link and I see the good assembly. So I'm confused.

ZhonghaoPan-nju · 2025-04-21T14:50:48Z

@saethlin I have checked the godbolt link https://www.godbolt.org/z/zY3Kqj8ax. The Editor Rust Resource#2 is the example with missed optimization, and the compiler rustc 1.86.0 Editor #2 is the corresponding assembly. Other windows in this link are used for comparsion.

To show the problem clearly, I provide another godbolt link to show the example with bad assembly the same as it in this issue:
https://www.godbolt.org/z/fEW4zExWf

Thank you!

saethlin · 2025-04-21T15:16:00Z

More fool me. I tried to open the link on mobile and got confused by how godbolt renders on mobile 🤦

Alive2 says this optimization is valid: https://alive2.llvm.org/ce/z/LaPXQ2

Also the only diff in the IR we generate is just whether the comparison is signed or unsigned: https://www.godbolt.org/z/GnTrWErcs

ZhonghaoPan-nju · 2025-04-21T15:24:18Z

@saethlin Thank you for your review. I’ll learn it and keep following this issue. :)

nikic · 2025-04-21T16:06:29Z

As far as I can tell the difference here is that the loop gets runtime unrolled in the usize case. Is the resulting code actually slower (for large inputs)?

Just because you see less assembly doesn't mean that the code is less optimized.

nikic · 2025-04-21T16:26:34Z

I'll also add that, based on extensive experience in the LLVM project, missed optimization issues that are based around some kind of automated identification approach tend to provide significant negative value to the project. They result in contributors wasting a substantial amount of time implementing, reviewing and maintaining optimizations that do not provide any real-world value.

Making such reports actually useful requires significantly more effort on behalf of the reporter -- you need to understand what the root cause of the optimization difference is and then consider whether fixing that root cause can plausibly benefit real-world code. The usually more reliable approach is to instead directly work on a corpus of real-world code, and identify missed optimization opportunities in it.

I'd appreciate it if missed optimization reports that are not directly derived from unmodified real-world code were at least tagged as such, to help prioritize them appropriately.

ZhonghaoPan-nju · 2025-04-22T02:23:54Z

@nikic Thank you very much for your comment, which has given me a deeper understanding of project maintenance regarding missed optimization opportunities.

Indeed, in complex systems, the handling of missed optimization issues tends to be more conservative. Through my research on detecting optimization oversights in various complex systems, I’ve observed that developers often do not prioritize minor optimization opportunities, as the associated costs outweigh the benefits.

In academia, some scholars adopt a relatively straightforward criterion: test cases with more than 100 lines of assembly code differences in differential testing are reported. However, this standard is not entirely accurate, as the size of the program itself also significantly influences the number of assembly lines. For smaller programs, even smaller assembly differences should warrant attention. In the case of this issue, the difference between the good and bad examples is more than twofold in terms of assembly lines, which is why it caught my attention.

More importantly, as you mentioned, the root cause of the missed optimization and its actual performance impact are what truly determine the value of the issue. In this case, the root cause may lie in the missed optimization of boolean operations involving the usize type, which prevents the recursion from being optimized into a loop, thereby slowing down the program. Additionally, there is a noticeable performance difference in practice. Below are the performance evaluation reports from perf for the two programs with an input of 333:

Bad example:

         2,048,659      cpu_core/cycles/                                            
     <not counted>      cpu_atom/cycles/                                              (0.00%)
         1,878,650      cpu_core/instructions/                                      
     <not counted>      cpu_atom/instructions/                                        (0.00%)
            20,471      cpu_core/cache-misses/                                      
     <not counted>      cpu_atom/cache-misses/                                        (0.00%)
           371,849      cpu_core/branches/                                          
     <not counted>      cpu_atom/branches/                                            (0.00%)
            10,912      cpu_core/branch-misses/                                     
     <not counted>      cpu_atom/branch-misses/                                       (0.00%)

       0.000708355 seconds time elapsed
       0.000692000 seconds user
       0.000000000 seconds sys

Good example:

         1,806,808      cpu_core/cycles/                                            
     <not counted>      cpu_atom/cycles/                                              (0.00%)
         1,841,896      cpu_core/instructions/                                      
     <not counted>      cpu_atom/instructions/                                        (0.00%)
            19,881      cpu_core/cache-misses/                                      
     <not counted>      cpu_atom/cache-misses/                                        (0.00%)
           365,033      cpu_core/branches/                                          
     <not counted>      cpu_atom/branches/                                            (0.00%)
            11,035      cpu_core/branch-misses/                                     
     <not counted>      cpu_atom/branch-misses/                                       (0.00%)

       0.000700774 seconds time elapsed
       0.000657000 seconds user
       0.000000000 seconds sys

The CPU cycle counts show a significant difference, indicating that this optimization does have meaningful impact.

Finally, does this missed optimization opportunity matter in the real world? I believe it does. Although the odd-number check in this issue may not be the best implementation, usize and recursion are still widely used techniques in programming. For large-scale computations, if more recursive patterns can be optimized into loops, it could lead to substantial performance improvements.

Thank you again for your response—it has been very enlightening. I hope my comment can be helpful to rustc, and please correct me if I’ve made any mistakes. Thank you!

ZhonghaoPan-nju added the C-bug Category: This is a bug. label Apr 21, 2025

rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Apr 21, 2025

jieyouxu added C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such and removed C-bug Category: This is a bug. labels Apr 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not optimize boolean computations for the `usize` type #140103

Not optimize boolean computations for the `usize` type #140103

ZhonghaoPan-nju commented Apr 21, 2025

saethlin commented Apr 21, 2025

ZhonghaoPan-nju commented Apr 21, 2025

saethlin commented Apr 21, 2025 •

edited

Loading

ZhonghaoPan-nju commented Apr 21, 2025

nikic commented Apr 21, 2025

nikic commented Apr 21, 2025

ZhonghaoPan-nju commented Apr 22, 2025

Not optimize boolean computations for the usize type #140103

Not optimize boolean computations for the usize type #140103

Comments

ZhonghaoPan-nju commented Apr 21, 2025

Meta

saethlin commented Apr 21, 2025

ZhonghaoPan-nju commented Apr 21, 2025

saethlin commented Apr 21, 2025 • edited Loading

ZhonghaoPan-nju commented Apr 21, 2025

nikic commented Apr 21, 2025

nikic commented Apr 21, 2025

ZhonghaoPan-nju commented Apr 22, 2025

Not optimize boolean computations for the `usize` type #140103

Not optimize boolean computations for the `usize` type #140103

saethlin commented Apr 21, 2025 •

edited

Loading