Skip to content

Bad codegen for comparing struct of two 16bit ints #140167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bjorn3 opened this issue Apr 22, 2025 · 3 comments
Open

Bad codegen for comparing struct of two 16bit ints #140167

bjorn3 opened this issue Apr 22, 2025 · 3 comments
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@bjorn3
Copy link
Member

bjorn3 commented Apr 22, 2025

I tried this code:

#[derive(Clone, Copy, PartialEq)]
pub struct Foo {
    pub x: u16,
    pub y: u16,
}

#[no_mangle]
fn eq(a: &Foo, b: &Foo) -> bool {
    a == b
}

I expected to see this happen: a and b are loaded into a single register each and then the registers are compared against each other.

Instead, this happened:

For -Copt-level=2:

eq:
        movzx   eax, word ptr [rdi]
        movzx   ecx, word ptr [rdi + 2]
        xor     ax, word ptr [rsi]
        movzx   edx, word ptr [rsi + 2]
        xor     edx, ecx
        or      dx, ax
        sete    al
        ret

For -Copt-level=3:

eq:
        movd    xmm0, dword ptr [rdi]
        movd    xmm1, dword ptr [rsi]
        pcmpeqw xmm1, xmm0
        pshuflw xmm0, xmm1, 80
        pshufd  xmm0, xmm0, 80
        movmskpd        eax, xmm0
        cmp     eax, 3
        sete    al
        ret

Meta

rustc --version --verbose:
Both 1.86 and nightly.

@bjorn3 bjorn3 added C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Apr 22, 2025
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Apr 22, 2025
@bjorn3 bjorn3 changed the title Bad codegen for comparing struct of 16bit ints Bad codegen for comparing struct of two 16bit ints Apr 22, 2025
@bjorn3
Copy link
Member Author

bjorn3 commented Apr 22, 2025

Using transmute::<_, u32>(read_unaligned(self)) and transmute::<_, u32>(read_unaligned(other)) inside of the PartialEq impl produces the expected codegen.

@folkertdev
Copy link
Contributor

Trying this in C shows that it is also unable to optimize (the equivalent of) Foo::eq, but it is able to generate good code when memcmp is used:

https://godbolt.org/z/cqhM9hThK

#include <stdint.h>
#include <stdbool.h>
#include <string.h>

typedef struct {
    uint16_t x;
    uint16_t y;
} Foo;

bool eq(const Foo* a, const Foo* b) {
    return memcmp(a, b, sizeof(Foo)) == 0;
}

bool partial_eq(const Foo* a, const Foo* b) {
    return a->x == b->x && a->y == b->y;
}
eq:
        mov     eax, dword ptr [rdi]
        cmp     eax, dword ptr [rsi]
        sete    al
        ret

partial_eq:
        movzx   eax, word ptr [rdi]
        cmp     ax, word ptr [rsi]
        jne     .LBB1_1
        movzx   eax, word ptr [rdi + 2]
        cmp     ax, word ptr [rsi + 2]
        sete    al
        ret
.LBB1_1:
        xor     eax, eax
        ret

That suggests that codegen for Eq could be improved by using memcmp in some cases. (fields must be copy, no padding , probably other conditions that I'm missing).

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Apr 22, 2025

At the time derives are expanded, field types aren’t known at all. The derive just gets tokens and it might not even see the same tokens as later stages of the compiler. So the derive can’t do the necessary checks (notably including: is bitwise equality even correct for all field types). In theory the built-in derives could expand to some magic intrinsic that’s lowered much later when types are known, but that’s a big hammer. I don’t see any fundamental reason why LLVM shouldn’t be able to do this optimization (at least for Rust) and that would help non-derived code as well. But I haven’t checked Alive2.

@Urgau Urgau added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-codegen Area: Code generation and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Apr 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

5 participants