frabert a day ago

This has been a sore point in a lot of discussions regarding compiler optimizations and cryptographic code, how compilers and compiler engineers are sabotaging the efforts of cryptographers in making sure there are no side-channels in their code. The issue has never been the compiler, and has always been the language: there was never a way to express the right intention from within C (or most other languages, really).

This primitive we're trying to introduce is meant to make up for this shortcoming without having to introduce additional rules in the standard.

  • Asooka 3 hours ago

    There really ought to be a subset of C that lets you write portable assembly. One where only a defined set of optimisations are allowed and required to be performed, "inline" means always inline, the "register" and "auto" keywords have their original meanings, every stack variable is allocated unless otherwise indicated, every expression has defined evaluation order, every read/write from/to an address is carried out, nothing is ever reordered, and undefined behaviour is switched to machine-specific behaviour. Currently if you need that level of control, your only option is writing it in assembly, which gets painful when you need to support multiple architectures, or want fancy features like autocomplete or structs and functions.

    • delta_p_delta_x 2 hours ago

      > want fancy features like autocomplete or structs and functions

      I would argue that given a certain ISA, it's probably easier to write an autocomplete extension for assembly targeting that ISA, rather than autocomplete for C, or goodness forbid, C++.

      Likewise for structs, functions, jump targets, etc. One could probably set up snippets corresponding to different sorts of conditional execution—loops, if/else/while, switch, etc.

    • saagarjha 2 hours ago

      Why would you like register and auto to have meaning?

      • thyristan an hour ago

        Because for timing-sensitive code, those are important. If a variable is really a register, cache-based timing attacks just don't happen, because there is no cache in between.

  • jfindper 14 hours ago

    >how compilers and compiler engineers are sabotaging the efforts of cryptographers

    I'm not exposed to this space very often, so maybe you or someone else could give me some context. "Sabotage" is a deliberate effort to ruin/hinder something. Are compiler engineers deliberately hindering the efforts of cryptographers? If yes... is there a reason why? Some long-running feud or something?

    Or, through the course of their efforts to make compilers faster/etc, are cryptographers just getting the "short end of the stick" so to speak? Perhaps forgotten about because the number of cryptographers is dwarfed by the number of non-cryptographers? (Or any other explanation that I'm unaware of?)

    • chowells 14 hours ago

      It's more a viewpoint thing. Any construct cryptographers find that runs in constant time is something that could be optimized to run faster for non-cryptographic code. Constant-time constructs essentially are optimizer bug reports. There is always the danger that by popularizing a technique you are drawing the attention of a compiler contributor who wants to speed up a benchmark of that same construct in non-cryptographic code. So maybe it's not intended as sabotage, but it can sure feel that way when everything you do is explicitly targeted to be changed after you do it.

    • stouset 14 hours ago

      It’s not intentional. The motivations of CPU designers, compiler writers, and optimizers are at odds with those of cryptographers. The former want to use every trick possible to squeeze out additional performance in the most common cases, while the latter absolutely require indistinguishable performance across all possibilities.

      CPUs love to do branch prediction to have computation already performed in the case where it guesses the branch correctly, but cryptographic code needs equal performance no matter the input.

      When a programmer asks for some register or memory location to be zeroed, they generally just want to be able to use a zero in some later operation and so it doesn’t really matter that a previous value was really overwritten. When a cryptographer does, they generally are trying to make it impossible to read the previous value. And they want to be able to have some guarantee that it wasn’t implicitly copied somewhere else in the interim.

    • layer8 13 hours ago

      “Sabotage” can be used in a figurative sense that doesn’t insinuate intent. An adjacent example is “self-sabotage”, which doesn’t imply intent.

  • fooker 14 hours ago

    > making sure there are no side-channels in their code

    Any side effect is a side channel. There are always going to be side channels in real code running on real hardware.

    Sure you can change your code, compiler, or, or even hardware to account for this but at it's core that is security by obscurity.

amluto 12 hours ago

Too bad that Intel chips more or less reserve the right to take LLVM’s nice output and make it non-constant-time anyway. See:

https://www.intel.com/content/www/us/en/developer/articles/t...

Sure, you could run on some hypothetical OS that supports DOITM and insert syscalls around every manipulation of secret data. Yeah, right.

  • JoshTriplett 11 hours ago

    Last I saw, it seemed like the plan was to unconditionally enable it, and on the off chance there's ever a piece of hardware where it's a substantial performance win, offer a way to opt out of it.

    • amluto 8 hours ago

      I advocated for that, and then I completely lost track of the status.

      The whole design is ridiculous.

  • stingraycharles 10 hours ago

    Sorry, I may be missing the point here, but reading that page doesn’t immediately make it obvious to me what that feature is. Is it some constant time execution mechanism that you can enable / disable on a per-thread basis to do… what exactly?

    • seanhunter 3 hours ago

      As a concrete example, say I have a (very naive and bad) password-checker that works like this pseudocode:

        > for i = 1 to len(real_password) {
        >   if entered_password[i] != real_password[i] {
        >      return FAILURE
        >   }
        > }
        >
        > return SUCCESS 
      
      OK now an alert attacker with the ability to very accurately record the time it takes to check the password can determine the length at least of the real password, because the time complexity of this check is O(length of the real password), and they could also gradually determine the password itself because the check would take longer as the attacker got each successive character correct.

      Taking this general idea and expanding it, there are lots of places where the timing of branches of code can leak information about some secret, so in cryptographic code in particular, it’s often beneficial to be able to ensure that two branches (the success and failure branches in the above) take exactly the same amount of time so the timing doesn’t leak information. So to fix the above you would probably want to do two things. Firstly set a boolean to failure and still continue the checking to ensure the “return failure quickly” problem doesn’t leak information and also change your password check to check against a fixed-width hash or something so the length of the password itself wasn’t a factor.

      The problem is lots of performance optimizations (pipelining, branch prediction etc) work specifically against this goal- they aim to take branches quickly in the happy path of the code because normally that’s what you want to ensure optimal performance.

      So say instead of the above I do

        > bool status = SUCCESS
        > for i = 1 to hash_length {
        >   if hash_of_entered_password[i] != hash_of_real_password[i] {
        >      status = FAILURE
        >   }
        > }
        >
        > return status 
      
      …I don’t want the optimizer to realize that when status becomes FAILURE it can never become SUCCESS again and the loop doesn’t do anything else so just return early. I want it to actually run the pointless comparison of the rest of the hash so the timing is exactly the same each time.

      But now my check is constant time but I’ve shifted the burden onto the person who writes the hash function. That has to run in constant time or my check will once again leak. So in general people want the ability to tell the compiler that they want a particular piece of code to run in constant time. At the moment, in the general case I think you have to break into inline assembly to achieve this.

      • rkagerer 2 hours ago

        Why not just always spin until a fixed number of ticks (or microseconds for slewing clocks) have passed (starting from function entry), prior to returning?

        Obviously this doesn't mitigate power usage side channel attacks, but that's not the point here.

        It's time-bound, so let's check time.

        • seanhunter an hour ago

          You could totally do that, but in the exact same way as the above, you'd want the compiler not to optimize your spinlock away thinking it wasn't needed. My understanding is in lots of real applications, the asm code that I mentioned is in part making sure it waits specific numbers of clocks in each branch to ensure they all exactly balance.

    • wat10000 10 hours ago

      It turns off CPU features that could cause execution time to vary in a way that depends on the data being operated on.

ethin 13 hours ago

So this makes me curious: is there a reason we don't do something like a __builtin_ct_begin()/__builtin_ct_end() set of intrinsics? Where the begin intrinsic begins a constant-time code region, and all code within that region must be constant-time, and that region must be ended with an end() call? I'm not too familiar with compiler intrinsics or how these things work so thought I'd ask. The intrinsic could be scoped such that the compiler can use it's implementation-defined behavior freedom to enforce the begin/end pairs. But Idk, maybe this isn't feasible?

  • zzo38computer 12 hours ago

    Maybe it might be better to be implemented as a function attribute instead

    • ethin 12 hours ago

      Or a pragma? Like how OpenMP did it?

charcircuit 6 hours ago

These are meaningless without guarantees that the processor will run the instructions in constant time and not run the code as fast as possible. Claims like cmov on x86 always being constant time are dangerous because a microcode update could change that to not be the case anymore. Programmers want an actual guarantee that the code will take the same amount of time.

We should be asking our CPU vendors to support enabling a constant time mode of some sort for sensitive operations.

  • Gibbon1 2 hours ago

    That's been one of my counters to the bitch that C isn't safe. The underlying architecture isn't safe.

    That said WG21 and WG14 don't seem to be able to get the memo that safety is more important than single core speed. Or as I suspect a bunch members are actually malicious.

zzo38computer 13 hours ago

I think __builtin_ct_select and __builtin_ct_expr would be good ideas. (They could also be implemented in GCC in future, as well as LLVM.)

In some cases it might be necessary to consider the possibility of invalid memory accesses (and avoid the side-channels when doing so). (The example given in the article works around this issue, but I don't know if there are any situations where this will not help.)

  • connicpu 13 hours ago

    The side channel from memory access timings are exactly why cmov is its own instruction on x86_64. It retrieves the memory regardless of the condition value. Anything else would change the timings based on condition. If you're going to segfault that's going to be visible to an attacker regardless because you're going to hang up.

    • wahern 13 hours ago

      AFAIU, cmov wasn't originally intended to be a guaranteed constant-time operation, Intel and AMD won't commit to keeping it constant-time in the future, but it just so happened that at one point it was implemented in constant-time across CPUs, cryptographers picked up on this and began using it, and now Intel and AMD tacitly recognize this dependency. See, e.g., https://www.intel.com/content/www/us/en/developer/articles/t...

      > The CMOVcc instruction runs in time independent of its arguments in all current x86 architecture processors. This includes variants that load from memory. The load is performed before the condition is tested. Future versions of the architecture may introduce new addressing modes that do not exhibit this property.

    • zzo38computer 13 hours ago

      I mean the possibility that the rest of the program guarantees that the address is valid if the condition is true but otherwise it might be valid or invalid. This is probably not important for most applications, but I don't know if there are some unusual ones where it would matter.