Modern processors batch many lines of code to optimize for speed. What happens in case of a branche? Learn how to branches can slow down the CPU.

Brancheless Programming Python example.

In 1986 I would fire up Turbo Assembler on my C64 and write assembly code that was blazingly fast. The processor would process one instruction per cycle and the world was easy and OK.

Action factory

Since then, chip manufactures did many things to improve processing power. They make transitors smaller and fit more and more on one wafer. I hear a transistor can be made with a 1000x1000x1000 atoms these days.

But shrinking CPU’s is not the only way to make processors faster anymore. Moore’s law is not only about more transistors but also about optimizing the code. By this I mean the code in the processor.

One of the tricks of modern CPU’s is batching instructions. Many lines of code are loaded and analysed at once. But what happens if there are branches? Well, it turns out that branches have a high chance of going one direction and the CPU optimizes for that branche. But in case the branche goes the other direction, the pipe is flushed and there is a performance hit.

This video Branchless Programming: Why If is Sloowww… and what we can do about it! shows two examples in c++. It shows the difference between the branched and brancheless code because the assembly code generated from the branched c++ code has j(ump) opcodes and the brancheless does not.

Python disassemble

How does Python deal with branched and branchless code? Python has a disassembler that we use to show the byte code that is generated.

Here is a function that returns the smallest number of any two given numbers:

def smaller(a, b):
if a < b:
return a
else:
return b

print(smaller(1, 2))
print(smaller(10, 20))

Output:

1
10

To show the generated byte code, you can use the dis function from the sys module. Here is an example. I’ve omitted the function calls.

import dis

def smaller(a, b):
if a < b:
return a
else:
return b

dis.dis(smaller)

Output:

4           0 LOAD_FAST                0 (a)
    2 LOAD_FAST                1 (b)
    4 COMPARE_OP               0 (<)
    6 POP_JUMP_IF_FALSE       12

5           8 LOAD_FAST                0 (a)
   10 RETURN_VALUE

7     >>   12 LOAD_FAST                1 (b)
   14 RETURN_VALUE
   16 LOAD_CONST               0 (None)
   18 RETURN_VALUE

And there we see the JUMP statement: 6 POP_JUMP_IF_FALSE 12. This indicates a branche: If FALSE, jump to 12. If not, continue

Branche

Brancheless Byte Code

Let’s rewrite the program. The function still uses relational expressions but does not use them to branche from it.

import dis

def smaller(a, b):
return a * (a < b) + b * (b <= a)

dis.dis(smaller)

Output:

4           0 LOAD_FAST                0 (a)
    2 LOAD_FAST                0 (a)
    4 LOAD_FAST                1 (b)
    6 COMPARE_OP               0 (<)
    8 BINARY_MULTIPLY
   10 LOAD_FAST                1 (b)
   12 LOAD_FAST                1 (b)
   14 LOAD_FAST                0 (a)
   16 COMPARE_OP               1 (<=)
   18 BINARY_MULTIPLY
   20 BINARY_ADD
   22 RETURN_VALUE

The re-written programm does not use branches anymore but the effect of the function is still the same!

Conclusion

Writing brancheless code can speed up the code. However we are talking about fractions of seconds. This can be interesting if you are doing something in a loop and repeat it over and over.

But you probably know enough about code to make a smart decision when it comes down to write readable code that can be understood by you future you. The question is: Do you want to write an if/else statement that anyone understands, or a more cryptic line of code that prevents branching?

That said, it is still interesting to know about branched code v.d. brancheless code.

Written by Loek van den Ouweland on 2020-07-10.
Questions regarding this artice? You can send them to the address below.