Bitslicing with Quine-McCluskey

Data Orthogonalization for Cryptography

Part one gave a short introduction of bitslicing as a concept, talked about its use cases, truth tables, software multiplexers, LUTs, and manual optimization.

The second covered Karnaugh mapping, a visual method to simplify Boolean algebra expressions that takes advantage of humans’ pattern-recognition capability, but is unfortunately limited to at most four inputs in its original variant.

Part three will introduce the Quine-McCluskey algorithm, a tabulation method that, in combination with Petrick’s method, can minimize circuits with an arbitrary number of input values. Both are relatively simple to implement in software.

Part 1: Bitslicing, An Introduction
Part 2: Bitslicing with Karnaugh maps
Part 3: Bitslicing with Quine-McCluskey

The Quine-McCluskey algorithm

Here is the 3-to-2-bit S-box from the previous posts again:

uint8_t SBOX[] = { 1, 0, 3, 1, 2, 2, 3, 0 };

Without much ado, we’ll jump right in and bitslice functions for both its output bits in parallel. You’ll probably recognize a few similarities to K-maps, except that the steps are rather mechanical and don’t require visual pattern-recognition abilities.

Step 1: Listing minterms

The lookup table SBOX[] can be expressed as the Boolean functions fL(a,b,c) and fR(a,b,c). Here are their truth tables, with each combination of inputs assigned a symbol mi. Rows m0-m7 will be called minterms.

fL(a,b,c)
a b c fL
m00000
m10010
m20101
m30110
m41001
m51011
m61101
m71110
fR(a,b,c)
a b c fR
m00001
m10010
m20101
m30111
m41000
m51010
m61101
m71110

We’re interested only in the minterms where the function evaluates to 1 and will ignore all others. Boolean functions can already be constructed with just those tables. In Boolean algebra, OR can be expressed as addition, AND as multiplication. The negation of x is represented by x.

fL(a,b,c) = ∑ m(2,4,5,6)
          = m2 + m4 + m5 + m6
          = abc + abc + abc + abc

fR(a,b,c) = ∑ m(0,2,3,6)
          = m0 + m2 + m3 + m6
          = abc + abc + abc + abc

Well, that’s a start. Translated into C, these functions would be constant-time but not even close to minimal.

Step 2: Bit Buckets

Now that we have all these minterms, we’ll put them in separate buckets based on the number of 1s in their inputs a, b, and c.

fL(a,b,c)
# of 1s minterm binary
1m2010
m4100
2m5101
m6110
fR(a,b,c)
# of 1smintermbinary
0m0000
1m2010
2m3011
m6110

The reasoning here is the same as the Gray code ordering for Karnaugh maps. If we start with the minterms in the first bucket n, only bucket n+1 might contain matching minterms where only a single variable changes. They can’t be in any of the other buckets.

Step 3: Merging minterms

Why would you even look for pairs of minterms with a one-variable difference? Because they can be merged to simplify our expression. These combinations are called minterms of size 2.

All minterms have output 1, so if the only difference is exactly one input variable, then the output is independent of it. For example, (a & ~b & c) | (a & b & c) can be reduced to just a & c, the expression value is independent of b.

fL(a,b,c)
# of 1s minterm binary size-2
1m2010m2,6—10
m4100m4,510—
m4,61—0
2m5101
m6110
fR(a,b,c)
# of 1smintermbinarysize-2
0m0000m0,20—0
1m2010m2,301—
m2,6—10
2m3011
m6110

Always start with the minterms in the very first bucket at the top of the table. For every minterm in bucket n, we try to find a minterm in bucket n+1 with a one-bit difference in the binary column. Any matches will be recorded as pairs and entered into the size-2 column of bucket n.

m2=010 and m6=110 for example differ in only the first input variable, a. They merge into m2,6=—10, with a dash marking the position of the irrelevant input bit.

Once all minterms were combined (as far as possible), we’ll continue with the next size. Minterms of size bigger than 1 have dashes for irrelevant input bits and it’s important to treat those as a “third bit value”. In other words, their dashes must be at the same positions, otherwise they can’t be merged.

There’s nothing left to merge for fL(a,b,c) as all its size-2 minterms are in the first bucket. For fR(a,b,c), none of the size-2 minterms in the first bucket match any of those in the second, their dashes are all in different positions.

Step 4: Prime Implicants

All minterms from the previous step that can’t be combined any further are called prime implicants. Entering them into a table let’s us check how well they cover the original minterms determined by step 1.

If any prime implicant is the only one to cover a minterm, it’s called an essential prime implicant (marked with an asterisk). It’s essential because it must be included in the resulting minimal form, otherwise we’d miss one of the input values combinations.

fL(a,b,c)
m2 m4 m5 m6 abc
m2,6*xx-10
m4,5*xx10-
m4,6 xx1-0
fR(a,b,c)
m0 m2 m3 m6 abc
m0,2*xx0-0
m2,3*xx01-
m2,6*xx-10

Prime implicant m2,6* on the left for example is the only one that covers m2. m4,5* is the only one that covers m5. Not only is m4,6 not essential, but we actually don’t need it at all: m4 and m6 are already covered by the essential prime implicants. All prime implicants of fR(a,b,c) are essential, so we need all of them.

When bitslicing functions with many input variables it may happen that you are left with a number of non-essential prime implicants that can be combined in various ways to cover the missing minterms. Petrick’s method helps finding a minimum solution. It’s tedious to do manually, but not hard to automate.

Step 5: Minimal Forms

Finally, we derive minimal forms of our Boolean functions by looking at the abc column of the essential prime implicants. Input variables marked with dashes are ignored.

fL(a,b,c) = m2,6 + m4,5 = bc + ab

The code for SBOXL() with 8-bit inputs:

uint8_t SBOXL(uint8_t a, uint8_t b, uint8_t c) {
  return (b & ~c) | (a & ~b);
}

fR(a,b,c), reduced to the combination of its three essential prime implicants:

fR(a,b,c) = m0,2 + m2,3 + m2,6 = ac + ab + bc

And SBOXR() as expected:

uint8_t SBOXR(uint8_t a, uint8_t b, uint8_t c) {
  return (~a & ~c) | (~a & b) | (b & ~c);
}

Combining SBOXL() and SBOXR() yields the familiar version of SBOX(), after eliminating common subexpressions and taking out common factors.

void SBOX(uint8_t a, uint8_t b, uint8_t c, uint8_t* l, uint8_t* r) {
  uint8_t na = ~a;
  uint8_t nb = ~b;
  uint8_t nc = ~c;

  uint8_t t0 = b & nc;
  uint8_t t1 = b | nc;

  *l = (a & nb) | t0;
  *r = (na & t1) | t0;
}

Bitslicing a DES S-box

When I started writing this blog post I thought it would be nice to ditch the small S-box from the previous posts, and naively bitslice a “real” S-box, like the ones used in DES.

But these are 6-to-4-bit S-boxes, how much more effort can it be? As it turns out, humans are terrible at understanding exponential growth. Here are my intermediate results after an hour of writing, trying to bitslice just one of the four output bits:

I gave up when I spotted a few mistakes that would likely lead to a non-minimal solution. Bitslicing a function with that many input variables manually is laborious and probably not worth it, except that it definitely helped me understand the steps of the algorithm better.

As mentioned in the beginning, Quine-McCluskey and Petrick’s method can be implemented in software rather easily, so that’s what I did instead. I’ll explain how, and what to consider, in the next post.