Bitslicing with Quine-McCluskey

Data Orthogonalization for Cryptography

August 27, 2018

Part one gave a short introduction of bitslicing as a concept, talked about its use cases, truth tables, software multiplexers, LUTs, and manual optimization.

The second covered Karnaugh mapping, a visual method to simplify Boolean algebra expressions that takes advantage of humans’ pattern-recognition capability, but is unfortunately limited to at most four inputs in its original variant.

Part three will introduce the Quine-McCluskey algorithm, a tabulation method that, in combination with Petrick’s method, can minimize circuits with an arbitrary number of input values. Both are relatively simple to implement in software.

Part 1: Bitslicing, An Introduction
Part 2: Bitslicing with Karnaugh maps
Part 3: Bitslicing with Quine-McCluskey

The Quine-McCluskey algorithm

Here is the 3-to-2-bit S-box from the previous posts again:

uint8_t SBOX[] = { 1, 0, 3, 1, 2, 2, 3, 0 };

Without much ado, we’ll jump right in and bitslice functions for both its output bits in parallel. You’ll probably recognize a few similarities to K-maps, except that the steps are rather mechanical and don’t require visual pattern-recognition abilities.

Step 1: Listing minterms

The lookup table SBOX[] can be expressed as the Boolean functions f_L(a,b,c) and f_R(a,b,c). Here are their truth tables, with each combination of inputs assigned a symbol m_i. Rows m₀-m₇ will be called minterms.

f_L(a,b,c)
	a	b	c	f_L
m₀	0	0	0	0
m₁	0	0	1	0
m₂	0	1	0	1
m₃	0	1	1	0
m₄	1	0	0	1
m₅	1	0	1	1
m₆	1	1	0	1
m₇	1	1	1	0

f_R(a,b,c)
	a	b	c	f_R
m₀	0	0	0	1
m₁	0	0	1	0
m₂	0	1	0	1
m₃	0	1	1	1
m₄	1	0	0	0
m₅	1	0	1	0
m₆	1	1	0	1
m₇	1	1	1	0

We’re interested only in the minterms where the function evaluates to 1 and will ignore all others. Boolean functions can already be constructed with just those tables. In Boolean algebra, OR can be expressed as addition, AND as multiplication. The negation of x is represented by x.

f_L(a,b,c) = ∑ m(2,4,5,6)
          = m₂ + m₄ + m₅ + m₆
          = abc + abc + abc + abc

f_R(a,b,c) = ∑ m(0,2,3,6)
          = m₀ + m₂ + m₃ + m₆
          = abc + abc + abc + abc

Well, that’s a start. Translated into C, these functions would be constant-time but not even close to minimal.

Step 2: Bit Buckets

Now that we have all these minterms, we’ll put them in separate buckets based on the number of 1s in their inputs a, b, and c.

f_L(a,b,c)
# of 1s	minterm	binary
1	m₂	010
	m₄	100
2	m₅	101
	m₆	110

f_R(a,b,c)
# of 1s	minterm	binary
0	m₀	000
1	m₂	010
2	m₃	011
	m₆	110

The reasoning here is the same as the Gray code ordering for Karnaugh maps. If we start with the minterms in the first bucket n, only bucket n+1 might contain matching minterms where only a single variable changes. They can’t be in any of the other buckets.

Step 3: Merging minterms

Why would you even look for pairs of minterms with a one-variable difference? Because they can be merged to simplify our expression. These combinations are called minterms of size 2.

All minterms have output 1, so if the only difference is exactly one input variable, then the output is independent of it. For example, (a & ~b & c) | (a & b & c) can be reduced to just a & c, the expression value is independent of b.

f_L(a,b,c)
# of 1s	minterm	binary	size-2
1	m₂	010	m_2,6	—10
	m₄	100	m_4,5	10—
			m_4,6	1—0
2	m₅	101
	m₆	110

f_R(a,b,c)
# of 1s	minterm	binary	size-2
0	m₀	000	m_0,2	0—0
1	m₂	010	m_2,3	01—
			m_2,6	—10
2	m₃	011
	m₆	110

Always start with the minterms in the very first bucket at the top of the table. For every minterm in bucket n, we try to find a minterm in bucket n+1 with a one-bit difference in the binary column. Any matches will be recorded as pairs and entered into the size-2 column of bucket n.

m₂=010 and m₆=110 for example differ in only the first input variable, a. They merge into m_2,6=—10, with a dash marking the position of the irrelevant input bit.

Once all minterms were combined (as far as possible), we’ll continue with the next size. Minterms of size bigger than 1 have dashes for irrelevant input bits and it’s important to treat those as a “third bit value”. In other words, their dashes must be at the same positions, otherwise they can’t be merged.

There’s nothing left to merge for f_L(a,b,c) as all its size-2 minterms are in the first bucket. For f_R(a,b,c), none of the size-2 minterms in the first bucket match any of those in the second, their dashes are all in different positions.

Step 4: Prime Implicants

All minterms from the previous step that can’t be combined any further are called prime implicants. Entering them into a table let’s us check how well they cover the original minterms determined by step 1.

If any prime implicant is the only one to cover a minterm, it’s called an essential prime implicant (marked with an asterisk). It’s essential because it must be included in the resulting minimal form, otherwise we’d miss one of the input values combinations.

f_L(a,b,c)
	m₂	m₄	m₅	m₆	abc
m_2,6*	x			x	-10
m_4,5*		x	x		10-
m_4,6		x		x	1-0

f_R(a,b,c)
	m₀	m₂	m₃	m₆	abc
m_0,2*	x	x			0-0
m_2,3*		x	x		01-
m_2,6*		x		x	-10

Prime implicant m_2,6* on the left for example is the only one that covers m₂. m_4,5* is the only one that covers m₅. Not only is m_4,6 not essential, but we actually don’t need it at all: m₄ and m₆ are already covered by the essential prime implicants. All prime implicants of f_R(a,b,c) are essential, so we need all of them.

When bitslicing functions with many input variables it may happen that you are left with a number of non-essential prime implicants that can be combined in various ways to cover the missing minterms. Petrick’s method helps finding a minimum solution. It’s tedious to do manually, but not hard to automate.

Step 5: Minimal Forms

Finally, we derive minimal forms of our Boolean functions by looking at the abc column of the essential prime implicants. Input variables marked with dashes are ignored.

f_L(a,b,c) = m_2,6 + m_4,5 = bc + ab

The code for SBOXL() with 8-bit inputs:

uint8_t SBOXL(uint8_t a, uint8_t b, uint8_t c) {
  return (b & ~c) | (a & ~b);
}

f_R(a,b,c), reduced to the combination of its three essential prime implicants:

f_R(a,b,c) = m_0,2 + m_2,3 + m_2,6 = ac + ab + bc

And SBOXR() as expected:

uint8_t SBOXR(uint8_t a, uint8_t b, uint8_t c) {
  return (~a & ~c) | (~a & b) | (b & ~c);
}

Combining SBOXL() and SBOXR() yields the familiar version of SBOX(), after eliminating common subexpressions and taking out common factors.

void SBOX(uint8_t a, uint8_t b, uint8_t c, uint8_t* l, uint8_t* r) {
  uint8_t na = ~a;
  uint8_t nb = ~b;
  uint8_t nc = ~c;

  uint8_t t0 = b & nc;
  uint8_t t1 = b | nc;

  *l = (a & nb) | t0;
  *r = (na & t1) | t0;
}

Bitslicing a DES S-box

When I started writing this blog post I thought it would be nice to ditch the small S-box from the previous posts, and naively bitslice a “real” S-box, like the ones used in DES.

But these are 6-to-4-bit S-boxes, how much more effort can it be? As it turns out, humans are terrible at understanding exponential growth. Here are my intermediate results after an hour of writing, trying to bitslice just one of the four output bits:

I gave up when I spotted a few mistakes that would likely lead to a non-minimal solution. Bitslicing a function with that many input variables manually is laborious and probably not worth it, except that it definitely helped me understand the steps of the algorithm better.

As mentioned in the beginning, Quine-McCluskey and Petrick’s method can be implemented in software rather easily, so that’s what I did instead. I’ll explain how, and what to consider, in the next post.