Part one gave a short introduction of bitslicing as a concept, talked about its use cases, truth tables, software multiplexers, LUTs, and manual optimization.
The second covered Karnaugh mapping, a visual method to simplify Boolean algebra expressions that takes advantage of humans’ pattern-recognition capability, but is unfortunately limited to at most four inputs in its original variant.
Part three will introduce the Quine-McCluskey algorithm, a tabulation method that, in combination with Petrick’s method, can minimize circuits with an arbitrary number of input values. Both are relatively simple to implement in software.
The Quine-McCluskey algorithm
Here is the 3-to-2-bit S-box from the previous posts again:
Without much ado, we’ll jump right in and bitslice functions for both its output bits in parallel. You’ll probably recognize a few similarities to K-maps, except that the steps are rather mechanical and don’t require visual pattern-recognition abilities.
Step 1: Listing minterms
The lookup table
SBOX can be expressed as the Boolean functions
fL(a,b,c) and fR(a,b,c). Here are their truth tables,
with each combination of inputs assigned a symbol mi. Rows
m0-m7 will be called minterms.
We’re interested only in the minterms where the function evaluates to
will ignore all others. Boolean functions can already be constructed with just
those tables. In Boolean algebra,
OR can be expressed as addition, AND as multiplication. The negation of x
is represented by x.
fL(a,b,c) = ∑ m(2,4,5,6) = m2 + m4 + m5 + m6 = abc + abc + abc + abc fR(a,b,c) = ∑ m(0,2,3,6) = m0 + m2 + m3 + m6 = abc + abc + abc + abc
Well, that’s a start. Translated into C, these functions would be constant-time but not even close to minimal.
Step 2: Bit Buckets
Now that we have all these minterms, we’ll put them in separate buckets based
on the number of
1s in their inputs a, b, and c.
|# of 1s||minterm||binary|
|# of 1s||minterm||binary|
The reasoning here is the same as the Gray code ordering for Karnaugh maps. If we start with the minterms in the first bucket n, only bucket n+1 might contain matching minterms where only a single variable changes. They can’t be in any of the other buckets.
Step 3: Merging minterms
Why would you even look for pairs of minterms with a one-variable difference? Because they can be merged to simplify our expression. These combinations are called minterms of size 2.
All minterms have output
1, so if the only difference is exactly one input
variable, then the output is independent of it. For example,
(a & ~b & c) | (a & b & c)
can be reduced to just
a & c, the expression value is independent of b.
|# of 1s||minterm||binary||size-2|
|# of 1s||minterm||binary||size-2|
Always start with the minterms in the very first bucket at the top of the table. For every minterm in bucket n, we try to find a minterm in bucket n+1 with a one-bit difference in the binary column. Any matches will be recorded as pairs and entered into the size-2 column of bucket n.
m2=010 and m6=110 for example differ in only the first input variable, a. They merge into m2,6=—10, with a dash marking the position of the irrelevant input bit.
Once all minterms were combined (as far as possible), we’ll continue with the next size. Minterms of size bigger than 1 have dashes for irrelevant input bits and it’s important to treat those as a “third bit value”. In other words, their dashes must be at the same positions, otherwise they can’t be merged.
There’s nothing left to merge for fL(a,b,c) as all its size-2 minterms are in the first bucket. For fR(a,b,c), none of the size-2 minterms in the first bucket match any of those in the second, their dashes are all in different positions.
Step 4: Prime Implicants
All minterms from the previous step that can’t be combined any further are called prime implicants. Entering them into a table let’s us check how well they cover the original minterms determined by step 1.
If any prime implicant is the only one to cover a minterm, it’s called an essential prime implicant (marked with an asterisk). It’s essential because it must be included in the resulting minimal form, otherwise we’d miss one of the input values combinations.
Prime implicant m2,6* on the left for example is the only one that covers m2. m4,5* is the only one that covers m5. Not only is m4,6 not essential, but we actually don’t need it at all: m4 and m6 are already covered by the essential prime implicants. All prime implicants of fR(a,b,c) are essential, so we need all of them.
When bitslicing functions with many input variables it may happen that you are left with a number of non-essential prime implicants that can be combined in various ways to cover the missing minterms. Petrick’s method helps finding a minimum solution. It’s tedious to do manually, but not hard to automate.
Step 5: Minimal Forms
Finally, we derive minimal forms of our Boolean functions by looking at the abc column of the essential prime implicants. Input variables marked with dashes are ignored.
fL(a,b,c) = m2,6 + m4,5 = bc + ab
The code for
SBOXL() with 8-bit inputs:
fR(a,b,c), reduced to the combination of its three essential prime implicants:
fR(a,b,c) = m0,2 + m2,3 + m2,6 = ac + ab + bc
SBOXR() as expected:
SBOXR() yields the familiar version of
eliminating common subexpressions and taking out common factors.
Bitslicing a DES S-box
When I started writing this blog post I thought it would be nice to ditch the small S-box from the previous posts, and naively bitslice a “real” S-box, like the ones used in DES.
But these are 6-to-4-bit S-boxes, how much more effort can it be? As it turns out, humans are terrible at understanding exponential growth. Here are my intermediate results after an hour of writing, trying to bitslice just one of the four output bits:
I gave up when I spotted a few mistakes that would likely lead to a non-minimal solution. Bitslicing a function with that many input variables manually is laborious and probably not worth it, except that it definitely helped me understand the steps of the algorithm better.
As mentioned in the beginning, Quine-McCluskey and Petrick’s method can be implemented in software rather easily, so that’s what I did instead. I’ll explain how, and what to consider, in the next post.