Thursday, 15 November 2012

Machinery Translation

I ported my Soma-cube-solving algorithm to C++, to see how the implementation would be affected by the language. This time, because there's no such thing as Literate C++, I've left the program listing to the very end. I tried to write idiomatic, readable C++. For example, I overloaded operators where that seemed mathematically plausible, rather than wherever possible. Similarly, I used STL machinery as I imagine its authors intended, and tried not to disappear down the pseudo-functional rabbit-hole suggested by STL functions like std::transform. I didn't try to second-guess compiler behaviour, letting functions return containers by value. And I avoided playing tricks involving the memory layout of data structures.

I ended up with a 388-line program (350 non-blank lines). That compares to about 150 lines in the Haskell version (about 105 non-blank lines) once the computation of statistics is included.

This is how the program performed:

$ g++ -O2 cube.cc -o cube
$ time ./cube > solutions-c.txt 
Possibilities (explored, kept) at each stage:
  (4,4)
  (256,130)
  (9360,2334)
  (168048,16625)
  (1596000,26403)
  (2534688,4080)
  (587520,240)
Total possibilities explored: 4895876
Total possibilities kept: 49816

real    0m0.032s
user    0m0.022s
sys 0m0.006s

The report of the work done is exactly the same as the one from the Haskell version, which is reassuring. (The solutions are also the same, though listed in a slightly different order.)

The C++ version is a little more than twice as fast: 0.03s instead of 0.07s.

What's not obvious from the source code or the numbers is how much I missed having any kind of interactive session while writing the C++ version. It's really useful, when writing code, to be able to evaluate a handful of expressions to see whether each function appears to do the right thing.

And, as promised, here's the source code:

#include <iostream>
#include <set>
#include <string>
#include <sstream>
#include <vector>

using namespace std;

enum Shape { L, S, T, R, P, Q, Y, kNumShapes };

const char* plan(Shape s) {
  switch (s) {
  case L: return
      "XXX"
      "X  ";
  case S: return
      " XX"
      "XX ";
  case T: return
      "XXX"
      " X ";
  case R: return
      "XX "
      "X  ";
  case P: return
      "X  "
      "X  "
      "   "
      "XX ";
  case Q: return
      " X "
      " X "
      "   "
      "XX ";
  case Y: return
      "XX "
      "X  "
      "   "
      "X  ";
  }
}

char name(Shape s) {
  switch (s) {
  case L: return 'L';
  case S: return 'S';
  case T: return 'T';
  case R: return 'R';
  case P: return 'P';
  case Q: return 'Q';
  case Y: return 'Y';
  }
}

const int N = 3;

struct Form { char a[N][N][N]; };

bool operator==(const Form& x, const Form& y) {
  for (int i = 0; i < N; ++i)
    for (int j = 0; j < N; ++j)
      for (int k = 0; k < N; ++k)
        if (x.a[i][j][k] != y.a[i][j][k]) return false;
  return true;
}

bool operator<(const Form& x, const Form& y) {
  for (int i = 0; i < N; ++i)
    for (int j = 0; j < N; ++j)
      for (int k = 0; k < N; ++k)
        if (x.a[i][j][k] < y.a[i][j][k]) return true;
        else if (x.a[i][j][k] > y.a[i][j][k]) return false;
  return false;
}

ostream& operator<<(ostream& o, const Form& f) {
  for (int j = 0; j < N; ++j) {
    if (j > 0) o << endl;
    for (int i = 0; i < N; ++i){
      if (i > 0) o << "  ";
      o << '|';
      for (int k = 0; k < N; ++k)
        o << f.a[i][j][k];
      o << '|';
    }
  }
  return o;
}

Form empty() {
  Form f;
  for (int i = 0; i < N; ++i)
    for (int j = 0; j < N; ++j)
      for (int k = 0; k < N; ++k)
        f.a[i][j][k] = ' ';
  return f;
}

Form operator+(const Form& x, const Form& y) {
  Form z;
  for (int i = 0; i < N; ++i)
    for (int j = 0; j < N; ++j)
      for (int k = 0; k < N; ++k) {
        char xa = x.a[i][j][k];
        char ya = y.a[i][j][k];
        char& za = z.a[i][j][k];
        if (xa == ' ' && ya == ' ') za = ' ';
        else if (ya == ' ') za = xa;
        else if (xa == ' ') za = ya;
        else za = '#';
      }
  return z;
}

string flatten(const Form& f) {
  ostringstream o;
  for (int i = 0; i < N; ++i)
    for (int j = 0; j < N; ++j)
      for (int k = 0; k < N; ++k)
        o << f.a[i][j][k];
  return o.str();
}

int as_bits(const Form& f) {
  int bits = 0;
  int n = 0;
  string s = flatten(f);
  for (string::const_iterator it = s.begin();
       it != s.end();
       ++it, ++n)
    if (*it != ' ')
      bits |= 1 << n;
  return bits;
}

int size(const Form& f) {
  int n = 0;
  string s = flatten(f);
  for (string::const_iterator it = s.begin();
       it != s.end();
       ++it)
    if (*it != ' ')
      ++n;
  return n;
}

Form default_form(Shape s) {
  const char* p = plan(s);
  Form f;
  for (int i = 0; i < N; ++i)
    for (int j = 0; j < N; ++j)
      for (int k = 0; k < N; ++k) {
        char& fa = f.a[i][j][k];
        switch(*p) {
        case 0: fa = ' '; break;
        case ' ': fa = ' '; ++p; break;
        default: fa = name(s); ++p;
        }
      }
  return f;
}

Form tx(const Form& f) {
  Form g;
  for (int i = 0; i < N; ++i)
    for (int j = 0; j < N; ++j)
      for (int k = 0; k < N; ++k)
        g.a[i][j][k] = (k == 0) ? ' ' : f.a[i][j][k-1];
  return g;
}

Form ty(const Form& f) {
  Form g;
  for (int i = 0; i < N; ++i)
    for (int j = 0; j < N; ++j)
      for (int k = 0; k < N; ++k)
        g.a[i][j][k] = (j == 0) ? ' ' : f.a[i][j-1][k];
  return g;
}

Form tz(const Form& f) {
  Form g;
  for (int i = 0; i < N; ++i)
    for (int j = 0; j < N; ++j)
      for (int k = 0; k < N; ++k)
        g.a[i][j][k] = (i == 0) ? ' ' : f.a[i-1][j][k];
  return g;
}

Form rx(const Form& f) {
  Form g;
  for (int i = 0; i < N; ++i)
    for (int j = 0; j < N; ++j)
      for (int k = 0; k < N; ++k)
        g.a[i][j][k] = f.a[j][N-1-i][k];
  return g;
}

Form ry(const Form& f) {
  Form g;
  for (int i = 0; i < N; ++i)
    for (int j = 0; j < N; ++j)
      for (int k = 0; k < N; ++k)
        g.a[i][j][k] = f.a[k][j][N-1-i];
  return g;
}

Form rz(const Form& f) {
  Form g;
  for (int i = 0; i < N; ++i)
    for (int j = 0; j < N; ++j)
      for (int k = 0; k < N; ++k)
        g.a[i][j][k] = f.a[i][N-1-k][j];
  return g;
}

vector<Form> majors(const Form& f) {
  vector<Form> result;
  result.push_back(f);
  result.push_back(ry(f));
  result.push_back(ry(ry(f)));
  result.push_back(ry(ry(ry(f))));
  result.push_back(rz(f));
  result.push_back(rz(rz(rz(f))));
  return result;
}

vector<Form> minors(const Form& f) {
  vector<Form> result;
  result.push_back(f);
  result.push_back(rx(f));
  result.push_back(rx(rx(f)));
  result.push_back(rx(rx(rx((f)))));
  return result;
}

bool in_bounds(Shape s, const Form& f) {
  return size(default_form(s)) == size(f);
}

vector<Form> shifts(Shape s) {
  vector<Form> result;
  Form fi = default_form(s);
  for (int i = 0; i < N; ++i) {
    Form fj = fi;
    for (int j = 0; j < N; ++j) {
      Form fk = fj;
      for (int k = 0; k < N; ++k) {
        if (in_bounds(s, fk))
          result.push_back(fk);
        fk = tz(fk);
      }
      fj = ty(fj);
    }
    fi = tx(fi);
  }
  return result;
}

vector<Form> all_forms(Shape s) {
  vector<Form> result;
  if (s == L) {
    Form f = default_form(s);
    result.push_back(f);
    result.push_back(ty(f));
    result.push_back(tz(f));
    result.push_back(ty(tz(f)));
  } else {
    set<Form> distinct_forms;
    vector<Form> f_sh = shifts(s);
    for (vector<Form>::const_iterator it_sh = f_sh.begin();
         it_sh != f_sh.end();
         ++it_sh) {
      vector<Form> f_min = minors(*it_sh);
      for (vector<Form>::const_iterator it_min = f_min.begin();
           it_min != f_min.end();
           ++it_min) {
        vector<Form> f_maj = majors(*it_min);
        for (vector<Form>::const_iterator it_maj = f_maj.begin();
             it_maj != f_maj.end();
             ++it_maj) {
          if (distinct_forms.find(*it_maj) == distinct_forms.end()) {
            result.push_back(*it_maj);
            distinct_forms.insert(*it_maj);
          }
        }
      }
    }
  }
  return result;
}

typedef pair<Form, int> FormBits;

vector<vector<FormBits> > combos() {
  vector<vector<FormBits> > result;
  for (int s = 0; s < kNumShapes; ++s) {
    result.push_back(vector<FormBits>());
    vector<FormBits>& forms_and_bits = result.back();
    vector<Form> forms = all_forms(static_cast<Shape>(s));
    for (vector<Form>::const_iterator it = forms.begin();
         it != forms.end();
         ++it) {
      forms_and_bits.push_back(FormBits(*it, as_bits(*it)));
    }
  }
  return result;
}

struct Stats {
  Stats() : tried(0), kept(0) {}
  int tried;
  int kept;
};

Stats operator+(const Stats& x, const Stats& y) {
  Stats z;
  z.tried = x.tried + y.tried;
  z.kept = x.kept + y.kept;
  return z;
}

template<typename C>
struct CompareBySize {
  bool operator()(const C& x, const C& y) const {
    return x.size() < y.size();
  }
};

pair<vector<Form>, vector<Stats> > solutions() {
  vector<vector<FormBits> > cs = combos();

  sort(cs.begin(), cs.end(), CompareBySize<vector<FormBits> >());

  vector<FormBits> partials;
  vector<Stats> stats;
  partials.push_back(FormBits(empty(), 0));

  for (vector<vector<FormBits> >::const_iterator fbs = cs.begin();
       fbs != cs.end();
       ++fbs) {
    vector<FormBits> next_partials;
    stats.push_back(Stats());
    Stats& partial_stats = stats.back();
    for (vector<FormBits>::const_iterator p = partials.begin();
         p != partials.end();
         ++p)
      for (vector<FormBits>::const_iterator fb = fbs->begin();
           fb != fbs->end();
           ++fb) {
        ++partial_stats.tried;
        if ((p->second & fb->second) == 0) {
          next_partials.push_back(FormBits(p->first + fb->first,
                                           p->second | fb->second));
          ++partial_stats.kept;
        }
      }
    partials.swap(next_partials);
  }

  pair<vector<Form>, vector<Stats> > result;
  for (vector<FormBits>::const_iterator p = partials.begin();
       p != partials.end();
       ++p)
    result.first.push_back(p->first);
  result.second.swap(stats);
  return result;
}

int main(int argc, char* argv[]) {
  pair<vector<Form>, vector<Stats> > sols = solutions();
  for (vector<Form>::const_iterator it = sols.first.begin();
       it != sols.first.end();
       ++it)
    cout << *it << endl << endl;

  Stats totals;
  cerr << "Possibilities (explored, kept) at each stage:" << endl;
  for (vector<Stats>::const_iterator it = sols.second.begin();
       it != sols.second.end();
       ++it) {
    totals = totals + *it;
    cerr << "  (" << it->tried << ", " << it->kept << ')' << endl;
  }
  cerr << "Total possibilities explored: " << totals.tried << endl;
  cerr << "Total possibilities kept: " << totals.kept << endl;
  return 0;
}

Tuesday, 13 November 2012

A Better Sort of Input

In a previous post I developed a backtracking algorithm for finding solutions of a Soma cube. It was good to discover that an optimized version of the program would find all the solutions in only 1.2s.

Later, I began to wonder how much of the speed was down to being able to prune the solution space effectively, versus simply being able to check each candidate solution very quickly. A revised version of the program emits statistics to help find out.

> module CountingCube where
> import Cube hiding (main)
> import Data.Bits
> import Data.Monoid (mappend, mempty)
> import Data.List (intercalate, sortBy)
> import Data.Ord (comparing)
> import Data.Word
> import System.IO

In this version, the computation returns not just a list of solutions, but also a list of details of how much work was done for each additional shape folded into the partial solution. (The work list has to be reversed once it is complete because it is built up backwards.)

> countedSolutions = (map fst pbs, reverse work) where
>     (pbs, work) = foldr addForm ([(mempty, 0)], []) combos

For each shape, all the forms of that shape have to be tried against all the partial solutions built so far, so the number of candidates tried will be the product of those two lists' lengths. The number of candidates kept is just the number of partial solutions resulting from this step of the fold.

> addForm form (partialSolutions, work) = (partialSolutions', work') where
>     partialSolutions' = [(ps `mappend` f, psb .|. fb) |
>                          (f, fb) <- form,
>                          (ps, psb) <- partialSolutions,
>                          psb .&. fb == 0]
>     work' = (tried, kept) : work
>     tried = length form * length partialSolutions
>     kept = length partialSolutions'

As before, the program prints out all the solutions, separated by newlines. But this time it also describes how much work was done.

> printSolutions (solutions, work) = do
>   putStrLn $ intercalate "\n" [show s | s <- solutions]
>   hPutStrLn stderr $ unlines $ [
>                  "Possibilities (explored, kept) at each stage:"
>                 ] ++ map (("  " ++) . show) work ++ [
>                  "Total possibilities explored: " ++ show (sum (map fst work)),
>                  "Total possibilities kept: " ++ show (sum (map snd work))
>                  ]

> main = printSolutions countedSolutions

How did that turn out?

$ ghc --make -main-is CountingCube -O2 CountingCube.lhs
[2 of 2] Compiling CountingCube     ( CountingCube.lhs, CountingCube.o )
Linking CountingCube ...
$ time ./CountingCube > solutions.txt 
Possibilities (explored, kept) at each stage:
  (64,64)
  (6144,2544)
  (244224,38448)
  (5536512,805584)
  (58002048,1817232)
  (130840704,552384)
  (2209536,240)
Total possibilities explored: 196839232
Total possibilities kept: 3216496


real    0m3.594s
user    0m3.374s
sys 0m0.212s

Because the fold is right-associative, it starts with the last shape in the list: Y. As can be seen, the Y was tried in all 64 positions in which it could appear, and was found to fit in all of them - just as expected, given that no shapes had been added ahead of it. The next shape to be tried fitted in the position tried 2544 out of 6144 times - less than half of the time.

Overall, the number of moves attempted was 196,839,232. The entire solution space (up to rotations and reflections) has 268,435,456 possibilities, but each of seven shapes has to be placed for each one of those, so a completely stupid non-backtracking implementation would try 1,879,048,192 moves. So pruning of the solution space like this avoids about 90% of the work.

But it gets better. While writing a version of the same algorithm in C++ (of which more later) I noticed that the total work done was different, even though the number of solutions ended up being the same, simply because the shapes were being considered in a different order. Of course ... so it would be better to try to place the shapes having fewer forms first, because they constrain the solution space more so there should be less backtracking to do.

Note that smarterSolutions here is identical to countedSolutions above, except that it sorts the input first.

> smarterSolutions = (map fst pbs, reverse work) where
>     (pbs, work) = foldr addForm
>                   ([(mempty, 0)], [])
>                   (reverse $ sortBy (comparing length) combos)

> main' = printSolutions smarterSolutions

GHC --make unfortunately doesn't seem to tell the difference between programs built with different -main-is settings, so the previous run's outputs need to be deleted first.

$ rm CountingCube CountingCube.o CountingCube.hi
$ ghc --make -main-is "CountingCube.main'" -O2 CountingCube.lhs
[2 of 2] Compiling CountingCube     ( CountingCube.lhs, CountingCube.o )
Linking CountingCube ...
$ time ./CountingCube > solutions.txt 
Possibilities (explored, kept) at each stage:
  (4,4)
  (256,130)
  (9360,2334)
  (168048,16625)
  (1596000,26403)
  (2534688,4080)
  (587520,240)
Total possibilities explored: 4895876
Total possibilities kept: 49816


real    0m0.070s
user    0m0.061s
sys 0m0.007s

That really was a valuable optimisation! Instead of trying nearly 200,000,000 moves, this version explores only 4,895,876 - it is over 40 times more efficient. And the solutions are now all found in under a tenth of a second. I am embarrassed that I didn't think of this sooner.

Thursday, 8 November 2012

When Simpler Isn't Better

In a previous post I investigated the number of ways of solving a Soma cube by writing a Haskell program to find out for me. I was pleased to find that an optimised version of the program took about 1.2 seconds to find 240 solutions.

In that program, I represented each partial solution and each position of a shape as a 27-bit number, allowing simple bitwise operations to combine forms and filter out the invalid ones. Was that a premature optimisation? It would definitely have been faster in terms of programming time not to write all the conversion to bit patterns, and to operate directly on the representation of forms as lists of characters. But would the resulting program have run as fast?

A different version of just a few functions will show whether the extra work was necessary.

> module DumbCube where
> import Cube hiding (main)
> import Data.Monoid (mappend)
> import Data.List (intercalate)

For each shape, all the forms are precalculated, but not the bit representations that Cube.combos computes.

> dumbCombos = [allForms shape | shape <- enumFrom L]

Likewise, the fold to enumerate solutions combines form representations directly, instead of working on any bit representations.

> dumbSolutions = foldr1 addForm dumbCombos where
>   addForm partialSolution form = [cf |
>                                   cf <- [(pf `mappend` f) |
>                                          f <- form,
>                                          pf <- partialSolution],
>                                   not ('#' `elem` flatten cf)]

Finally, as before, print out all the solutions, separated by newlines.

> main = putStrLn $ intercalate "\n" [show s | s <- dumbSolutions]

How fast is the result now?

$ ghc --make -main-is DumbCube -O2 DumbCube.lhs
[1 of 2] Compiling Cube             ( Cube.lhs, Cube.o )
[2 of 2] Compiling DumbCube         ( DumbCube.lhs, DumbCube.o )
Linking DumbCube ...
$ time ./DumbCube > solutions.txt 

real    4m0.143s
user    3m58.932s
sys 0m1.072s

This slightly simpler program produces the same output but takes about 240 seconds - almost exactly one second per solution, on average. That also means it's almost exactly 200 times slower than the original program with its bitwise representation. That conversion to patterns of 27 bits obviously was worthwhile!