Solving a gamebook : a smarter solver

This post is part of the "Gamebook solver" serie:

In the previous episode, a "simple" solver was presented, along with ideas to make it go much faster (namely memoization and parallelism). It is however possible to do much better, with only a little bit of work.

A simple observation

In order to compare two possible choices, the previously presented algorithm scores them, and chooses the solution with the highest score. The score is equal to the score of all possible sub-solutions (corresponding to all possible outcomes of a choice), weighted by their probability.

This means that all possible states and all possible configurations are checked. Memoization saves us from checking the same state twice, but it might be possible to do better!

In particular, it is often obvious to the human that some choices are better than others:

  • It is sometimes possible to avoid a fight.
  • Some choices lead to certain death.
  • Some choices lead to free items.

I am currently thinking of a general solution to that sort of apparently "simple" decisions, but there is a simple observation that leads to a good speedup, with minimal complexity. Many random events are only occuring with a fairly small probability. Here is an example with the first fight, with some default starting values for our character:

Remaining HPProbabilityCummulated probability
2012.80%12.80%
1910.97%23.77%
219.60%33.37%
227.70%41.07%
187.03%48.10%
156.21%54.31%
256.20%60.51%
146.05%66.56%
135.34%71.90%
165.24%77.14%
235.10%82.24%
174.73%86.97%
123.74%90.71%
243.60%94.31%
111.55%95.87%
80.82%96.69%
90.76%97.45%
70.72%98.17%
100.72%98.88%
60.55%99.43%
50.33%99.76%
40.12%99.88%
00.05%99.93%
30.03%99.96%
10.02%99.98%
20.02%100.00%

There are 26 possible outcomes, so 26 branches that span from this node, each of them having branches. But just exploring 6 of these branches (less than a quarter) gives 54% of the total score of this node. Exploring half of the outcomes gives 87% of the total score!

Lazy evaluation to the rescue

The idea, when comparing multiple choices, is to only explore as many outcomes as necessary to find the best. It could be possible to do that explicitely, but I decided to use lazy evaluation to make the algorithm a bit simpler.

Let's start with the following data structure:

data PScore = Approximate { _proba  :: !Proba
                          , _pscore :: !Rational
                          , _next   ::  PScore }
            | Certain !Rational
            deriving (Show, Eq, Generic)

It is a recursive data structure, a bit like a list, except the last constructor always contains a value. For representing the expected roll of a six sided dice, once completely forced, it will look like:

Approximate { _proba = 1 % 6, _pscore = 1 % 6, _next =
  Approximate {_proba = 1 % 3, _pscore = 1 % 2, _next =
    Approximate {_proba = 1 % 2, _pscore = 1 % 1, _next =
      Approximate {_proba = 2 % 3, _pscore = 5 % 3, _next =
        Approximate {_proba = 5 % 6, _pscore = 5 % 2, _next =
          Certain (7 % 2)
}}}}}

As can be seen, the _proba field represents how certain the _pscore field is. As the structure is traversed, the _proba field value increases, until the Certain constructor is reached (meaning the probability is 1).

Now, if a maximum possible score is known, it is possible to compare solutions that way:

scoreCompare :: Rational -> PScore -> PScore -> Ordering
scoreCompare _ (Certain sa) (Certain sb) = compare sa sb
scoreCompare smax na@(Certain sa) (Approximate pb sb nb)
  | sa < sb = LT
  | sa > sb / pb = GT
  | otherwise = scoreCompare smax na nb
scoreCompare smax (Approximate pa sa na) nb@(Certain sb)
  | sb < sa = GT
  | sb > sa / pa = LT
  | otherwise = scoreCompare smax na nb
scoreCompare smax ca@(Approximate pa sa na) cb@(Approximate pb sb nb)
  | sa / pa > sb + (1 - pb) * smax = GT
  | sb / pb > sa + (1 - pa) * smax = LT
  | pa > pb   = scoreCompare smax ca nb
  | otherwise = scoreCompare smax na cb

If constructed properly, forcing the evaluation of the next link in the PScore chain should only force the evaluation of the related outcome. It means that when comparing two PScores, only the minimum number of underlying outcomes should be evaluated.

But does it make a difference?

I have to admit I ran a few test really hoping this would help. During my previous tries, it only gave marginal speedups (around 5%). But this time, with the work that has been put into optimizing the game state representation, the speedups are much more substantive.

When computing the probability to reach chapter 150, the "simple" solver took 4 hours and 26 minutes, whereas with this optimization it only took 47 minutes! That is a huge speedup, and I can tell I am quite proud of it.

What's next?

Now that this optimization is out, and working even better than I expected, I am entering the terra incognita. I have only a few optimization ideas left, and the program is far from being efficient enough to run the book in one run.

During previous tries, I attained an approximation of the solution by combining manual patching of the chapter's description (to force some choices I knew to be better), and a "checkpoint" system. There are natural "choke points" in game books, that the player has to go through. My idea was that I could compute partial solutions between each pair of subsequent "choke points". While this makes the problem computationally easy to solve, it is also very approximate. Some choices have consequences only after the next choke point, so doing it blindly is not good.

This time, I will try to automatically simplify the chapter's structures by removing branches that offer no advantages compared to others. I am still unclear on how this should be done, but hopefuly this will turn out well!