Saturday, March 11, 2006

Belief Optimization

This post is part of an ongoing dialog between my friend Ernie and me about the validity of Christian belief (roughly speaking).

Ernie had asked me to respond to his description of beliefs as a complex vector field. If you have math anxiety, you might want to stop reading now.

Ernie wrote:

Perhaps even more importantly, I interpret this as more like a complex vector field than a series of Booleans. That is, each Belief is not a simple, independent True/False decision, but a complex system with varying degrees of precision, confidence, applicability, and inter-relatedness. The value of logic is that it uses analytic propositions to link two synthetic Beliefs in a discrete chain, so that one testable or accepted Belief can be used to validate another, thus untangling the web into a series of more-or-less distinct strands.

My own view, or mental model really, is similar, but not quite the same. The way I think about this (when I think about it in this way at all) is more like a series of belief values, where each value ranges from from 0 to 1, expressing the strength of my belief about (or confidence in) an individual statement. (Or, perhaps from -1 to 1, with -1 being certainty that it is false, to 1 being certainty that it is true.) The varying degrees of precision and applicability would be reflected in the individual statements themselves, not in the values associated with those statements. Likewise, the interrelatedness of beliefs would be reflected in relations external to the belief values themselves.

Some statements may contradict other statements. If so, I cannot assign them both high belief values. If I do have high belief values in contradictory statements, I have to lower some or all of them. Of course, each belief statement is related to many others, so I cannot just lower one or two in isolation. If I lower my confidence Bi in statement Si but Si is strongly implied by Sj and my corresponding Bj is high, then I have a new kind of contradiction. Conceptually, I can calculate the total contradiction score C(B1, ..., BN) that exists among my beliefs. So my goal is to find the set of values B1, ..., BN that minimizes C.1

Now, there are necessarily a very large number of related statements. That is, N is large. So I am searching an N-dimensional space for that point with the absolute minimum value of C(B1, ..., BN). This is conceptually just like many other searching problems that are encountered in various applications.

For instance, the software that I work on for my job does automated employee scheduling. For each employee there are a number of rules that constrain what they can do and when, and there also varying staffing needs. This can be viewed as a discrete searching problem, where each possible assignment to each possible staff person is a different variable that can be assigned either 0 or 1, corresponding to that assignment being made or not. The goal is to meet all the staffing needs without breaking any of the rules. Each rule broken is assigned a penalty, so we are trying to minimize the penalty. For a typical schedule, the search space can easily have 10,000 binary variables, so that there are 210,000 or about 103,500 possible solutions. This is a vast number. There are less than 10100 particles in the visible universe. As it happens, though, this search space can be "pruned" extensively, and reasonable (though not necessarily optimal) solutions can be found in as little as a couple of minutes, depending on the number of rules. (And, in actuality, I do not actually model the problem as a set of binary variables, but that is not important here.)

Anyway, I bring that up because for me, thinking about beliefs as an optimization problem allows me to apply what I know about optimization to beliefs. For instance, a simple approach to solving a maximization problem is called hill-climbing. (For a minimization problem, you just invert everything.) In hill-climbing, you pick a point at random from the entire search space and evaluate your objective function (the function that you are trying to maximize). Then you pick one dimension and move along that dimension as long as the value of the objective function increases. (If the objective function is differentiable, this is an exercise in calculus; otherwise, you have to sample various points and hope that the objective function is not too "wild".) Once you find the maximum along one dimension, you switch dimensions and do it again. You keep cycling through all the dimensions until you can no longer find an improvement along any dimension. In two dimensions, you can think about a landscape. First you walk north or south until you are as high as you can get. Then you walk east or west, until you as high as you can get, then north or south, etc. Eventually, you will not be able to go any higher in either direction.

Hill-climbing is very simple, and for certain kinds of objective functions it works splendidly. If there is a single big hill on your landscape, then hill-climbing will find the top. But if there are bumps, mounds and small hills in addition to the largest hill, hill-climbing can get stuck on one of these "local maxima". That's bad. Other searching techniques use a variety of mechanisms to avoid getting stuck. For instance, you might take steps of a fixed size in each direction. If the step takes you higher, you always take it. If it takes you lower, you accept it only with probability P. As time goes on, you lower P. Early on, you avoid getting stuck on local maxima because the higher initial value for P allows you "escape". Later, as P gets lower, escaping becomes less likely and the search (hopefully) converges on a better solution. This kind of approach is called simulated annealing. There are many others as well.

Bringing this back to the whole vector-of-beliefs idea, where we are trying to minimize C(B1, ..., BN), a local minimum is a place where any small adjustments to each Bi cause C to increase (meaning that my beliefs become increasingly contradictory). But that does not mean that I am at the global minimum. A sufficiently large adjustment to my belief vector may allow me to escape a local minimum and reach another one that is lower. Probably nobody is actually at even a local minimum, since even evaluating the objective function is beyond us. But we are not all in even the same valley. Loosely speaking, Christianity might be one valley, Islam another, atheism another and so on. Each valley has smaller gullies and various Christian denominations might be found at the bottom of different gully within the larger Christian valley. Some people may be far away from even local minima, perhaps even living near the peaks. We might call such people crazy, or at least irrational, since this means that their own beliefs are highly contradictory.

Ernie thinks his valley is lower than mine. I think mine is lower than his. We are trying to agree on something we can use an altimeter. But I think the geek-meter just went off the chart.


1 Actually, the objective function would probably have some terms related to the total amount of certainty. If I use values from -1 to 1, then my objective function would include, for all i, Bi2.

No comments: