Lahc: A Must For Your Software Engineering Toolbox

tl;dr Late Acceptance Hill Climbing, LAHC, is a so simple and so efficient optimization algorithm that I think it belongs to any software person toolbox.

I have always been fond of metaheuristic optimization algorithms. Evolutionary algorithms and the likes fascinate me, simply because of biomimetism, and also, a bit, because they work. They can apply to every optimization problems. The caveat is that even you can apply it everywhere, those approaches give approximate solutions and often take quite some time to run. So beware of using the right tool at hand.

Your scientist were so preoccupied with whether or not they could they didn't
stop to think if they should

A few years ago, I wrote Gimuby, a quite complete Ruby library to experiment with genetic algorithms and a variant called the island model. I kept the taste for that kind of things.

The first Covid-19 lockdown gave me the opportunity to try to do some live coding so if you wish to see 3h of live programming around LAHC, be my guest. If you need a library for Ruby, the product of this session, Lahcuby, is available.

Academics

The algorithm has been introduced by Burke and Bykov, 2016, I did not run any benchmark but the paper claims state of the art performance compared to similar methods. But this is not what I find the most interesting here, even though without it nobody would have mentioned it.

If you consider a random problem and a random metaheuristic algorithm to solve it, there are many parameters to consider. First about how you represent the problem and its potential solutions, but also how you set the different parameters of your algorithm, and those two things interact with one another. For example, the way you represent an itinerary of a traveling salesman, how you mix two itineraries in a third one, how you select which itineraries to keep and how and how often you randomly change an itinerary. All those things have an impact on one another when you are trying to get the best possible result. Meaning that fine tuning your optimization algorithm is an optimization problem by itself.

Your next T-shirt should say:

I used metaheuristics on my optimization problem, and all I got was this lousy hyperparameter optimization problem

The cool thing with LAHC is its simplicity, there is a single parameter. Your mileage may not vary, and you may find consistency in your results. I do not believe in silver bullets, but this might be as close as you can get.

How it works

Lets start with a simple gradient descent algorithm:

solution = get_initial_solution()
cost = get_cost(solution)

iteration_count = 0
max_iteration_count = 1000
while iteration_count < max_iteration_count:
    iteration_count += 1
    
    alternative_solution = get_variation(solution)
    alternative_cost = get_cost(alternative_solution)
    
    if alternative_cost >= cost:
        continue
        
    # we have found a better solution
    solution = alternative_solution
    cost = alternative_cost

I am starting with this because LAHC is very close to a gradient descent, but it should not get stuck in local minimums. Actually “gradient descent”, is just another way to say “hill climbing”.

Here are the two differences:

Instead of keeping only a solution, the algorithm keeps a solution plus an history of the previously encountered costs, we will call it the Memory.
Instead of accepting an alternative solution only if it is better than the current one, the alternative solution is also accepted if it is better than the cost at the current Memory Index. If such a case occurs the alternative solution cost gets stored in the Memory.

All the rest is intendance:

Initializing the Memory, the authors set it to be the initial cost. My personal views are different, it could be set with random solutions’ costs to lessen the risk of a bad initialization. I keep the authors view in the implementation below to show a simpler version.
Rotating the Memory Index.
Deciding to end the loop, the authors suggest to set a minimum number of iterations and then wait for enough idle iterations - i.e iterations where the current solution does not get updated. To keep things simple I just set a maximum number of iterations here.

Note here that the only parameter here is the Memory Size.

memory_size = 30  # The only algorithm parameter

solution = get_initial_solution()
cost = get_cost(solution)
cost_memory = [cost] * memory_size  # Memory initialization

iteration_count = 0
max_iteration_count = 1000
while iteration_count < max_iteration_count:  # Exit condition 
    iteration_count += 1
    
    alternative_solution = get_variation(solution)
    alternative_cost = get_cost(alternative_solution)
    better_than_solution = alternative_cost < cost
    
    memory_index = iteration_count % memory_size
    current_memorized_cost = cost_memory[memory_index]
    better_than_memory = alternative_cost < current_memorized_cost
    
    if better_than_solution or better_than_memory:
        # We have found a better solution
        solution = alternative_solution
        cost = alternative_cost
        
    if better_than_memory:  
        # We update the memory, that will constrain the algorithm to finer 
        # grain solutions over time 
        cost_memory[memory_index] = alternative_cost

My main criticism about LAHC is its name, Late Acceptance Hill Climbing. The acceptance is not late.

The benchmark against which solution is accepted is lagged or large - meaning there is more than one reference points. Would it be better to have it named LaBeHC, Lagged/Large Benchmark Hill Climbing?

Benefits

Low tuning required

As discussed above depending on the problem and also on the implementation, the logic behind obtaining a solution variation is often impacting the optimal configuration of hyperparameters. Here, the only hyperparameter, Memory Size, is enough to allow more changes in the solution to have a better solution space exploration. Over time, no matter how big the Memory Size, with the process going on, the algorithm will progressively switch from exploration to exploitation to refine most optimal solution.

Low entry barrier

The presented LAHC implementation is only about 25 extra lines compared to a gradient descent. It is probably about the same size of a grid search but it should provide much better results.

This means that, unlike most metaheuristics, you can implement it inside your program in a couple of hours regardless of your technical stack.

Same benefits as other metaheuristics

I did not put to the test the result of the original paper, but the approach is respected and I think it deserves due credit. In my experience, I replaced both gradient descent and grid search by this, and observed great results in both cases… one major counter example being the live programming session - the curse of the live demo.