Simulated annealing algorithm is a global search optimization algorithm that is inspired by the annealing technique in metallurgy. In this one, Let’s understand the exact algorithm behind simulated annealing and then implement it in Python from scratch.

## First, What is Annealing?

In simple terms, ‘Annealing’ is a technique, where a metal is heated to a high temperature and slowly cooled down to improve its physical properties. When the metal is hot, the molecules randomly re-arrange themselves at a rapid pace.

As the metal starts to cool down, the re-arranging process occurs at a much slower rate. In the end, the resultant metal will be a desired workable metal. The factors of time and metal’s energy at a particular time will supervise the entire process.

In machine learning, Simulated annealing algorithm mimics this process and is used to find optimal (or most predictive) features in the feature selection process.

Let’s now try to draw parallel’s between Annealing in metallurgy and Simulated annealing for Feature selection:

In terms of feature selection,

**Set of features**represents the arrangement of molecules in the material(metal).**No. of Iterations**represents time. Therefore, as the no. of iterations decreases temperature decreases.- Change in
**Predictive performance**between the previous and the current iteration represents the change in material’s energy.

Simulated Annealing is a stochastic global search optimization algorithm which means it operates well on non-linear objective functions as well while other local search algorithms won’t operate well on this condition.

Ok, it sounds somewhat similar to Stochastic hill climbing. What’s the difference?

## Stochastic Hill Climbing (Vs) Simulated Annealing

A considerably upgraded version of stochastic hill-climbing is simulated annealing

Consider that you are climbing a hill and trying to find the optimal steps to reach the top. The main difference between stochastic hill-climbing and simulated annealing is that in stochastic hill-climbing steps are taken at random and the current point is replaced with a new point provided the new point is an improvement to the previous point.

Whereas in simulated annealing, the search works the same way but sometimes the worse points are also accepted to allow the algorithm to learn answers that are eventually better.

## Simulated annealing algorithm

Let’s go over the exact Simulated Annealing algorithm, step-by-step.

- The initial step is to select a subset of features at random.
- Then choose the no. of iterations. A ML model is then built and the predictive performance (otherwise called objective function) is calculated.
- A small percentage of features are randomly included/excluded from the model. This is just to ‘perturb’ the features. Then the predictive performance is calculated once again for this new set of features.

Two things can happen here:

- If performance
**Increases in the new set**then the new feature set is**Accepted**. - If the performance of the new feature set has
**worse performance**, then the**Acceptance Probability**(otherwise called**metropolis acceptance criterion**) is calculated. (You will see the formula and its significance shortly. Stay with the flow for now.)

Once the acceptance probability is calculated, generate a random number between 0 – 1 and :

- If the
**Random Number**>**Acceptance Probability**then the new feature set is**Rejected**and the previous feature set will be continued to be used. - If the
**Random Number**<**Acceptance Probability**then the new feature set is**Accepted**.

The impact of randomness by this process helps simulated annealing to not get stuck at local optimums in search of a global optimum.

Keep doing this for the chosen number of iterations.

.

Now, how is all of this related to ‘annealing’ concept of cooling temperature?, you might wonder.

The ‘acceptance probability’ takes care of that. The formula for acceptance probability is designed in such a way that, as the number of iterations increase, the probability of accepting bad performance comes down. As a result, fewer changes are accepted.

Let’s look at the formula now.

## Formula for acceptance probability( a.k.a Metropolis acceptance criterion)

The formula for acceptance probability is as follows:

Where,

i = No. Of Iterations,

c = Controls the amount of perturbation that can happen,

old = Old score,

new = New score.

The acceptance probability can be understood as a function of time and change in performance with a constant ‘c’, which is used to control the rate of perturbation happening in the features. Usually, ‘c’ is set to be 1.

## Working example of acceptance probability formula:

Consider the problem in hand is to optimize the accuracy of a machine learning model. Assume that the previous solution is 77% and the current solution is 73% :

When,

no. of iteration i = 1 and c = 1

iteration i = 5 and c = 1

and finally when iteration i = 10 and c = 1

As you can see after 10 iterations the acceptance probability came down to 0.0055453. Thus, **as the no. of iterations increases, the chances of accepting a worse solution decreases.**

## Now, Why does simulated annealing accept worse performing feature sets ?

If the algorithm tends to accept only the best performing feature sets the probability of getting stuck in the local optima gets very high which is not good. Even if the algorithm is going to continuously face poor-performing feature sets for a certain number of times it allows for better chances of finding the global optima which may exist elsewhere. As the acceptance probability decreases with time (iterations), it tends to go back to the last known local optimum and starts its search for global optimum once again. So the chances of settling on a worser performing results is diminished.

When the temperature is high the chances of worse-performing features getting accepted is high and as the no. of iterations goes up, temperature decreases, and that in turn decreases the chances of worse-performing features getting accepted.

The intent here is that, when the temperature is high, the algorithm moves freely in the search space, and as temperature decreases the algorithm is forced to converge at global optima.

## Implementing Simulated annealing from scratch in python

Consider the problem of hill climbing. Consider a person named ‘Mia’ trying to climb to the top of the hill or the global optimum. In this search hunt towards global optimum, the required attributes will be:

- Area of the search space. Let’s say area to be [-6,6]
- A start point where ‘Mia’ can start her search hunt.
- Step_size that ‘Mia’ is going to take.
- Number of attempts ‘Mia’ is going to make. As of algorithm this would be no. of iterations.
- Steeps and slopes she climbs as she tries to reach the top/global optimum. As of algorithm this would be temperature.

Another thing to note here is that both the temperature and no. of iterations arguments will be predefined.

Now how would ‘Mia’ know whether her step is betterment to the previous step or not?

Her steps are validated by a function called ‘objective’. The objective function will be the ‘square of the step taken’. This is because the steps ‘Mia’ is going to take are going to be totally random between the bounds of the specified area and that means there are chances of getting a negative value also, To make it positive, the objective function is used. If this new step is betterment then she will continue on that path.

If her step is not good: The acceptance probability/Metropolis acceptance criterion is calculated. After that, a random number will be generated using rand().

If random_number > Acceptance probability: Reject the new step Else: Accept the new step

Just an overview,

In this code, the steps taken by ‘Mia’ will be random and not user-fed values. Each time there is an improvement/betterment in the steps taken towards global optimum, those values alongside the previous value get saved into a list called `outputs`

.

The initial step is to import necessary libraries.

```
from numpy import asarray, exp
from numpy.random import randn, rand, seed
from matplotlib import pyplot
```

Let’s define the `objective`

function to evaluate the steps taken by mia.

```
# Objective function is the square of steps taken
def objective(step):
return step[0] ** 2
```

Now that the `objective`

function is defined. ‘Mia’ needs to start the search hunt from some point right ?. Only if she has a start point she can progress towards the global optimum.

The below code cell gives us a random start point between the range of the `area`

of the search space. Let’s also see the evaluation of this `start_point`

.

```
seed(1)
area = asarray([[-6.0, 6.0]])
# area[:,0] = -6.0, area[:,1] = 6.0, ran(len(area))generates random number within the length of area.
# length of interval is 1.
start_point = area[:, 0] + rand( len( area ) ) * ( area[:, 1] - area[:, 0] )
print(start_point)
print('start_point=', start_point)
print('objective function evaluation of start point=',objective(start_point))
```

```
[-0.99573594]
start_point= [-0.99573594]
objective function evaluation of start point= 0.9914900693154707
```

Seems like the new point obtained( objective function evaluated point ) is better than the `start_point`

.

seed(1) is a Pseudorandom_number_generator.

By using seed(1) same random numbers will get generated each time the code cell is run,

Let’s now define the simulated annealing algorithm as a function. The parameters needed are:

- Objective function.
- Area of the search space.
- No. of iterations.
- step_size.
- Temperature.

After defining the function, the `start_point`

is initialized then, this `start_point`

is getting evaluated by the `objective`

function and that is stored into `start_point_eval`

```
def sa(objective, area = ([[-6.0,6.0]]), n_iterations = 1200, step_size = 0.1, temperature = 12):
# Generating a random start point for the search hunt
start_point = area[:, 0] + rand( len( area ) ) * ( area[:, 1] - area[:, 0] )
# Evaluating the start_point
start_point_eval = objective(start_point)
```

Now `start_point`

and `objective`

function evaluation of start point(`start_point_eval`

) needs to be stored so that each time an improvement happens, the progress can be seen.

```
# Storing the start point and its objective function evaluation into mia_start_point and mia_start_eval.
mia_start_point, mia_start_eval = start_point, start_point_eval
# this empty list will get updated over time once looping starts.
outputs = []
```

‘Mia’ start point and her start point evaluation are stored into `mia_start_point`

and `mia_start_eval`

. `Outputs`

is an empty list that will get updated over time once looping starts. As of now, ‘Mia’ started at a point and evaluated that point. Now she has to take her first step towards her search hunt and to do so, a for loop is defined ranging from 0 to the iteration number we specify.

```
# Looping from 0 to the iteration number we specify
for i in range(iterations):
# First step taken by mia
mia_step = mia_start_point + randn( len( area ) ) * step_size
```

The first step will be in accordance with Gaussian distribution where the mean is the current point and standard deviation is defined by the `step_size`

. In a nutshell, this means the steps taken will be 3 * `step_size`

of the current point.

```
# Evaluating the first step
mia_step_eval = objective(mia_step)
```

This new point obtained must be checked whether it is better than the current point, if it is better, then replace the current point with the new point. Then append those new points into our outputs list. If the new point is better:

(i) Iteration count

(ii) Previous best

(iii) New best

are printed.

```
# The new step is checked whether it is better than current step. If better, the current point is replaced with new point.
if mia_step_eval < start_point_eval:
start_point, start_point_eval = mia_step, mia_step_eval
# Step gets appended into the list
outputs.append(start_point_eval)
# printing out the iteration number, best_so_far and new_best
print('iteration Number = ',i," ", 'best_so_far = ',start_point," " ,'new_best = ',start_point_eval)
```

If the new point isn’t a promising solution, then the difference between the `objective`

function evaluation of the current solution(`mia_step_eval`

) and current working solution(`mia_start_eval`

) is calculated. One of the popular ways of calculating `temperature`

is by using the “Fast Simulated Annealing Method” which is as follows:

**temperature = initial_temperature / (iteration_number + 1)**

```
difference = mia_step_eval - mia_start_eval
# Temperature is calculated
t = temperature / float(i + 1)
```

`difference`

gives us the difference between the old point and the new point so that the acceptance probability/metropolis acceptance criterion can be calculated. This helps in calculating the probability of accepting a point with worse performance than the current point.

```
# Acceptance probability is calculated
mac = exp(-difference / t)
```

Then a random number is generated using rand() and if the **Random Number** > **Acceptance Probability** then the new point will be **Rejected** and if **Random Number** < **Acceptance Probability** then the new point will be **Accepted**.

```
if difference < 0 or rand() < mac:
# Storing the values
mia_start_point, mia_start_eval = mia_step, mia_step_eval
return [start_point, start_point_eval, outputs] #indenting is outside because return belongs to 'SA' function
```

The last step is to pass values to the parameters of the simulated annealing function.

```
seed(1)
# define the area of the search space
area = asarray([[-6.0, 6.0]])
# initial temperature
temperature = 12
# define the total no. of iterations
iterations = 1200
# define maximum step_size
step_size = 0.1
# perform the simulated annealing search
start_point, output, outputs = sa(objective, area, n_iterations, step_size, temp)
```

Putting all these codes together into a single code cell this is how the final code looks like:

```
from numpy import asarray, exp
from numpy.random import randn, rand, seed
from matplotlib import pyplot
# Define objective function
def objective(step):
return step[0] ** 2.0
# Define simulated annealing algorithm
def sa(objective, area, iterations, step_size, temperature):
# create initial point
start_point = area[:, 0] + rand( len( area ) ) * ( area[:, 1] - area[:, 0] )
# evaluate initial point
start_point_eval = objective(start_point)
# Assign previous and new solution to previous and new_point_eval variable
mia_start_point, mia_start_eval = start_point, start_point_eval
outputs = []
for i in range(iterations):
# First step by mia
mia_step = mia_start_point + randn( len( area ) ) * step_size
mia_step_eval = objective(mia_step)
if mia_step_eval < start_point_eval:
start_point, start_point_eval = mia_step, mia_step_eval
#Append the new values into the output list
outputs.append(start_point_eval)
print('Acceptance Criteria = %.5f' % mac," ",'iteration Number = ',i," ", 'best_so_far = ',start_point," " ,'new_best = %.5f' % start_point_eval)
difference = mia_step_eval - mia_start_eval
t = temperature / float(i + 1)
# calculate Metropolis Acceptance Criterion / Acceptance Probability
mac = exp(-difference / t)
# check whether the new point is acceptable
if difference < 0 or rand() < mac:
mia_start_point, mia_start_eval = mia_step, mia_step_eval
return [start_point, start_point_eval, outputs]
seed(1)
# define the area of the search space
area = asarray([[-6.0, 6.0]])
# initial temperature
temperature = 12
# define the total no. of iterations
iterations = 1200
# define maximum step_size
step_size = 0.1
# perform the simulated annealing search
start_point, output, outputs = sa(objective, area, iterations, step_size, temperature)
#plotting the values
pyplot.plot(outputs, 'ro-')
pyplot.xlabel('Improvement Value')
pyplot.ylabel('Evaluation of Objective Function')
pyplot.show()
```

```
Acceptance Criteria = 0.76232 iteration Number = 37 best_so_far = [-0.95340594] new_best = 0.90898
Acceptance Criteria = 0.87733 iteration Number = 39 best_so_far = [-0.91563305] new_best = 0.83838
Acceptance Criteria = 1.44710 iteration Number = 40 best_so_far = [-0.85680363] new_best = 0.73411
Acceptance Criteria = 1.42798 iteration Number = 41 best_so_far = [-0.8221177] new_best = 0.67588
Acceptance Criteria = 1.22608 iteration Number = 42 best_so_far = [-0.68541443] new_best = 0.46979
Acceptance Criteria = 2.09273 iteration Number = 43 best_so_far = [-0.61804282] new_best = 0.38198
Acceptance Criteria = 1.16478 iteration Number = 50 best_so_far = [-0.42564785] new_best = 0.18118
Acceptance Criteria = 0.81414 iteration Number = 66 best_so_far = [-0.35632231] new_best = 0.12697
Acceptance Criteria = 1.74595 iteration Number = 67 best_so_far = [-0.33780667] new_best = 0.11411
Acceptance Criteria = 1.02155 iteration Number = 72 best_so_far = [-0.3368772] new_best = 0.11349
Acceptance Criteria = 1.20897 iteration Number = 75 best_so_far = [-0.29671621] new_best = 0.08804
Acceptance Criteria = 1.47133 iteration Number = 88 best_so_far = [-0.29346186] new_best = 0.08612
Acceptance Criteria = 1.78859 iteration Number = 89 best_so_far = [-0.2525718] new_best = 0.06379
Acceptance Criteria = 0.29369 iteration Number = 93 best_so_far = [-0.20893326] new_best = 0.04365
Acceptance Criteria = 1.68933 iteration Number = 94 best_so_far = [-0.12724854] new_best = 0.01619
Acceptance Criteria = 1.24284 iteration Number = 95 best_so_far = [-0.00883141] new_best = 0.00008
Acceptance Criteria = 0.99774 iteration Number = 102 best_so_far = [0.00581387] new_best = 0.00003
Acceptance Criteria = 1.05301 iteration Number = 118 best_so_far = [0.00137051] new_best = 0.00000
Acceptance Criteria = 0.99611 iteration Number = 138 best_so_far = [0.0009528] new_best = 0.00000
Acceptance Criteria = 1.33458 iteration Number = 158 best_so_far = [-0.00047896] new_best = 0.00000
Acceptance Criteria = 0.50157 iteration Number = 473 best_so_far = [0.00045742] new_best = 0.00000
Acceptance Criteria = 1.02387 iteration Number = 482 best_so_far = [0.00010925] new_best = 0.00000
```

So this output shows us, in which iteration the improvement happened, the previous best point, and the new best point. The graph shows that there are about 22 improvements ( red circle ) as the algorithm reaches the global optima. There are certain places where there are no big improvements but as the algorithm reaches the end there are many improvements.

This is the python implementation of the simulated annealing algorithm.

Feel free to change the `area`

, `step_size`

and other inputs to see what you get.

## Advantage and limitations of simulated annealing

## Advantages

Some of the advantages worth mentioning are:

- Simulated Annealing Algorithm can work with cost functions and arbitrary systems.
- Easy to code even if the problem in hand is complex.
- This technique guarantees finding an optimal solution by not getting stuck in local optima.

## Limitations

Some limitations of this technique are:

- This technique cannot tell whether it has found the optimal solution or not.
- In problems with few local minima, this method is not necessary, gradient descent would do the job.
- It tends to be a very time consuming procedure.