12 Feb 2023 - The Dark Side

Optimizing A Toy Model Using Gradient Descent And Jax

The deep learning community has been relying on powerful libraries enabling more than I can dream of in terms of mathematical capabilities. Back in the days, I worked on an artificial neural network project where we implemented the derivatives where we would need them. Seeing those projects made me willing to toy around with their capacities for other models, not necessarily artificial neural networks.

We are going to try to fit a toy econometric model using gradient descent.


A single dependency: JAX. JAX is a computation library compatible with NumPy able to leverage your GPU for learning… This is probably an overkill.

pipenv --python 3.10 install jax

So here are the import of our main.py

import jax
import jax.numpy as numpy
import random

Let’s fake some training data

raw_knowledge_base = [
    # (sector, surface, parkings, property price to be added)
    (0, 100, 1),
    (0, 50, 0),
    (0, 120, 1),
    (0, 120, 2),
    (0, 110, 1),
    (0, 220, 3),
    (0, 100, 2),
    (0, 90, 0),
    (1, 40, 1),
    (1, 50, 0),
    (1, 80, 0),
    (1, 120, 2),
    (1, 110, 0),
    (1, 30, 0),
    (1, 140, 1),
    (1, 100, 1),
    (1, 40, 1),
]
def get_mock_price(sector: int, surface: float, parkings: int) -> float:
    """
    This function is just here to generate our learning data, 
    in the real world we would use known price for each line above
    """
    standard_surface_price = 10000 * surface
    standard_parking_price = 40000 * parkings
    # 20% premium for sector 1
    sector_surface_premium = standard_surface_price * sector * 0.2  
    # 10% premium for sector 1
    sector_parking_premium = standard_parking_price * sector * 0.1  
    raw_price = standard_surface_price + standard_parking_price + \
        sector_surface_premium + sector_parking_premium
    
    random_factor = 1 - (random.random() - 0.5) / 10  # from -5 to +5%
    return round(raw_price * random_factor)
    
# Add a price to previous line
knowledge_base = [
    (sector, surface, parkings, get_mock_price(sector, surface, parkings))
    for (sector, surface, parkings) in raw_knowledge_base
]

At this point, we only have some fake data on which we are going to try to fit our model.

Our model

To make it a bit more readable, we are splitting the model parameters from the model input itself.

def get_model_price(params, x):
    """
    params: 
        * [0] average price / sqm
        * [1] price / sqm premium in sector 0
        * [2] price / sqm premium in sector 1
        * [3] price per parking
    """
    
    surface_value = (params[0] + params[1] * (1 - x[0]) + params[2] * x[0]) * x[1]
    parking_value = params[3] * x[2]
    return surface_value + parking_value

The error

Our game is now to find the model parameters that are minimizing the error over every entry of our knowledge_base. So the first thing is to be able to associate an error to a set of model parameters.

def get_model_error(params):
    error = 0.0
    
    for sector, surface, parkings, expected_price in knowledge_base:
        x = (sector, surface, parkings)
        model_price = get_model_price(params, x)
        delta = expected_price - model_price
        error += numpy.sqrt(numpy.power(delta, 2))
        
    return error

Gradient descent iteration

We start with an initial set of parameters.

current_params = numpy.asarray([
    3000. + random.random() * 5000., 
    (random.random() - 0.5) * 500., 
    (random.random() -0.5) * 500.,
    3000. + (random.random() - 0.5) * 3000.
])

Then during our iteration cycle:

  1. We compute the gradient at the the current parameters.
  2. This gradient is a vector that indicates the slope of the error function at this specific point of the parameters space. If we move in the direction it indicates, we are increasing the error… so we just have to do the opposite.
  3. Because just doing that would be too easy, we weight it with a learning rate. If this learning rate is too high, the parameters are changing too fast, and it does not improve our model performance. If this learning rate is too low, the error reduction is too minimal, and we are at the risk to be trapped in a local optimum.
  4. The learning rate is usually adapted over time. Here, we are reducing it everytime our error does not improve.
previous_error = None
learning_rate = 2.

for i in range(0, 1000):
    # Parameters update
    gradient = jax.jacfwd(get_model_error)(current_params)
    current_params -= learning_rate * gradient
    
    # learning rate update
    new_error = get_model_error(current_params)
    if previous_error is not None:
        if previous_error < new_error:
            learning_rate *= 0.9
            
        if learning_rate < 0.005:
            break
        
    previous_error = new_error
    
print(current_params)

Conclusion

We get something like [9489 8089 2320 3805] after rounding.

If we roll back to our parameters’ description:

  • Average price / sqm: 9 489.
  • Price / sqm premium in sector 0: 809. So in fact, sector 0 has an average price of 9 489 + 809 = 10 298, our get_mock_price modelizes 10 000. Not too bad.
  • Price / sqm premium in sector 1: 2 320. Sector 1 has an average of 9 489 + 2 320 = 11 749 against a known value of 12 000. This is pretty ok as well.
  • Price per parking: 3805. This is a lot below the known value between 40 000 and 44 000. Perhaps that is due to the order of magnitude difference between the parking spots contribution and the price.

Considering that the average property is worth about 1 075 000, the random part of 10% has a weight of 107 500, clearly outweighing the parking contribution. It can be an explanation. One of the way to counterbalance this would be to first get more data, and perhaps also fit first the parameters with the most important contribution and later the more secondary ones ; here the parking elements.


Would you like to hear more from me?

I thought about interrupting your reading midway, but then I decided to respect my readership.

If you wish, let's stay in touch regardless of changes in your social network feed!


Fräntz Miccoli

This blog is wrapping my notes about software engineering, computer science and sometimes more general hacking. My main interests are architecture, quality and machine learning but content on this blog may diverge from time to time.

For some time now, I have been the happy cofounder, COO & CTO of Nexvia.

Ideas are expressed here to be challenged.


About me Out Of The Comfort Zone Twitter