The deep learning community has been relying on powerful libraries enabling more than I can dream of in terms of mathematical capabilities. Back in the days, I worked on an artificial neural network project where we implemented the derivatives where we would need them. Seeing those projects made me willing to toy around with their capacities for other models, not necessarily artificial neural networks.
We are going to try to fit a toy econometric model using gradient descent.
A single dependency: JAX. JAX is a computation library compatible with NumPy able to leverage your GPU for learning… This is probably an overkill.
pipenv --python 3.10 install jax
So here are the import of our main.py
import jax
import jax.numpy as numpy
import random
Let’s fake some training data
raw_knowledge_base = [
# (sector, surface, parkings, property price to be added)
(0, 100, 1),
(0, 50, 0),
(0, 120, 1),
(0, 120, 2),
(0, 110, 1),
(0, 220, 3),
(0, 100, 2),
(0, 90, 0),
(1, 40, 1),
(1, 50, 0),
(1, 80, 0),
(1, 120, 2),
(1, 110, 0),
(1, 30, 0),
(1, 140, 1),
(1, 100, 1),
(1, 40, 1),
]
def get_mock_price(sector: int, surface: float, parkings: int) -> float:
"""
This function is just here to generate our learning data,
in the real world we would use known price for each line above
"""
standard_surface_price = 10000 * surface
standard_parking_price = 40000 * parkings
# 20% premium for sector 1
sector_surface_premium = standard_surface_price * sector * 0.2
# 10% premium for sector 1
sector_parking_premium = standard_parking_price * sector * 0.1
raw_price = standard_surface_price + standard_parking_price + \
sector_surface_premium + sector_parking_premium
random_factor = 1 - (random.random() - 0.5) / 10 # from -5 to +5%
return round(raw_price * random_factor)
# Add a price to previous line
knowledge_base = [
(sector, surface, parkings, get_mock_price(sector, surface, parkings))
for (sector, surface, parkings) in raw_knowledge_base
]
At this point, we only have some fake data on which we are going to try to fit our model.
Our model
To make it a bit more readable, we are splitting the model parameters from the model input itself.
def get_model_price(params, x):
"""
params:
* [0] average price / sqm
* [1] price / sqm premium in sector 0
* [2] price / sqm premium in sector 1
* [3] price per parking
"""
surface_value = (params[0] + params[1] * (1 - x[0]) + params[2] * x[0]) * x[1]
parking_value = params[3] * x[2]
return surface_value + parking_value
The error
Our game is now to find the model parameters that are minimizing the error over
every entry of our knowledge_base
. So the first thing is to be able to
associate an error to a set of model parameters.
def get_model_error(params):
error = 0.0
for sector, surface, parkings, expected_price in knowledge_base:
x = (sector, surface, parkings)
model_price = get_model_price(params, x)
delta = expected_price - model_price
error += numpy.sqrt(numpy.power(delta, 2))
return error
Gradient descent iteration
We start with an initial set of parameters.
current_params = numpy.asarray([
3000. + random.random() * 5000.,
(random.random() - 0.5) * 500.,
(random.random() -0.5) * 500.,
3000. + (random.random() - 0.5) * 3000.
])
Then during our iteration cycle:
- We compute the gradient at the the current parameters.
- This gradient is a vector that indicates the slope of the error function at this specific point of the parameters space. If we move in the direction it indicates, we are increasing the error… so we just have to do the opposite.
- Because just doing that would be too easy, we weight it with a learning rate. If this learning rate is too high, the parameters are changing too fast, and it does not improve our model performance. If this learning rate is too low, the error reduction is too minimal, and we are at the risk to be trapped in a local optimum.
- The learning rate is usually adapted over time. Here, we are reducing it everytime our error does not improve.
previous_error = None
learning_rate = 2.
for i in range(0, 1000):
# Parameters update
gradient = jax.jacfwd(get_model_error)(current_params)
current_params -= learning_rate * gradient
# learning rate update
new_error = get_model_error(current_params)
if previous_error is not None:
if previous_error < new_error:
learning_rate *= 0.9
if learning_rate < 0.005:
break
previous_error = new_error
print(current_params)
Conclusion
We get something like [9489 8089 2320 3805]
after rounding.
If we roll back to our parameters’ description:
- Average price / sqm:
9 489
. - Price / sqm premium in sector 0:
809
. So in fact, sector 0 has an average price of9 489 + 809 = 10 298
, ourget_mock_price
modelizes10 000
. Not too bad. - Price / sqm premium in sector 1:
2 320
. Sector 1 has an average of9 489 + 2 320 = 11 749
against a known value of12 000
. This is pretty ok as well. - Price per parking:
3805
. This is a lot below the known value between40 000
and44 000
. Perhaps that is due to the order of magnitude difference between the parking spots contribution and the price.
Considering that the average property is worth about 1 075 000
,
the random part of 10% has a weight of 107 500
, clearly outweighing the
parking contribution. It can be an explanation. One of the way to counterbalance
this would be to first get more data, and perhaps also fit first the parameters
with the most important contribution and later the more secondary ones ;
here the parking elements.