Wednesday, March 12, 2025

Speeding Up QuantLib with Matlogica’s AADC Library

  


M 917 536 3378

maksim_kozyarchuk@yahoo.com









I recently had the opportunity to connect with Dmitry Goloubentsev from Matlogica and explore their AADC (Adjoint Algorithmic Differentiation Compiler) library. I’m amazed by the speedups they’re achieving on top of QuantLib. For instance, doing 100 revals for a 10K OIS swap portfolio takes around 60 seconds with QuantLib on my machine. Once compiled into an AADC Kernel, doing 100 revals on the same portfolio takes only 200 ms. This makes it completely feasible to calculate VaR for large portfolios in seconds, even with minimal hardware.

Below is an example of how you can compile your QuantLib-based Python code into an AADC Kernel and run multiple revaluations..



Simple VaR example

Suppose you have a function price_portfolio to compute NPV of a portfolio that uses QuantLib and looks as follows:

def price_portfolio(swaps, curves, curve_rates):

    npv = 0.

    for curve, rates in zip(curves, curve_rates):

        for quote, rate in zip(curve.quotes, rates):

            quote.setValue(rate)

    for swap in swaps:

        npv += swap.NPV()

    return npv

This function assumes that relevant curves and swap objects are already built and it simply modifies the quote objects linked to the relevant curve objects.  As such it’s quite efficient and takes ~0.6 seconds on my desktop for a portfolio of 10,000 OIS Swaps.  To calculate Historical VaR from this function you may want to call this function 100 times with various historical curves, which would take ~60 seconds



Applying AADC


AADC can yield significant amazing speedups with this above example, but before jumping into the details, let first walk through core  library function ( for deepeer understanding refer to https://matlogica.com/Implicit-Function-Theorem-Live-Risk-Practice-QuantLib.php

Compiling: One can think of AADC as a ‘just in time’ compiler, that takes your python and quantlib code along with relevant curve and trade setup and converts into an executable that will accept updated rates values.  Below is a code sample of how one would ‘compile’ the call to above function. 

def compile_kernel(swaps, curves):

    kernel = aadc.Kernel()

    kernel.start_recording()

    curve_rates = []

    curve_args  = []

    for curve in curves:

        zero_rates = aadc.array(np.zeros(len(curve.tenors)))

        curve_rates.append(zero_rates)

        curve_args.append(zero_rates.mark_as_input())


    res = price_portfolio(swaps, curves, curve_rates)

    res_aadc = res.mark_as_output()

    request = { res_aadc: [] }

    kernel.stop_recording()

    return (kernel, request, curve_args)

A few things to note about this function

  • It marks inputs to this function that can change as well as tags output

  • Inputs and outputs need to be of special data types aadc.array in this example but could be aadc.float as well

  • curve_rate values passed to the price_portfolio method during the recording do not really matter and are initialized to zero values in this example

  • This function returns three output kernel, request and curve_args, these will be needed to later call the compiled kernel


Running the function: To evaluate the compiled function you need three things.  The kernel that was acquired earlier, an input structure that provides a value for expected inputs and results structure defining expected result fields.  Below example demonstrates how one can invoke above with new curve_rates.  Note construction of a dictionary of tagged input arguments with provided curved_rates.  The first element of the return list contains a dictionary, with the same keys as the request dictionary with calculated values returned as an array of one element.   Running this compiled function yields 10-20x speedup over original execution.

def call_kernel(kernel, request, curve_args, curve_rates):

    inputs = {}

    for curve_arg, rates in zip(curve_args, curve_rates):

        for arg_point, rate in zip( curve_arg, rates ):

            inputs[arg_point] = rate


    r = aadc.evaluate(kernel, request, inputs)

    res_aadc = list(request.keys())[0]

    return r[0][res_aadc][0]


Running a batch:  To compute VaR as stated in earlier example, it’s possible to call the above function 100 times and achieve 10-20x speedup, however aadc.evaluate also allows you to pass an array of 100 notes for each of the curve points.  It will then evaluate all 100 calls internally and you will see per call speedups of over 100x.  Though you would need to modify the above call_kernel function to return the full list of results rather than just the first element.  Below example demonstrates how to calculate npvs for 100 random rates, this function will then also return a list of 100 nvs

curve_rates = []

for curve in curves:

    rates = []

    for _ in curve.tenors:

        rates.append(np.random.randint(0, 99, 100) *0.005 * 0.02 +  0.0025 )

    curve_rates.append(rates)


npvs = call_kernel(kernel, request, curve_args, curve_rates)

If that’s not fast enough, aadc.evaluate allows you to distribute calculations across local threads further speeding up runtimes.


It’s also not too hard to create a python decorator that hides the complexity of compiling and calling the kernel away from an API user.   Below decorator can be applied to the price_portfolio function.

def with_aadc_kernel(func):

    _cache = {}

    def wrapper(swaps, curves, curve_rates):

        key = func.__name__

        if key not in _cache:

            kernel = aadc.Kernel()

            kernel.start_recording()

            curve_rates = []

            curve_args  = []

            for curve in curves:

                zero_rates = aadc.array(np.zeros(len(curve.tenors)))

                curve_rates.append(zero_rates)

                curve_args.append(zero_rates.mark_as_input())


            res = func(swaps, curves, curve_rates)

            res_aadc = res.mark_as_output()

            request = { res_aadc: [] }

            kernel.stop_recording()

            _cache[key] = (kernel, request, curve_args)


        # Subsequent calls: use cached kernel

        kernel, request, curve_args = _cache[key]

        inputs = {}

        for curve_arg, rates in zip(curve_args, curve_rates):

            for arg_point, rate in zip( curve_arg, rates ):

                inputs[arg_point] = rate


        r = aadc.evaluate(kernel, request, inputs, aadc.ThreadPool(1))

        res_aadc = list(request.keys())[0]

        result  = r[0][res_aadc]

        if len(result) == 1:

            result = float(result[0])

        return result

   

    return wrapper





Compile Time and Serialization

As mentioned earlier kernel compile time incurs minimal overhead. So even if you include compilation time, you can compute 100 scenarios with AADC in roughly the same time it would take you to compute 10 iterations using plain QuantLib.

Furthermore, the compiled kernel is fully serializable, meaning you can save time on swap and curve construction during subsequent runs. It also lets you precisely reproduce results on a different machine if needed.



Additional Resources

In addition to calculating the results of your function, AADC can automatically calculate gradients with respect to the inputs. For an explanation of this and more examples, check out:

You can also see my full code sample here:
https://github.com/kozyarchuk/aadc_demo/blob/main/aadc_demo.py





No comments: