Deep backward stochastic differential equation method

Introduction

Deep BSDE (Deep Backward Stochastic Differential Equation) is a numerical method that combines deep learning with Backward stochastic differential equation (BSDE). This method is particularly useful for solving high-dimensional problems in financial derivatives pricing and risk management. By leveraging the powerful function approximation capabilities of deep neural networks, deep BSDE addresses the computational challenges faced by traditional numerical methods in high-dimensional settings ^[1].

History

Backward stochastic differential equations were introduced by Jean-Michel Bismut in 1973 in the linear case^[2] . In the 1990s, Étienne Pardoux and Shige Peng established the existence and uniqueness theory for nonlinear BSDE solutions, applying BSDEs to financial mathematics and control theory. For instance, BSDEs have been widely used in option pricing, risk measurement, and dynamic hedging.

Deep Learning is a machine learning method based on multilayer neural networks. Its core concept can be traced back to the neural computing models of the 1940s. In the 1980s, the proposal of the backpropagation algorithm made the training of multilayer neural networks possible. In 2006, the Deep Belief Networks proposed by Geoffrey Hinton and others rekindled interest in deep learning. Since then, deep learning has made groundbreaking advancements in image processing, speech recognition, natural language processing, and other fields.

As financial problems become more complex, traditional numerical methods for BSDEs (such as the Monte Carlo method, finite difference method, etc.) have shown limitations such as high computational complexity and the curse of dimensionality.

In high-dimensional scenarios, the Monte Carlo method requires numerous simulation paths to ensure accuracy, resulting in lengthy computation times. In particular, for nonlinear BSDEs, the convergence rate is slow, making it challenging to handle complex financial derivative pricing problems.
The finite difference method, on the other hand, experiences exponential growth in the number of computation grids with increasing dimensions, leading to significant computational and storage demands. This method is generally suitable for simple boundary conditions and low-dimensional BSDEs, but it is less effective in complex situations.

The combination of deep learning with BSDEs, known as deep BSDE, was proposed by Han, Jentzen, and E in 2018 as a solution to the high-dimensional challenges faced by traditional numerical methods^[1]. The Deep BSDE approach leverages the powerful nonlinear fitting capabilities of deep learning, approximating the solution of BSDEs by constructing neural networks. The specific idea is to represent the solution of a BSDE as the output of a neural network and train the network to approximate the solution.

Model

Mathematical Method

Backward Stochastic Differential Equations (BSDEs) represent a powerful mathematical tool extensively applied in fields such as stochastic control, financial mathematics, and beyond. Unlike traditional Stochastic differential equations (SDEs), which are solved forward in time, BSDEs are solved backward, starting from a future time and moving backwards to the present.

This unique characteristic makes BSDEs particularly suitable for problems involving terminal conditions and uncertainties^[3].

Fix a terminal time $T>0$ and a probability space $(\Omega ,{\mathcal {F}},\mathbb {P} )$ . Let $(B_{t})_{t\in [0,T]}$ be a Brownian motion with natural filtration $({\mathcal {F}}_{t})_{t\in [0,T]}$ . A backward stochastic differential equation is an integral equation of the type

Y_{t}=\xi +\int _{t}^{T}f(s,Y_{s},Z_{s})\mathrm {d} s-\int _{t}^{T}Z_{s}\mathrm {d} B_{s},\quad t\in [0,T],

1

In this equation:

$f:[0,T]\times \mathbb {R} \times \mathbb {R} \to \mathbb {R}$ is called the generator of the BSDE,
$\xi$ is an ${\mathcal {F}}_{T}$ -measurable random variable and the terminal condition specified at time $T$ .
$(Y_{t},Z_{t})_{t\in [0,T]}$ is the solution process, which consists of stochastic processes $(Y_{t})_{t\in [0,T]}$ and $(Z_{t})_{t\in [0,T]}$
$(Y_{t})_{t\in [0,T]}$ and $(Z_{t})_{t\in [0,T]}$ which are adapted to the filtration $({\mathcal {F}}_{t})_{t\in [0,T]}$ .
$B_{s}$ is a standard Brownian motion.

The goal is to find adapted processes $Y_{t}$ and $Z_{t}$ that satisfy this equation. Traditional numerical methods struggle with BSDEs due to the curse of dimensionality, which makes computations in high-dimensional spaces extremely challenging.

Neural Network Architecture

Neural Network Framework of Deep BSDE Method

Deep learning encompass a class of machine learning techniques that have transformed numerous fields by enabling the modeling and interpretation of intricate data structures. These methods, often referred to as deep learning, are distinguished by their hierarchical architecture comprising multiple layers of interconnected nodes, or neurons. This architecture allows deep neural networks to autonomously learn abstract representations of data, making them particularly effective in tasks such as image recognition, natural language processing, and financial modeling^[4]. The core of this method lies in designing an appropriate neural network structure (such as fully connected networks or recurrent neural networks) and selecting effective optimization algorithms.

The choice of deep BSDE network architecture, the number of layers, and the number of neurons per layer are crucial hyperparameters that significantly impact the performance of the deep BSDE method. The deep BSDE method constructs neural networks to approximate the solutions for $Y$ and $Z$ , and utilizes stochastic gradient descent and other optimization algorithms for training^[1].

The fig illustrates the network architecture for the deep BSDE method. Note that $\nabla u(t_{n},X_{t_{n}})$ denotes the variable approximated directly by subnetworks, and $u(t_{n},X_{t_{n}})$ denotes the variable computed iteratively in the network. There are three types of connections in this network:

i) $X_{t_{n}}\rightarrow h_{1}^{n}\rightarrow h_{2}^{n}\rightarrow \ldots \rightarrow h_{H}^{n}\rightarrow \nabla u(t_{n},X_{t_{n}})$ is the multilayer feedforward neural network approximating the spatial gradients at time $t=t_{n}$ . The weights $\theta _{n}$ of this subnetwork are the parameters optimized.

ii) $(u(t_{n},X_{t_{n}}),\nabla u(t_{n},X_{t_{n}}),W_{t_{n}+1}-W_{t_{n}})\rightarrow u(t_{n+1},X_{t_{n+1}})$ is the forward iteration providing the final output of the network as an approximation of $u(t_{N},X_{t_{N}})$ , characterized by Eqs. 5 and 6. There are no parameters optimized in this type of connection.

iii) $(X_{t_{n}},W_{t_{n}+1}-W_{t_{n}})\rightarrow X_{t_{n+1}}$ is the shortcut connecting blocks at different times, characterized by Eqs. 4 and 6. There are also no parameters optimized in this type of connection.

Algorithms

Adam Algorithm

function ADAM( $\alpha$ , $\beta _{1}$ , $\beta _{2}$ , $\epsilon$ , ${\mathcal {G}}(\theta )$ , $\theta _{0}$ ) is

    // This function implements the Adam optimization algorithm
    // for minimizing the target function  ${\mathcal {G}}(\theta )$ .

     $m_{0}:=0$  // Initialize the first moment vector
     $v_{0}:=0$  // Initialize the second moment vector
     $t:=0$    // Initialize timestep

    // Step 1: Initialize parameters
     $\theta _{t}:=\theta _{0}$ 

    // Step 2: Optimization loop
    while  $\theta _{t}$  has not converged do
         $t:=t+1$ 
         $g_{t}:=\nabla _{\theta }{\mathcal {G}}_{t}(\theta _{t-1})$  // Compute gradient of  ${\mathcal {G}}$  at timestep  $t$ 
         $m_{t}:=\beta _{1}\cdot m_{t-1}+(1-\beta _{1})\cdot g_{t}$  // Update biased first moment estimate
         $v_{t}:=\beta _{2}\cdot v_{t-1}+(1-\beta _{2})\cdot g_{t}^{2}$  // Update biased second raw moment estimate
         ${\widehat {m}}_{t}:={\frac {m_{t}}{(1-\beta _{1}^{t})}}$  // Compute bias-corrected first moment estimate
         ${\widehat {v}}_{t}:={\frac {v_{t}}{(1-\beta _{2}^{t})}}$  // Compute bias-corrected second moment estimate
         $\theta _{t}:=\theta _{t-1}-{\frac {\alpha \cdot {\widehat {m}}_{t}}{({\sqrt {{\widehat {v}}_{t}}}+\epsilon )}}$  // Update parameters
    
    return  $\theta _{t}$

Backpropagation Algorithm for Multilayer Feedforward Neural Networks

function BackPropagation(set $D=\left\{(\mathbf {x} _{k},\mathbf {y} _{k})\right\}_{k=1}^{m}$ ) is

    // This function implements the backpropagation algorithm
    // for training a multi-layer feedforward neural network.

    // Step 1: Random initialization
    // Step 2: Optimization loop
    repeat until termination condition is met:
        for each  $(\mathbf {x} _{k},\mathbf {y} _{k})\in D$ :
             ${\hat {\mathbf {y} }}_{k}:=f(\beta _{j}-\theta _{j})$  // Compute output
            // Compute gradients
            for each output neuron  $j$ :
                 $g_{j}:={\hat {y}}_{j}^{k}(1-{\hat {y}}_{j}^{k})({\hat {y}}_{j}^{k}-y_{j}^{k})$  // Gradient of output neuron
            for each hidden neuron  $h$ :
                 $e_{h}:=b_{h}(1-b_{h})\sum _{j=1}^{\ell }w_{hj}g_{j}$  // Gradient of hidden neuron
            // Update weights
            for each weight  $w_{hj}$ :
                 $\Delta w_{hj}:=\eta g_{j}b_{h}$  // Update rule for weight
            for each weight  $v_{ih}$ :
                 $\Delta v_{ih}:=\eta e_{h}x_{i}$  // Update rule for weight
            // Update parameters
            for each parameter  $\theta _{j}$ :
                 $\Delta \theta _{j}:=-\eta g_{j}$  // Update rule for parameter
            for each parameter  $\gamma _{h}$ :
                 $\Delta \gamma _{h}:=-\eta e_{h}$  // Update rule for parameter

    // Step 3: Construct the trained multi-layer feedforward neural network

    return trained neural network

Numerical Solution for Optimal Investment Portfolio

function OptimalInvestment( $W_{t_{i+1}}-W_{t_{i}}$ , $x$ , $\theta =(X_{0},H_{0},\theta _{1},\theta _{2},\dots ,\theta _{N-1})$ ) is

    // This function calculates the optimal investment portfolio using
    // the specified parameters and stochastic processes.

    // Step 1: Initialization
    for  $k:=0$  to maxstep do
         $M_{0}^{k,m}:=0$ ,  $X_{0}^{k,m}:=X_{0}^{k}$  // Parameter initialization
        for  $i:=0$  to  $N-1$  do
             $H_{t_{i}}^{k,m}:={\mathcal {NN}}(M_{t_{i}}^{k,m};\theta _{i}^{k})$  // Update feedforward neural network unit
             $M_{t_{i+1}}^{k,m}:=M_{t_{i}}^{k,m}+{\big (}(1-\phi )(\mu _{t_{i}}-M_{t_{i}}^{k,m}){\big )}(t_{i+1}-t_{i})+\sigma _{t_{i}}(W_{t_{i+1}}-W_{t_{i}})$ 
             $X_{t_{i+1}}^{k,m}:=X_{t_{i}}^{k,m}+{\big [}H_{t_{i}}^{k,m}(\phi (M_{t_{i}}^{k,m}-\mu _{t_{i}})+\mu _{t_{i}}){\big ]}(t_{i+1}-t_{i})+H_{t_{i}}^{k,m}(W_{t_{i+1}}-W_{t_{i}})$ 
        // Step 2: Compute loss function
         ${\mathcal {L}}(t):={\frac {1}{M}}\sum _{m=1}^{M}\left|X_{t_{N}}^{k,m}-g(M_{t_{N}}^{k,m})\right|^{2}$ 
        // Step 3: Update parameters using ADAM optimization
         $\theta ^{k+1}:=\operatorname {ADAM} (\theta ^{k},\nabla {\mathcal {L}}(t))$ 
         $X_{0}^{k+1}:=\operatorname {ADAM} (X_{0}^{k},\nabla {\mathcal {L}}(t))$ 

    // Step 4: Return terminal state
    return  $(M_{t_{N}},X_{t_{N}})$

Application

Deep BSDE is widely used in the fields of financial derivatives pricing, risk management, and asset allocation. It is particularly suitable for:

High-Dimensional Option Pricing: Pricing complex derivatives like basket options and Asian options, which involve multiple underlying assets^[1].
Risk Measurement: Calculating risk measures such as Conditional Value-at-Risk (CVaR) and Expected shortfall (ES)* ^[5].
Dynamic Asset Allocation: Determining optimal strategies for asset allocation over time in a stochastic environment^[5].

Advantages and Disadvantages

Advantages

High-Dimensional Capability: Compared to traditional numerical methods, deep BSDE performs exceptionally well in high-dimensional problems.
Flexibility: The incorporation of deep neural networks allows this method to adapt to various types of BSDEs and financial models.
Parallel Computing: Deep learning frameworks support GPU acceleration, significantly improving computational efficiency^[1]^[5].

Disadvantages

Training Time: Training deep neural networks typically requires substantial data and computational resources.
Parameter Sensitivity: The choice of neural network architecture and hyperparameters greatly impacts the results, often requiring experience and trial-and-error^[1]^[5].

External Links

Deep Learning for High-Dimensional PDEs(https://arxiv.org/abs/1707.02568)
Backward Stochastic Differential Equations(https://www.math.ku.dk/english/research/conferences/2018/bsde/)
Curse of Dimensionality - Scholarpedia(http://www.scholarpedia.org/article/Curse_of_dimensionality)

References

^ ^a ^b ^c ^d ^e ^f Han, J.; Jentzen, A.; E, W. (2018). "Solving high-dimensional partial differential equations using deep learning". Proceedings of the National Academy of Sciences. 115 (34): 8505–8510.
^ Bismut, Jean-Michel (1973). "Conjugate convex functions in optimal stochastic control". Journal of Mathematical Analysis and Applications. 44 (2): 384–404. doi:10.1016/0022-247X(73)90066-8.
^ Pardoux, E.; Peng, S. (1990). "Adapted solution of a backward stochastic differential equation". Systems & Control Letters. 14 (1): 55–61.
^ LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. *Nature, 521*(7553), 436-444.
^ ^a ^b ^c ^d Beck, C.; E, W.; Jentzen, A. (2019). "Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations". Journal of Nonlinear Science. 29 (4): 1563–1619.

[Han2018-1] ^ ^a ^b ^c ^d ^e ^f Han, J.; Jentzen, A.; E, W. (2018). "Solving high-dimensional partial differential equations using deep learning". Proceedings of the National Academy of Sciences. 115 (34): 8505–8510.

[2] Bismut, Jean-Michel (1973). "Conjugate convex functions in optimal stochastic control". Journal of Mathematical Analysis and Applications. 44 (2): 384–404. doi:10.1016/0022-247X(73)90066-8.

[Pardoux1990-3] Pardoux, E.; Peng, S. (1990). "Adapted solution of a backward stochastic differential equation". Systems & Control Letters. 14 (1): 55–61.

[4] LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. *Nature, 521*(7553), 436-444.

[Beck2019-5] Beck, C.; E, W.; Jentzen, A. (2019). "Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations". Journal of Nonlinear Science. 29 (4): 1563–1619.

[1]

[2]

[3]

[4]

[5]