While the objective (linearized improvement) is geometrically meaningful, the Euclidean constraint <math>\|\theta_{ti+1} - \theta_ttheta_i\| </math> introduces coordinate dependence. To address this, the natural policy gradient replaces the Euclidean constraint with a [[Kullback–Leibler divergence]] (KL) constraint:<math display="block">\begin{cases}
\end{cases}</math>where the KL divergence between two policies is '''averaged''' over the state distribution under policy <math>\pi_{\theta_ttheta_i}</math>. That is,<math display="block">\bar{D}_{KL}(\pi_{\theta_{ti+1}} \| \pi_{\theta_{ti}}) := \mathbb E_{s \sim \pi_{\theta_ttheta_i}}[D_{KL}( \pi_{\theta_{ti+1}}(\cdot | s) \| \pi_{\theta_{ti}}(\cdot | s) )]</math> This ensures updates are invariant to invertible affine parameter transformations.