Reproducing kernel Hilbert space: Difference between revisions

Content deleted Content added
Cyphra (talk | contribs)
Connection between RKHS with ReLU function: Rewrite section to explain more clearly
Line 263:
== Connection between RKHS with ReLU function ==
The [[Rectifier (neural networks)|ReLU function]] is commonly defined as <math>f(x)=\max (0, x)</math> and is a mainstay in the architecture of neural networks, used as an activation function.
We show below how we can construct an RKHS using the a ReLU-like nonlinear function, which gives another characterization of the objective function of neural networks.
 
We show below how we can construct an RKHS using a ReLU-like nonlinear function. This illustrates the representation power of neural networks using ReLU activations.We will work with the Hilbert space <math> \mathcal{H}=C^{1}[0, \infty) </math> of absolutely continuous functions with inner product <math> \langle f,g \rangle_{\mathcal{H}} = \int_0^{\infty}f'(x)g'(x) dx </math>.
Recall that we can recover a function by taking its derivative and then its integral on a specified interval. We introduce a function <math>G(x,t)</math>:
 
We start by introducing a function <math> G(x,t) = \begin{cases} 1, & \text{if }0\leq x<t \\ 0, & \text{O.W.otherwise}\end{cases} </math> and considering its integral <math>K_t(x)</math> with <math>K_t(0)=0</math>: <math>K_t(x)=\int\limits_{0}^{\infty} G(x,t)dx=\begin{cases}
x, & \text{if } 0\leq x<0t\\
t, & \text{otherwise}
\end{cases}=\min(x, t)</math>.
 
LetNow <math>if K_t'(x)=G(x,t)we take </math>. In solving fort\rightarrow <math>K_t(x)\infty</math>, then we can see thathave <math>K_tK_\infty(x)=\int G(x,t) + C</math>. begin{cases}
x, & \text{if } x\geq 0\\
 
If <math>C=t=0</math>, we have <math>K_0(x)=\begin{cases}
x, & \text{if } x<0\\
0, & \text{otherwise}
\end{cases}=ReLU(x)</math>, and we obtain the ReLU function.
which can be compared to <math>
ReLU = \begin{cases} x, & \text{if }x>0 \\ 0, & \text{otherwise}\end{cases}</math>
 
Note that <math>K_t</math> is a set of functions that are symmetric about the origin to ReLU. Both functions are, importantly, ramp functions.
 
We can then show that <math>K_t</math> is a reproducing kernel.
 
<math> \forall t\in [0,1] </math>, we have <math> f(t)=\int_{0}^{t} f'(x) dx=\int_{0}^{t} G(x,t)f'(x) dx = \langle K_t(x),f \rangle </math>.
 
We can show that <math>K_t</math> is a kernel by noting that its corresponding matrix is positive definite. We can also show the reproducing property of <math>K_t</math> using [[Fundamental theorem of calculus|Fundamental Theorem of Calculus]]:
Note that the last equality above comes from the inner product rule, which states <math> \langle f,g \rangle = \int_0^1f'(x)g'(x) dx </math>.
 
<math> \forall t\in [0,1] </math>, we have <math> f(t)=\int_{0}^{t} f'(x) dx=\int_{0}^{t\infty} G(x,t)f'(x) dx = \int_{0}^{\infty} K_{t}'(x)f'(x) dx= \langle K_t(x),f \ranglerangle_{\mathcal{H}} </math>.
Let the associated RKHS to <math>K_t</math> be called <math>\mathcal{H}_t</math>. From this, we can find the minimum in the Hilbert space, which corresponds to the optimum in a neural network setting.
 
Using this formulation, we can easily find the minimizing function in this Hilbert space, which corresponds to the optimum in training a neural network with ReLU activation.
== See also ==
*[[Positive definite kernel]]