Revision as of 04:47, 10 May 2020 edit Cyphra (talk \| contribs) 5 edits →Connection between RKHS with ReLU function ← Previous edit		Revision as of 03:47, 21 May 2020 edit undo Mathreader17 (talk \| contribs) 11 edits →Connection between RKHS with ReLU function: Rewrite section to explain more clearly Tag: Visual edit Next edit →
Line 263: == Connection between RKHS with ReLU function == The [[Rectifier (neural networks)\|ReLU function]] is commonly defined as <math>f(x)=\max (0, x)</math> and is a mainstay in the architecture of neural networks, used as an activation function. ~~We show below how we can construct an RKHS using the a ReLU-like nonlinear function, which gives another characterization of the objective function of neural networks.~~ We show below how we can construct an RKHS using a ReLU-like nonlinear function. This illustrates the representation power of neural networks using ReLU activations.We will work with the Hilbert space <math> \mathcal{H}=C^{1}[0, \infty) </math> of absolutely continuous functions with inner product <math> \langle f,g \rangle_{\mathcal{H}} = \int_0^{\infty}f'(x)g'(x) dx </math>. ~~Recall that we can recover a function by taking its derivative and then its integral on a specified interval. We introduce a function <math>G(x,t)</math>:~~ We start by introducing a function <math> G(x,t) = \begin{cases} 1, & \text{if }0\leq x<t \\ 0, & \text{~~O.W.~~otherwise}\end{cases} </math> and considering its integral <math>K_t(x)</math> with <math>K_t(0)=0</math>: <math>K_t(x)=\int\limits_{0}^{\infty} G(x,t)dx=\begin{cases} x, & \text{if } 0\leq x<0t\\▼ t, & \text{otherwise} \end{cases}=\min(x, t)</math>. ~~Let~~Now ~~<math>~~if ~~K_t'(x)=G(x,t)~~we take </math>~~. In solving for~~t\rightarrow ~~<math>K_t(x)~~\infty</math>, then we ~~can see that~~have <math>~~K_t~~K_\infty(x)=\~~int G(x,t) + C</math>.~~ begin{cases} x, & \text{if } x\geq 0\\ ~~If <math>C=t=0</math>, we have <math>K_0(x)=\begin{cases}~~ ▲ x, & \text{if } x<0\\ 0, & \text{otherwise} \end{cases}=ReLU(x)</math>, and we obtain the ReLU function. ~~which can be compared to <math>~~ ~~ReLU = \begin{cases} x, & \text{if }x>0 \\ 0, & \text{otherwise}\end{cases}</math>~~ ~~Note that <math>K_t</math> is a set of functions that are symmetric about the origin to ReLU. Both functions are, importantly, ramp functions.~~ ~~We can then show that <math>K_t</math> is a reproducing kernel.~~ <math> \forall t\in [0,1] </math>, we have <math> f(t)=\int_{0}^{t} f'(x) dx=\int_{0}^{t} G(x,t)f'(x) dx = \langle K_t(x),f \rangle </math>. ▼ We can show that <math>K_t</math> is a kernel by noting that its corresponding matrix is positive definite. We can also show the reproducing property of <math>K_t</math> using [[Fundamental theorem of calculus\|Fundamental Theorem of Calculus]]: ~~Note that the last equality above comes from the inner product rule, which states <math> \langle f,g \rangle = \int_0^1f'(x)g'(x) dx </math>.~~ ▲~~<math> \forall t\in [0,1] </math>, we have~~ <math> f(t)=\int_{0}^{t} f'(x) dx=\int_{0}^{t\infty} G(x,t)f'(x) dx = \int_{0}^{\infty} K_{t}'(x)f'(x) dx= \langle K_t(x),f \~~rangle~~rangle_{\mathcal{H}} </math>. Let the associated RKHS to <math>K_t</math> be called <math>\mathcal{H}_t</math>. From this, we can find the minimum in the Hilbert space, which corresponds to the optimum in a neural network setting. Using this formulation, we can easily find the minimizing function in this Hilbert space, which corresponds to the optimum in training a neural network with ReLU activation. == See also == *[[Positive definite kernel]]

Reproducing kernel Hilbert space: Difference between revisions