Content deleted Content added
No edit summary |
|||
Line 15:
The transformation is nonlinear and is obtained from data in an iterative way.
== Mathematical Description==
Let <math>Y,X_1,\dots,X_p</math> be random variables. We use <math>X_1,\dots,X_p</math> to predict <math>Y</math>. Suppose <math>\theta(Y),\varphi_1(X_1),\dots,\varphi_p(X_p)</math> are mean-zero functions and with these transformation functions, the fraction of variance of <math>\theta(Y)</math> not explained is
: <math> e^2(\theta,\varphi_1,\dots,\varphi_p)=\frac{E[\theta(Y)-\sum_{i=1}^p \varphi_i(X_i)]^2}{E\theta^2(Y)}</math>
Line 33:
In the bivariate case, ACE algorithm can also be regarded as a method for estimating the maximal correlation between two variables.
== Software Implementation==
The ACE algorithm was developed in the context of known distributions. In practice, data distributions are seldom known and the conditional expectation
should be estimated from data. [[R language]] has a package <kbd>acepack</kbd> which implements ACE algorithm. The following example shows its usage:
Line 45:
plot(a$x,a$tx) # view the carrier transformation
plot(a$tx,a$ty) # examine the linearity of the fitted model
== Discussion ==
The ACE algorithm provides a fully automated method for estimating optimal transformations in multiple regression. It also provides a method for estimating maximal correlation between random variables. Since the process of iteration usually terminates in a limited number of runs, the time complexity of the algorithm is <math>O(np)</math> where <math>n</math> is the number of samples. The algorithm is reasonably computer efficient.
A strong advantage of the ACE procedure is the ability to
incorporate variables of quite different type in terms of the set
of values they can assume. The transformation functions <math>\theta(y),
\varphi_i(x_i)</math> assume values on the real line. Their
arguments can, however, assume values on any set. For example,
ordered real and unordered categorical variables can be incorporated in the
same regression equation. Variables of mixed type are admissible.
As a tool for data analysis, the ACE procedure provides
graphical output to indicate a need for transformations as well
as to guide in their choice. If a particular plot suggests a familiar
functional form for a transformation, then the data can be pre-transformed
using this functional form and the ACE algorithm
can be rerun.
As with any regression procedure, a high degree of association
between predictor variables can sometimes cause the individual
transformation estimates to be highly variable, even
though the complete model is reasonably stable. When this is
suspected, running the algorithm on randomly selected subsets
of the data, or on bootstrap samples can assist
in assessing the variability.
== References ==
|