This example shows an NLREG program that fits a line in 3-dimensional
space to a set of data points.
This problem seems similar to what simple linear regression does: fit a
straight line to a set of data points. However, ordinary
linear regression minimizes the sum of the squared deviations between the
points and the line, and it defines the deviation as the distance
in the vertical (Y) direction. The problem we are going to solve in this
example minimizes the direct distance between the points and the line.
The direct distance is along a line that runs from the point and is
perpendicular to the target line. In the following figure, the
distance 'd' is the direct distance from the point at (X0,Y0,Z0) to the line.
The parametric equation for a 3D line is:
Xp = X0 + Vx*t
Yp = Y0 + Vy*t
Zp = Z0 + Vz*t
Where (X0,Y0,Z0) is some point on the line and
<Vx,Vy,Vz> is a vector defining the direction of the line.
t is the parameter whose value is varied to define points on the line.
With this definition, there are six parameters: X0, Y0, Z0, Vx, Vy, Vz.
But this overspecifies the line because a 3D line can be defined by 4 parameters
as long as it is not parallel to one of the X, Y or Z planes.
When fitting a function to data, it is important that there are no mutually
dependent (redundant) parameters in the function. If there are mutually
dependent parameters, then there is no unique solution, and the fitting
process will not converge.
So we need to eliminate two parameters.
Rather than allowing an arbitrary point (X0,Y0,Z0) to specify a point on
the line, we will force Z0 to be 0 and make X0 and Y0 be the coordinates on
the X-Y plane where the line penetrates the plane (i.e., where Z is zero).
This eliminates Z0 as a parameter that needs to be computed. We can do
this as long as we know that the line is not parallel to the X-Y plane, so
it intersects it at some point.
Next, we will work on the direction vector <Vx,Vy,Vz> that defines
the direction of the line. Scaling the direction vector by a non-zero
factor changes its length but not its direction (e.g., the direction
defined by the vector <1,2,3> is the same as <2,4,6>, but the
second vector is twice as long). If we scale the direction vector by 1/Vz
to force Vz to be 1, then we can define a revised direction vector,
<Vx2,Vy2,Vz2>
where Vx2 = Vx/Vz, Vy2 = Vy/Vz and Vz2 = 1
So we will force Vz to be 1 and define Vx and Vy as multiples of Vz.
This eliminates Vz as a parameter that needs to be computed. Note that this
is only valid if Vz is not zero which means the line is not parallel to the
X-Y plane. You can divide by Vx or Vy if you want to allow the line to be
parallel to the X-Y plane but not some other plane.
The NLREG statements for this analysis are as follows:
/*
* Fit a 3D line in parametric form to a set of points in (X,Y,Z) space.
*/
Title "Fit a 3D parametric line to a set of data points";
/*
* The input values are a set of (Xp,Yp, Zp) coordinates
* for each point to be fit.
*/
Variables Xp, Yp, Zp;
/*
* Parameters whose values are forced to be constant to avoid mutually
* dependent parameters.
*/
Constant Z0 = 0;
Constant Vz = 1;
/*
* So, the parameters remaining to be computed are:
*
* X0 = The X coordinate on the X-Y plane where the line penetrates.
* Y0 = The Y coordinate on the X-Y plane where the line penetrates.
* Vx = The change in X value relative to the change in Z for a change of 1 on the t parameter.
* Vy = The change in Y value relative to the change in Z for a change of 1 on the t parameter.
*/
Parameters X0, Y0, Vx, Vy;
/*
* Define some work variables.
*/
Double A, Dx, Dy, Dz, Distance, length;
Double Ux, Uy, Uz;
/*
* Compute a unit vector (Ux,Uy,Uz) with length = 1 for the direction.
* This normalizes the direction vector .
*/
length = sqrt(Vx^2 + Vy^2 + Vz^2);
Ux = Vx / length;
Uy = Vy / length;
Uz = Vz / length;
/*
* Compute the distance for a point (Xp,Yp,Zp) from the line.
*/
A = Ux*(Xp-X0) + Uy*(Yp-Y0) + Uz*(Zp-Z0);
Dx = (Xp-X0) - A*Ux;
Dy = (Yp-Y0) - A*Uy;
Dz = (Zp-Z0) - A*Uz;
Distance = sqrt(Dx^2 + Dy^2 + Dz^2);
/*
* The function to be minimized by NLREG is the sum of the squared
* distances of the points from the line:
*/
Function Distance;
/*
* Data values for the Xp,Yp,Zp points follow.
*/
Data;
[data values go here]
Note that there is no dependent variable or equal sign to the left
of the function. NLREG will determine the values of the
parameters X0, Y0, Vx and Vy such that the sum of the squared values
of the function (i.e., the sum of the squared distances) is
minimized.