ARMARARTAGCalibration

From DigitalBlacksmith

Jump to: navigation, search

Contents

[edit] Calibrating ARMAR with ARTag

[edit] Overview

I am currently using two types of tracking with ARMAR:

  • Intersense 900/1200: This is used to position the camera (Valve Source Player) inside a scene. This can be thought of as "long range" or course tracking
  • ARTag: This is used to position individual objects inside the camera's field of view

I am trying to use ARTag to help "back out" the inherit transform that exists between an InterSense 900/1200 tracker and a tracked object.

If you want to use information from the tracker to position the tracked objects in the real world, then you need to know exactly where the tracked object is in world coordinates. However, its difficult to mount InterSense tracking sensors directly on tracked objects. Usually you have to mount it ontop or near the tracked object. This imparts a small (but painful) transformation in the tracking. Examples:

  • Tracking a camera that is (a) providing video for a video see through display and also positioning a virtual camera in a 3D scene.
  • Tracking your hand such that when a person's real hand moves to the location of a virtual object, a proper collision is detected.

My problem is the former. I have a CCD camera that is capturing video and displaying it on the back buffer of a 3D scene. I need the exact position of this CCD camera because the 3D scene "camera" (that is responsible for showing virtual content) must be aligned with the CCD camera for virtual content to appear coincident with real world counterparts.

The basic idea is to use ARTag to help estimate the translation between the camera center and the tracker. The short version is: we place an ARTag fiducial at a known real world coordinate (p), then place a 2 "virtual" markers in the 3D scene -- one marker is statically placed coded at the same coordinate (p) in the 3D scene. This virtual object gets its rendered position and orientation from the virtual camera, which is tied to the InterSense tracker. the second virtual marker gets its position and orientation from ARTag, which is using camera imagery which is mounted with the tracker.

[edit] Registration

I had the thought of using ARTag to help calibrate the IS900 registration. The issue at hand is that objects in my scene are not appearing precisely where I want them to on the video backbuffer. This is the classic registration problem in Augmented Reality. In my current set up, there is a transformation (call it matrix T) between the Intersense tracker and the camera. This is a result of how I have the tracker mounted near the camera. Obviously, you can't place the tracker on the focal plane of the camera, so this will always be the case.

To resolve this mysterious translation T we can either physically measure it or do it with computer vision. Physical measurement will get us close, but requires very precise (mm scale) measurements in orientation and position.

I thought we might use ARTag to help. The idea is to place 2 copies of some virtual object m in the scene -- one tracked by ARTag and one that is registered (using the game engine map) at a known location in the scene, and thus is positioned by the view manager according to where the player is looking. The physical ARTag fiducial feeding ARTag is positioned at the same coordinates as the virtual map-registered object. In short:

ma ARTag tracked model The ARTag model is positioned by taking the player's eye position, and applying the model view matrix transform that is produced by ARTag.

The fiducial is placed at some known point in the environment - p

mb The map registered model This model is added to the 3D scene map, at same location as the physical ARTag fiducial - p


Ideally, ma and mb would appear at the same point in the scene. That is, the ARTag model, which is linked to the player/camera eyes through some transformation would be in the same location as the rendered map-registered model. Lets designate m^{r}_{a} as the rendered point of the ARTag model and m^{r}_{b} as the rendered point of the map model. So we want:

m^{r}_{a} = m^{r}_{b}

For this to occur, the render transformation chain for each model would have to result in model pixels ma and mb (think of these as either individual pixels in identical models or identical sets of pixels in identical models) being generated at the same point at the end of the rendering.

[edit] Transformation Chains

Let's define the transformations:

  • S - The transformation coming from the Intersense tracker. Physical world space
  • T - The unknown transformation from the Intersense tracker to the camera/view space
  • V - The transformation of any point in the virtual world space to camera/view space
  • A - The transformation (Model View Matrix) of the ARTag marker in player space
  • Wb - The transformation of the map registered model from model space to virtual world space
  • P - The transformation of points from the player space to the virtual world space




Now let's identify the transformation chains.

[edit] Base ARTag model transformation chain

Here's the transformation of the ARTag model to view space:

m^{r}_{a} = m_{a} \times P \times A \times V

This chain takes the ARTag tracked model, rotates/translates it to player space (P), rotates/translates it along the player's line of site (A) then positions it in view space for rendering (V)

[edit] Base Map model transformation chain

Here's the transformation of the map registered pixel to view space:

m^{r}_{b} = m_{b} \times W_{b} \times V

This chains takes the map registered model, translates/rotates it into world space, then positions it in view space for rendering.

[edit] The Role of the Intersense Tracker

The Intersense tracker is responsible for changing the location of the player/camera. So the player/camera transform (P) is really the tracker transform concatenated with the unknown offset (T) between the tracker and the camera/player. So we rewrite P as:

P = T \times S

Additionally, the view space of the rendering is a function of the Intersense Tracker. Because we are linking the player to the tracker, and the game engine prescribes that the player's view is the "camera" view, any changing in the tracking (S) or the unknown offset (T) will change V.

So we rewrite V as:

V = T \times S

[edit] Solving the Matrix Chains

As mentioned above, we want:

m^{r}_{a} = m^{r}_{b}

Substituting our base chains:



m_{a}  \times P \times A \times V =  m_{b} \times W_{b} \times V



And expanding the V and P terms:



m_{a}  \times (T \times S) \times A \times T \times S =  m_{b} \times W_{b} \times T \times S



Let:



Z = (T \times S)



Then:



m_{a}  \times Z \times A \times Z \times Z^{-1} =  m_{b} \times W_{b} \times Z \times Z^{-1}




Simplifying:



m_{a} \times Z \times A  \times I =  m_{b} \times W_{b} \times I



And dropping identities and bringing back T and S:



m_{a}  \times T \times S  \times A  =  m_{b} \times W_{b}



OK..this makes sense. It says that a model positioned at some translation and rotation in player space then rotated/translated to the ARTag fiducial, then converted to world space (remember: P = T x S) is the same point as where the map model is in world space.



Now, because the ARTag model and the map model share the same pixels:



ma = mb



So we drop these terms from both sides:



T \times S \times A =  W_{b}



Multiplying by the inverse of A from the RHS:



T \times S \times A \times A^{-1} =  W_{b} \times A^{-1}



Yields:



T \times S =  W_{b} \times A^{-1}



This also makes sense. It says the player/camera (remember P = T x S) should be located at the same location as if we started at the map model and rotated and translated the same but opposite distance/angles as where the ARTag marker is in relation to the camera..



Multiplying by the inverse of S from the RHS:



T =  W_{b} \times A^{-1} \times S^{-1}



Using the inverse identity (GH) − 1 = G − 1H − 1:


T =  W_{b} \times (A \times S)^{-1}



OK..this also makes sense. We known that Wb is hard coded to the physical coordinates of the test marker. If we multiply that by the inverse of the ARTag matrix, that puts us at the camera's physical world coordinates (from ARTag -- not InterSense). If we multiply this matrix by the inverse of the tracker (S) we remove the effects of the InterSense tracker, and are left with essentially the "difference" between where ARTag places the camera and where the InterSense tracker places it.

...Lets test this!

[edit] Testing

[edit] Toy Example

Suppose that the InterSense tracker was perfectly mounted coplanar with the focal plane of the camera, but is exactly 1 unit up from the focal center (or rather, the focal center is one unit down from the tracker):



T=\left[ \begin{array}{cccc}  1 & 0 & 0 & 0 \\  0 & 1 & 0 & 0 \\  0 & 0 & 1 & 0 \\  0 & 0 & -1 & 1 \end{array} \right]


So, if our equation for T is correct then the above matrix should be our result...

Now suppose that we had our calibration point mounted on a calibration pedestal at real world coordinates (15,0,0,1):

Note: I am using the homogeneous representation of a position vector (e.g. a point)



W_{b} = \left[ \begin{array}{cccc}  1 & 0 & 0 & 0 \\  0 & 1 & 0 & 0 \\  0 & 0 & 1 & 0 \\  15 & 0 & 0 & 1 \end{array} \right]



Let's assume that the InterSense tracker reports the camera (the one detected ARTag fiducials) position as one unit up from the origin:



S=\left[ \begin{array}{cccc}  1 & 0 & 0 & 0 \\  0 & 1 & 0 & 0 \\  0 & 0 & 1 & 0 \\  0 & 0 & 1 & 1 \end{array} \right]



Let's assume that we have an ARTag fiducial on the calibration pedestal. There is no camera distortion, and the intrinsic parameters are known (perfectly). The ARTag Model View matrix reports the marker as exactly 15 units out along the camera FOV and with no rotation:



A=\left[ \begin{array}{cccc}  1 & 0 & 0 & 0 \\  0 & 1 & 0 & 0 \\  0 & 0 & 1 & 0 \\  15 & 0 & 0 & 1 \end{array} \right]



Using our equation:

T =  W_{b} \times (A \times S)^{-1}



We get:

T =     \left[ \begin{array}{cccc}  1 & 0 & 0 & 0 \\  0 & 1 & 0 & 0 \\  0 & 0 & 1 & 0 \\  15 & 0 & 0 & 1 \end{array} \right]    \left( \left[ \begin{array}{cccc}  1 & 0 & 0 & 0 \\  0 & 1 & 0 & 0 \\  0 & 0 & 1 & 0 \\  15 & 0 & 0 & 1 \end{array} \right]  \left[ \begin{array}{cccc}  1 & 0 & 0 & 0 \\  0 & 1 & 0 & 0 \\  0 & 0 & 1 & 0 \\  0 & 0 & 1 & 1 \end{array} \right]  \right)^{-1}



Which resolves to:




\left[ \begin{array}{cccc}  1 & 0 & 0 & 0 \\  0 & 1 & 0 & 0 \\  0 & 0 & 1 & 0 \\  0 & 0 & -1 & 1 \end{array} \right]



Which corresponds to our injected T..

Now...need to test it in the application!

--Steve 23:47, 26 March 2008 (EDT)

[edit] Application Testing

I've coded up the relationship for T in a special CARMARCalibration object.

After a lot of tweaking, I finally have it close -- it generates a consistent T value, in the range of where I expect given crude measurements.

However, I am getting a stray Y offset that I didn't expect.

The problem appears to be in the rotation matrices, especially the one that is used to derive A. I'm not totally convinced the its correct. I thought I could just reuse the one from the CARTagTrackedObject class.

Possible sources of errors include:

  • Bad Marker/Weak Toolbar: I am getting a weird canting of toolbar1 on the artag object that is used for this experiment.

I checked and rechecked the cf file, but I need to (a) check it again and (b) try other markers. Also, why am I only using a toolbar? Why not make a more dense constellation as to yeild more occurate results?

  • Using pPLayer->EyeAngles() instead of AbsAngles.
  • Loading the A martix (and setting modelViewMatrix to it)
  • In the A matrix I send to the calibration object, is that ARTag space, PlayerSpace, or game world space
  • The OpenGL to DirectX translation that I've used (successfully) in the artag object may not be the best way to go.

I basically manipulate the signs in the MVM coming from ARTag and swap things around. Why not try changing the coordinate system in the cf file to return DirectX coordinates? (This doesn't track in ARTag) Or why no try removing translation, rotating the coordinate frame, the reapplying the translation? Links: [1]

[edit] Verifying ARTag to Valve Matrix Chains

I decided to trace the transformations all the way back to ARTag and the OpenGL code.

Click here to follow my trek

[edit] Estimation of T

Once we are confident that the equation is good, we will use a least-squared regression to approximate the values of T.

Something akin to this.

[edit] References

Yuko Uematsu, Hideo Saito - AR registration by merging multiple planar markers at arbitrary positions and poses via projective space
ICAT '05: Proceedings of the 2005 international conference on Augmented tele-existence pp. 48--55, New York, NY, USA, 2005
Bibtex
Auteur : Yuko Uematsu, Hideo Saito
Titre : AR registration by merging multiple planar markers at arbitrary positions and poses via projective space
Dans : ICAT '05: Proceedings of the 2005 international conference on Augmented tele-existence -
Adresse : New York, NY, USA
Date : 2005
link

Kato, H., Billinghurst, M. - Marker tracking and HMD calibration for a video-based augmented reality conferencing system
Augmented Reality, 1999. (IWAR '99) Proceedings. 2nd IEEE and ACM International Workshop on pp. 85-94, 1999
Bibtex
Auteur : Kato, H., Billinghurst, M.
Titre : Marker tracking and HMD calibration for a video-based augmented reality conferencing system
Dans : Augmented Reality, 1999. (IWAR '99) Proceedings. 2nd IEEE and ACM International Workshop on -
Adresse :
Date : 1999

link


Fuhrmann A.L. Fuhrmann, Splechtna R., Přikryl J. - Comprehensive Calibration and Registration Procedures for Augmented Reality
Proceedings of the Joint 5th Immersive Projection Technology and 7th Eurographics Virtual Environments Workshop (EGVE-01) pp. 219-228, 2001
Bibtex
Auteur : Fuhrmann A.L. Fuhrmann, Splechtna R., Přikryl J.
Titre : Comprehensive Calibration and Registration Procedures for Augmented Reality
Dans : Proceedings of the Joint 5th Immersive Projection Technology and 7th Eurographics Virtual Environments Workshop (EGVE-01) -
Adresse :
Date : 2001

link

Personal tools