Monday, April 8, 2013

Barriers to triangulation in video

I've been doing a little work, here and there. I figure it's time to provide an update, even without reaching the milestones I was pursuing.

New Landmarks


I created a new table setup with new landmarks, with an emphasis on the area over the table. Mr. W suggested that using the legs under the table was not helping, because it's the area over the table that we need to be accurate with. And panning the camera down to see the legs was removing our view of the playing area above the table. Both good observations.

So now I have lots of points over the table. I used some more objects from around the house, and either rested them on the table, or stood them up along the sides of the table. Most importantly, I am making use of the other half of the table to provide a backdrop with interesting features that I can measure. Here is the left image of the new landmarks, with the points marked.





I was hoping that using the setup would improve my reprojection error. In a previous post, I believe I said that I had the reprojection error down to 1.5cm or something like that. I now think I was mistaken. It was more like 3.5cm when using cameras far apart, which ought to have triangulation advantages.

Unfortunately, my reprojection error is not better. I'm now getting about 4cm average error, which is fairly significant. Knowing the ball's location to within 4cm is not that reassuring.

The first possible explanation is the inherent bad resolution of the images. I'm only marking landmarks to the nearest pixel. Many of the new landmarks are further away from the cameras, meaning that a single pixel covers more space.

The second possible explanation is that my real-world measurements might not be accurate enough. Some of these new landmarks are not as easy to measure. For example, for the vertical cardboard boxes, I was able to accurately measure the bottom, outside, near corner of the box, and then measure the height and width of the box. In a perfect world that would locate the landmark at the top well. In an imperfect world, the box may be leaning one way or the other. The vertical metal posts jutting up from the floor along the edge of the table might not be perfectly vertical. They might be leaning in or out, forward or back.

My best idea to overcome these problems is to run a numerical optimization to adjust the x,y locations that I've selected to subpixel accuracy, and even adjust the x,y,z locations of the uncertain landmarks (i.e. the table corners are exact by definition, but others are error-prone measurements). That's a fair bit of effort, and would make use of a non-free library that I use for work... meaning my code would no longer be open source and shareable.

Video Sync


The next big problem I encountered is the sync of the left and right cameras. As I've said before, I need to know how far apart the left and the right images are in time so that I can attempt to interpolate points before doing triangulation. Last time I wrote about this, I didn't know how I was going to solve it, so I was briefly quite happy when I found that OpenCV will provide a timestamp for each frame as you read a video file, using the VideoCapture::get(CV_CAP_PROP_POS_MSEC) method.

My joy was short-lived, however, when I found that the timestamps don't seem to be accurate. The timestamps are milliseconds since video start, not an absolute time, so the offset of the starting times of the two videos needs to be found. I was trying to calculate the offset by identifying the frames/timestamps before and after six ball bounces in the videos. The ball bouncing is an easy event to use to synchronize on. Unfortunately the offset at each of those six ball bounces was different, drifting over time. It started as 1403ms and ended as 1475ms.

I think the timestamps are simply applying the average frame rate to calculate the time, rather than actually measuring time, and that the actual frame rate isn't steady enough to be using the average frame rate for this purpose. Said another way, I think OpenCV assumes the frames are evenly spaced in time, but in reality there is jitter in the frame rate. This would mean that the timestamps are essentially useless for synchronizing.

Triangulation

I have manually marked the x,y locations of the ball in a few seconds of each camera's video. When using my wonderful ball-marking tool that still takes many minutes per second of video. Now I don't know what to do with them, because I don't know how to correlate the two cameras together. I need to know the x,y locations of the ball in both cameras at the same instant in order to do a triangulation. Mr. W has been proposing to interpolate one camera's ball positions in consecutive frames to approximate the location at the same instant as a frame from the other camera, but that requires at least being able to choose the nearest frame from the other video (and really needs even more precision to do the interpolation).

Even if I solve that problem, my triangulation will still be 4cm off of the true location.

So that's where things are at the moment. I'm not sure what's going to happen next. I might explore the live video capabilities of OpenCV to see if I can synchronize better that way. (Recall that I am currently using guvcview to record videos, then replaying them with OpenCV.) Maybe I can get timestamps more accurately that way, though it would make it very hard to do research work if I need to do everything live.