# Some thoughts on range sensor design

## Range sensor design and calibration

From: Bob Cromwell cromwell@ecn.purdue.edu Date: Fri, 11 Feb 2000 18:57:11 -0500 (EST) To: rvl-people@ecn.purdue.edu Subject: Range sensor design and calibration Cc: Stuart.Poss@usm.edu, gyoung@sla.purdue.edu, sblask@harris.com, seth@cs.uiuc.edu Given recent lab discussions of range sensor design, I thought I'd put in my US$ 0.02. If you don't care about designing and calibrating single-plane structured light range sensors, bail out now.... MEASURING RESOLUTION (OR SPATIAL SAMPLING RATES) One common question about a sensor is, "What is its resolution?" Well, to be pedantic, the sensor has no resolution. It does have a spatial sampling rate that we could quantify. Or "granularity" if you prefer. These sensors are really 2-D sensors -- they can only measure points within a plane. You translate (or rotate) the sensor (or target) to sweep out a volume of interest. Now, how thick is your light plane, meaning how wide is the laser stripe? Depends on your optics and the surface characteristics. At some scale, most anything is translucent, so no matter how narrow you make your beam, the stripe will "bloom" on the target surface. Also, to quickly dash any remaining hopes of subatomic sensing, consider that your roughly 512x485 camera sensor array must sample the target region. Divide that by two to be safe (and allow for image-position- dependent sampling rates), and you can assume that you cannot reliably measure anything smaller than 1/200 the maximum target size. SPATIAL SAMPLING PERPENDICULAR TO THE LIGHT PLANE What relationship should we have between the stripe width and the size of the perpendicular steps made by the sensor (or target)? Again, the answer is, "it depends". Depends on the texture of the target, and whether or not you care about that texture. If you're doing gross object recognition (e.g., radiator tubes, or auto tires), you really want to limit the amount of data to just what you really need to recognize and locate the object of interest. The steps should be MUCH larger than the beamwidth! If you're doing high-resolution data capture, however, when you move more than one beamwidth you risk missing a feature, whether it's convex (e.g., a spine on a fish skull) or concave (e.g., a sign on a cuneiform tablet). You would like to move by one beamwidth per step. But it's more complex yet, as the beam doesn't have a nice boxcar-shaped cross-section of intensity. It's vaguely Gaussian (the precise shape doesn't matter, because it's not as if the video camera and digitizer have anything resembling a linear response). By "beamwidth" we could mean something like the width where the intensity has fallen to 1% of maximum. There are some interesting signal-processing discussion to be had over how much meaningful information can be extracted by steps smaller than this approximate beamwidth. But on a practical matter, make your steps too small and you drown in data! Wow, all this and I haven't even mentioned sampling within one frame yet... SPATIAL SAMPLING WITHIN THE LIGHT PLANE The sampling rate within one digitized frame is the sampling rate within one planar slice of the target volume. For the time being we'll just consider the trivial one-slice data collection. The camera's optical axis intersects the light plane at an acute angle, meaning: -- The camera sees a roughly trapezoidal region of the light plane -- Spatial sampling varies with position in that region, and thus with (row,column) position in the image You can derive equations that describe the spatial sampling rate as a function of (row,column) position, see the first chapter of my thesis if you're really curious. But those equations are useless for anything more than estimates for proposed designs, as they are based on six variables, only two of which you can accurately measure. Can we measure spatial sampling? Sure, we'll just have to calibrate the camera first... CALIBRATING THE SENSOR Calibration is probably best done by the method described in 'Modeling and calibration of a Structured Light Scanner for 3-D Robot Vision' (C.H. Chen and A.C. Kak, Proceedings of the IEEE International Conference on Robotics and Automation, pp. 807-815, Raleigh NC, March 1987). In this method you have a 4x3 matrix T, and measure the (row,column) position of a point of interest. Then: |x'| |row| |x| |x'/t| |y'| = T |col| and |y| = |y'/t| |z'| | 1 | |z| |z'/t| |t | Some words on hardware -- the math assumes that a simple pinhole model is appropriate for the camera and lens. Things that violate this assumption are very wide-angle lenses (e.g., more than maybe 45 degree field of view), relatively large apertures, and optically complex lens assemblies (e.g., zoom lenses, or anything with too many elements). Oddly enough, this is a case where cheaper is usually better, at least for the lens.... Let's say we grabbed an image of a calibration target, something with easy-to-find (row,column) image positions of things for which we accurately know their real (x,y,z) world positions. A block of pins is useful. The first such pin is really at [x1,y1,z1] and we spotted it at [r1,c1] in the image. The second is at [x2,y2,z2] and was spotted at [r2,c2]. And so on, to pin number N. The image processing applied to locate the target points is crucial. It's best to threshold the image by setting to zero all points dimmer than some limit. Find the remaining non-zero connected components. Find the center of mass. You should have the target image coordinates in subpixel accuracy. Of course, your eventual spotting of measured points should be done in a similar way, so you're measuring as you calibrated. Now, to assume we have N pairs of world/image coordinate pairs, and to cut-and-paste from some huge comment blocks in some software I wrote, we have a bunch of equations of the following form, where T is unknown: |t11 t12 t13||ri| |xi'| |xi| |xi'/t| |t21 t22 t23||ci| = |yi'| |yi| = |yi'/t| {Eq 1} |t31 t32 t33|| 1| |zi'| |zi| |zi'/t| |t41 t42 t43| |ti | Rearranging the second part: |xi'| = |xi*ti| |xi' - xi*ti| |yi'| = |yi*ti| |yi' - yi*ti| = 0 {Eq 2} |zi'| = |zi*ti| |zi' - zi*ti| We could solve for each row of T individually, where we would have N equations of each. But let's not really solve for them individually, that's too much work. Just arrange the equations like we're going to do it. For the first row: t11*r1 + t12*c1 + t13 = x1' t11*r2 + t12*c2 + t13 = x2' {Eq 3} ........................... t11*rN + t12*cN + t13 = xN' and for the second row: t21*r1 + t22*c1 + t23 = y1' t21*r2 + t22*c2 + t23 = y2' {Eq 4} ........................... t21*rN + t22*cN + t23 = yN' and for the third row: t31*r1 + t32*c1 + t33 = z1' t31*r2 + t32*c2 + t33 = z2' {Eq 5} ........................... t31*rN + t32*cN + t33 = zN' and for the fourth row: t41*r1 + t42*c1 + t43 = t1 t41*r2 + t42*c2 + t43 = t2 {Eq 6} .......................... t41*rN + t42*cN + t43 = tN Expanding {Eq 2} above, using blocks {Eq 3} and {Eq 6}, gives us: t11*r1 + t12*c1 + t13 - t41*r1*x1 - t42*c1*x1 - t43*x1 = 0 t11*r2 + t12*c2 + t13 - t41*r2*x2 - t42*c2*x2 - t43*x2 = 0 ............................................................... t11*rN + t12*cN + t13 - t41*rN*xN - t42*cN*xN - t43*xN = 0 and using blocks {Eq 4} and {Eq 6}: t21*r1 + t22*c1 + t23 - t41*r1*y1 - t42*c1*y1 - t43*y1 = 0 t21*r2 + t22*c2 + t23 - t41*r2*y2 - t42*c2*y2 - t43*y2 = 0 ............................................................... t21*rN + t22*cN + t23 - t41*rN*yN - t42*cN*yN - t43*yN = 0 and using blocks {Eq 5} and {Eq 6}: t31*r1 + t32*c1 + t33 - t41*r1*z1 - t42*c1*z1 - t43*z1 = 0 t31*r2 + t32*c2 + t33 - t41*r2*z2 - t42*c2*z2 - t43*z2 = 0 ............................................................... t31*rN + t32*cN + t33 - t41*rN*zN - t42*cN*zN - t43*zN = 0 The lower-right term in the matrix T, t43, can be arbitrarily set to 1, which will scale the remainder of the matrix as needed. Yes, we're going to solve the whole mess at once! Put those three blocks of N equations each together, lining up the columns, set t43=1, rearrange just a little, and notice that we've got a sparsely populated matrix multiplied by the vector [t11,t12,t13,t21,t22,t23,t31,t32,t33,t41,t42], and we've got 3*N equations in 11 unknowns: |r1 c1 1 0 0 0 0 0 0 -r1*x1 -c1*x1||t11| |x1| |r2 c2 1 0 0 0 0 0 0 -r2*x2 -c2*x2||t12| |x2| |..........................................||t13| |..| |rN cN 1 0 0 0 0 0 0 -rN*xN -cN*xN||t21| |xN| | 0 0 0 r1 c1 1 0 0 0 -r1*y1 -c1*y1||t22| |y1| | 0 0 0 r2 c2 1 0 0 0 -r2*y2 -c2*y2||t23| = |y2| {Eq 7} |..........................................||t31| |..| | 0 0 0 rN cN 1 0 0 0 -rN*yN -cN*yN||t32| |yN| | 0 0 0 0 0 0 r1 c1 1 -r1*z1 -c1*z1||t33| |z1| | 0 0 0 0 0 0 r2 c2 1 -r2*z2 -c2*z2||t41| |z2| |..........................................||t42| |..| | 0 0 0 0 0 0 rN cN 1 -rN*zN -cN*zN| |zN| Now, see "Numerical Recipies in C", Press, Flannery, Teukolsky, and Vetterling, 1998, sections 2.0-2.1. Our problem is to solve for x in Ax=b, where in our {Eq 7}, "A" is the big matrix, "x" is the vector of tij, and "b" is the vector [x1,x2,....zN]:: | r1 ... -c1*x1 | | t11 | | x1 | A = | ... ... ...... | x = | ... | b = | .. | | 0 ... -cN*zN | | t41 | | zN | A has (many) more rows than columns, and thus the problem is (very) overdetermined. We form the normal equations and find the linear least squares solution (i.e., that solution minimizing the sum of the squares of the differences between the left and right sides of the above equations) by solving for (At*A)x = (At)b, where "At" is the transpose of matrix A and "*" is matrix multiplication. At this point, code like that in the book can be applied. Since we have 3*N equations in 11 unknowns, we could scrape by with just 4 target points. But that would be bad -- what if one pin were bent, or we mismeasured due to dust on the pin, or the camera was noisy, or .... It's best to have more points. Just make sure that no three are collinear or the math can go terribly wrong. You could measure the quality of T by using it to calculate [x,y,z] positions for the target [row,column] values, and comparing that to the known values. The min/average/max error at this back-calculation might give you a useful measure of calibration quality. It will also quite likely point out when you have a bent pin on your calibration target.... USING THE CALIBRATION MATRIX Now our sensor is calibrated and T is fully populated. We can calculate an [x,y,z] position for any measured [row,column] position. That also means we can calculate [x,y,z] positions for points we didn't necessarily measure -- handy for characterizing spatial sampling! The error for a one-pixel error in column selection (the common error for our sensors with the light stripe segments roughly parallel to columns) is the Euclidean distance between the [x,y,z] positions calculated for [row,column] and [row,column+1]. Or [row,column-1], if you prefer... Let's call this along-row sampling rate delta_r. The error for a one-pixel error in row measurement (which is the sampling along one light stripe segment in a digitized frame) is the Euclidean distance between the [x,y,z] positions calculated for [row,column] and [row+1,column]. Or [row-1,column], if you prefer... Let's call this along-column sampling rate delta_c. There are (roughly) a quarter million [row,column] positions in a digitized video frame at which the above two measures could be calculated, so which ones should we pay attention to? Presumably you have the camera pointing more or less right at the area of interest. With Nr rows and Nc columns in the digitized frame, the position [Nr/2,Nc/2] might be an appropriate place to check the spatial sampling rate. Both the best and worst spatial sampling rates will appear somewhere around the margin of the image, so we can quickly calculate sampling rates for the 2*(Nr + Nc) border pixel locations. (OK, so the best sampling rate will be somewhere within the image if the point on the light plane at which a perpendicular to the plane passes through the camera's virtual pinhole actually appears in the image, and in this degenerate case will be at that point. Not like this is likely to happen) PUTTING IT ALL TOGETHER We have delta_r and delta_c, sampling within one plane. And our step between frames defines the third measure of sampling, let's call this delta_f. The three sampling rates are (more or less) perpendicular to each other. What relationship should they have to each other? Good question. Again, what criteria do you want for data quality estimation? If you're cautious, you might consider the limiting sampling rate for a sensor to be the maximum of those. As you can likely reduce delta_f by making your steps smaller (and you certainly could with some sort of gearing on the stepper motor, or micro-stepping), the maximum of delta_r and delta_c would be the limiting factor. However, if delta_r, delta_c, and delta_f are not of the same order of magnitude, serious consideration of sampling rates is of questionable meaning. If you can precisely locate points, but the points must be spaced very far apart, does that really help? Another thing to consider is that delta_r is more of an error measure, while delta_c and delta_f are sampling. The only one you can directly control is delta_f, as delta_r and delta_c are functions of geometry, optics, and video signaling (scan lines per frame and pixels per scan line), or CCD array dimensions for digital cameras. Bob

### Image Enhancement

### Applications

### Data Collection and Analysis

**Only of historical interest...**

Viewport size:

© by
Bob Cromwell
Mar 2017. Created with
`vim`

and
ImageMagick,
hosted on
OpenBSD
with
Apache.

Root password available
here,
privacy policy here,
contact info here.