Intel® RealSense™ Cross Platform API
Intel Realsense Cross-platform API
|
This document describes the projection mathematics relating the images provided by librealsense
to their associated 3D coordinate systems, as well as the relationships between those coordinate systems. These facilities are mathematically equivalent to those provided by previous APIs and SDKs, but may use slightly different phrasing of coefficients and formulas.
Each stream of images provided by librealsense
is associated with a separate 2D coordinate space, specified in pixels, with the coordinate [0,0]
referring to the center of the top left pixel in the image, and [w-1,h-1]
referring to the center of the bottom right pixel in an image containing exactly w
columns and h
rows. That is, from the perspective of the camera, the x-axis points to the right and the y-axis points down. Coordinates within this space are referred to as "pixel coordinates", and are used to index into images to find the content of particular pixels.
Each stream of images provided by librealsense
is also associated with a separate 3D coordinate space, specified in meters, with the coordinate [0,0,0]
referring to the center of the physical imager. Within this space, the positive x-axis points to the right, the positive y-axis points down, and the positive z-axis points forward. Coordinates within this space are referred to as "points", and are used to describe locations within 3D space that might be visible within a particular image.
The relationship between a stream's 2D and 3D coordinate systems is described by its intrinsic camera parameters, contained in the rs_intrinsics
struct. Each model of RealSense device is somewhat different, and the rs_intrinsics
struct must be capable of describing the images produced by all of them. The basic set of assumptions is described below:
width
and height
fields describe the number of rows and columns in the image, respectivelyfx
and fy
fields describe the focal length of the image, as a multiple of pixel width and heightfx
and fy
fields are allowed to be different (though they are commonly close)ppx
and ppy
fields describe the pixel coordinates of the principal point (center of projection)model
field describes which of several supported distortion models was used to calibrate the image, and the coeffs
field provides an array of up to five coefficients describing the distortion modelKnowing the intrinsic camera parameters of an images allows you to carry out two fundamental mapping operations.
rs_project_point_to_pixel(...)
.rs_deproject_pixel_to_point(...)
.Intrinsic parameters can be retrieved via a call to rs_get_stream_intrinsics
for any stream which has been enabled with a call to rs_enable_stream
or rs_enable_stream_preset
. This is because the intrinsic parameters may be different depending on the resolution/aspect ratio of the requested images.
Based on the design of each model of RealSense device, the different streams may be exposed via different distortion models.
rs_project_point_to_pixel(...)
and rs_deproject_pixel_to_point(...)
.rs_project_point_to_pixel(...)
. This model is used by the RealSense R200's color image stream.rs_deproject_pixel_to_point(...)
. This model is used by the RealSense F200 and SR300's depth and infrared image streams.Although it is inconvenient that projection and deprojection cannot always be applied to an image, the inconvenience is minimized by the fact that RealSense devices always support calling `rs_project_deprojection from depth images, and always support projection to color images. Therefore, it is always possible to map a depth image into a set of 3D points (a point cloud), and it is always possible to discover where a 3D object would appear on the color image.
The 3D coordinate systems of each stream may in general be distinct. For instance, it is common for depth to be generated from one or more infrared imagers, while the color stream is provided by a separate color imager. The relationship between the separate 3D coordinate systems of separate streams is described by their extrinsic parameters, contained in the rs_extrinsics
struct. The basic set of assumptions is described below:
translation
field contains the 3D translation between the imager's physical positions, specified in metersrotation
field contains a 3x3 orthonormal rotation matrix between the imager's physical orientationsKnowing the extrinsic parameters between two streams allows you to transform points from one coordinate space to another, which can be done by calling rs_transform_point_to_point(...)
. This operation is defined as a standard affine transformation using a 3x3 rotation matrix and a 3-component translation vector.
Extrinsic parameters can be retrieved via a call to rs_get_device_extrinsics(...)
between any two streams which are supported by the device. One does not need to enable any streams beforehand, the device extrinsics are assumed to be independent of the content of the streams' images and constant for a given device for the lifetime of the program.
As mentioned above, mapping from 2D pixel coordinates to 3D point coordinates via the rs_intrinsics
structure and the rs_deproject_pixel_to_point(...)
function requires knowledge of the depth of that pixel in meters. Certain pixel formats exposed by librealsense
contain per-pixel depth information, and can be immediately used with this function. Other images do not contain per-pixel depth information, and thus would typically be projected into instead of deprojected from.
RS_FORMAT_Z16
or rs::format::z16
rs_get_device_depth_scale(...)
. The following pseudocode shows how to retrieve the depth of a pixel in meters:const float scale = rs_get_device_depth_scale(dev, NULL);
const uint16_t * image = (const uint16_t *)rs_get_frame_data(dev, RS_STREAM_DEPTH, NULL);
float depth_in_meters = scale * image[pixel_index];
rs_set_device_option(...)
with RS_OPTION_R200_DEPTH_UNITS
, which specifies the number of micrometers per one increment of depth. 1000 would indicate millimeter scale, 10000 would indicate centimeter scale, while 31 would roughly approximate the F200's 1/32th of a millimeter scale.RS_FORMAT_DISPARITY16
or rs::format::disparity16
rs_get_device_depth_scale(...)
. The following pseudocode shows how to retrieve the depth of a pixel in meters:const float scale = rs_get_device_depth_scale(dev, NULL);
const uint16_t * image = (const uint16_t *)rs_get_frame_data(dev, RS_STREAM_DEPTH, NULL);
float depth_in_meters = scale / image[pixel_index];
RS_FORMAT_Z16
, a disparity value of zero is meaningful. A stereo match with zero disparity will occur for objects "at infinity", objects which are so far away that the parallax between the two imagers is negligible. By contrast, there is a maximum possible disparity. The R200 only matches up to 63 pixels of disparity in hardware, and even if a software stereo search were run on an image, you would never see a disparity greater than the total width of the stereo image. Therefore, when the device fails to find a stereo match for a given pixel, a value of 0xFFFF
will be stored in the depth image as a sentinel.rs_set_device_option(...)
with RS_OPTION_R200_DISPARITY_MULTIPLIER
. For instance, setting it to 100 would indicate that 100 units in the disparity map are equivalent to one pixel of disparity.It is not necessary to know what model of RealSense device is plugged in to successfully make use of the projection capabilities of librealsense
, developers can take advantage of certain known properties of given devices.
translation[1]
and translation[2]
are zero)y
component of pixel coordinates can be used interchangeably between these two streamslibrealsense
will pad the depth image or crop the infrared image if you request matching resolutionsx
and y
component of pixel coordinates can be mapped independently between depth/infrared and rectified color