Computer Vision

Camera & LiDAR Sensor Fusion Key Concept: Intrinsic Parameters

Into the world of sensor fusion annotation, and how intrinsic matrices help AI systems understand spatial data by combining 2D images and 3D

min

BasicAI R&D Team

1. A Glimpse into the World of 3D Point Clouds

🤔 Imagine you're struck by a strange affliction that makes everything you see a blurry, indistinguishable blob of the same color 👨🦯. You attempt to embrace your loved one 💑, only to collide headfirst with a pillar 🤕. The next time, you cleverly avoid the pillar 😏, but it turns out that was your significant other all along 🤦.

An ambiguous point cloud image shared on the Nvidia forum. Can you make out any details?

This is the issue with 3D point clouds—they provide accurate location information but lack semantic data, making it difficult for AI to distinguish between different objects 😵💫. Just look at the point cloud image above; it resembles a virtual digital space from an '80s cyberpunk movie 👨💻.

Naturally, we imagine that it would be great to combine the location data in 3D space with the visual information from 2D images. And so, the concept of 2D/3D fusion annotation was born 📽️.

2. The Principles of Fusion Annotation

Why is it called "fusion" annotation? Can't we just annotate both the 3D point cloud and 2D image separately? Well, that would be inefficient! 🙅

Since cameras can map points from 3D space onto 2D images, we can simulate the "photo-taking" process after annotating a 3D point cloud. This way, the 3D annotation is "captured" in the 2D image without the need for further annotation 📸.

The principle behind camera imaging is the pinhole camera model 🕯️, which you might recall from middle school physics.

Now consider this: everyone perceives the world with themselves as the origin. Since 3D point clouds are generated by LiDAR sensors, their coordinate systems originate in the "eye" of the LiDAR (i.e., LiDAR's origin). Meanwhile, cameras use their own coordinate systems (with the camera's origin), which change based on the camera's position and angle. 🌏

We'll differentiate these coordinate systems as the point cloud coordinate system and the camera coordinates system. 👐

In short, mapping a point from a 3D point cloud to a 2D image involves two steps: first, transforming the point's coordinates from the point cloud coordinate system (3D) to the camera coordinate system (3D); then, simulating the photo-taking process to convert the point to the image coordinate system (2D). 🧐

The complete mapping process: the yellow line represents the transformation from point cloud to camera coordinate system, and the red line represents the transformation from camera to image coordinate system.

Linear algebra teaches us that matrix operations correspond to spatial transformations 📐. Essentially, changing coordinate systems is just a matter of matrix manipulation 😌. Two crucial matrices are the intrinsic matrix (from camera coordinate system to image coordinate system) and the extrinsic matrix (from point cloud coordinate system to camera coordinate system) 👨🏫.

3. Intrinsic Matrix

https://docs.opencv.org/master/pinhole_camera_model.png The pinhole camera model diagram from the OpenCV documentation (for convenience, the image plane is symmetrical to the front of the camera) — The pinhole camera model diagram from the OpenCV documentation (for convenience, the image plane is symmetrical to the front of the camera)

For now, let's ignore the extrinsic parameters and consider the coordinate system where the large blue arrow is located as the camera coordinate system. The coordinates of the tip of the arrow are represented as (Xc, Yc, Zc) ✍️. The transformation from the camera coordinate system to the image coordinate system is not accomplished in one step. Below, we'll break it down into multiple steps and explain it in detail 👨🏫.

3.1 First Spatial Transformation: Linear Transformation

The first spatial transformation is the scaling transformation from the camera coordinate system to the blue coordinate system in the image, which is a linear transformation. Since many materials do not specifically emphasize the blue coordinate system, we'll name it the "intermediate coordinate system" here. According to similar triangles (the triangle formed by the person's head, feet, and camera focal point Fc), the coordinates of the small blue arrow can be obtained ✍️:

Here, f is the focal length, which is the distance from the origin of the camera coordinate system (the Fc in the image) to the image plane, in millimeters ✍️.

At this time, the intermediate coordinate system is not the image coordinate system. The reason is twofold: first, the image's length unit should be pixels rather than millimeters, so a unit conversion is still needed; second, the origin of the intermediate coordinate system is the same as the camera coordinate system, which is the camera focal point, while the origin of the image coordinate system is in the upper left corner of the image, so a translation transformation is also needed ✍️.

The blue coordinate system is the current x, y coordinate system, and the brown coordinate system is the actual image coordinate system.

3.2 Unit Conversion

To convert units, the current coordinates should be multiplied or divided by a factor ✍️. In the x and y directions, the unit conversion factors are dx and dy, which represent how many millimeters per pixel on the sensor, i.e., millimeters/pixel ✍️. At this point, the coordinates have the same unit as the image coordinate system, so we'll change the coordinate letters to u' and v'. ✍️

Generally, the focal length and conversion factors are combined and simplified as:

3.3 Second Spatial Transformation: Translation Transformation

The second spatial transformation is a translation transformation ✍️, which is a non-linear transformation. After adding a constant to the current coordinates for translation, the final u and v are obtained as ✍️:

Note: Sometimes, fx, fy, cx, and cy are also represented as fu, fv, cu, and cv.

3.4 Overall Transformation: Affine Transformation

The above two spatial transformations are equivalent to a single affine transformation, and low-dimensional affine transformations can be achieved through linear transformations in a higher dimension ✍️. First, let's represent the complete affine transformation ✍️:

Increase the dimension! Convert to a linear transformation ✍️:

Multiply both sides of the equation by zc ✍️:

If we carefully observe the matrix multiplication equation above, we will find that it completely conforms to our intuitive understanding of the photographic process 😲. The red diagonal matrix corresponds to the scaling of space, that is, the object is scaled to fit into the photo; the blue column corresponds to the translation of the coordinate system origin from the center of the camera coordinate system to the upper left corner of the image; finally, by taking only the first two coordinates (u, v) from the three coordinates (u, v, 1), it corresponds to the dimension reduction from 3D to 2D 👏.

In the end, we find that by multiplying the coordinates in the camera coordinate system by a 3x3 matrix, and then dividing by zc, we can obtain the coordinates u and v of the point in the image coordinate system 🎉.

This matrix is called the intrinsic matrix 🙆.

The name of this matrix reflects both the spatial transformation of the coordinates from the outside of the camera to the inside and the fact that these parameters are determined by the internal factors of the camera 📷.

Although the derivation process above is quite complex, there is no need to worry about intermediate variables when actually using it; just input the final fx, fy, cx, and cy 👌.

Explore 2D & 3D sensor fusion annotation on BasicAI Cloud*, the free AI-powered data annotation platform:

Get Started

* To further enhance data security, we discontinue the Cloud version of our data annotation platform since 31st October 2024. Please contact us for a customized private deployment plan that meets your data annotation goals while prioritizing data security.

Back to All Posts

Get Essential Training Data
for Your AI Model Today.

Let's Talk

AI Training Data Solutions & Services

Overview of BasicAI’s professional, efficient and low-cost data annotation services for all types of training data and all industries.

Contact BasicAI to get project estimates and free pilot for your customized data labeling project.

End-to-end image/video annotation services for robust computer vision.

Leading 3D Sensor Fusion annotation services for autonomous systems.

Data labeling services for large language model and Gen AI training.

Get Project Estimates

BasicAI Data Annotation Platform

Overview of BasicAI’s all-in-one smart data annotation platform.

Explore the AI-powered labeling toolset for all types of AI training data.

See how BasicAI facilitates collaborative annotation project.

Learn about annotation tools designed for SFT, RLHF and classification tasks.

Tools for auto point cloud data labeling and semantic segmentation.

Choose the right plan for your teams, no matter how small or large.

Industries & Use Cases

Proprietary Data Engine
Prompt Delivery
Full Quality Assurance

Competitive Pricing
Dedicated Project Manager
Robust Data Security

Free Pilot Project

Blog

Resources

Open Source

Platform

Camera & LiDAR Sensor Fusion Key Concept: Intrinsic Parameters

1. A Glimpse into the World of 3D Point Clouds

2. The Principles of Fusion Annotation

3. Intrinsic Matrix

3.1 First Spatial Transformation: Linear Transformation

3.2 Unit Conversion

3.3 Second Spatial Transformation: Translation Transformation

3.4 Overall Transformation: Affine Transformation

This matrix is called the intrinsic matrix 🙆.

Explore 2D & 3D sensor fusion annotation on BasicAI Cloud*, the free AI-powered data annotation platform:

Get Essential Training Data
for Your AI Model Today.

AI Training Data Solutions & Services

Overview of BasicAI’s professional, efficient and low-cost data annotation services for all types of training data and all industries.

Contact BasicAI to get project estimates and free pilot for your customized data labeling project.

End-to-end image/video annotation services for robust computer vision.

Leading 3D Sensor Fusion annotation services for autonomous systems.

Data labeling services for large language model and Gen AI training.

Get Project Estimates

BasicAI Data Annotation Platform

Overview of BasicAI’s all-in-one smart data annotation platform.

Explore the AI-powered labeling toolset for all types of AI training data.

See how BasicAI facilitates collaborative annotation project.

Learn about annotation tools designed for SFT, RLHF and classification tasks.

Tools for auto point cloud data labeling and semantic segmentation.

Choose the right plan for your teams, no matter how small or large.

Industries & Use Cases

Proprietary Data Engine Prompt Delivery Full Quality Assurance

Competitive Pricing Dedicated Project Manager ​Robust Data Security

Free Pilot Project

Blog

Resources

Open Source

Platform

Camera & LiDAR Sensor Fusion Key Concept: Intrinsic Parameters

1. A Glimpse into the World of 3D Point Clouds

2. The Principles of Fusion Annotation

3. Intrinsic Matrix

3.1 First Spatial Transformation: Linear Transformation

3.2 Unit Conversion

3.3 Second Spatial Transformation: Translation Transformation

3.4 Overall Transformation: Affine Transformation

This matrix is called the intrinsic matrix 🙆.

Explore 2D & 3D sensor fusion annotation on BasicAI Cloud*, the free AI-powered data annotation platform:

Get Essential Training Data for Your AI Model Today.

Proprietary Data Engine
Prompt Delivery
Full Quality Assurance

Competitive Pricing
Dedicated Project Manager
Robust Data Security

Get Essential Training Data
for Your AI Model Today.