Sensor Fusion

No single sensor tells the whole story. Cameras see rich detail but lack depth. LiDAR measures distance precisely but has no color. IMUs track motion but drift over time. Sensor fusion combines multiple sensors to get a better picture than any one sensor alone.

Why Combine Sensors?

Each sensor has strengths and weaknesses:

Sensor	Strengths	Weaknesses
Camera	Rich texture, color, high resolution	No depth, fails in darkness
LiDAR	Precise 3D geometry, works in darkness	No color/texture, expensive
IMU	High-frequency motion updates	Drifts over time, no absolute position
GPS	Absolute global position	1-5m error, doesn't work indoors
Wheel Encoders	Smooth short-term motion	Accumulates error (wheel slip, drift)

Fusion compensates for each sensor's weaknesses:

Camera + LiDAR: 3D bounding boxes with visual classification
IMU + GPS: Smooth trajectory with absolute position correction
Wheel encoders + IMU: Accurate short-term odometry, corrected by periodic landmarks

Note

Self-driving cars use camera + LiDAR + radar + GPS + IMU. Redundancy is critical — if LiDAR fails (heavy rain), cameras and radar keep the car safe. If GPS drops out (tunnel), IMU and odometry maintain position until GPS returns.

Early vs. Late Fusion

Two fundamental strategies for combining sensors:

Early Fusion (Sensor-Level)

Combine raw data before processing:

Align camera and depth images pixel-by-pixel
Merge LiDAR points with camera colors
Create a unified representation (e.g., colored point cloud)

Pros:

Maximum information preserved
More accurate when sensors are perfectly aligned

Cons:

Requires tight synchronization
Calibration is critical (misalignment ruins everything)
Computationally expensive

Early Fusion: RGB-D Point Cloud

Late Fusion (Decision-Level)

Process each sensor independently, then combine results:

Camera detects "person" at pixel (320, 240)
LiDAR measures distance 2.5m at that angle
Combine: "person at (x=2.5m, y=0, z=0)"

Pros:

Each sensor can run at its own rate
Easier to handle sensor failures (just drop that input)
More modular — swap sensors without rewriting the whole pipeline

Cons:

Information loss (each sensor's pipeline makes irreversible decisions)
Harder to resolve conflicts (camera says "person", LiDAR says "empty")

Late Fusion: Detection + Depth

Tip

For robotics, late fusion is more common — it's robust to sensor timing differences and easier to debug. Early fusion is used when you need the absolute best accuracy (e.g., 3D reconstruction, precision manipulation).

The Kalman Filter: Fusion in Motion

When tracking moving objects (or your own robot's position), you need to handle:

Noisy measurements — sensors aren't perfect
Predictions — where will the object be next?
Updates — how do we correct the prediction when new data arrives?

The Kalman filter is the standard solution. It's a recursive algorithm that:

Predicts the next state using a motion model
Updates the prediction using new sensor data
Weights the prediction vs. measurement based on their uncertainties

Kalman Filter Example: Tracking a Ball

1D Kalman Filter (Position Tracking)

The beauty: if the sensor is noisy (high measurement_noise), the filter trusts the prediction more. If the model is uncertain (high process_noise), it trusts the sensor more. Optimal fusion automatically.

Practical Sensor Fusion Examples

1. Camera + LiDAR for Object Detection

Camera: Detect "car" with 2D bounding box
LiDAR: Measure point cloud in that bbox region
Fusion: Fit 3D bounding box to LiDAR points, label it "car"
Result: 3D position, orientation, and size of the car

2. GPS + IMU for Drone Localization

GPS: Noisy absolute position, 1 Hz
IMU: Clean acceleration/rotation, 100 Hz
Fusion (Kalman): Integrate IMU for smooth high-frequency position, correct with GPS every second
Result: 100 Hz position updates with GPS-level long-term accuracy

3. Stereo Camera + Wheel Odometry for Navigation

Wheel encoders: Fast, smooth motion estimates (but drift over time)
Stereo camera: Slow, precise position from visual landmarks
Fusion: Use odometry between landmark observations, reset drift when landmarks are detected
Result: Accurate position even during fast motion or wheel slip

What's Next?

You've now learned the fundamentals of robot perception — cameras, LiDAR, depth sensing, object detection, and sensor fusion. In the next module, we'll explore localization and mapping — how robots figure out where they are and build maps of their environment using these sensors.

Sensor Fusion

Sensor Fusion

Why Combine Sensors?

Early vs. Late Fusion

Early Fusion (Sensor-Level)

Late Fusion (Decision-Level)

The Kalman Filter: Fusion in Motion

Kalman Filter Example: Tracking a Ball

Practical Sensor Fusion Examples

1. Camera + LiDAR for Object Detection

2. GPS + IMU for Drone Localization

3. Stereo Camera + Wheel Odometry for Navigation

What's Next?

Discussion