Learn Robotics
Module: See The World

Sensor Fusion

Why combining sensors is essential, the difference between early and late fusion, how Kalman filters work, and practical sensor fusion examples.

10 min read

Sensor Fusion

No single sensor tells the whole story. Cameras see rich detail but lack depth. LiDAR measures distance precisely but has no color. IMUs track motion but drift over time. Sensor fusion combines multiple sensors to get a better picture than any one sensor alone.

Why Combine Sensors?

Each sensor has strengths and weaknesses:

SensorStrengthsWeaknesses
CameraRich texture, color, high resolutionNo depth, fails in darkness
LiDARPrecise 3D geometry, works in darknessNo color/texture, expensive
IMUHigh-frequency motion updatesDrifts over time, no absolute position
GPSAbsolute global position1-5m error, doesn't work indoors
Wheel EncodersSmooth short-term motionAccumulates error (wheel slip, drift)

Fusion compensates for each sensor's weaknesses:

  • Camera + LiDAR: 3D bounding boxes with visual classification
  • IMU + GPS: Smooth trajectory with absolute position correction
  • Wheel encoders + IMU: Accurate short-term odometry, corrected by periodic landmarks
Note

Self-driving cars use camera + LiDAR + radar + GPS + IMU. Redundancy is critical — if LiDAR fails (heavy rain), cameras and radar keep the car safe. If GPS drops out (tunnel), IMU and odometry maintain position until GPS returns.

Early vs. Late Fusion

Two fundamental strategies for combining sensors:

Early Fusion (Sensor-Level)

Combine raw data before processing:

  • Align camera and depth images pixel-by-pixel
  • Merge LiDAR points with camera colors
  • Create a unified representation (e.g., colored point cloud)

Pros:

  • Maximum information preserved
  • More accurate when sensors are perfectly aligned

Cons:

  • Requires tight synchronization
  • Calibration is critical (misalignment ruins everything)
  • Computationally expensive
Early Fusion: RGB-D Point Cloud

Late Fusion (Decision-Level)

Process each sensor independently, then combine results:

  • Camera detects "person" at pixel (320, 240)
  • LiDAR measures distance 2.5m at that angle
  • Combine: "person at (x=2.5m, y=0, z=0)"

Pros:

  • Each sensor can run at its own rate
  • Easier to handle sensor failures (just drop that input)
  • More modular — swap sensors without rewriting the whole pipeline

Cons:

  • Information loss (each sensor's pipeline makes irreversible decisions)
  • Harder to resolve conflicts (camera says "person", LiDAR says "empty")
Late Fusion: Detection + Depth
Tip

For robotics, late fusion is more common — it's robust to sensor timing differences and easier to debug. Early fusion is used when you need the absolute best accuracy (e.g., 3D reconstruction, precision manipulation).

The Kalman Filter: Fusion in Motion

When tracking moving objects (or your own robot's position), you need to handle:

  • Noisy measurements — sensors aren't perfect
  • Predictions — where will the object be next?
  • Updates — how do we correct the prediction when new data arrives?

The Kalman filter is the standard solution. It's a recursive algorithm that:

  1. Predicts the next state using a motion model
  2. Updates the prediction using new sensor data
  3. Weights the prediction vs. measurement based on their uncertainties

Kalman Filter Example: Tracking a Ball

1D Kalman Filter (Position Tracking)

The beauty: if the sensor is noisy (high measurement_noise), the filter trusts the prediction more. If the model is uncertain (high process_noise), it trusts the sensor more. Optimal fusion automatically.

Practical Sensor Fusion Examples

1. Camera + LiDAR for Object Detection

  • Camera: Detect "car" with 2D bounding box
  • LiDAR: Measure point cloud in that bbox region
  • Fusion: Fit 3D bounding box to LiDAR points, label it "car"
  • Result: 3D position, orientation, and size of the car

2. GPS + IMU for Drone Localization

  • GPS: Noisy absolute position, 1 Hz
  • IMU: Clean acceleration/rotation, 100 Hz
  • Fusion (Kalman): Integrate IMU for smooth high-frequency position, correct with GPS every second
  • Result: 100 Hz position updates with GPS-level long-term accuracy

3. Stereo Camera + Wheel Odometry for Navigation

  • Wheel encoders: Fast, smooth motion estimates (but drift over time)
  • Stereo camera: Slow, precise position from visual landmarks
  • Fusion: Use odometry between landmark observations, reset drift when landmarks are detected
  • Result: Accurate position even during fast motion or wheel slip

What's Next?

You've now learned the fundamentals of robot perception — cameras, LiDAR, depth sensing, object detection, and sensor fusion. In the next module, we'll explore localization and mapping — how robots figure out where they are and build maps of their environment using these sensors.

Got questions? Join the community

Discuss this lesson, get help, and connect with other learners on Discord.

Join Discord

Discussion

Sign in to join the discussion.