The Chicken-and-Egg Problem

Here's a riddle: to build a map, you need to know where you are. To know where you are, you need a map. So if you have neither... where do you start?

That's the SLAM problem — Simultaneous Localization And Mapping. The robot must build a map of an unknown environment while simultaneously figuring out its position on that incomplete map. It's like drawing a floor plan of a dark building while blindfolded, using only your footsteps and a flashlight to sense nearby walls.

SLAM is considered one of the fundamental problems in mobile robotics, and solving it robustly is what separates toy robots from serious autonomous systems.

Why SLAM is Hard

Let's break down the difficulty:

Circular Dependency

You need your position to add sensor data to the map in the right place. You need the map to figure out your position. Every measurement depends on every previous measurement — errors compound.

Sensor Drift

Odometry (wheel encoders, IMU) accumulates error over time. Walk 100 meters counting steps, and you might be off by several meters. The map you're building is based on these drifting position estimates.

Data Association

When the robot sees a corner, is it a new corner to add to the map, or one it saw 30 seconds ago? Getting this wrong creates duplicate landmarks or destroys your map.

Scale

A building might have thousands of landmarks. Tracking the relationships between all of them requires sophisticated data structures.

Note

The breakthrough that made SLAM practical was realizing you don't need to maintain a perfect map constantly — you just need to track uncertainty and occasionally correct accumulated errors through "loop closure" (which we'll cover in the next lesson).

Two Main Approaches

SLAM algorithms fall into two broad categories:

1. Feature-Based SLAM

The robot extracts distinctive landmarks (corners, edges, specific objects) from sensor data and tracks them as it moves. The map is a list of landmark positions.

How it works:

Detect features in sensor data (camera: corners, edges; LiDAR: lines, planes)
Match features across consecutive observations to track them
Use feature positions to estimate robot motion
Refine both robot trajectory and feature positions together

Pros:

Compact maps (just landmark coordinates)
Fast for sparse environments
Good for visual SLAM with cameras

Cons:

Requires distinctive features (fails in featureless hallways)
Matching features across views is hard

Feature-Based SLAM Outline

2. Graph-Based SLAM

Instead of maintaining a single map, the algorithm builds a graph where nodes are robot poses at different times, and edges are spatial constraints (odometry measurements, landmark observations).

The magic happens during graph optimization — periodically, the algorithm adjusts all poses to make the constraints as consistent as possible. This is like adjusting a house of cards until all the pieces fit together without contradiction.

How it works:

Add a node for each robot pose as it moves
Add edges between consecutive poses (from odometry)
When recognizing a previous location, add a "loop closure" edge
Periodically optimize the graph to minimize constraint violations
Build the map from the optimized trajectory

Pros:

Handles large-scale environments
Naturally incorporates loop closure
Can produce occupancy grids or feature maps

Cons:

Periodic optimization can be slow
Requires good loop closure detection

Tip

Modern SLAM systems often combine both approaches — use features for tracking frame-to-frame, but maintain a pose graph for global optimization.

The Role of Filtering vs. Smoothing

There are two philosophies for handling SLAM uncertainty:

Filtering (EKF-SLAM, FastSLAM)

Maintain a probability distribution over the current robot pose and map. Update this distribution incrementally with each new measurement. Once a decision is made about past poses, it's locked in.

Fast per-update
Can't revise history
Errors accumulate

Smoothing (Graph-Based SLAM)

Maintain a history of poses and constraints. Periodically re-optimize the entire trajectory, revising past poses if new evidence (like loop closure) suggests they were wrong.

Slower periodic optimization
Can correct past errors
Better final map quality

Most modern SLAM systems use smoothing because loop closure (next lesson) is critical for long-term accuracy, and smoothing naturally incorporates it.

Visual SLAM vs. LiDAR SLAM

SLAM can use different sensors:

Sensor Type	Pros	Cons	Common Use
Camera	Rich features, cheap, passive	Struggles in darkness, affected by lighting	Drones, AR/VR headsets
LiDAR	Works in any lighting, accurate depth	Expensive, can't read text/color	Self-driving cars, indoor robots
Both (Fusion)	Best of both worlds	Complexity, sensor synchronization	High-end autonomous systems

Visual SLAM (using cameras) is called VSLAM and often uses feature matching (ORB-SLAM, PTAM). LiDAR SLAM often uses scan matching (ICP, Cartographer).

What's Next?

SLAM works surprisingly well for short-term mapping, but there's a catch: errors slowly accumulate as the robot explores. The solution is loop closure — recognizing when you've returned to a previously visited place. That's our next lesson, and it's the secret ingredient that makes long-term SLAM possible.

SLAM Explained