Innovation

Now, these robots can listen like humans

Researchers are working to bring sense of acoustic interpretation to robots.

Imagine sitting in a dark movie theater, wondering how much soda is left in your oversized cup. Instead of taking off the lid to check, you give it a little shake, listening for the rattle of ice cubes to estimate whether you’ll need a refill. As you set the cup down, your mind wanders to the armrest. A few taps and the hollow sound convinces you it’s plastic, not wood.

This effortless ability to interpret the world through sound is second nature to us. Now, researchers are working to bring this sense of acoustic interpretation to robots, adding to their growing arsenal of sensory capabilities.

At the upcoming Conference on Robot Learning (CoRL 2024) in Munich, Germany, a team from Duke University will present new research on a system called SonicSense, which allows robots to perceive their environment through sound in ways previously limited to humans.

“Robots today primarily rely on vision to understand the world,” said Jiaxun Liu, lead author of the study and a first-year Ph.D. student in Boyuan Chen’s lab at Duke’s Department of Mechanical Engineering. “We wanted to create a solution that could handle the complexity of everyday objects, giving robots a richer ability to ‘feel’ and interpret their surroundings.”

SonicSense is equipped with a robotic hand featuring four fingers, each embedded with a contact microphone in the fingertip. These sensors detect vibrations generated when the robot taps, grasps, or shakes an object, filtering out ambient noise to focus on the object’s acoustic signals.

By analyzing these vibrations, SonicSense can determine an object’s material and 3D shape. Using its AI-powered system, it can identify familiar objects in as few as four interactions. For unfamiliar ones, it may take up to 20 interactions to make a conclusion.

“SonicSense gives robots a new way to hear and feel, similar to humans, which could revolutionize how they interact with objects,” explained Boyuan Chen. “While vision is important, sound offers additional information that can reveal details invisible to the eye.”

In their research, Chen’s team demonstrated SonicSense’s impressive abilities. For example, by shaking a box of dice, the system can count the dice and determine their shape. When shaking a bottle of water, it can accurately measure the liquid inside. By tapping around an object, SonicSense can construct a 3D model of its shape and determine its material—just like a person feeling their way around an object in the dark.

Although other systems have attempted similar approaches, SonicSense goes beyond previous efforts by using four fingers instead of one, touch-based microphones to filter out noise, and advanced AI algorithms. This combination allows it to identify objects with complex geometries, mixed materials, and even reflective or transparent surfaces—challenges that typically stump vision-based systems.

  • Press release from Duke University,