Decoding the Voices of Nature: Inside the Earth Species Project

Today, we’re taking a deep dive into the Earth Species Project (ESP)—a trailblazing non-profit harnessing artificial intelligence to decode animal communication. From the haunting codas of sperm whales to the whispered dialects of crows, ESP is unraveling the complex “languages” that animals use, revealing intelligences we never fully appreciated. With 2025 marking a banner year of advancements, including a major workshop at NeurIPS and fresh open-source releases, this field is evolving faster than ever.
In this post, I’ll unpack ESP’s mission, its evolutionary history, the sophisticated AI technologies powering their work, ongoing experiments with a spotlight on 2025 highlights, key achievements, and the profound implications for conservation and ethics. Drawing from their 2024 annual report (reflecting forward to 2025), recent GitHub updates, and community calls, we’ll go beyond the headlines to understand how these tools are creating a “virtuous cycle” of discovery. Let’s embark on this journey—consider it your insider’s map to interspecies AI.

What is the Earth Species Project?

At its heart, the Earth Species Project is a non-profit research lab and impact organization dedicated to decoding non-human communication through AI.

Launched in 2017, ESP envisions a world where understanding animal “languages” fosters deeper relationships with nature, ultimately safeguarding biodiversity amid escalating threats like climate change and habitat destruction.

They approach animal vocalizations not as mere signals but as intricate systems—complete with syntax, semantics, emotion, and cultural variation—using machine learning to translate them into insights humans can grasp.

ESP’s global, fully remote team blends AI researchers, ethologists, engineers, and conservationists, emphasizing diversity and ethical innovation.030f13 Their board boasts luminaries like Christiana Figueres (former UN climate chief), Brewster Kahle (Internet Archive founder), and Kay Firth-Butterfield (AI governance expert), ensuring a holistic perspective.b0ae4b Core values—such as “collective foresight,” “moral courage,” and “playful curiosity”—drive decisions, prioritizing open science and tools that empower researchers worldwide without exploiting wildlife.

A Brief History: From Visionary Spark to AI Frontier

ESP emerged from the founders’ conviction that AI’s rapid progress in the late 2010s could revolutionize bioacoustics.

Britt Selvitelle (Twitter co-founder) and Aza Raskin (Mozilla Labs co-founder and Center for Humane Technology advocate) kicked things off, later joined by Katie Zacarian—a conservationist, AI leader, and underwater photographer—who became CEO in early seed funding from donors like Reid Hoffman and the Waverley Street Foundation fueled initial experiments in pattern recognition for animal sounds

.
The 2020s accelerated: 2022-2023 saw the launch of benchmarks like BEANS for evaluating bioacoustic models.88bb22 2024 was transformative, with the release of NatureLM-audio and milestones detailed in their annual report—laying groundwork for “bold strides” in 2025.af18a8 Heading into this year, ESP raised $17M in grants to expand their AI capabilities, focusing on cultural shifts toward interspecies empathy.6363ad Leadership remains stable, with Zacarian at the helm, though the organization continues to evolve through community input and new hires in communications and research.

Core Technologies: The AI Arsenal Unlocking Animal Tongues

ESP’s “AI-first” strategy creates multimodal models that integrate audio, video, and behavioral data, forming a feedback loop where discoveries refine algorithms.f21d44 Here’s a closer look at their toolkit, emphasizing 2025 refinements:
NatureLM-audio: This groundbreaking large audio-language model, open-sourced in 2024 and iterated in 2025, is tailored for bioacoustics.

Trained on millions of hours from Xeno-Canto, iNaturalist, and human speech datasets, it excels in zero-shot learning: identifying unseen species, inferring emotions (e.g., aggressive vs. affiliative tones), counting callers in choruses, and synthesizing calls.bda5e5 A standout feature is “domain transfer”—leveraging human language patterns to achieve 20% accuracy in naming novel species, hinting at universal linguistic structures.bc2475 In 2025, an interactive Hugging Face demo invites global tinkering, accelerating ethological applications.a1fecd
AVES and BirdAVES: Self-supervised encoders for vocalizations; BirdAVES, updated in 2025, boosts bird-task performance by over 20% via specialized fine-tuning.6afaae851572 These models thrive on noisy, real-world data, generalizing across taxa.
Specialized Tools:
Voxaboxen: An annotation platform for precise vocalization labeling, updated on GitHub in 2025 for collaborative use.

Biodenoising: AI-driven noise removal for field recordings, essential for decoding faint signals in wild environments.0fc936
BEANS and BEANS-ZERO: Benchmarks for classification and zero-shot tasks, now including 2025 extensions for multimodal evaluation.618146
BEBE: Integrates bio-logger movement data with audio, updated in 2025 for holistic behavior analysis.51bacf
All are open-source on GitHub, democratizing access and inviting contributions.

Ongoing Projects and 2025 Experiments
ESP’s “decode experiments” target species-specific questions, partnering with biologists for fieldwork. 2025 emphasizes playback studies—testing synthetic calls on live animals—and large-scale data synthesis.
Crow Dialects: NatureLM-audio reveals “family dialects” in Hawaiian crows, where captives maintain tool-use-linked vocal traditions; 2025 reintroduction efforts use this for monitoring.63fd12f12119
Whale and Dolphin Codas: Collaborating with Project CETI, ESP decodes click sequences with rhythmic “vowels,” experimenting with synthetic playbacks on humpback whales and zebra finches to test responses.
Frog Calls: While not a direct ESP project, their models align with Australia’s FrogID initiative, where citizen scientists logged 34,000+ calls in 2024; 2025’s FrogID Week (Nov 7-16) could integrate NatureLM for automated verification, saving hundreds of hours on species ID in noisy habitats.
A 2025 highlight: ESP’s NeurIPS workshop on AI for Animal Communication calls for papers, fostering cross-disciplinary breakthroughs.49363f411433
Key Achievements: Milestones That Echo Across Species
ESP’s impact is tangible: 2024’s NatureLM release set state-of-the-art benchmarks, while 2025 GitHub activity (e.g., biodcase_2025_task1) advances public datasets.

Publications in Science and Nature, media spots (e.g., Joe Rogan #2076), and tools like BirdAVES have boosted bioacoustics by 20%+ in accuracy.01623d1bb557 Their Discord community, active since 2021, now drives collaborative decode projects.

Implications: Conservation, Ethics, and a Multispecies Future
By decoding emotions in calls (e.g., crow “mourning” or whale stress), ESP aids conservation—rerouting ships via distress alerts or tracking dialects for population health.

Ethically, they advocate for “animal privacy” and bias-free models, influencing global AI guidelines.

Looking ahead, 2025’s momentum could enable real-time “translators,” redefining human-nature bonds.
What animal voice intrigues you most? Share in the comments, and subscribe for more deep dives—like my upcoming series on cetacean AI. Until next time, keep listening to the wild! 🐋🔊
Sources and further reading: ESP’s site, GitHub repos, and NeurIPS announcements.

AIComplianceCore

recent posts

about