Piecing It All Together
The Cacophony Project is an ambitious project with many moving parts. It can be difficult to understand what our technology does and how the various components of the project fit together.
Here's the high level parts of the project and how they relate to each other:
Let's look at these parts in more detail, following the general flow of data. Links to related technologies and source code repositories are provided for readers with an interest in software development.
Audio recorder
This is where the Cacophony Project began. The Cacophonometer is an Android application which runs on inexpensive mobile phones. When installed in the field, it makes regular, objective, GPS located, timestamped audio recordings. These recordings (which include bird-song) are uploaded to the Cacophony Project's cloud-based API server for storage and analysis.
Over time, the collected recordings can be used to determine bird population trends for a given location. Eventually we may also be able to automatically identify bird species in the recordings. Having a permanent, centralised and well organised set of recordings opens up the possibility for a wide range of analysis.
Thermal video platform
To accurately detect predators in New Zealand's natural places, we have developed a thermal video platform. It's a rugged embedded system consisting of a Raspberry Pi computer paired with a FLIR Lepton 3 thermal camera. Additional custom hardware provides audio output, power scheduling and watchdog capabilities. It can run on battery or mains power.
On-board software watches the thermal camera feed and uses a highly-tuned motion detection algorithm to detect warm moving objects (hopefully an animal). When an animal is detected, a recording is made which is then uploaded to the Cacophony Project API server for storage and analysis.
The thermal video feed is directly used to detect animals, making the camera many times more sensitive than typical trail cameras which rely on a passive infrared sensor to activate the camera. This makes it the perfect tool to observe small, fast-moving predators such as stoats which are otherwise difficult to detect.
Our thermal video platform is more than just a camera - because it's based around a powerful computer it's exceptionally flexible. For example, it is capable of playing audio on a schedule to lure predators and can also interface with external hardware to trigger traps.
Sidekick app
When installing one of our thermal video devices it's important to be able to see what the camera is looking at. We have developed the Sidekick application which allows users to wirelessly connect to a nearby thermal camera to view the live thermal video feed from it. This is essential to ensure the camera is aligned correctly.
The use of a wireless connection means there's no messing around with cables or SD cards in the rain, and no extra openings in the case which could let in moisture or dust.
To address situations where in mobile network coverage isn't available, Sidekick can also retrieve recordings from nearby thermal camera devices. A person with a Sidekick running on a mobile phone can simply walk near a thermal camera and retrieve recordings using Sidekick. These can be uploaded to the API server later when internet connectivity is available, ensuring that all recordings are kept in one central location for later retrieval.
API server
The API server is in many ways the heart of the Cacophony Project, providing a structured repository of recordings and metadata. As the central data store, most of the components of the Cacophony Project interact with the API server.
A conventional, well documented REST API design is used. The server is backend by a SQL database for metadata and an object store for the recordings themselves.
Machine learning pipeline
The use of machine learning to automatically identify predators is one of the novel and exciting aspects of the Cacophony Project. We have developed a software pipeline which takes the thermal video recordings collected from the field, processes them, and uses them to train and test a classifier model.
Each training run takes many hours and results in a TensorFlow model which can be used to identify predators in thermal video footage. This model is applied to thermal video recordings as they are uploaded to the API server, but will soon be run directly on our cameras to allow autonomous detection of predators.
Upload processing
Newly uploaded recordings are processed in order to convert them into formats that can be consumed by people and to extract interesting information from them.
For audio recordings, some basic file format conversion and signal processing is done. We'll soon also be doing bird-song analysis at this stage.
For thermal video recordings, our machine learning classifier is applied in order to identify and tag predators. This gives us feedback on how well the classifier is performing and the automated tags help us to quickly hone in on interesting recordings when we check in on recent activity.
Web Interface
We have developed a sophisticated web interface which can be used to quickly query recordings, play them back, add tags and comments, and download them. There's also comprehensive administrative features for controlling access to recordings. The web interface is one part of the system that the Cacophony Project team and end users interact with every day.
Management
Server metrics dashboard
In order to keep track of all of the project's components we make heavy use of SaltStack for managing configuration and remote command execution. We use a time series database to track the long term health of our systems and various dashboards so we can visualise what's going on.
What's next?
We hope that this article has provided some interesting background on what we've created so far. We firmly believe that a powerful and flexible system like this has the potential to turn the tide in the fight for New Zealand's native birds.
Look out for an upcoming article which explains our roadmap and likely changes the project's architecture.