Thermal Camera Reliability Revisited

A critical component of our Cacophonator devices is the FLIR Lepton 3 thermal camera. It's our eyes in the dark as we look for invasive predators. As discussed previously, reliably extracting video frames out of the camera has been a challenge. Our software would often "lose sync" with the camera, resulting in lost video data.

Here's an example of the result of loss of sync with the camera. Note how the possum appears to suddenly jump at one point.

We had found that keeping the Raspberry Pi's CPU clock frequency at its lowest rate (600 MHz) significantly reduced the loss of sync events. This approach wasn't perfect but the error rate was good enough and we've been running our Raspberry Pis like this for some time now.

We're planning to run our machine learning based classifier on the Cacophonator soon which means we need access to more processing power. This means allowing the Raspberry Pi to run at its higher clock rate (1.2GHz) and revisiting the camera reliability issue.

Data and observations were gathered and sent to the Raspberry Pi developers. We noticed that loss of synchronisation with the camera seemed more likely when our recorder software was in the process of generating a Cacophony Project Thermal Video (CPTV) file. Our initial theory was that writes to the Raspberry Pi's SD card were somehow interfering with the camera, but when we configured the recorder to write to a RAM disk, the problem persisted. The extra load of encoding compressed CPTV files seemed to be increasing the rate of sync issues. This was backed up by some of the suggestions we got back from the Raspberry Pi team.

Could we separate the recorder code that communicates with the camera from the other parts which handle motion detection and generating CPTV files? If these parts were in their own processes perhaps we could prevent the CPU load caused by generating recordings from interfering with the camera interface.

Oscilloscope output sent to the Raspberry Pi team

A prototype of this approach was quickly hacked together and the initial results looked promising. Loss of sync events still happened, but less often. Things got even better when we isolated the camera reading part to its own CPU core. More research and experimentation lead us to Linux's real-time process scheduling features. A real-time process gets priority over most other things running on the system and running the camera interface process with real-time priority gave a huge improvement in reliability. In fact, we found that running the camera interface process at real-time priority mostly removed the need to give it its own CPU core.

Where we're at now:

The camera is much more reliable. Camera synchronisation is still occasionally lost but this is now a relatively rare event.
Camera reliability is unaffected by other load on the system. Thermal video footage remains stable even with all 4 CPU cores running at 100%.
Camera reliability is unaffected by the processor running at its "turbo" frequency (1.2GHz). This allows us to run more demanding workloads on our Cacophonator devices.
Splitting out the camera reading code makes it easier for us to swap in different consumers of the thermal video frames. The camera interface process can remain as-is when we start running the machine learning classifier.

If you're interested in seeing what we've done, the changes can be found in the thermal-recorder repository on Github. The camera interface process can be found in the cmd/leptond directory. The recorder component is at cmd/thermal-recorder.

Talk at the NZ AI-Day Conference

Robustness and Waterproofing Work to Make Camera More Reliable