In this entry we are very pleased to welcome back Tim Hunt from the Centre for Information Technology at Wintec. He has an update for us on his work in automatically detecting Morepork (Ruru) from audio recordings made by our bird monitors.
In a previous blog (https://www.2040.co.nz/blogs/news/first-morepork-automatically-identified), you may have read about how I started down the road of automatically detecting morepork calls. Since then I’ve made some further progress and thought I’d share the journey so far with you.
Here is an initial look at the numbers of morepork calls that we are starting to see automatically using the model that has been developed. Each colour is for a different recorder/location:
To see if we have found any moreporks on a particular recorder, you can scroll down through your recordings (see pic below) and see the red markers in the Tags column.
Then click on the ID and you will see the tags listed as below. Note: currently recordings are only processed with the model on an ad-hoc basis.
Although I’ve been involved with IT for many years, and keep hearing about the wonders of machine learning, I hadn’t personally created a machine learning application. It is a big and daunting area of knowledge and expertise that I just didn’t have. So, as you do, I started watching youtube videos and found some great tutorials on using TensorfFow (https://www.tensorflow.org/) the seemingly go to technique for machine learning. However, I was getting a nagging feeling that I wasn’t really understanding what was going on and so was keeping an open mind about alternative ways of doing machine learning. I had also heard about a tool called Weka (https://www.cs.waikato.ac.nz/ml/weka/) which happens to have been created at The University of Waikato, right here in my hometown of Hamilton. I thought I should really have a look to see what it was capable of.
Weka is a ‘workbench for machine learning’ and comes with a huge amount of helpful resources including a book that has been used my many people who are not necessary maths wizzes but wish to understand and implement machine learning solutions – it even has a picture of an owl ’hiding’ in a tree, on the front cover – a good omen! I decided to take a closer look.
Machine learning basics – a simple understanding
Through watching the online tutorials (https://www.youtube.com/user/WekaMOOC) and reading the book I slowly increased my knowledge of machine learning and how to use the Weka workbench. From the perspective of being able to recognize morepork calls, what we need to do is:
1) Find examples of morepork calls (see that previous blog),
2) Extract ‘features’ from the audio of each example (I’ll come back to this later),
3) Give the (feature) examples to the machine learning algorithm (I just tried a few different algorithms available in Weka),
4) 'Ask’ the algorithm to adjust its internal parameters (that’s a bit of a black box) to find a set of parameters that work on deciding (classifying) if the sounds are morepork or not – this is known as learning – and the result is some code that we call a model,
5) Evaluate the model to see how well it works on sounds it hasn’t yet heard.
I kept going around in circles on how to do this. Weka requires the features to be input as a sequence of numbers or categories and I kept getting conceptually stuck on how to represent an audio signal in this format. Here is what a typical recording looks like when you view how the volume changes with time. Can you see the morepork? Unlikely!
There are various published techniques and code for extracting features from audio but again I was concerned about losing information about how the sound changed with time and how to deal with the calls starting at different times. To cut a long story short, I ended up using a common technique of representing the sound as a spectrogram which indicates how the frequencies change with time. Can you see the morepork in this next image?
Maybe, but which of those blobs are morepork? Well as we know that morepork calls are in the frequency range of 700-1000Hz, let's look there.
There we go – what you see is about 19 morepork calls in the minute recording.
To extract features, I ‘cut-out’ each of the example calls from that image (and many others) and feed them into Weka which can automatically create features from the images. Don’t worry, I didn’t cut them out by hand, but instead I used some modified code written by Chris Blackbourn that finds locations in an audio file where the volume increases. It is worth noting that even if there are very loud background noises in the recording, as long as the noises are not in the 700 – 1000 Hz frequency range, they will not interfere with the image of a very quiet morepork call.
I’ve heard that before?
Creating a model turned out to be a very iterative process. As I said before, once you have a model, you need to confirm that it is working – this can only be done by listening to the recordings and comparing what the model predicted with what it actually appears to be. I’m no bird expert, far from it, which is partly why I choose the classic sounding more-----pork call to start with. It soon became obvious that I found myself listening to the same recordings over and over again for each new model iteration. It was time for a database and some software to keep track of everything. I was also getting suspicious that the model might work better if all the non-morepork calls were actually categorized correctly rather than just as ‘not-morepork’ as I had been doing. Here is one of the interfaces from the Audio Manager application that I've developed to help with this process – you can get the code at https://github.com/TheCacophonyProject/audio_manager
For each audio clip that is fed into the model, it returns a predicted classification along with how sure (certain) it is of its prediction. The less certain the model is, the more likely it turns out to be wrong: that is, the model says it is a morepork when it isn’t. The current model is currently far from perfect - however, it is giving enough information to be useful. In the following graph, the bottom blue dashed line shows that as the model certainty reduces (left to right) the number of errors increases.
The top solid red line shows that the number of moreporks that we actually find, increases when we accept a higher number of errors. There is an obvious trade-off between locating all the morepork calls and having too many errors – known as false positives. The model is currently set to only ‘tag’ moreporks on the Cacophony interface if the model has a certainty of greater than 0.5 (50%).
- Currently not all the morepork calls in a recording are located for the model to analyze so the detection of these places of interest can be improved.
- The model is currently using the location of the Bird Monitor recorder as one of the input features to the model. This means that new Bird Monitors and those with few moreporks present may not perform as well as locations with many examples.
- There will certainly be better methods of creating the features – maybe I (or you) could use TensorFlow to create these features.
Many thanks to Tim for this article. Tim (and everyone at Cacophony) would love to hear any feedback you have.
You can leave a comment below and/or you can email us at firstname.lastname@example.org.