Accelerating Tagging Using Machine Learning Classification
We now have many thermal cameras deployed at various locations collecting recordings every night. This means we have a lot of thermal video footage to manually tag every day so that they can be used to improve our machine learning classifier. As I write this, we have collected almost 30,000 thermal video recordings.
Our recordings include many false positives - where a non-animal object triggered our camera's motion detector. We also have many, many recordings of birds - particularly at dawn and dusk. Filtering through all the false positives and bird footage is very time consuming.
Given that our classifier is now already quite capable - especially when it comes to identifying birds and false positives - we're now starting to use it to streamline our manual tagging efforts. We're effectively using the classifier to help to improve the classifier. All recordings uploaded to the Cacophony Project API server are now run through the classifier. If this generates a clear classification, the recording is tagged as such. If the classifier was unsure about the recording, it's tagged as "unidentified", so that we can inspect it later, determine why the classifier had trouble and hopefully improve the classifier.
To avoid undesirable training outcomes caused by feeding automated tags back into the classifier training process, it's important that automatically generated tags are separate from human generated tags. Our API and database has been extended so that automatic tags are clearly identified. We want automatic classifications to be used only to aid humans doing manual tagging.
We're now working on improvements to our user interface so that we can take better advantage of the classifier generated tags. Being able to easily skip over false positives and bird footage will make our tagging efforts much more effective. We'll still need to check some recordings that the classifier tagged as a false positive or bird, but not every one. Random sampling to check that the classifier continues to make sensible choices should be sufficient. This will take far less time than having to watch through every single recording.