This page is an on-line demo of our recent research results on audio captioning.
Full presentation of results and method is in our paper entitled "WaveTransformer: A novel architecture for audio captioning", available from here, and submitted for review to the 46th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021.
Below you can find three columns. In each column you can see an audio player with two catefories of textual descriptions (captions) beneath it. The captions at the two categories correspond to the sound that you can hear from the audio player and are:
Columns correspond to categorization of the predicted captions according to the employed metrics.
Our method was tested using Clotho evaluation split, consisting of 1045 audio files and their associated captions. The result metrics for our method are: