API reference

The specifications of the Cochlear.ai Sense models as well as their output examples are described below. In the output examples, we assume that the input is an audio file with a length of 3 seconds or an audio stream of 1 second.

speech_detector

Input type:audio file
Prediction unit:
 1 second
Inter-prediction duration:
 0.5 seconds
Sample-rate:16000Hz (recommended) or higher
Output examples:
 
{"result": [{"speech": [0.972, 0.995, 1.0, 0.994, 0.992]}]}

music_detector

Input type:audio file
Prediction unit:
 1 second
Inter-prediction duration:
 0.5 seconds
Sample-rate:16000Hz (recommended) or higher
Output examples:
 
{"result": [{"music": [0.602, 0.789, 0.515, 0.866, 1.0]}]}

age_gender

Input type:audio file
Prediction unit:
 1 second
Inter-prediction duration:
 0.5 seconds
Sample-rate:16000Hz (recommended) or higher
Output examples:
 
{"result": [{"age/gender": "child", "probability": [0.173, 0.202, 0.336, 0.775, 0.997]},
            {"age/gender": "male", "probability": [0.654, 0.461, 0.125, 0.051, 0.001]},
            {"age/gender": "female", "probability": [0.173, 0.336, 0.539, 0.174, 0.002]}]}

music_genre

Input type:audio file
Prediction unit:
 Entire audio
Inter-prediction duration:
 N/A
Sample-rate:22050Hz (recommended) or higher
Output examples:
 
{"result": [{"genre": ["Alternative", "Dance"], "probability": [0.443, 0.411]}]}

music_mood

Input type:audio file
Prediction unit:
 Entire audio
Inter-prediction duration:
 N/A
Sample-rate:22050Hz (recommended) or higher
Output examples:
 
{"result": [{"arousal": [0.536], "valence": [0.029]}]}

music_tempo

Input type:audio file

Note that the outputs denote the top-two tempo candidates in bpm and their corresponding probabilities.

Prediction unit:
 Entire audio
Inter-prediction duration:
 N/A
Sample-rate:22050Hz or higher
Output examples:
 
{"result": [{"tempo": [72.0, 36.0], "probability": [0.881, 0.119]}]}

music_key

Note that the output denotes the top-one key candidate and its corresponding probability.

Input type:audio file
Prediction unit:
 Entire audio
Inter-prediction duration:
 N/A
Sample-rate:22050Hz or higher
Output examples:
 
{"result": [{"key": ["Gb"], "probability": [0.752]}]}

event

Input type:audio file
Prediction unit:
 1 second
Inter-prediction duration:
 0.5 seconds
Sample-rate:22050Hz (recommended) or higher
Output examples:
 
{"result": [{"event": "babycry", "probability": [0.999, 1.0, 0.531, 0.091, 0.486]}]}

speech_detector_stream

Prediction unit:
 1 second
Inter-prediction duration:
 0.5 seconds
Sample-rate:16000Hz
Output examples:
 
{"result": [{"speech": [0.972]}]}

music_detector_stream

Prediction unit:
 1 second
Inter-prediction duration:
 0.5 seconds
Sample-rate:16000Hz
Output examples:
 
{"result": [{"music": [0.602]}]}

age_gender_stream

Prediction unit:
 1 second
Inter-prediction duration:
 0.5 seconds
Sample-rate:16000Hz
Output examples:
 
{"result": [{"age/gender": "child", "probability": [0.173]},
            {"age/gender": "male", "probability": [0.654]},
            {"age/gender": "female", "probability": [0.173]}]}

music_genre_stream

Prediction unit:
 3 seconds
Inter-prediction duration:
 0.5 seconds
Sample-rate:22050Hz
Output examples:
 
{"result": [{"genre": ["Alternative", "Dance"], "probability": [0.443, 0.411]}]}

music_mood_stream

Prediction unit:
 3 seconds
Inter-prediction duration:
 0.5 seconds
Sample-rate:22050Hz
Output examples:
 
{"result": [{"arousal": [0.536], "valence": [0.029]}]}

event_stream

Prediction unit:
 1 second
Inter-prediction duration:
 0.5 seconds
Sample-rate:22050Hz
Output examples:
 
{"result": [{"event": "babycry", "probability": [0.999]}]}