What is Cochlear.ai Sense?¶
Cochlear.ai offers audio cognition systems as a service. Our cloud API service Cochlear.ai Sense enables developers to analyze audio contents by extracting non-verbal information. It is based on gRPC framework and python, java, node.js are supported in beta version.
If you need any help or support, please do not hesitate to send us an email at firstname.lastname@example.org.
Please send us the audio samples that are not working properly with our API and tell us any issues you face during the development. It would be greatly appreciated. Thank you for your participation!
- File input methods
- ‘speech_detector’ (speech activity detection)
- ‘music_detector’ (music activity detection)
- ‘age_gender’ (age and gender detection)
- ‘music_genre’ (music genre detection)
- ‘music_mood’ (music mood estimation)
- ‘music_tempo’ (music tempo detection)
- ‘music_key’ (music key detection)
- ‘event’ (audio event detection)
- Streaming input methods
- ‘speech_detector_stream’ (speech activity detection)
- ‘music_detector_stream’ (music activity detection)
- ‘age_gender_stream’ (age and gender detection)
- ‘music_genre_stream’ (music genre detection)
- ‘music_mood_stream’ (music mood estimation)
- ‘event_stream’ (audio event detection)
For ‘event’ and ‘event_stream’, the following subtasks are available.
‘babycry’, ‘carhorn’, ‘cough’, ‘dogbark’, ‘glassbreak’, ‘siren’, ‘snoring’
In other cases, the subtask will be ignored.
Key features of beta version¶
Our beta version API includes the following major updates compared to the alpha version.
- Improved latency
- Streaming input support
- Example client codes of other languages (Java, Node.js)
- Additional functionalities
- Speech and music activity detection
- Age and gender detection
- Additional sound event class (glassbreak)
- Improved performance
In this short tutorial, we introduce Cochlear.ai Sense API and go through the process of analyzing your first audio content.
Step 1. Get your Free API key¶
Every API access is managed with API key. If you are a first time user, visit http://cochlear.ai/beta-subscription/ to get your free API key.
All API keys are limited to 700 audio files and 10 minute audio streams per method per day.
Daily Quota: 700 calls per method (audio file) / 10 minutes per method (audio stream)
Daily quotas are refreshed at the end of a 24-hour window (GMT+0).
Step 2. Clone this repository¶
This repository contains the libraries required to utilize Cochlear.ai Sense API. Copy the code below or manually download to use.
$ git clone https://github.com/cochlearai/sense-client
Step 3. Setup your environment (python)¶
This and the next steps assume the python 2.7 environment running on Ubuntu. If you are using java, please refer to the following documents:
The tutorial for node.js is soon to be updated.
- Install portaudio
This is required only for streaming methods.
$ apt install python-dev portaudio19-dev
- Install pip
Run the following codes.
$ wget https://bootstrap.pypa.io/get-pip.py $ python get-pip.py
As an alternative, you can also use apt-get command
$ apt-get install python-pip
To install the dependencies presented below, pip version 10.0.1 or later is recommended.
- (Optional) Install virtualenv
If you want to setup the python environment on virtualenv, run the following codes.
$ pip install virtualenv $ virtualenv venv $ source venv/bin/activate
You can verify whether the virtual environment is successfully activated with the prefix (venv) in the terminal window.
- Install python libraries
Run the following codes.
$ pip install --upgrade pip $ pip install --no-cache-dir -r requirements.txt
Step 4. Make your first call (python)¶
- Example codes
For the examples of the file input methods and the streaming methods, please refer to ./examples/example_file.py and ./examples/example_stream.py, respectively.
After inserting your API key to the example code, run the code below.
$ python ./examples/example_file.py $ python ./examples/example_stream.py
The size of the input audio file is recommended not to exceed 100MB.
Note that the type of the result is not determined by the input audio but by the method you call. For example, if you call a music analysis method with a speech data, the model will regard the input as a music signal and make predictions based on its knowledge about the music. Please be aware of the kind of the audio inputs you are using.
- Audio coming directly from the microphone may return unstable results.
- If the original sampling rate of your audio file does not match our requirement, use it as it is rather than resampling it by yourself.