Getting started

What is Sense? offers audio cognition systems as a service. Our cloud API service Sense enables developers to analyze audio contents by extracting non-verbal information. It is based on gRPC framework and python, java, node.js are supported in beta version.

If you need any help or support, please do not hesitate to send us an email at

Please send us the audio samples that are not working properly with our API and tell us any issues you face during the development. It would be greatly appreciated. Thank you for your participation!

Available tasks

  • File input methods
  1. ‘speech_detector’ (speech activity detection)
  2. ‘music_detector’ (music activity detection)
  3. ‘age_gender’ (age and gender detection)
  4. ‘music_genre’ (music genre detection)
  5. ‘music_mood’ (music mood estimation)
  6. ‘music_tempo’ (music tempo detection)
  7. ‘music_key’ (music key detection)
  8. ‘event’ (audio event detection)
  • Streaming input methods
  1. ‘speech_detector_stream’ (speech activity detection)
  2. ‘music_detector_stream’ (music activity detection)
  3. ‘age_gender_stream’ (age and gender detection)
  4. ‘music_genre_stream’ (music genre detection)
  5. ‘music_mood_stream’ (music mood estimation)
  6. ‘event_stream’ (audio event detection)

For ‘event’ and ‘event_stream’, the following subtasks are available.

‘babycry’, ‘carhorn’, ‘cough’, ‘dogbark’, ‘glassbreak’, ‘siren’, ‘snoring’

In other cases, the subtask will be ignored.

Key features of beta version

Our beta version API includes the following major updates compared to the alpha version.

  • Improved latency
  • Streaming input support
  • Example client codes of other languages (Java, Node.js)
  • Additional functionalities
    • Speech and music activity detection
    • Age and gender detection
  • Additional sound event class (glassbreak)
  • Improved performance

Quick Tutorial

In this short tutorial, we introduce Sense API and go through the process of analyzing your first audio content.

Step 1. Get your Free API key

Every API access is managed with API key. If you are a first time user, visit to get your free API key.

All API keys are limited to 700 audio files and 10 minute audio streams per method per day.

Daily Quota:700 calls per method (audio file) / 10 minutes per method (audio stream)

Daily quotas are refreshed at the end of a 24-hour window (GMT+0).

Step 2. Clone this repository

This repository contains the libraries required to utilize Sense API. Copy the code below or manually download to use.

$ git clone

Step 3. Setup your environment (python)

This and the next steps assume the python 2.7 environment running on Ubuntu. If you are using java, please refer to the following documents:

The tutorial for node.js is soon to be updated.

  • Install portaudio

This is required only for streaming methods.

$ apt install python-dev portaudio19-dev
  • Install pip

Run the following codes.

$ wget
$ python

As an alternative, you can also use apt-get command

$ apt-get install python-pip

To install the dependencies presented below, pip version 10.0.1 or later is recommended.

  • (Optional) Install virtualenv

If you want to setup the python environment on virtualenv, run the following codes.

$ pip install virtualenv
$ virtualenv venv
$ source venv/bin/activate

You can verify whether the virtual environment is successfully activated with the prefix (venv) in the terminal window.

  • Install python libraries

Run the following codes.

$ pip install --upgrade pip
$ pip install --no-cache-dir -r requirements.txt

Step 4. Make your first call (python)

  • Example codes

For the examples of the file input methods and the streaming methods, please refer to ./examples/ and ./examples/, respectively.

After inserting your API key to the example code, run the code below.

$ python ./examples/
$ python ./examples/

The size of the input audio file is recommended not to exceed 100MB.

Note that the type of the result is not determined by the input audio but by the method you call. For example, if you call a music analysis method with a speech data, the model will regard the input as a music signal and make predictions based on its knowledge about the music. Please be aware of the kind of the audio inputs you are using.


  • Audio coming directly from the microphone may return unstable results.
  • If the original sampling rate of your audio file does not match our requirement, use it as it is rather than resampling it by yourself.