Provide developers with accurate speech recognition services, that is, it supports real-time commands and also supports multiple formats of short audio, including pcm, wav, ogg, etc. Mainly used in multi-scenario intelligent voice interaction such as social chat and smart home

Featured Functions

  • Accuracy Rate Over 91%

    Advanced ASR, integrated with high level of sound localization, multi-microphone noise reduction and echo cancellation technology of SoundAI

  • Support Multiple Languages ​​and Dialects

    Supports Chinese and English language recognition, and also supports Cantonese, Sichuanese, Northeastern dialects and other dialects

  • Personalized Hot Word Recognition

    Massive hot words, and support uploading hot words for proper nouns with poor recognition effect to enhance recognition accuracy

  • Fast Information Communication

    The recognition result response time is 150~200ms, and the recognition time is audio duration*0.3. The efficient speed makes the voice communication extremely smooth

Solutions for Your Business

Smart Home

Provide far-field voice control capabilities for smart home devices to fully meet the application requirements of home appliances

Social Chat

When using social apps to chat, convert voice messages into text

Voice Search

Provide a voice search form for existing search software, making search more convenient and efficient

Traditional Electrical Appliances

When human-computer interaction, the voice information of both parties is converted into text information to enhance the interactive experience


Should it be marked if there is a dialect?
Yes, the currently supported dialects include Hubei, Sichuan, Northeast, Shandong, Henan, and Cantonese. It is necessary to indicate in the request header what dialect is the uploaded audio. The audio formats supported by different dialects are not the same. Please check in the table for comparison
What audio formats and sampling rates does the speech recognition service support?
At present, the ASR service only supports two sampling rates of 16KHz and 8KHz. For other sampling rates such as 48KHz, it is recommended to resample to 16KHz before calling the speech recognition service. The audio formats supported by each service are different. Please go to the respective service pages to perform View
Wrong place name and person name recognition?
These are exclusive vocabulary. Please organize the exclusive vocabulary into a hot vocabulary list. After uploading the hot vocabulary list, add this vocabulary id in the request header to optimize the translation effect of the exclusive vocabulary. For the specific method of uploading hot words, please go to hot word View page

We will contact you as soon as possible to provide you with timely services