This repository stores the definitions and generated code for Speechly public APIs.
There are also higher-level client libraries available for selected platforms, which contain microphone and audio management functions, as well as the connection state management that otherwise would be needed separately on top of these definitions. See Speechly Client Libraries for more information about these.
Protobuf stub generation is pretty easy, so if you need support for a language not in the list, you can always generate the stubs separately.
Make sure to check language-specific READMEs.
See the language specific examples in the respective subdirectories for more detailed description about using the generated code. The following describes the basic API flow of a Speechly client, which sends speech to the API and receives results at the same time.
An API Reference is generated from the protobuf source files, which contains detailed documentation about the APIs.
All gRPC connections to Speechly APIs must use secure channels, meaning that the connection is done using TLS encryption. The secure channel should be opened to
api.speechly.com:443. This channel can then be used to access all of the APIs.
The first step in connecting to the Speechly API is to call
speechly.identity.v2.IdentityAPI and create an access token to use for the future calls.
device_id, a device identifier that the API can use to match the microphone acoustic profile
app_idto select a specific Speechly application to use, or
project_idto use a project, containing multiple applications
speechly.identity.v2.IdentityAPI/Login(the stubs help here)
LoginResponsewill contain an access token, and expiry information. A new access token should be fetched before the expiration to prevent unnecessary errors.
IdentityAPI/Login is the only API call which does not require authentication metadata. All other API's require that the access token received from
Login is attached to the request metadata with key
authorization and value
Bearer TOKEN (replace TOKEN with the actual token).
If the token is expired or otherwise invalid, all API calls will terminate with gRPC status code
PERMISSION_DENIED. A reason is included in the error details.
The token will expire after a certain amount of time, stated in the
LoginResponse message. It is still a good idea to keep the once-received token and reuse it for multiple connections, and refresh it only when it is close to expiration. This will make the API calls as fast as possible.
speechly.slu.v1.SLU/Stream is used to send audio in, and receive results based on the target Speechly application configuration. An access token from
IdentityAPI is required to access the
A generic example of an
speechly.slu.v1.SLU/Stream. Remember to include the access token in the stream's metadata.
SLURequestand all responses are of type
SLUResponse. These are envelopes that will contain different types of data, depending on the situation:
SLURequest.configmessage, describing the audio stream
SLURequest.event.STARTmessage when the speech stream is started
SLUResponse.startedmessage, containing the
SLUstream is bidirectional, it will receive data at the same time as it sends data. Refer to the docs to see the meaning of different types of
SLUResponse.finishedevent, containing the
audioContextid that was finished
The connection can be kept open, but an active speech stream (audioContext) will have a maximum duration of 5 minutes.
There are other APIs that can be used to manage Speechly applications. Instead of integrating to these, a quicker alternative is to use the Speechly command. Nevertheless, the APIs are documented and usable, if so required.
The build is done with
You can run the build for all languages with
make build from the root of this repo.
Speechly is a developer tool for building real-time multimodal voice user interfaces. It enables developers and designers to enhance their current touch user interface with voice functionalities for better user experience. Speechly key features:
|Instead of using buttons, input fields and dropdowns, Speechly enables users to interact with the application by using voice.
User gets real-time visual feedback on the form as they speak and are encouraged to go on. If there's an error, the user can either correct it by using traditional touch user interface or by voice.
|Last commit: 1 week ago|