streamsx.sttgateway package¶
Speech To Text gateway integration for IBM Streams¶
For details of implementing applications in Python for IBM Streams including IBM Cloud Pak for Data and the Streaming Analytics service running on IBM Cloud see:
-
class
streamsx.sttgateway.
WatsonSTT
(credentials, base_language_model, partial_result=False, **options)¶ Bases:
streamsx.topology.composite.Map
Composite map transformation for WatsonSTT
This operator is designed to ingest audio data in the form of a file (.wav, .mp3 etc.) or RAW audio and then transcribe that audio into text via the IBM Watson STT (Speech To Text) cloud service. It does that by sending the audio data to the configured Watson STT service running in the IBM public cloud or in the IBM Cloud Pak for Data via the Websocket interface. It then outputs transcriptions of speech in the form of utterances or in full text as configured. An utterance is a group of transcribed words meant to approximate a sentence. Audio data must be in 16-bit little endian, mono format. For the Telephony model and configurations, the audio must have an 8 kHz sampling rate. For the Broadband model and configurations, the audio must have a 16 kHz sampling rate. The data can be provided as a .wav file or as RAW uncompressed PCM audio.
Note
The input stream must contain an attribute with the name
speech
of typeblob
orbytes
, for exampleStreamSchema('tuple<blob speech>')
typing.NamedTuple('SttInput', [('speech', bytes)])
.
A window punctuation marker or an empty speech blob may be used to mark the end of an conversation. Thus a conversation can be a composite of multiple audio files. When the end of conversation is encountered, the STT engine delivers all results of the current conversation and flushes all buffers.
Example for reading audio files and speech to text transformation:
import streamsx.sttgateway as stt import streamsx.standard.files as stdfiles from streamsx.topology.topology import Topology from streamsx.topology.schema import StreamSchema import streamsx.spl.op as op import typing import os # credentials for WatsonSTT service stt_creds = { "url": "wss://xxxx/instances/xxxx/v1/recognize", "access_token": "xxxx", } topo = Topology() # add sample files to application bundle sample_audio_dir='/your-directory-with-wav-files' # either dir or single file dirname = 'etc' topo.add_file_dependency(sample_audio_dir, dirname) if os.path.isdir(sample_audio_dir): dirname = dirname + '/' + os.path.basename(sample_audio_dir) dirname = op.Expression.expression('getApplicationDir()+"/'+dirname+'"') s = topo.source(stdfiles.DirectoryScan(directory=dirname, pattern='.*call-center.*\.wav$')) SttInput = typing.NamedTuple('SttInput', [('conversationId', str), ('speech', bytes)]) files = s.map(stdfiles.BlockFilesReader(block_size=512, file_name='conversationId'), schema=SttInput) SttResult = typing.NamedTuple('SttResult', [('conversationId', str), ('utteranceText', str)]) res = files.map(stt.WatsonSTT(credentials=stt_creds, base_language_model='en-US_NarrowbandModel'), schema=SttResult) res.print()
-
credentials
¶ Name of the application configuration or dict containing the credentials for WatsonSTT. The dict contains either “url” and “access_token” (STT service in Cloud Pak for Data) or “url” and “api_key” and “iam_token_url” (STT IBM cloud service)
Example for WatsonSTT in Cloud Pak for Data:
credentials = { "url": "wss://xxxx/instances/xxxx/v1/recognize", "access_token": "xxxx", }
Example for WatsonSTT IBM cloud service:
credentials = { "url": "wss://xxxx/instances/xxxx/v1/recognize", "api_key": "xxxx", "iam_token_url": "https://iam.cloud.ibm.com/identity/token", }
- Type
str|dict
-
base_language_model
¶ This parameter specifies the name of the Watson STT base language model that should be used. https://cloud.ibm.com/docs/services/speech-to-text?topic=speech-to-text-input#models
- Type
str
-
partial_result
¶ True
to get partial utterances,False
to get the full text after transcribing the entire audio (default).- Type
bool
-
options
¶ The additional optional parameters as variable keyword arguments.
- Type
kwargs
-
property
content_type
¶ Content type to be used for transcription. (Default is audio/wav)
- Type
str
-
property
filter_profanity
¶ This parameter indicates whether profanity should be filtered from a transcript. (Default is false) https://cloud.ibm.com/docs/services/speech-to-text?topic=speech-to-text-output#profanity_filter
- Type
bool
-
property
keywords_spotting_threshold
¶ This parameter specifies the minimum confidence level that the STT service must have for an utterance word to match a given keyword. A value of 0.0 disables this feature. Valid value must be less than 1.0. (Default is 0.3) https://cloud.ibm.com/docs/services/speech-to-text?topic=speech-to-text-output#keyword_spotting
- Type
float
-
property
keywords_to_be_spotted
¶ This parameter specifies a list (array) of strings to be spotted. (Default is an empty list) Example for list format:
['keyword1','keyword2']
Example for str format:
"['keyword1','keyword2']"
- Type
list(str)
-
property
max_utterance_alternatives
¶ This parameter indicates the required number of n-best alternative hypotheses for the transcription results. (Default is 3) https://cloud.ibm.com/docs/services/speech-to-text?topic=speech-to-text-output#max_alternatives
Note
This parameter is ignored if
partial_result
isFalse
.- Type
int
-
property
non_final_utterances_needed
¶ This parameter controls the output of non final utterances. (Default is False)
Note
This parameter is ignored if
partial_result
isFalse
.- Type
bool
-
populate
(topology, stream, schema, name, **options)¶ Populate the topology with this composite map transformation. Subclasses must implement the
populate
function.populate
is called when the composite is added to the topology with:transformed_stream = input_stream.map(myMapComposite)
- Parameters
topology – Topology containing the composite map.
stream – Stream to be transformed.
schema – Schema passed into
map
.name – Name passed into
map
.**options – Future options passed to
map
.
- Returns
Single stream representing the transformation of stream.
- Return type
Stream
-
streamsx.sttgateway.
configure_connection
(instance, credentials, name='sttConnection')¶ Configures IBM Streams for a connection to Watson STT.
Creates an application configuration object containing the required properties with connection information.
Example for creating a configuration for a Streams instance with connection details:
from icpd_core import icpd_util from streamsx.rest_primitives import Instance import streamsx.sttgateway as stt cfg=icpd_util.get_service_instance_details(name='your-streams-instance') cfg[streamsx.topology.context.ConfigParams.SSL_VERIFY] = False instance = Instance.of_service(cfg) sample_credentials = { 'url': 'wss://hostplaceholder/speech-to-text/ibm-wc/instances/1188888444444/api/v1/recognize', 'access_token': 'sample-access-token' } app_cfg = stt.configure_connection(instance, credentials=sample_credentials, name='stt-sample')
- Parameters
instance (streamsx.rest_primitives.Instance) – IBM Streams instance object.
credentials (dict) – dict containing “url” and “access_token” (STT service in Cloud Pak for Data) or “url” and “api_key” and “iam_token_url” (STT IBM cloud service).
name (str) – Name of the application configuration
- Returns
Name of the application configuration.
-
streamsx.sttgateway.
download_toolkit
(url=None, target_dir=None)¶ Downloads the latest sttgateway toolkit from GitHub.
Example for updating the sttgateway toolkit for your topology with the latest toolkit from GitHub:
import streamsx.sttgateway as stt from streamsx.spl import toolkit # download sttgateway toolkit from GitHub stt_toolkit_location = stt.download_toolkit() # add the toolkit to topology toolkit.add_toolkit(topology, stt_toolkit_location)
Example for updating the topology with a specific version of the sttgateway toolkit using a URL:
import streamsx.sttgateway as stt from streamsx.spl import toolkit url220 = 'https://github.com//IBMStreams/streamsx.sttgateway/releases/download/v2.2.0/streamsx.sttgateway-2.2.0-ced653b-20200331-1219.tgz' stt_toolkit_location = stt.download_toolkit(url=url220) toolkit.add_toolkit(topology, stt_toolkit_location)
- Parameters
url (str) – Link to toolkit archive (*.tgz) to be downloaded. Use this parameter to download a specific version of the toolkit.
target_dir (str) – the directory where the toolkit is unpacked to. If a relative path is given, the path is appended to the system temporary directory, for example to /tmp on Unix/Linux systems. If target_dir is
None
a location relative to the system temporary directory is chosen.
- Returns
the location of the downloaded sttgateway toolkit
- Return type
str
Note
This function requires an outgoing Internet connection
New in version 0.5.
-
class
streamsx.sttgateway.schema.
GatewaySchema
¶ Bases:
object
Structured stream schemas for
WatsonSTT()
-
AccessToken
= <streamsx.topology.schema.StreamSchema object>¶ This schema is used internally in
WatsonSTT()
by the access token generator.
-
STTInput
= <streamsx.topology.schema.StreamSchema object>¶ Use this schema as input for
WatsonSTT()
The schema defines following attributes
conversationId(rstring) - identifier, for example file name
speech(blob) - audio data
-
STTResult
= <streamsx.topology.schema.StreamSchema object>¶ This schema is used as output in
WatsonSTT()
The schema defines following attributes
conversationId(rstring) - identifier, for example file name
transcriptionCompleted(boolean) - boolean value to indicate whether the full transcription/conversation is completed
sttErrorMessage(rstring) - Watson STT error message if any.
utteranceStartTime(float64) - start time of an utterance relative to the start of the audio
utteranceEndTime(float64) - end time of an utterance relative to the start of the audio
utterance(rstring) - the transcription of audio in the form of a single utterance
-
STTResultKeywordExtension
= <streamsx.topology.schema.StreamSchema object>¶ This schema is added to STTResult schema when keywords_to_be_spotted is set in
WatsonSTT()
The schema defines following attributes
keywordsSpottingResults(map<rstring, list<tuple<float64 startTime, float64 endTime, float64 confidence>>>) - The keys of the map are the spotted keywords.
-
STTResultPartialExtension
= <streamsx.topology.schema.StreamSchema object>¶ This schema is added to STTResult schema when result mode is partial in
WatsonSTT()
The schema defines following attributes
finalizedUtterance(boolean) - boolean value to indicate if this is an interim partial utterance or a finalized utterance.
confidence(float64) - confidence value for an interim partial utterance or for a finalized utterance or for the full text.
-