streamsx.sttgateway package

Speech To Text gateway integration for IBM Streams

For details of implementing applications in Python for IBM Streams including IBM Cloud Pak for Data and the Streaming Analytics service running on IBM Cloud see:

class streamsx.sttgateway.WatsonSTT(credentials, base_language_model, partial_result=False, **options)

Bases: streamsx.topology.composite.Map

Composite map transformation for WatsonSTT

This operator is designed to ingest audio data in the form of a file (.wav, .mp3 etc.) or RAW audio and then transcribe that audio into text via the IBM Watson STT (Speech To Text) cloud service. It does that by sending the audio data to the configured Watson STT service running in the IBM public cloud or in the IBM Cloud Pak for Data via the Websocket interface. It then outputs transcriptions of speech in the form of utterances or in full text as configured. An utterance is a group of transcribed words meant to approximate a sentence. Audio data must be in 16-bit little endian, mono format. For the Telephony model and configurations, the audio must have an 8 kHz sampling rate. For the Broadband model and configurations, the audio must have a 16 kHz sampling rate. The data can be provided as a .wav file or as RAW uncompressed PCM audio.

Note

The input stream must contain an attribute with the name speech of type blob or bytes, for example

  • StreamSchema('tuple<blob speech>')

  • typing.NamedTuple('SttInput', [('speech', bytes)]).

A window punctuation marker or an empty speech blob may be used to mark the end of an conversation. Thus a conversation can be a composite of multiple audio files. When the end of conversation is encountered, the STT engine delivers all results of the current conversation and flushes all buffers.

Example for reading audio files and speech to text transformation:

import streamsx.sttgateway as stt
import streamsx.standard.files as stdfiles
from streamsx.topology.topology import Topology
from streamsx.topology.schema import StreamSchema
import streamsx.spl.op as op
import typing
import os

# credentials for WatsonSTT service
stt_creds = {
    "url": "wss://xxxx/instances/xxxx/v1/recognize",
    "access_token": "xxxx",
}

topo = Topology()

# add sample files to application bundle
sample_audio_dir='/your-directory-with-wav-files' # either dir or single file
dirname = 'etc'
topo.add_file_dependency(sample_audio_dir, dirname)
if os.path.isdir(sample_audio_dir):
    dirname = dirname + '/' + os.path.basename(sample_audio_dir)
dirname = op.Expression.expression('getApplicationDir()+"/'+dirname+'"')

s = topo.source(stdfiles.DirectoryScan(directory=dirname, pattern='.*call-center.*\.wav$'))
SttInput = typing.NamedTuple('SttInput', [('conversationId', str), ('speech', bytes)])
files = s.map(stdfiles.BlockFilesReader(block_size=512, file_name='conversationId'), schema=SttInput)

SttResult = typing.NamedTuple('SttResult', [('conversationId', str), ('utteranceText', str)])
res = files.map(stt.WatsonSTT(credentials=stt_creds, base_language_model='en-US_NarrowbandModel'), schema=SttResult)

res.print()
credentials

Name of the application configuration or dict containing the credentials for WatsonSTT. The dict contains either “url” and “access_token” (STT service in Cloud Pak for Data) or “url” and “api_key” and “iam_token_url” (STT IBM cloud service)

Example for WatsonSTT in Cloud Pak for Data:

credentials = {
    "url": "wss://xxxx/instances/xxxx/v1/recognize",
    "access_token": "xxxx",
}

Example for WatsonSTT IBM cloud service:

credentials = {
    "url": "wss://xxxx/instances/xxxx/v1/recognize",
    "api_key": "xxxx",
    "iam_token_url": "https://iam.cloud.ibm.com/identity/token",
}
Type

str|dict

base_language_model

This parameter specifies the name of the Watson STT base language model that should be used. https://cloud.ibm.com/docs/services/speech-to-text?topic=speech-to-text-input#models

Type

str

partial_result

True to get partial utterances, False to get the full text after transcribing the entire audio (default).

Type

bool

options

The additional optional parameters as variable keyword arguments.

Type

kwargs

property content_type

Content type to be used for transcription. (Default is audio/wav)

Type

str

property filter_profanity

This parameter indicates whether profanity should be filtered from a transcript. (Default is false) https://cloud.ibm.com/docs/services/speech-to-text?topic=speech-to-text-output#profanity_filter

Type

bool

property keywords_spotting_threshold

This parameter specifies the minimum confidence level that the STT service must have for an utterance word to match a given keyword. A value of 0.0 disables this feature. Valid value must be less than 1.0. (Default is 0.3) https://cloud.ibm.com/docs/services/speech-to-text?topic=speech-to-text-output#keyword_spotting

Type

float

property keywords_to_be_spotted

This parameter specifies a list (array) of strings to be spotted. (Default is an empty list) Example for list format:

['keyword1','keyword2']

Example for str format:

"['keyword1','keyword2']"
Type

list(str)

property max_utterance_alternatives

This parameter indicates the required number of n-best alternative hypotheses for the transcription results. (Default is 3) https://cloud.ibm.com/docs/services/speech-to-text?topic=speech-to-text-output#max_alternatives

Note

This parameter is ignored if partial_result is False.

Type

int

property non_final_utterances_needed

This parameter controls the output of non final utterances. (Default is False)

Note

This parameter is ignored if partial_result is False.

Type

bool

populate(topology, stream, schema, name, **options)

Populate the topology with this composite map transformation. Subclasses must implement the populate function. populate is called when the composite is added to the topology with:

transformed_stream = input_stream.map(myMapComposite)
Parameters
  • topology – Topology containing the composite map.

  • stream – Stream to be transformed.

  • schema – Schema passed into map.

  • name – Name passed into map.

  • **options – Future options passed to map.

Returns

Single stream representing the transformation of stream.

Return type

Stream

streamsx.sttgateway.configure_connection(instance, credentials, name='sttConnection')

Configures IBM Streams for a connection to Watson STT.

Creates an application configuration object containing the required properties with connection information.

Example for creating a configuration for a Streams instance with connection details:

from icpd_core import icpd_util
from streamsx.rest_primitives import Instance
import streamsx.sttgateway as stt

cfg=icpd_util.get_service_instance_details(name='your-streams-instance')
cfg[streamsx.topology.context.ConfigParams.SSL_VERIFY] = False
instance = Instance.of_service(cfg)
sample_credentials = {
    'url': 'wss://hostplaceholder/speech-to-text/ibm-wc/instances/1188888444444/api/v1/recognize',
    'access_token': 'sample-access-token'
}
app_cfg = stt.configure_connection(instance, credentials=sample_credentials, name='stt-sample')
Parameters
  • instance (streamsx.rest_primitives.Instance) – IBM Streams instance object.

  • credentials (dict) – dict containing “url” and “access_token” (STT service in Cloud Pak for Data) or “url” and “api_key” and “iam_token_url” (STT IBM cloud service).

  • name (str) – Name of the application configuration

Returns

Name of the application configuration.

streamsx.sttgateway.download_toolkit(url=None, target_dir=None)

Downloads the latest sttgateway toolkit from GitHub.

Example for updating the sttgateway toolkit for your topology with the latest toolkit from GitHub:

import streamsx.sttgateway as stt
from streamsx.spl import toolkit
# download sttgateway toolkit from GitHub
stt_toolkit_location = stt.download_toolkit()
# add the toolkit to topology
toolkit.add_toolkit(topology, stt_toolkit_location)

Example for updating the topology with a specific version of the sttgateway toolkit using a URL:

import streamsx.sttgateway as stt
from streamsx.spl import toolkit
url220 = 'https://github.com//IBMStreams/streamsx.sttgateway/releases/download/v2.2.0/streamsx.sttgateway-2.2.0-ced653b-20200331-1219.tgz'
stt_toolkit_location = stt.download_toolkit(url=url220)
toolkit.add_toolkit(topology, stt_toolkit_location)
Parameters
  • url (str) – Link to toolkit archive (*.tgz) to be downloaded. Use this parameter to download a specific version of the toolkit.

  • target_dir (str) – the directory where the toolkit is unpacked to. If a relative path is given, the path is appended to the system temporary directory, for example to /tmp on Unix/Linux systems. If target_dir is None a location relative to the system temporary directory is chosen.

Returns

the location of the downloaded sttgateway toolkit

Return type

str

Note

This function requires an outgoing Internet connection

New in version 0.5.

class streamsx.sttgateway.schema.GatewaySchema

Bases: object

Structured stream schemas for WatsonSTT()

AccessToken = <streamsx.topology.schema.StreamSchema object>

This schema is used internally in WatsonSTT() by the access token generator.

STTInput = <streamsx.topology.schema.StreamSchema object>

Use this schema as input for WatsonSTT()

The schema defines following attributes

  • conversationId(rstring) - identifier, for example file name

  • speech(blob) - audio data

STTResult = <streamsx.topology.schema.StreamSchema object>

This schema is used as output in WatsonSTT()

The schema defines following attributes

  • conversationId(rstring) - identifier, for example file name

  • transcriptionCompleted(boolean) - boolean value to indicate whether the full transcription/conversation is completed

  • sttErrorMessage(rstring) - Watson STT error message if any.

  • utteranceStartTime(float64) - start time of an utterance relative to the start of the audio

  • utteranceEndTime(float64) - end time of an utterance relative to the start of the audio

  • utterance(rstring) - the transcription of audio in the form of a single utterance

STTResultKeywordExtension = <streamsx.topology.schema.StreamSchema object>

This schema is added to STTResult schema when keywords_to_be_spotted is set in WatsonSTT()

The schema defines following attributes

  • keywordsSpottingResults(map<rstring, list<tuple<float64 startTime, float64 endTime, float64 confidence>>>) - The keys of the map are the spotted keywords.

STTResultPartialExtension = <streamsx.topology.schema.StreamSchema object>

This schema is added to STTResult schema when result mode is partial in WatsonSTT()

The schema defines following attributes

  • finalizedUtterance(boolean) - boolean value to indicate if this is an interim partial utterance or a finalized utterance.

  • confidence(float64) - confidence value for an interim partial utterance or for a finalized utterance or for the full text.

Indices and tables