Pre-13.0 Release Notes
| Reference Number: AA-02195 Views: 49947
0 Rating/ Voters
Important: These are release notes prior to LumenVox version 13.0.100
Only release notes relating to release versions 10.5.110 (August 14, 2012) and 12.2.100 (September 2, 2014) are shown here. Please refer to other articles for other release notes
12.2.100 (September 2, 2014):
Improvements and New Features:
Added many internal cosmetic and functional changes to the Speech Tuner relating to speed, stability and performance in addition to specific features and changes described below.
- Added new Random Sentence Generator to the Speech Tuner.
- Added new Pronunciation Checker functionality to the Speech Tuner
- Added new filtering options to Speech Tuner for audio length, signal-to-noise ratio, menu index, RTF and grammar set index
- Added new status filtering options to Speech Tuner, allowing for OOG, OOC, Transcribed, No-Input and No-Match
- Added new Out-Of-Coverage (OOC) option to Speech Tuner, along with option to treat OOC as OOG for backward compatibility.
- Added new Tuning Wizard functionality to Speech Tuner, which analyzes data and reports any issues detected
- Added new Speech Tuner option to load callsre files recursively from a selected folder, significantly increasing loading speed, when lots of files are being used
- Added new option to Speech Tuner allowing users to specify the number of threads (and corresponding speech ports/licenses) that will be used. This can dramatically increase Speech Tuner performance over previous single-threaded behavior (while using more licenses).
- Added new API option to specify PROP_EX_CONFIDENCE_THRESHOLD for an application. This is automatically called when using the Media Server, but for API customers wishing to track and use confidence threshold values during tuning, they should use this API functionality.
- Added new client_property.conf setting for MENU_ID_STRING_MODE for client applications. This setting defines which information used to determine the uniqueness of an active grammar set, when used in conjunction with the Speech Tuner to automatically discover menus and grammar sets so that data can be organized more naturally using these constructs
- Added new options in Speech Tuner to allow specification of strategy to use when determining menu/grammar set uniqueness. Also added option to force this method to be used when loading data (essentially overriding whatever setting was in effect when the callsre files were generated)
- Added new automatic log cleanup mechanism to the LVManager service. Now log files/folders will be automatically cleaned up after a number of days (specified in manager.conf). Separate configuration settings are available for resource logging files (used by the Dashboard charts) and regular log files. This mechanism also caters for the extended logging mode, which rolls log files over to new sub-folders based on the current date, and will also clean up any vacated sub-directories as part of this processing.
- Added new options to 'Manager' dashboard configuration screen to specify settings for log_file_max_age (default = 0/disabled) and resource_log_file_max_age (default = 30 days)
- Added new Dashboard options to allow configuration of log cleanup mechanism via Manager configuration screen
- Added new Dashboard functionality to display machine uptime in main Summary view
- Added option to specify audio Sample Rate in SimpleTTSClient_c example code. This already existed in the c++ sample. Also changed the samples to use the default values specified in client_property.conf file unless explicitly specified in the command line. Previously the audio format would always default to ULAW when not specified.
- Added new Speech Tuner option to right-click TTS interaction and edit the corresponding SSML via context menu
- Added new menu and grammar set processing functionality to the Speech Tuner, offering a more intuitive organization of data, based on the combinations of grammars used for different interactions
- Added new menu and grammar set context toolbar to the Speech Tuner to allow easy switching between these new contexts
- Added SSML interaction processing support to CallIndexer
- Added CPA and AMD specific testing to LVShowConfig
- Added new Grammar Editor option allowing users to select either the currently active 'editor' grammar, or all selected grammars when parsing using the grammar editor. This new option allows multiple grammars to be parsed at once, as opposed to a single grammar as implemented previously. A separate parse result will be displayed for each active grammar.
- Added new Speech Tuner statistics to track "False No-Input" and "Correct No-Input", allowing better analysis of No-Input interactions
- Modified the behavior of MIN_FREE_DISK_SPACE in logs.conf. Now, when this is set to a value of 0, the disk free space checking algorithm will be disabled.
- Modified customer example code to demonstrate the correct use of IGNORE_LV_DEPRICATED when defined.
- Modified comments in customer example code relating to disambiguation
- Modified SSML parser for TTS1 to support both "spell-out" and "spell" as options for 'say-as'
- Updated SSML parser to improve support for decimal and hexadecimal numeric escape sequences
- Improved Speech Tuner auto-complete code to auto-close XML tags when editing grammars, lexicons and SSML documents
- Improved reporting of grammar load failures both within API logging and also when using the Speech Tuner. Often obscure grammar load failures (such as failure to load referenced grammar or lexicon files) are now more clearly reported.
- Updated Windows builds to use the latest version (1.0.1g) of openssl to fix any potential issues relating to the heartbleed bug. Linux versions use the currently installed version on the host machine, so can be updated independently
- Increased the severity of messages logged when the Media Server runs out of resources.
- Changed Dashboard process charts to not set the 'smoothed' plot option by default
- Modified Media Server to better handle the situation where multiple INVITE requests are being handled in parallel for the same session. Better detection is now in place to avoid responding if a previous thread is already processing the first request and starting the corresponding session. This fixes a potential issue where the Media Server is being overwhelmed by a large volume of session requests (this situation is not typically encountered).
- Modified callsre logging for No-Input type events to now record which grammars were active, since this can be useful when later tuning. Also recording more clearly which voice and dtmf grammars were active when performing DTMF decodes (again, for better subsequent tuning).
- Modified Speech Tuner confidence histogram display to better utilize display window height. Also displaying of confidence threshold ranges was improved.
- Modified Speech Tuner to display optional messages if insufficient transcribed data is available, along with helpful tips
- Modified the Speech Tuner to better represent data from callsre files before (input) and after (output) when performing decodes within Tester View
- Modified Speech Tuner transcriber view to disable transcription options for non-transcribable interactions (such as TTS interactions).
- Modified Speech Tuner SSML and Grammar Editors to automatically adjust scrollbars to match contents when the length of the lines being edited were particularly long. Now the scrollbars should be correct when editing all documents.
- Modified Grammar Editor in the Speech Tuner to allow multiple grammars and lexicons to be opened in different tabs until unloaded by the user. This allows easier switching between grammars
- Modified the grammar lists shown in Summary, Grammar Editor and Tester Views to be filtered by the new menu / grammar set context.
- Changed Speech Tuner NO_INPUT/"~Barge in Timeout" interactions so that these are no longer considered as DECODE FAILURES
- Modified Speech Tuner behavior when evaluating Decode and Transcript text matches, to now strip out any noise tags. Utterances that previously had noise tags would always show up as a "semantic match" even if the raw text matched except for the noise tags. After this change, the same utterance would now show up as a "correct".
- Improved Speech Tuner handling of loading incompatible versions of tuner zips. Previous versions would fail without specifically identifying the cause of the failure as loading a tuner zip created by a newer speech tuner than the current one. An appropriate message box is now displayed when there is a failure due to incompatible tuner zip versions.
- Added support for custom ASR lexicons to Tier1 and Tier2 licenses. Previously this functionality was restricted to Tier3 and Tier4
- Fixed a logging bug, which miscalculated the amount of free disk space when using certain block sizes in Linux.
- Fixed a small memory leak in the Media Server when AMD and CPA are active at the same time in a session
- Fixed minor quantization issue in ASR statistics reporting
- Fixed minor ASR statistics issue when these were reset (via the Dashboard) the maximum queue size was being set to 0, instead of the currently active value
- Fixed some incorrect statistical logging for ASR and TTS. This issue led to some very large numbers being generated, due to a wrap-around issue, which would throw off the average timing calculations.
- Suppressed logging of Content-ID for Media Server when secure-context is active, since this may be used to expose potentially sensitive information
- Suppressed logging of grammar labels in LVSpeechPortAPILog.txt when secure context was active.
- Fixed a minor bug where grammars were not being automatically deactivated when they were unloaded from the port. Now UnloadGrammar will also deactivate the grammar.
- Fixed a bug in GetPropertyEx when using the PROP_EX_LOGGING_ENCRYPTION_LEVEL option. Previously this would have returned 0, regardless of the actual setting
- Fixed a complex bug when dealing with multiple ASR servers that were being used by API or Media Server clients in a load-balanced configuration. Previously, it was possible for a grammar load failure to occasionally occur depending on how the grammar was being cached.
- Fixed incorrect accents within builtin Mexican Spanish date and currency grammars.
- Fixed an issue in the Speech Tuner where some TTS interactions were not correctly displaying the corresponding 'platform' information for an interaction.
- Fixed a grammar related issue in the Speech Tuner when working with grammars having the same URI but with different raw text
- Fixed file encoding display issues within Speech Tuner. Previously conversion to UTF-8 or ISO-8859-1 was performed internally before the grammar was displayed in the editor. Now the selection is more faithful and consistent with the actual format of the file (and can be changed on the fly as needed). This applies to grammar, lexicon and SSML files being edited
- Fixed a Speech Tuner file permission installation issue which resolves the inability to write log files when run without administrative privileges
- Fixed issues in Dashboard when viewing log files of services with or without the extended path mechanism (logs created in subdirectory based on current date). Also corrected an issue when viewing logs containing less than 350 characters
- Fixed issue with ASR going into an unrecoverable state when attempting to process a decode with multiple conflicting or invalid languages
- Fixed a bug in the client (Media Server or LVSpeechPort) which threw an exception in a specific situation, indicating "LicenseClient::ValidateTimedLicense" as the culprit in stack trace.
- Fixed Speech Tuner bug when extracting a TunerZip file where file handles were not closed as soon as extraction was complete, but were instead closed when the application was closed. This prevented the user from deleting the extracted folder until the Speech Tuner application was closed which also prevented re-extraction to the same folder.
- Fixed Speech Tuner "Send To LumenVox" functionality to include all grammars instead of just the last loaded grammar.
- Fixed a minor cosmetic issue when scrolling up or down one line at a time when viewing logs within the Dashboard.
- Fixed a problem when viewing logs within the Dashboard for services using extended logging, with log files in different sub-folders
- Fixed a Grammar Editor bug in the Speech Tuner where escaped quotations (") in SSML XML attributes were being mishandled. This was noticeable in specifying a
element in SSML with "x-sampa" alphabet where a quote (") as part of the sampa phoneme syntax indicates a primary stress. Even though the quote was correctly escaped using " in the input SSML, on preparsing the SSML, due to the internal handling it was getting unescaped to ", thus breaking the XML.
- Fixed bug when loading referenced custom lexicons from callsre files via the Speech Tuner.
- Fixed a small leak in LVSpeechPort when using custom lexicons.
12.1.100 (February 10, 2014):
Improvements and New Features:
- Added feature in Speech Tuner to allow changing the TTS Server ip-address via the Options dialog.
- Added feature in Speech Tuner to allow text-encoding for SSML to be specified. The encoding is auto detected when a file is loaded from disk but can also be modified via a drop down box to either UTF-8 or ISO-8859-1
- Added feature in Speech Tuner to allow editing of grammars as either UTF-8 or ISO-8859-1 encoding type.
- Improved handling of grammar load failures in MRCP to return more appropriate completion causes: 005 gram-compilation-failure, 009 uri-failure, and 010 language-unsupported. The optional MRCP headers "Failed-URI" and "Failed-URI-Cause" are also now populated upon a URI failure
- Improved handling of SSML load failures in MRCP to return more appropriate completion causes in case of URI failure. The optional MRCP headers "Failed-URI" and "Failed-URI-Cause" are also now populated upon a URI failure
- New Dashboard option allowing statistics to be reset as needed
- Improved rules for es-MX builtin:grammar/date
- Changed secure context functionality to be more secure by suppressing grammar label, URI and grammar text when secure_context=1 (active)
- Changed Media Server default ports for mrcp_server_port_base and rtp_server_port_base to 20000 and 25000 respectively to avoid overlap with the CentOS6 ephemeral port range.
- Added new check at Media Server startup to detect overlap between ephemeral port range and RTP/MRCP port ranges. A warning message is logged to the media_server_app.txt log as well as lumenvox_critical.txt if any overlap is detected
- Added new checks to Media Server startup to detect and report any overlapping of MRCP and RTP port ranges. This is similar to the new ephemeral port range checks.
- Added checks in LVShowConfig/lv_show_config to log a warnings if there are any overlaps between the the RTP Port, MRCP Port and Ephemeral Port ranges.
- Added new estimate grammar complexity feature, viewable in Speech Tuner in the grammar property page. This value represents a relative complexity for the specified grammar when used with the LumenVox ASR engine. This may be particularly useful when debugging or determining scalability when using this grammar.
- Modified order of interpretations returned when multiple grammars match the input to be sorted by grammar activation order rather than alphabetical order of label
- Modified Dashboard, adding the ability to detect and correctly locate and log files when extended logging mode is enabled (log files stored in date-based sub-folders)
- New dashboard monitoring functionality to allow many more days to be displayed (previously limited to one day). Also allows for scaling of Y (vertical) axis, and persistence when hiding series. Another new option allows only currently running processes to be shown, or all tracked processes (not necessarily still running). Scaling in the x (horizontal) direction can now be done using a slider with min/max options or by using an optional settings dialog. Requires updated Manager from 12.1 in order to work correctly (this functionality is not backwards compatible with previous versions).
- After adding the extended Dashboard monitoring functionality, a more compact and efficient method of data transfer was implemented. When viewing more than 10 hours of data, it is displayed in a summarized format, based on hourly min/max range. When viewing less than 10 hours of data, more detailed minute-based information is displayed. These changes allow for a much more responsive user experience and much less unnecessary data traversing the network.
- Added TTS API to return Error String for a specified Return Code. These new calls are LV_TTS_ReturnErrorString and LVTTSClient::ReturnErrorString, and make the TTS interfaces more consistent with the ASR interfaces.
- New feature to stop logging to centralized logs if there is less than a specified amount of free disk space. This amount can be specified in logs.conf, with the default being 100MB.
- New Media Server NUM_CHANNELS setting that should be set to the anticipated maximum number of channels (ASR + TTS + CPA ports) being handled by the server. This single setting will automatically assign appropriately scaled settings to num_spawning_threads, num_graveyard_threads, num_mrcp_threads, num_rtp_threads and listening_socket_size to optimize memory use and performance. Note that num_spawning_threads, num_graveyard_threads, num_mrcp_threads, num_rtp_threads and listening_socket_size settings should be assigned as either 'auto' or 'default' values to enable this mechanism. For RTP event threads, MRCP event threads and spawning threads, the value automatically assigned will become 2 for up to 500 channels and then scale up to 4 threads for 1000 channels or greater.
- For graveyard threads, we set the number of threads as 4 up to 400 lines and then scale up to 10 threads for 1000 lines or greater.
- Deprecated support for Upgrade Analyzer tool, which is no longer required.
- Deprecated the following Global Grammars API calls:
LVSpeechPort::LoadGlobalGrammar(const char* label, const char* uri);
LVSpeechPort::LoadGlobalGrammar(const char* label_is_uri);
LVSpeechPort::LoadGlobalGrammarFromBuffer(const char* label, const char* buffer_string);
LVSpeechPort::ActivateGlobalGrammar(const char* name);
LVSpeechPort::UnloadGlobalGrammar(const char* uri);
LVSpeechPort::LoadGlobalGrammarFromObject(const char* label, LVGrammar& Grammar);
NOTE: These API calls can still be used by including LV_SRE_Deprecated.h
- Updated confidence scoring algorithms that show modest overall improvement to confidence scores. This change may not be noticeable to most customers, but allows for future planned improvements
- Improved customer examples to clean up some inconsistencies and make the code more consistent with our coding recommended
- Improved Speech Tuner grammar and SSML editors to modify the auto-indent feature when a new-line is entered. There is now more consistent white-space formatting to match the previous line, which gives a better and more consistent user experience.
- Improved to add more detailed logging messages when encountering a failure to load from an external reference in a grammar
- Fixed grammar bug in Speech Tuner when loading up tunerzip files which led to the wrong grammar being used in decodes
- Modified Media Server configuration from within Dashboard to permit 0 (disabled) as a valid entry for SIP_port and RTSP_port.
- Fix for minor leak when setting custom js footer in compatibility mode
12.0.100 (November 18, 2013):
Improvements and New Features:
- Various changes to logging mechanism to allow settings to be modified by LVCONFIG/logs.conf file. This file will be created if it doesn't exist.
- Multiple streamlining and performance improvements across all LumenVox products, resulting in more responsive handling of requests, and more efficient use of memory and resources. These changes may be most evident for users running a very large number of simultaneous sessions, which should now perform noticeably better at higher loads.
- Added 4 new TTS languages and voices
- European Portuguese Male: Adriano
- Indian English Female: Rani
- Turkish Female: Sevi
- Swedish Female: Janna
- Added SSML support for <sentence> as a substitute for <s> and <paragraph> as a substitute to <p>. While this is not required by the SSML specification, customers have frequently used these tokens interchangeably.
- Improved TTS logging of SSML parsing to specify the missing mandatory property that caused an element to be skipped.
- Modified Tuner filters to allow string matching to be case insensitive. For example if looking for "Transcript Text" using the "Contains" option to search for strings will now perform case insensitive matching. Previously this was case sensitive.
- Improved error handling for loading W3C n-grams. Malformed n-grams will now be handled in a more robust way, and any errors detected will be reported in a clearer manner.
- Removed unwanted constrains on logging verbosity settings. These were previously being capped at 3. Note that using verbosity at higher settings on high-bandwidth systems, may have an impact on performance due to the amount of data written to the logs.
- Various internal changes to licensing mechanism that will be deployed in future versions to assist users when deploying or updating licenses. These changes will have no effect on new or existing users for the moment.
- Added support for optional quotes around vendor-specific-parameters. Customers have asked us to support things like secure_context="1" as well as secure_context=1. This was added for all vendor-specific-parameters. Note that when querying vendor-specific-parameters after setting them, the returned values will reflect what the client set (with or without quotes, as appropriate). Our default remains without quotes.
- Added new Media Server configuration option (max_rtp_packet_size) that allows maximum size of RTP packets being received. These changes also lower the default value of DEFAULT_RTP_PACKET_LENGTH to 200 (from 260). Permitted RTP packet sizes are now in the range 180-3000 bytes.
- Added new options to LVManager service that allow tracking of system and process resources over time. By default, statistics from LumenVox processes will be tracked, however users can easily configure settings to monitor additional processes instead, or in addition to these. Samples are taken by default once every 60 seconds. This sample frequency can also be changed by configuration setting.
- New Dashboard 'Monitoring' page that graphically shows the performance history of the system and processes that are being tracked. Real-time CPU, memory, thread, handle and disk use can be displayed, including up to 24 hours of history.
- Modified Dashboard to allow selection of verbosity settings up to and including 5 (Highest). Note that verbosity settings this high can have an impact on throughput if a very large number of simultaneous ports are being used.
- Added Dashboard option to display installed acoustic model languages when viewing the ASR configuration page.
- Added Dashboard options to display TTS voice names that are installed when viewing the TTS Server configuration page.
- Added several new Dashboard options allowing configuration of ASR 'ENABLE_APP_STAT_LOGGING', Media Server 'USE_SPEECH_COMPLETE', 'FORCE_INCREMENT_RTSP_CSEQ' and 'MAX_RTP_PACKET_SIZE'
- Many changes to accommodate suppression of grammars being logged when running in secure_context mode.
- Added more statistical tracking of resource use within ASR, which can optionally be sent to the asr_server_status.txt
- Replaced timing mechanism used by Media Server when tracking various session based timers. The new mechanism is significantly more accurate, and allows timing precision down to a small number of millisecond in most cases.
- Implemented more reasonable floor values for Speed vs Accuracy settings, resulting in fewer recognizer-errors being returned when using edge-case settings.
- Added a new 'force_increment_rtsp_cseq' configuration setting to Media Server for RTSP message processing. The default is 0, in which multiple replies for the same request will have the same RTSP CSeq as the original request. Setting this as 1 uses stricter adherence to the specification, where outgoing (server originated) RTSP messages will have their own CSeq numbers, starting from a value of 1. The default setting of 0 will behave as previous versions.
- Added a new 'use_speech_incomplete' Media Server configuration option. If enabled, the greater of speech-complete-timeout or speech-incomplete-timeout will be used for EOS delay. If disabled, only speech-complete-timeout will be used (as in previous versions).
- Changes were made to builtin DTMF grammar processing where the specified language parameter will be used when selecting the appropriate grammar.
- Added ISO-8859-1 encoding declaration to NLSML output to better support accented characters being returned to other platforms via MRCP. This resolves potential ambiguity where the other platform assumes UTF-8 encoding.
- Removed the 'Save-Waveform: false' header from SimpleASRClient requests, thus allowing users to control save-waveform from their configuration files as needed.
- Minor styling change on Dashboard create_server_id.html page to be clearer for users.
- Deprecated API functions LV_SRE_DecodePitch, LV_SRE_DecodeEnergy and LV_SRE_GetVoiceChannelData.
- Deprecated the following properties, that were not being used and have not been supported for some time:
- Improved TTS SSML handling for languages including handling of Gwendolyn and Gavin when the language is not specified. We now do a better job of picking the language in a smart fashion if both cy-GB (Welsh) and en-GB (British English with a Welsh accent) voices are available. If there was a language specified in any parent property that is valid for the current element in SSML, we use that language in picking the language for these voices.
- Several TTS enhancements, including:
- Rebuilt voice models
- Improved pronunciation accuracy in Russian, Romanian, English, Brazilian Portuguese, Spanish, Italian, Dutch and German voices
- Improved pronunciation of URLs in Italian, Dutch, Danish and Brazilian Portuguese voices
- Improved support for VXML phone number and time formats in SSML <say-as>
- Improved splitting of paragraphs ending with a single EOL character
- Fixed a small memory leak detected across all LumenVox products when connection between client and server is lost under heavy load conditions.
- Fixed Speech Tuner problem which mishandled external references when loading grammars from tuner zip files
- Fixed Speech Tuner issues with dragging/dropping callsre files, including encryption callbacks. Also added Welsh and Russian voices and languages to SSML editor auto-complete options.
- Fixed bug in TTS where prosody rate changes applied as a relative value was ignored. e.g. "+20%" or "-15%" would not work in previous versions. Absolute values of 1.20 or 0.85 has always, and continues to work.
- Fixed an issue in the Media Server where a received MRCP message may not trigger an immediate response when under high load in Linux
- Fix for forward-slash characters used in GrXML phrases. These are now surrounded by quotes when processed.
- Removed fractional dollars from builtin:currency. This fixes an issue where invalid currency utterances could be accepted, also leading an invalid semantic interpretation
- Fix for exception in SimpleMRCPClient if an audio file with no suffix/extension was specified.
- Fixed a port/license leak in the Speech Tuner when saving a pre-compiled grammar
- Fixed issues with secure_context log suppression
- Fixes to better handle the situation where users specify a NULL or empty dtmf-term-char, if specified at the request level. Previously a null string would result in the default # from the session being used.
- Several changes to correctly implement dtmf-term-timeout, and slightly adjusting dtmf-interdigit-timeout. These changes affect MRCPv1 and MRCPv2.
- Fixed a TTS bug where, if no specific voice or gender is specified, the original priority list will be used in deciding which voice to use as opposed to being effected by the previous synthesis voice or gender.
- Fixed a problem where compiling grammars over a certain size could not be saved using GrammarLoader due to timeout problem. Previously, the timeout was only associated with the grammar compilation, but if the size of the compiled grammar is significant then the fetching of the compiled grammar could fail due to a small fixed timeout that was used. This has been changed to allow the remainder of the specified PROP_EX_LOAD_GRAMMAR_TIMEOUT
- Fixed a bug where nested NULL rules at the end of a sequence of symbols were not being handled correctly
- Fixed an issue where square brackets in the middle of grammar URIs were causing the URIs to be incorrectly terminated at that point. These '[' and ']' characters are now classed as permitted within URI strings.
- Fix for grammar issue where rules with similar constructs could cause unwanted additional parses. In particular this issue was being seen for the builtin:dtmf/currency grammar. Now these types of grammars should return the correct number of parses and in a shorter period of time
- Fix to better handle newlines within SISR literals, which previously had problems when evaluating newlines within strings and would incorrectly return a syntax error. This fixes an issue with the interpretations returned for grammars using a tag-format sisr/1.0-literals with a newline in the tag.
- Fix for problems seen during grammar load when performing a server-initiated grammar fetch from client while also performing a client-initiated load request using the same grammar. This issue may have be seen when using multiple ASR servers from a single client and under some significant load
- Added missing root rule to builtin es-MX date grammar
11.3.100 (August 27, 2013):
Improvements and New Features:
- Added new TTS API functions LV_TTS_GetLastSynthesisError and LV_TTS_GetLastSynthesisErrorCode to provide more verbosity when TTS synthesis errors are detected.
- Added Response (callsre) file encryption to code ASR and TTS functionality as well as adding necessary decryption functionality to Speech Tuner. Please see our new Securing Sensitive Data article for more information.
- Added support for the new Russian TTS voice Nikita
- Improved ASR acoustic model performance for all languages and all models. This is a major change to the ASR and should provide more accurate results in most cases.
- Added support for 22 kHz TTS1 voices for API customers (no Media Server transport of 22 kHz is supported). Users can now choose between 8 kHz (telephony) and 22 kHz (Web / Mobile / Desktop / Embedded / Other) options.
- Modified SimpleTTSClient to allow a new -rate option so that users can specify either 8000 (default) or 22050 Hz when performing synthesis.
- Added support for TTS1 viseme generation with new PROP_EX_VISEME_GENERATION option and matching VISEME_GENERATION client_property.conf setting. These new API functions are used to provide access to visemes:
- Minor updates to the TTS API comments. Also added the C++ LVTTSClient versions of the recently added GetLastSynthesisError and GetLastSynthesisErrorCode functions. LVTTSClient member functions now correctly return LV_INVALID_TTS_HANDLE instead of LV_SYSTEM_ERROR if called when m_client is NULL (before the class was initialized, or after it is destroyed)
- Improved user feedback when saving TTS audio to file
- Improved loading of grammars to reduce time taken processing cached grammars. This should reduce the amount of string processing that occurs on repeated load grammar requests and fix a possible threading issue when reading URI fragments.
- Added new internal caching mechanism to Speech Tuner to improve performance and user experience. This is especially important when dealing with encrypted response files, but has benefits for all response file handling.
- Fixed a minor issue where Media Server would add unwanted telephone-event and fmtp headers to the SDP for MRCPv1 SETUP replies when running in compatibility mode 1.
- Minor corrections to API log label decoration in calls to UnloadGrammar
- Fix for a problem with Avaya AEP 6 when NOT using new RTSP session per call. Previously the media server could crash when processing two calls in the same thread after one call terminates. Note that users should select the "Use New RTSP Session Per Call" setting, so this situation should never occur.
- Minor change to resolve Speech Tuner exceptions when launching from Windows Explorer shell by double-clicking interactions files.
- Fixed issue that was introduced in 11.0 when using SSML for TTS1 when only the "gender" attribute was specified for a "voice" element without an "xml:lang" attribute. The "gender" attribute was previously being ignored in this scenario.
- Fixed a bug in the Speech Tuner which prevented platform information from being displayed for Response Files.
- Fixed some existing return values that were previously incorrect when calling LV_TTS_SetPropertyEx
- Fixed typo in SimpleMRCPClient usage information. Previously an argument of 1 for -secure_context was indicated. However, there should be no argument for a -secure_context specifier
- Fix for grammar compilation handling of recursive rules. This fixes cases where out-of-grammar recognition may be returned from the decoder when using complex grammars that contain recursion or an unterminated repeat operation.
11.2.200 (May 22, 2013):
Improvements and New Features:
- Added missing Completion-Cause header from DEFINE-GRAMMAR responses for both good and bad MRCP results. This header was missing from both MRCPv1 and MRCPv2. This change makes the Media Server more consistent with both specifications
- Fixed an issue when encountering grammars with more than 10,000 characters within a single line without any line feeds. This caused the LumenVox Client, Media Server and Speech Tuner to issue a segmentation fault
- Fixed SSML processing for the following XML escaped punctuation characters within text to be spoken as part of a TTS synthesis request. " ' < > This caused errors processing the XML syntax of the SSML, causing the requested synthesis to fail.
- Minor fix to processing of special VOID rule handling. This would only affect grammars using the VOID rule together with a repeat operator
- Fix for internal GUID being returned in results in place of grammar label when running in CPA/AMD modes
11.2.100 (May 6, 2013):
Improvements and New Features:
- Added new functionality to support LumenVox/CMU and SAMPA format phoneme strings, significantly extending the previous custom lexicon functionality. Please refer to ourknowledgebase article for more details on using custom lexicons, including the recent enhancements.
- Adding new LV_SRE_GetSampaToLumenVoxConversion, LV_SRE_GetLumenVoxToSampaConversion, LV_SRE_IsValidLumenVoxPhonemeString and LV_SRE_IsValidSampaPhonemeString API functions as part of other custom lexicon enhancements
- Many changes to add support for custom ASR lexicons in Speech Tuner, allowing them to be created and/or edited within the grammar editor section. Lexicons can also now be edited and tested within the tuner, like grammars.
- Updated the Speech Tuner Phonetic Speller dialog (with optional SAMPA support) to support new new custom lexicon functionality.
- Added HTTPS support for fetching grammar and SSML documents. Two new configuration options were added to the global section of client_property.conf to give users control over certificate verification options: SSL_VERIFYPEER and CERTIFICATE_AUTHORITY_FILE.
- SSL_VERIFYPEER defaults to 1, but may be set to 0 skip certificate verification for trusted sites.
- CERTIFICATE_AUTHORITY_FILE may be used to specify the path a CA cert file to be used to verify peer certificates upon HTTPS requests.
- Moved AuthorizationLog.txt log file from LVBIN to LVLOGS folder to be consistent with other logs. This is only used when the license server when run in authentication mode, which is fairly uncommon.
- Minor optimization to the Speech Port to reduce the number of threads used. This should improve resource utilization performance when running many ports simultaneously.
- The EULA document shipped with LumenVox products was updated.
- Minor cosmetic changes to improve refreshing of Revert button in Speech Tuner's SSML editor
- Added auto-indent and improved auto-complete and syntax-highlighting functionality in Speech Tuner grammar and SSML editors.
- Updated example ABNF/GrXML/SSML and new lexicon files when creating new files in the Speech Tuner
- Minor changes to move logging files LVLicenseReport.logdata to LVBIN in Windows and /var/lumenvox/license_reports in Linux, and also LVLicense_A.logdata and LVLicense_B.logdata to LVBIN in Windows and /etc/lumenvox in Linux.
- Minor change to formatting of NLSML to remove a carriage return when used with Avaya Orchestration Designer, which did not handle this correctly in previous versions. This issue does not affect the Avaya Aura Experience interface, or other Avaya products.
- Added new higher verbosity logging level (LOGGING_VERBOSITY_HIGHER) to reduce logging load on systems. LOGGING_VERBOSITY_HIGHER should only be used when fully debugging, will likely affect performance under medium to high load.
- Improved responsiveness of Media Server when under significant load to prevent unwanted delays in MRCP message processing
- Due to improvements made in the LumenVox ASR’s handling of grammars internally, the ASR server has changed the way it responds to speech clients (e.g. the Media Server). This means that users with pre-11.2 speech clients must upgrade those clients if they wish to use the 11.2 ASR server. Attempting to use a pre-11.2 speech client with an 11.2 or newer ASR server will result in grammar labels being replaced with hash codes.
- Fix for issue relating to multiple grammars sharing the same session: label in the Media Server. Previously if the same grammar was used a second time in the same session, but with a different label (Content-Id), the original label may be returned in the result string.
- Fixed a grammar processing bug, related to grammars defining rule paths that contained a NULL (special) rule in addition to a semantic tag. For example:
<rule id='example' scope='public'>
- Minor fix to Media Server SDP parsing to accommodate optional encoding parameters at the end of rtpmap lines, which were previously causing 486 Busy replies to session requests if present.
- Minor fix to NLSML formatting to remove an unwanted quote mark at the end of the <result> element when preparing a nomatch reply
- Fixed LV_SRE_WaitForEngineToIdle, which in versions since 10.0 had a bug that caused the call to block up to at most decode timeout if the decode did not complete.
- Fixed a memory leak that could occur when writing to disk fails or when handling a very large number of simultaneous logging messages under extreme system load
- Fixed a problem when attempting to load a malformed XML grammar into the Speech Tuner grammar editor. Previously the grammar would fail to load, now it can be loaded (indicating error) and can be corrected.
11.1.100 (March 13, 2013):
Improvements and New Features:
- Added support for SIP over TCP connectivity. Now, either TCP or UDP SIP connections can be established with the Media Server. These connections share the same sip_port number as defined within the configuration file (with the default value for sip_port being 5060).
- Added SIP/TCP connectivity support to SimpleMRCPClient utility, consistent with new Media Server functionality. See the new "-transport TCP" option in the Using the SimpleMRCPClient article for details.
- Added SIP/TCP support to lv_show_config (Linux) / LVShowConfig (Windows), consistent with new Media Server functionality. See the Running lv_show_config article for more details.
- Added support for text/grammar-ref-list media type to DEFINE-GRAMMAR requests within the Media Server, as described in RFC6787. Although this media type is not explicitly supported by MRCPv1, the LumenVox implementation supports this type for both MRCPv1 and MRCPv2 protocols. See the Recognizer DEFINE-GRAMMAR article for details on how different grammar references can be used with the LumenVox Media Server.
- Added Completion-Reason header to indicate the reason behind grammar load failures where possible. This is an MRCPv2 only feature. MRCPv1 users should continue to consult the client and grammar logs for additional information relating to grammar load failures. Additional logging changes now also include more verbosity when reporting such errors (for both MRCPv1 and MRCPv2).
- Added new configurable APP_STAT_LOGGING option in tts_server.conf, which enables logging useful statistical information to (tts_server_status.txt) file regarding the TTS Server. This change is consistent with other status logging for other LumenVox services.
- Added support for non-compliant GrXML grammar files that are missing their XML prolog at the top of the file. This is consistent with non-compliant behavior in some other vendors. See Section 4.3 of the SRGS specification for compliance details. We now consider the <?xml version="1.0"?> to be an optional requirement in GrXML grammars when specified inline, or by URI reference. This change in support applies to both Media Server and C/C++ API level integration.
- Added new Media Server option to suppress sending 'TRYING' messages in response to SIP INVITEs. This change modifies earlier behavior, which would automatically send these optional messages. Now the default is to NOT send these messages, however this can be overridden setting the send_sip_trying configuration to 1 if needed. Not sending these messages is more efficient for both client and server resources when connecting.
- Added size and duration parameters when reporting waveform-uri in MRCPv2 RECOGNITION-COMPLETE messages. These two parameters are now appended to the end of the Waveform-URI (when enabled). Size is reported in bytes and duration is in milliseconds. Note that these parameters are only defined in MRCPv2. This functionality is now consistent with the latest MRCPv2 specification document. See section 9.4.8 of the MRCPv2 specification for details on using Waveform-URI
- Improved layout and formatting of Media Server MRCPv1 and MRCPv2 logging messages to provide clearer reporting.
- Significant set of changes to Media Server to support request-based parameter settings as well as session-based. Now the scope of the header settings is correctly applied to either session (when using SET-PARAMS/GET-PARAMS) or request (SPEAK/RECOGNIZE/etc.) as needed. This set of changes also optimizes the handling of all settings so that they are now only applied to the ASR/TTS port when needed. Previously, settings may have been applied multiple times unnecessarily, utilizing more CPU than was strictly necessary.
- Added new HTTP-based Dashboard to replace the previous (and now deprecated) LVDashboard GUI application, which was Windows-only. The new Dashboard application provides remote web access to LumenVox services installed on a Windows or Linux machine. The web server is included within the lv_manager codebase and can therefore coexist with IIS and/or Apache, or any other web servers installed on the system. Please see the LumenVox Dashboard Overview article for more information.
- Added new Welsh voices : Gavin (male), Gwendolyn (Female). Note that these voices can be configured to speak the Welsh language (cy-GB) or British English with a Welsh accent (en-GB). The language code can be used to specify which language to use when utilizing these voices.
- Added the option to enable the use of non en-US builtin grammars to be used within the Media Server. The "Speech-Language" is now used to determine the appropriate builtin grammar. See theBuilt-in Grammars article for details on adding support for additional languages as needed, where you can provide your own versions of builtin grammars for various languages.
- Added new shortcuts and hyperlinks to Windows installation packages, making the products and documentation more accessible to new users
- Added new Tools package for Windows, which includes the new Dashboard and LVManager utilities
- Modified the default Media Server values for "Start-Input-Timers" (MRCPv2) and "Recognizer-Start-Timers" (MRCPv1) from "false" to "true" to comply with the MRCP specifications. Users may override this default by adding a configuration value (recognizer_start_timers) into the [MRCP] section of media_server.conf. For example, adding the line 'recognizer_start_timers= false', will force this default to be false instead of true as required by the MRCP specifications. Note that this is a change from the previous implementation, and may result in unexpected No-Input-Timeout events if applications were not adequately controlling these timers, for example if this value was not being set to either true or false for RECOGNIZE tasks. See respective sections in the MRCPv2 Specification or MRCPv1 Specification for details.
- Optimized processing of RTP streaming data, reducing CPU and memory load slightly. Also added handling of processing an incorrect/invalid IP address from the client, falling back to using the client IP address if necessary.
- Minor wording change in the Media Server logs - "Error accessing RTP..." is now used instead of "Error while listening to RTP..." since this may be emitted whenever sending data, not necessarily only when listening to the port (for example if unable to use the UDP port for TTS streaming)
- Changed the order of the SDP headers in SIP messages to be more consistent with other vendors. This is in response to some customers that (incorrectly) expect these to be in a certain order.
- Reduced logging of OPTIONS and DESCRIBE requests to prevent log contamination. Messages are still shown to indicate these requests were processed, but the contents of the inbound and outbound packets are no longer logged. The number of these requests can still be seen in media_server_app.txt and also media_server_status.txt
- The EULA document shipped with LumenVox products was updated.
- Documented reported CentOS6 Media Server high idle CPU utilization issue in a new knowledgebase article.
- Changed LV_SRE_SetCustomCallGuid to return LV_INVALID_HPORT when an invalid port was detected. Previously LV_FAILURE would result from this situation, which is less clear.
- Added new error code LV_FUNCTION_DISABLED (-69) defined as "The selected function is currently disabled." This is reported whenever an API call is made that cannot succeed due to functionality being disabled, such as attempting to call AddEvent when callsre logging is disabled. This is technically not an error, however this return code indicates that the specified action was not carried out. This was added to improve logging clarity, and this change is also reflected in the LVErrorCodes/lv_error_codes utility.
- Minor changes to logging for LV_SRE_SetPropertyEx, LV_SRE_SetClientPropertyExPermanent, LV_SRE_CreateClient, LV_SRE_GetStringProperty, LV_SRE_GetAvailableLanguageIndex and LV_SRE_GetCallGuid to improve readability and reduce possible confusion.
- Media Server was modified to remove the now-defunct Monitoring port functionality, which is now replaced by the new Dashboard functionality described above.
- Improved shutdown sequence for core connectivity between modules, allowing faster and smoother shutdown whenever stopping or restarting services
- Removed no longer used configuration settings 'Log' and 'MAINTENANCE_MESSAGING_PORT' from CallIndexer configuration.
- Modified Media Server grammar loading to pass in session parameters so that MRCP parameters such as "Speech-Language" and "Fetch-Timeout" specified in a SET-PARAMS request are applied to the grammar load from a subsequent DEFINE_GRAMMAR, RECOGNIZE, or INTERPRET request
- Modified Media Server to improve performance, reduce footprint and perform better under extreme load conditions. These changes included optimizing code, cleanup and also addressing a limitation encountered in Linux Operating Systems when running a very large number of simultaneous channels, using more than 1024 sockets. This value equals around 300 channels, depending on a number of factors such as protocol being used and number of active logs, open connections to ASR, License and TTS servers, etc.
- Added clearer logging and return values when returning from a grammar load failure due to licensing in delayed licensing mode and in some grammar load time out situations. Also addressed minor issues when performing many load grammar requests of the same grammar in a large number of simultaneous threads.
- Fixed a problem associated with MRCPv1 TTS and RTP audio streams when using proxy servers. Specifically, whenever a client requested TTS RTP audio to be sent to a different IP address than the client address using the c=IN IP4 specifier in the SDP of the SETUP request, this was not being honored. MRCPv2 continues to work correctly in this situation.
- Fixed a minor leak in Speech Tuner when a Call Indexer IP address was specified, but the Call Indexer was not available.
- Fixed a minor leak in Media Server when processing DTMF packets when not in recognition mode, or when the session is closed/closing while the packet is being processed.
- Fixed a minor issue when streaming the last packet of synthesized audio, which if less than the full packet length would result in a STREAM_STATUS_END_SPEECH message for both the last (partial) message, as well as the following zero-length packet.
- Fixed a problem related to HTTP caching (introduced in version 10.5.110) when fetching remote grammar files. There was a problem with superfluous carriage-returns in the request packets, which caused the HTTP caching mechanism at the server to respond inappropriately. With Apache, the file was re-sent with each request (negating the effect of caching) and with IIS, these requests were rejected, causing grammar load failures.
- Fixed incorrect or skipped reporting of <mark> tags in SSML when used before a <break> tag or at the end of the document, which occurred in TTS2. SSML marks should now be correctly reported across both TTS1 and TTS2 voices.
- Fixed a bug in the API call to GetAvailableLicensesCount, which previously would only work with "Engine" products. Now this will correctly work with all license types.
- Fixed a Speech Tuner bug parsing GrXML grammar external references. Now both apostrophe and quote marks can be used. Previously the code was expecting only quotes, so would stall whenever presented with apostrophes. This problem was isolated to when using the Speech Tuner only, and does not affect core ASR functionality.
- Fixed a bug in Media Server MRCPv2, affecting BARGE-IN-OCCURRED request processing. Previously the reply to this request incorrectly had the RequestId of the active SPEAK request instead of the RequestId of the BARGE-IN-OCCURRED as was required by the specification.
- Fixed a bug introduced in 11.0, which caused an unusual situation if a grammar load was requested, then immediately canceled, leaving it in a pending state. This could affect the following grammar load and recognition task..
11.0.300 (November 20, 2012):
Improvements and New Features:
- Added support for several new TTS languages and voices, as described in our knowledgebase article http://www.lumenvox.com/knowledgebase/index.php?/article/AA-01577
- Here is a complete list of TTS voices included in 11.0, bringing the total voice count to 39 [incl. 1 deprecated] -- That's 22 new voices!!. New voices shown with asterisk beside name...
=== TTS1 ===
- American English
- Chris (Male)
- * Andrew (Male)
- * Alvin (Male)
- * Jackie (Female)
- * Amanda(Female)
- * Kim(Female)
- * Leah(Female Child)
- British English
- Ben (Male)
- * Megan(Female)
- Standard German
- Heidi (Female)
- European French
- Margot (Female)
- Castilian Spanish
- Antonio (Male)
- Martina (Female)
- North American Spanish
- Lorena (Female)
- Australian English
- * Ian (Male)
- * Mikkel (Male)
- * Helsa (Female)
- * Henrick(Male)
- * Anneka(Female)
- * Angelo (Male)
- * Emilia (Female)
- Canadian French
- * Elodie (Female)
- Brazilian Portuguese
- * Gustavo(Male)
- * Giovanna(Female)
- * Jacub (Male)
- * Karolina (female)
- * Isak(Male)
- * Birta(Female)
=== TTS2 ===
- American English
- Rebecca (Female)
- Stacey (Female deprecated)
- British English
- Sophie (Female)
- Latin American Spanish
- Changed the default voice selection behavior for TTS. Previously if no voice name or voice gender was specified, the voice gender defaulted to male under certain complex conditions, based on the (OS dependent) voice loading sequence. However, this behavior was not consistent and was simplified to allow for a predictable default priority list. This change may affect customers if they were not specifying a particular voice and relying on the defaults since a different voice may be getting synthesized now by default, however the new behavior should be much more predictable for users using several languages and voices. Please review the knowledgebase article at http://www.lumenvox.com/knowledgebase/index.php?/article/AA-01616 for more details.
- Added support for precompiled grammars. Grammars may now be pre-compiled, stored to disk or HTTP grammar server and loaded/used in their precompiled form as needed. This can be used to optimize performance by removing the need to compile these grammars at run time on production machines. The command line tool GrammarLoader (lv_grammar_loader in Linux) now includes optional parameters to support precompiled grammar generation. See knowledgebase article http://www.lumenvox.com/knowledgebase/index.php?/article/AA-01618 for more details on using precompiled grammars and http://www.lumenvox.com/knowledgebase/index.php?/article/AA-01089 for details of the updated GrammarLoader utility.
- Added support for precompiled grammars within the Speech Tuner, so that these are indicated as such, and can be used during tuning and testing. Note that precompiled grammars cannot be modified in the grammar editor. Should precompiled grammars ever need to be changed, the original grammar that produced them will need to be modified and recompiled as needed. The Speech Tuner can be used to save precompiled grammars from within the Grammar Editor.
- In conjunction with other grammar loading changes related to precompiled grammars, the Speech Tuner has been modified to allow loading of grammars from specified URI references, and also builtin:grammar/ specifiers. Regular SRGS grammars as well as precompiled grammars can now be loaded in this way in addition to previous file-based and callsre-based references. See http://www.lumenvox.com/knowledgebase/index.php?/article/AA-01117 for more details on loading grammars into the Speech Tuner.
- Added support for suppressed logging as part of our PCI compliance initiative. Users may now specify the com.lumenvox.secure_context Vendor-Specific-Parameter header to suppress logging for ASR type events when using the Media Server, and may specify the com.lumenvox.tts.secure_context Vendor-Specific-Parameter header to suppress logging for TTS events when using the Media Server. When this header value is set to 1, logging will be suppressed and any potentially sensitive data that would normally be recorded in the logs will be replaced with a value of _SUPPRESSED instead. Note that when this mode is active for a session, DTMF digit logging will also be suppressed. When secure_context mode is active, this suppression will be passed to all aspects of the product logging so that not only the Media Server logs are affected, but all logs will have sensitive information suppressed. Note that this can be enabled on a per port basis (or per session basis in the case of Media Server). Callsre log files will also have potentially sensitive data suppressed, and will not record audio for affected interactions. MRCP users can specify the (tts.)secure_context header as part of a SET-PARAMS/GET-PARAMS request or as part of RECOGNIZE/INTERPRET/SPEAK requests as needed. The default value for secure_context and tts.secure_context can be specified in media_server.conf, or also within the client_property.conf file associated with the underlying speech/TTS port(s). If there is a conflict between these two configurations, the more secure option will always be used. These default values can be overridden using the headers specified above on a per-interaction basis within sessions as needed.
- Added new PROP_EX_SECURE_CONTEXT property value that can be used with LVSpeechPort::SetPropertyEx (or corresponding LV_SRE_SetPropertyEx) as well as LVTTSClient::SetPropertyEx (or corresponding LV_TTS_SetPropertyEx). When this property is set to a value of 1, logging for the specified port will be suppressed to avoid writing out any potentially sensitive data to log files. This property can be changed between ASR/TTS interactions within the port to selectively enable (1) or disable (0) the suppression of this potentially sensitive data. These changes are part of our ongoing PCI compliance initiative.
- Added new functionality to LVShowConfig (Windows)/lv_show_config (Linux). Added MRCP v1 and v2 testing of ASR and/or TTS functionality using this utility. Previously MRCP support was excluded and this utility only implemented API-level testing.
- Added LumenVox manager connectivity test to LVShowConfig/lv_show_config to allow connectivity testing and configuration reporting to be included in this utility.
- Changed the utility's license check to test for availability of any kind of license, including SLM, SpeechPort, VoxLite, CPA and/or AMD to be more verbose for all customers. Also, improved verbosity to clarify whether no licenses were available, or no license server could be reached, whereas previously this was shown as no licenses being available when either situation occurred.
- In addition to adding new functionality to LVShowConfig (Windows)/lv_show_config (Linux) mentioned above, the specified parameters to this utility have been improved to be clearer and more consistent for users. The usage information now shows only the following list of parameters -a, -all_config, -all_test, -license_test, -asr_test, -mrcp_test, -tts_test, -h, -o. The -a parameter flag now runs -all_config as well as -all_test. Note that the -o (output) parameter will now save all output in the specified output file, whereas previously only the configuration values were reported in the output file. Now test results and configuration values will be stored in the output file, which improves usefulness. See http://www.lumenvox.com/knowledgebase/index.php?/article/AA-01635
- Added support for SSML documents housed on HTTP servers, or using a file system path, to the Media Server so that such documented will be fetched and used from the specified HTTP URI or file location. MRCP users will now be able to use the SPEAK request along with Content-Type: text/uri-list as an alternative to the previously supported Content-Type: application/synthesis+ssml.
- Added new LV_TTS_SynthesizeURL to C API and corresponding SynthesizeURL method to C++ LVTTSClient. These functions now allow API users to specify SSML documents located on HTTP servers when performing TTS synthesis.
- Improved logging to LVSpeechPortAPILog.txt so that in addition to logging parameter values passed into API functions, results from these API calls are now also logged (assuming logging verbosity is configured to report these).
- Added new SimpleMRCPClient utility application to allow users to exercise basic functionality of MRCPv1 and MRCPv2 when configuring or testing Media Server. This utility accepts grammar and audio file parameters, then connects to the specified Media Server to verify basic ASR and/or TTS functionality. This can be useful for customers to validate whether all LumenVox components are correctly licensed and configured. See http://www.lumenvox.com/knowledgebase/index.php?/article/AA-01633
- Added new max_num_rtp_packets_buffered configuration setting to media_server.conf, allowing users to specify the maximum number of unprocessed RTP packets to be held on to when the media server is not in recognition mode i.e. it buffers the unprocessed audio that is usually discarded between recognitions and spools it in when the next recognition starts. Do not use if the media server session is being shared with different calls since the noise baseline may end up getting calculated with noise from a different caller. This setting is useful when there are large delays between prompts during which there is no active recognition and the user may have said something that should have been captured for the next prompt.
- Added support for configurable PUBLIC_RULE_ACTIVATION_MODE option in sre_server.conf file. In version 10.5, the behavior of public rules in SRGS grammars was modified so that all top-level public rules would be activated along with the root rule in accordance with the SRGS conformance test (http://www.w3.org/Voice/2003/srgs-ir/test/conformance-4.grxml). This caused an unexpected change in behavior for some customers, and this option is now configurable. Default behavior is now backwards compatible with versions prior to 10.5.110, i.e. Rules that are not referenced in an SRGS grammar are unreachable. To enable SRGS compliant behavior, this value can be set to 1
- Improvements were made to MRCP compatibility mode 1 to provide more compliant behavior. This change includes implementing a template-based method of applying custom changes, allowing users to modify output of NLSML to emulate other speech vendors, for example or to support alternate result container formats as needed. Also, now when specifying Compatibility mode 1, GrXML "tag" supplied as a property of <item> within grammars are now permitted. Previously only explicit child tag elements were supported.
- Previously, a RECOGNIZE request using a session: identifier from a previously defined grammar could only return results from the first grammar in the URI list. Now all of the grammars in the uri-list from the DEFINE-GRAMMAR request will be activated when using the corresponding session: identifier
- Added a new command line utility application LVErrorCodes (Windows) / lv_errorcodes (Linux) to help users interpret the meaning of error codes emitted from the various LumenVox function calls. See http://www.lumenvox.com/knowledgebase/index.php?/article/AA-01636 for usage information and more details
- Added new API function LV_SRE_LoadGrammarWithParameters to accommodate loading of grammars from URI with an optional list of MRCP-style headers. This can be useful when loading parameters in an MRCP or HTTP environment, or when loading grammars where parameters cannot be stored within the grammar itself, such as Fetch-Timeout and Cache-Control parameters. See the knowledgebase article http://www.lumenvox.com/knowledgebase/index.php?/article/AA-01624 for more details.
- Improved user feedback in upgrade analyzer utility to provide better clarity to users wanting to check their configuration before upgrading to newer versions of LumenVox products. This useful utility was previously not well documented or understood by users. See the knowledgebase article http://www.lumenvox.com/knowledgebase/index.php?/article/AA-01637 for more details.
- Improved sample ASR applications to automatically detect the presence of WAV headers for audio files being passed in. These utilities were designed to only accept headerless audio files, however many users complained of performance issues when incorrectly using wav files. This change is designed to avoid such frustration, however the desired input files for sample ASR applications remains headerless (ulaw) audio files, which will give the best performance. Previously, these unwanted WAV headers at the beginning of audio files would manifest themselves as clicks or noise at the beginning of recognition, and would adversely affect noise reduction settings and voice activity detection, which would ultimately interfere with recognition results.
- Added auto detection of wave headers in LoadVoiceChannel to strip out the wave header. This only affects direct LoadVoiceChannel and does not affect Streaming interface in any way. If a Wave header is detected, the audio format contained within it will automatically be used. The Audio Header and Audio Footer are detected in a robust manner if the data is determined to have a Wave header. Most customers will not notice this change, since LoadVoiceChannel is designed to accept headerless audio (non-wave-files), however this change can help situations where the incorrect use of wave files are attempted. If there is any ambiguity in detecting the audio format or the header/footer, the old behavior will persist. Previously, these unwanted WAV headers at the beginning of audio files would manifest themselves as clicks or noise at the beginning of recognition, and would adversely affect noise reduction settings and voice activity detection, which would ultimately interfere with recognition results.
- Modified TTS Server to add check for XML syntax errors before parsing in TTS1 engine so that behavior is consistent with TTS2. This change may affect customers who were attempting to perform SSML on a malformed XML document using TTS1. Previous behavior would result in a successful synthesis with empty audio. This behavior has been changed to return a synthesis error and log out details of the xml syntax error (e.g. line number, error description) to tts_server_app.txt and also made available at LV_TTS_GetLastSSMLError() and/or LVTTSClient::GetLastSSMLError().
- Added warnings in MultiThreadedStreamingExamples to notify the user that an audio file with a WAVE header was loaded. No attempt is made to compensate for the wave headers since it would detract from the example of our API.
- Modified Media Server to correct a problem in the Linux implementation where any Recognition-Mode that was specified was causing RECOGNIZE requests to fail. Now only unsupported values (such as hotword) specified in Recognition-Mode header will be rejected. To clarify, LumenVox only supports Recognition-Mode: normal at this time, and any other values specified will be rejected as unsupported when specified for this header. As part of this change, any specified Recognition-Mode header value will not persist within the session across subsequent RECOGNIZE requests, however this minor change should not affect any users, but should prevent any unwanted/unsupported values from persisting.
- Modified asr_server_grammar.txt logging to reduce the severity of log messages related to cached grammars not being located. These common and benign warnings were being incorrectly reported with ERROR severity. These are now correctly reported as INFO severity.
- Modified logging of LV_SRE_SetPropertyEx to correctly appear in the LVSpeechPortAPILog instead of the client_asr log to be more consistent with other API logging activity.
- Modified the documentation used when describing the LICENSE_TYPE setting within client_property.conf to be more verbose and clear when users are implementing SLM, CPA or AMD
- Modified the comments associated with TIMEOUT_INFINITE to clarify that this definition is only applicable when calling one of the WaitForEngineToIdle functions, and should not be used with any SetPropertyEx function calls.
- Minor change to PROP_EX_LOAD_GRAMMAR_TIMEOUT handling to prevent values outside of permitted range of 1000 to 2147483647. Previously invalid values of <= 0 or > 2147483647 caused undesirable/unpredictable results (such as affected the Digium connector bridge). Now any value outside the permitted range will be ignored in favor of the current value, or default (200000) as appropriate.
- Added minor comment to LV_SRE_Defines.h to clarify that SPX_8KHZ and SPX_16KHZ are deprecated audio formats.
- Minor edits and code cleanup was performed on the sample ASR and TTS code.
- Modified TTS Server to improve ability to switch voices in TTS1 between languages by just specifying the voice name in the ssml <voice> element. Previously for TTS1 the voice language would also have to be specified along with the voice name if a voice switch in a different language was desired. This change makes TTS1 more consistent with TTS2 behavior.
- SimpleTTSClient applications have been modified to accept SSML via specified URI in addition to previous options. Examples were also added to the usage information.
- Modified the way in which results are handled whenever com.lumenvox.end-of-speech-timeout expires during a decode (following barge-in). Previously, any result in this situation resulted in success-maxtime along with the returned result. Now, if the confidence score is below the threshold, a no-match-maxtime will be returned with no result.
- Modified client shutdown procedure to avoid assertions whenever License Server could not be contacted.
- Modified the following sample application projects to include _CRT_SECURE_NO_WARNINGS and _CRT_SECURE_NO_DEPRECATE precompiler directives to remove benign compile-time warnings:
- Sample TTS applications have been modified to create WAV headers when producing output files. In version 10.5, the files generated were headerless ulaw, which was an undocumented and undesired change from previous versions, so this is now corrected and behaves as it did prior to 10.5, thus producing correctly formed wav files.
- Modified Speech Tuner to improve shutdown whenever non-existent CallIndexer machine references are being used. This problem was relatively benign, however this change reduces the apparent delay and exception logging side-effects of the problem.
- Fixed a bug affecting TTS2 SSML parsing where an invalid or non-existent audio URI reference would cause synthesis failure instead of playing the alternate text contained within the <audio> tag in such situations. Note that an invalid audio URI reference in this context may mean an audio file that is not in the supported format (16-bit, 16 KHz PCM with wav header). TTS1 was not affected by this issue.
- Fixed referencing builtin:grammar from a grammar document with parameters. This resolves a problem introduced in 10.5 where parameters were being incorrectly stripped from the URL when builtin grammars were being specified from within other SRGS grammars. This problem manifested itself indicating the following message in asr_server_grammar.txt: "Referencing an external root rule. But the root was not defined in target grammar."
- Modified handling of PROP_EX_DECODE_TIMEOUT to ignore values of <= 0 of specified for this value. Now if values of <= 0 are specified, the currently active value will be retained (if set) or the default (typically 20000ms) will be used. This is the change that prompted the 10.5.300 maintenance build, so this behavior has not changed since that version, but this change now affects both Windows and Linux builds.
- Fixed a bug introduced in 10.4.500 where multiple parses for a single interpretation were not correctly added to the NLSML recognition result. This only affected decodes with >1 SRGS grammar parse paths that resulted in the same Semantic Interpretation. This gave a single result instead of the correct number of results based on the actual number of parses.
- Removed unwanted .o and .d files that were incorrectly shipped with sample Linux projects
- Fixed a small leak in TTS Server which was introduced in 10.5.110, where significant repeated load would eventually lead to synthesis failure due to depletion of available handles. The number of synthesis iterations needed to encounter this problem is > 120,000. Restarting the TTS Server clears the handles, however users are encouraged to upgrade to 11.0 to avoid this defect.
- Fixed a bug when processing additional URI parameters when passed in with grammar request. These were incorrectly being stripped from the subsequent HTTP fetch request. This problem was introduced in 10.5.110, where a query string present in an HTTP URI would not be passed along to the fetching of the HTTP document.
- Fixed a bug that was introduced in 10.5.110 where old voice names were not correctly supported in a backward compatible way for TTS1 users, effectively disabling voice switching within single SSML requests when using the older voice names.
- Fixed a problem when processing malformed n-gram documents. Previously it was possible to cause a fatal exception in the ASR server when such a malformed SLM was used.
- Fixed a bug in TTS Server where accented characters for TTS1 did not work previously when specified within an SSML document. It is likely that this bug has always existed since the introduction of TTS in version 10.0. It however did work previously with TTS1 when non-SSML document plain text was used. This bug did not affect TTS2, in which accented characters work as expected for SSML and plain text.
- Minor change to LVSpeechPort destructor to check whether HPort was NULL prior to performing further cleanup. This was previously harmless, however produced some undesired log messages whenever the port had already been released normally.
10.5.300 (September 21, 2012):
- Maintenance fix for Linux only. This corrects a change in behavior between 10.4 and 10.5 if users specified a timeout value of 0 when calling SetPropertyEx with a PropertyName value of PROP_EX_DECODE_TIMEOUT. In previous versions, an invalid value of 0 was ignored but in version 10.5 changes were introduced to utilize this value. The fix is to once again ignore such invalid values.
Note that this change affected the Asterisk Connector Bridge, so Asterisk users should avoid 10.5.100 and 10.5.200 and use 10.5.300 instead.
10.5.200 (August 29, 2012):
- Changed internal logging behavior in message routing subsystem to avoid reporting unwanted late/ignored messages, since these could be interpreted as problems by customers, when they were, in fact, benign
- lv_show_config output was modified to display default values where appropriate, thus removing ambiguity from the values previously shown
- Corrected minor typos in SimpleASRClient_c and SimpleASRClient_cpp customer examples
- Modified Speech Tuner to correct a minor bug causing exceptions when accessing Call Indexer running in 32-bit CentOS5/Red Hat 5 build
- Corrected a problem affecting the single stream mode of CPA. This would only affect users attempting to use the CPA/AMD features in version 10.5.110 release who opted for the unrecommended single stream method
- Fixed a problem with customer examples Visual Studio solutions, where the 64-bit option had not been defined, requiring users to define them.
- Fixed a problem introduced in 10.5.110 where specifying DTMF and AMD together would result in undesired speech barge-in if detected
- Fixed bug introduced in 10.5.110 when performing LoadGrammar with multiple optional parameters. These erroneous duplicates had the effect of creating new cached grammar entries, when encountered. These are now normalized.
- Improved memory management in statistical pronunciation modeling, which corrects a very small, slow increase in memory use over time.
- Fixed SSML preparser which previously ignored optional emphasis child elements for TTS2 engine only. This was contrary to the documentation.
- Fixed a minor typo in lv_show_config, where GRAMMAR settings were incorrectly listed under a STREAM heading.
- Fixed a bug when defining a global grammar using the same label as a previous global grammar, the former grammar would persist. Now the new grammar will replace any former definition.
- Corrected a bug introduced in 10.5.110, which reported incorrect 0-value vocab size for global grammars. This problem only effected reported values shown on screen in the Speech Tuner application.
- Fixed a build time error, introduced in 10.5.110, which caused lv_show_config to fail when running on certain platforms, announcing lumenvox.conf could not be found
10.5.110 (August 14, 2012):
Improvements and New Features:
- Added new Call Progress Analysis functionality. This is a significant enhancement over the previously available Answering Machine Detection mechanism. Please refer to the Knowledge Base for full details of these advanced features and capabilities.
- Grammar processing has been improved to cater for more optional parameters specified with grammars, and also meta-data within grammars has been greatly enhanced.
- TTS Voice names have been changed, along with all revised TTS documentation to offer more clarity to users. See LumenVox TTS Voices page and section for more details.
- New functionality has been added to the streaming interface. See API documentation for DELAYED_LICENSE_ACQUISITION, LV_SRE_StreamStartListening, LV_SRE_RegisterGrammar or the equivalent C++ methods LVSpeechPort::StreamStartListening, LVSpeechPort::RegisterGrammar.
- Implemented HTTP caching when referencing remote grammars. This change is in addition to the internal grammar caching mechanisms that were already in place, and allows some degree of additional control over the caching process.
- An optimized builtin:grammar/date grammar is now distributed with the product, offering better speed and memory performance when this grammar is mixed with others. The old version will also continue to be distributed for now, but this may be phased out in the future. See builtin:grammar/date_with_month_checks to reference the old version.
- Results returned from AMD will now be BEEP instead of the earlier ++BEEP++, which was not a valid token in a grammar. The built-in grammar was updated to accommodate this change.
- Improved Media Server to better handle extreme load conditions
- Revised and improved example code being distributed to users to be more consistent with current best practices and more clearly demonstrate the preferred streaming method.
- Added better Virtualization support, allowing smoother installation onto VM instances
- Added configurable behavior for unknown language code (for VXML compliance)
- Improved SSML parser to be more consistent when working with multiple voices. Also added more flexible support in say-as processing to be compatible with other vendors' custom implementations. See the Introduction to SSML section in the Knowledge Base for more details.
- Modified Media Server recognition timer behavior to be based upon barge-in rather than recognition-start-timers
- Modified the help link within he Speech Tuner to reference the new help in our Knowledge Base rather than the older help webpages.
- The SimpleSREClient has been renamed to SimpleASRClient to be more consistent with naming conventions. See the Using the SimpleASR Client article for details on using this tool.
- StreamSetParameter functions in our C and C++ APIs had internally changed parameter types. This requires no changes in customer code, but may require a recompile to work correctly. It is anticipated that this change breaks backward compatibility with previous versions, including Asterisk connector-bridge. See the LV_SRE_StreamSetParameter and LVSpeechPort::StreamSetParameter articles for details.
- Modified Media Server NLSML results to correctly report ambiguous interpretations that match different specified active grammars, so that they correctly appear in their own <interpretation> element.
- Fix for Media Server where zero length synthesis would not produce SPEAK-COMPLETE
- Fixed a bug associated with emailing critical error messages in some situations. This likely ignored certain critical errors being reported by the ASR