Speech Engine Release Notes
10.4.300 (March 28, 2012):
Minor Changes and Fixes:
- TTS Server now logs SSML event when verbosity set to 2 (verbosity 3 was previously
required)
- Fixed a bug when processing TTS1 lexicon references when using SSML 1.0
- Fixed a rare bug specific to CentOS6, 64-bit Operating System, which resulted in
unexpected decode results due to an incompatible memory move operation.
- Logic relating to suppression of more than 10 critical errors per hour was buggy. Now
fixed.
10.4.200 (February 1, 2012):
Minor Changes and Fixes:
- Changed MultiThreadedStreamingCExample to allow running of AMD decodes.
- Re-enabled switching of voices in SSML which was a regression in 10.4.100
10.4.100 (January 20, 2012):
Improvements and New Features:
- Added new American English acoustic models with improved accuracy and performance
- Logging Verbosity settings have been implemented across all products. These settings can
be set at startup within the corresponding product configuration files. At runtime, this
setting can be changed using the LumenVox Dashboard product, which also has the option of
making the changes temporary or persistent beyond service restart. The default value is
set as LOW, which means only errors and warnings will be logged. Users are encouraged to
set their desired level of logging verbosity. Note that maximum logging can adversely
affect system performance when large numbers of operations are being processed on the
system. In other words, reducing logging verbosity can improve throughput performance
- Added new PROP_EX_LOGGING_VERBOSITY option to SetPropertyEx and GetProperyEx allowing
users to adjust client logging verbosity at run time
- Changed LumenVox Dashboard to allow configuration of logging verbosity and viewing of
service configuration settings. This can be accessed by right-clicking on the service
in Summary view, and selecting the configure menu option. Seelcting the 'make changes
permanent' checkbox will apply configuration changes to the startup configuration files
- Added new Vendor-Specific-Parameters: com.lumenvox.compatibility-mode option, allowing
compatibility mode to be selected or deselected by using SET-PARAMS (etc.) without the
need to restart the Media Server.
- Added new Australian English TTS voice
- Added statistical performance tracking option to TTS Server. This information will be
periodically recorded in tts_server_status.txt located in the logs folder
- Added new waveform_url_prefix configuration option to Media Server, allowing users to
specify the prefix for returned Waveform-URI references. This may be useful when
configurations expose these files from a local http server
- Added Media Server support for wildcard option to GET-PARAMS requests for recognizer
and synthesizer resources for both MRCP V1 and V2 for better specification compliance
- Added LV_SRE_SetClientPropertyExPermanent and LV_SRE_CommitClientPropertySettings to
client API, along with C++ equivalents. These can be used to permanently commit config
changes to configuration files on disk, so that they persist beyond service restart. The
only setting that is supported by this mechanism at present is LOGGING_VERBOSITY, however
more support will be added in future versions
- Windows installation now checks for required administrator privileges in Windows 7, 2008
where UAC is required and if not present, generates a suitable message in place of the
cryptic message in previous versions
- Changed the User Agent string returned by Media Server to show version number
information. In MRCPv1, this string is now returned in the session name header (s=)
- Changed French pronunciation rules to prevent empty word phonetic pronunciations
- When processing GrXML grammars, the xmlns attribute is now considered optional and the
value of "http://www.w3.org/2001/06/grammar" will be implied if not supplied. Previously
we required this value to be non-optional with all GrXML grammars
- Overall inter-component communication was improved to increase performance and reduce
connectivity issues
- SimpleTTSClient now has default values for language, voice and gender, making it easier
to use. The SimpleTTSClient code was also modified to help clarify the process flow
so that it may be easier for users to follow and implement their own functionality
- SimpleTTSClient and SimpleSREClient now provide more verbose feedback, allowing users to
clearly see what types of issues they have, rather than returning a numeric error code
- Improved performance of Media Server to increase throughput when large numbers of
simultaneous sessions were being processed. This improves overall SRE and TTS performance
and increases the speed at which RTSP and SIP sessions are created and destroyed
- Modified TTS Server to support all voices by default, without the need to configure
into one of two modes as before. The corresponding TTS_ENGINE setting in configuration
no longer needs to be used and is now ignored because of this change
- Performance of TTS Server was improved by removing a single-threaded code bottleneck
which would only become apparent under significant load. This portion of code was redone
to avoid such a bottleneck, thus improving performance when under significant load
- Commas that were present in GrXML grammars were previously escaped with double-quotes
which was incorrect. Commas are now discarded as part of the grammar compilation process.
This should have no adverse effect on customers since commas do not have any acoustic
reference
- Changed Media Server to add more verbose logging for 'BusyHere' events
- Changed the Media Server behavior for recognition-timeout. Now works more according to
spec, using its own timer instead of just setting the EndOfSpeechTimeout in the streaming
interface. EndOfSpeechTimeout can now be set separately, defaulting to the private
value in the media_server.config file.
- Note that due to this modified behavior, users may experience "no-match-maxtime" earlier
than in previous LumenVox versions. In these cases, please configure maxspeechtimeout
to the desired value, which will be dialog dependent.
Minor Changes and Fixes:
- The default value for the trim_silence setting in client_property.conf has been changed
from 990 to 970 to correspond with revised acoutic model performance
- In Linux, the permissions applied to log and cache folders have been changed to allow
better compatibility when non-root users attempt to access these. The folders are now
created with 755 for the parent folders and 777 for child folders. Previously all
folders were created with 755 permission, which prevented non-root users access, since
execute permission is required to list and traverse folders
- Changed Media Server behavior to delay decodes until all associated grammars have been
fully loaded and activated. This change avoids the recognizer returning early from a
RECOGNIZE when the client had not correctly waited for completion of DEFINE-GRAMMAR
requests before issuing a RECOGNIZE request, which could result in no-match
- Removed enable_logging configuration option from Media Server configuration since we now
have logging verbosity which offers better control
- Removed the no longer used optional media_server_log configuration item
- Removed validation of dtmf-term-char for less strict (more flexible) performance
- Fixed a bug where the Media Server incorrectly responds to OPTIONS request with 0=
(using zero) instead of o= (using oh) for the reply headers, which was not compliant
- Project settings for the sample MultiThreadedStreamingCExample code now avoids unsupported
Unicode and debug library references
- SSML processing of ampersand characters has been improved again to allow better handling
of escaped and unescaped instances when they are encountered
- Fixed a problem where certain port-scanner software would cause an exception within the
Media Server
- Fixed a problem in Media Server where invalid negative timestamps were being reported for
some packets
- Fixed the critical log emailer application where it was not parsing the log entries correctly
which resulted in the emails not being sent out.
10.3.200 (November 7, 2011):
Minor Changes and Fixes:
- Fixed SSML preparser problem when prosody rate is specified with a decimal value. This was previously was not being handled correctly.
- Fixed a problem affecting Windows installations only, which prevented correct upgrade from a previous version of the Speech Engine due to incorrect InstallShield product code setting.
- Minor Fix to SimpleTTSClient application to prevent possible buffer overrun.
10.3.100 (November 1, 2011):
Improvements and New Features:
-
Added new Answering Machine Detection algorithms, replacing previous versions and with
significant improvements in accuracy with reduced CPU overhead.
-
Added support for 8 new TTS voices across 4 new languages:
- German Male
- German Female
- European French Male
- European French Female
- Castilian Spanish Male
- Castilian Spanish Female
- North American Spanish Male
- North American Spanish Female
Licensing and use of these new voices is similar to previous languages and voices
-
Added new routines that were in the C API, but not in the C++ API
- GetAvailableLanguageCount
- GetAvailableLanguageIndex
- GetCallGuid
- GetDecodeAcousticModel
- GetLogFileName
- GetPhoneticPronunciation
- GetPhoneticPronunciationCount
- SetCustomCallGuid
These are now members of LVSpeechPort class, and documentation can be found here:
https://www.lumenvox.com/help/speechEngine/programmers/coreapi/introduction.htm
-
Consistent with en-US, there are now definitions for the pronunciation of numerals in
en-IN, en-AU and en-GB
-
Added support for 64-bit RHEL6 / CentOS6 (no 32-bit support of this Operating System is
planned). Removed support for RHEL4/CentOS4 and Debian
-
Changes to Media Server to accommodate clients that do not wait for grammar loads or
response from RECOGNIZE before streaming audio and DTMF. Any decoder or DTMF activity
needed while waiting for grammar loads to complete will be postponed to prevent
unpredictable results due to incorrect grammar(s) being active
-
SimpleSREclient and SimpleTTSclient now have symbolic links in /usr/bin and /etc/lumenvox
on Linux systems, the data files ABNFDigits.gram and 8587070707.pcm are now copied into
/etc/lumenvox, making them easier to access for customers. The symbolic links are
SimpleSREClient and SimpleTTSClient. The original files can still be found in
/usr/share/doc/lumenvox/client/examples/
-
The Speech Engine now allows mixing of grammar tag-formats. Previously any such mixing
would result in undesirable semantic interpretation and was not supported
-
Minor speed improvements when processing grammars
-
Changed internal VAD settings to be less sensitive to low volume speech using the default
settings. This change improves false barge-in accuracy slightly
-
Media Server now performs better parsing of 'Content-Type' when grammars are specified
to avoid problems when optional (valid or invalid) parameters are specified
-
New options were added to the Dashboard File menu to be more intuitive for new users
when adding, changing or removing machines
-
Dashboard now keeps track of selected log type when switching machines in log view to
avoid the need to keep re-selecting it
-
Dashboard now automatically refreshes the status view values every few seconds
-
Centralized logging now trims or pads the third field to a fixed length for better
readability
-
Updated the internal architecture of the TTS server for better performance. This change
should not be readily noticeable by users, but allows better flexibility for more
languages and voices as they become available
-
Various improvements to the speech engine, including optimizations to the memory
management algorithms
-
Header file comments have been improved for the API functions, correcting out of date
information and including newly added functionality and options. This should be
consistent with the recent website documentation updates
-
The Call Indexer port is now configurable, with 7595 being the new default instead of
50800. This change coincides with similar changes to the Speech Tuner, which can now
specify the port number for Call Indexers as needed
-
Various changes to allow client applications a faster shutdown
-
Better support when converting GrXML to ABNF grammar formats when dealing with DTMF and
punctuation characters
Improvements and Major Changes:
-
Changed severity of startup checks to critical from error, since these prevent ASR engine
startup from taking place
-
Previous TTS language packs defining es-LA (Latin America) have been changed to a more
correct es-MX (Mexico) language definition to add better clarification for the various
Spanish dialects now supported
-
Minor changes to Dashboard menu and tooltips to have more consistent wording
-
Following some internal changes to built-in grammar management, Mexican Spanish built-in
grammars will now be distributed with ASR packages in addition to American English
Minor Changes and Fixes:
-
Fixed a problem where the Dashboard would fail to restart the ASR engine on linux
machines due to an internal permissions issue
-
Fixed a problem in the Speech Tuner when opening a grammar file from Explorer. Previously
a confusing and incorrect message would ask whether you wanted to delete all current data
and would not load the grammar. This fix also applies to ssml files.
-
Fixed TTS processing to correctly escape ampersands in the SSML. These were previously
causing synthesis problems
-
Fixed Media Server to allow for optional quotes around boundary specifier for multipart/
mixed grammar specifications in accordance with RFC1521
10.2.900 (October 13, 2011):
Improvements and Major Changes:
-
This is a significant maintenance release which fixes a problem relating to a timer wraparound, which presents itself when the tick counter wraps around when the system uptime counter reaches 49 days 17 hours and 2 minutes. Client applications using the Speech Port could exception at this time.
-
The TTS server shipping in 10.2.900 does not correctly perform an upgrade. You must do an uninstall of any previous version before installing the newest. This is because there was a change to the Service Name in the installation package and the previous versions prevented the service from being removed from the command line. If you encounter installation errors 1923 or 1920, you should perform an uninstall before installing the newer version. Future versions should not encounter this problem, once older versions have been uninstalled
10.1.700 (October 7, 2011):
Improvements and Major Changes:
-
This is a significant maintenance release which fixes a problem relating to a timer wraparound, which presents itself when the tick counter wraps around when the system uptime counter reaches 49 days 17 hours and 2 minutes. Client applications using the Speech Port could exception at this time.
10.2.800 (September 22, 2011):
Minor Changes and Fixes:
-
Minor fix to Media Server when reporting IN-PROGRESS for INTREPRET request. This was
previously sent as an event instead of a response type packet
10.2.700 (September 13, 2011):
Minor Changes and Fixes:
-
Minor fix to Media Server processing where INTERPRET requests were being made quickly
before a previous DEFINE-GRAMMAR request was in progress, thus causing a sequencing
problem.
-
Minor fix to Media Server when performing INTERPRET with DTMF grammars, which previously
returned an empty success result instead of no-match. Also changed the input mode to
correctly return dtmf in these cases too
10.2.600 (August 25, 2011):
Minor Changes and Fixes:
-
Minor fix to number of days to/from license expiration being displayed in Dashboard (math
error)
-
Minor fix to configuration file generation following installation in connection with
Manager product
-
Fix to license type switching when used with AMD mode. Problems were seen when repeatedly
entering and leaving AMD mode within the same session
-
Minor coding changes to avoid valgrind warnings
10.2.500 (August 5, 2011):
Improvements and New Features:
-
Minor improvement to AMD detection accuracy
10.2.400 (August 3, 2011):
Minor Changes and Fixes:
-
Fixed a problem with RTP stream timestamp from TTS synthesis streams to reduce calculated
jitter.
-
Minor fix to correctly release cached license from a non-primary server. It is unlikely
customers would encounter this issue in the field, but if they did its effect would be
minimal
-
Changed recent message when locales are not present from an error to a warning
10.2.200 (July 27, 2011):
Improvements and New Features:
-
Added better checks for locale information during ASR startup
-
Added better auto-detection of UTF-8 strings in TTS input
10.2.100 (July 22, 2011):
Improvements and New Features:
-
Added a new LumenVox Dashboard application to monitor various LumenVox
services, statistics and event logs (on the local machine, remote machines, or both). All
license maintenance functions can be performed using the Dashboard application. This change
also includes the addition of a new LumenVox Manager service, which communicates with the existing
LumenVox services and acts as an interface to the Dashboard. The Dashboard is a Windows-only
GUI tool but the Manager service runs in both Windows and Linux so the Dashboard can monitor and
manage both Windows and Linux machines.
-
As part of our ongoing effort to improve logging, we have implemented may changes to the
logging mechanism, allowing logs to be remotely accessed (via Dashboard). This process
also included renaming logs; see table below for more details. Log entries
now contain severity information for each event, and filtering of these events is
possible when viewing them in the Dashboard. For example, just reviewing critical errors and warnings is now
easily possible.
-
As part of our ongoing effort to improve statistics reporting, we have implemented new
statistics tracking mechanisms within the ASR Server and TTS Server. These are in
addition to the statistics tracking mechanisms previously added to Media Server and
License Servers. These statistics can be viewed using the new Dashboard application, or
by reviewing the individual application statistics logs.
-
Revised memory management improves overall performance, making the most of available
system resources. This change has the effect of reducing the overall memory consumption
of all LumenVox applications. In test conditions, dramatic reduction in memory use was
observed, at times using less than 40% memory than previous versions for the same load
tests.
-
Added SSML pre-parser to the TTS server, which now attempts to determine whether any
unexpected elements are present within the SSML passed in from user applications. If any
are detected, an attempt will be made to determine the users intent. If one can be made
the SSML will be updated accordingly. If no sense can be made of that portion of the
request, it will be ignored. This is in response to user requests for a more forgiving
SSML parsing mechanism. Now the server will make an effort to interpret any SSML sent
and will only respond with an error as a last resort. Note that any substitutions or
deletions will be recorded in the TTS server log.
-
Added the option of displaying pre-parsed and modified SSML in the Speech Tuner via the SSML
properties dialog.
-
Answering machine detection code has been reworked to provide significantly faster
response when detecting beeps. Typical improvements seen were from 622ms in version 10.1
to a response time of 96ms in 10.2. In addition, new methods of beep detection were
added to improve efficiency.
-
Fax tone detection was added to the Answering Machine Detection module.
-
Added support for viewing answering machine/fax detection events through the Speech Tuner.
-
Acoustic models for UK English Digits, Mexican Spanish Digits and Indian English Digits
were retrained and updated to provide better overall performance.
Improvements and Major Changes:
-
Renamed logs in accordance with the new centralized logging mechanism. All log files should now
share a common naming convention of component_name_log_type.txt. New and old log
filenames are listed below:
Old Log Name |
New Log Name |
LVStatus_TTSServer.log |
tts_server_status.txt |
LVStatus_SREServer.log |
asr_server_status.txt |
LVStatus_LVLicenseServer.log |
license_server_status.txt |
LVStatus_MediaServer.log |
media_server_status.txt |
LVApp_SRE.log |
asr_server_app.txt |
SpeechServerLog.txt |
asr_server_app.txt |
LVApp_SpeechPort.log |
client_asr.txt |
--- |
client_tts.txt |
--- |
client_license.txt |
LVApp_LicenseServer.log |
license_server_app.txt |
LvMediaServer.txt |
media_server_app.txt |
LVApp_Tuner.log |
tuner_app.txt |
LVApp_Tuner.log |
call_indexer_app.txt |
LVApp_TTSServer.log |
tts_server_app.txt |
--- |
dashboard_app.txt |
--- |
manager_app.txt |
GrammarLoadLog.txt |
asr_server_grammar.txt |
LVApp_Critical.log |
lumenvox_critical.txt |
LVApp_MediaServer_Restart.log |
media_server_restart.txt |
LVApp_SREServer_Restart.log |
asr_server_restart.txt |
LVApp_LicenseServer_Restart.log |
license_server_restart.txt |
LVApp_TTSServer_Restart.log |
tts_server_restart.txt |
--- |
manager_restart.txt |
-
The default wind-back-time for decodes has been changed from the previous value of 325ms
to 480ms. This change is due to resolution improvements in Voice Activity Detection code.
-
Both LvMediaServerMonitor and LVLicenseAdministrator applications will no longer be
supported after version 10.2. All of the functionality these applications provided are
now handled by the LumenVox Dashboard. Migration across to this new application is
therefore encouraged.
-
Media Server configuration settings are now initialized to a setting of 'default' which
will use the default value. This saves users from remembering the default value, and
may also be useful in future to identify non-default settings.
-
Removed definitions for unused DeactivateGrammar(int index) and GetUtteranceScore from
LVSpeechPort.h header file, since these functions no longer exist.
-
Minor change to decoder processing when presented with an invalid language as part of a
decode request. Logging will now be much clearer what exactly the problem is.
-
The phonetic speller tool within Speech Tuner has been improved to maintain a history of
phrases, which can now be copied and pasted into external applications as needed.
-
Result strings have generally been changed from 'No Error' to 'Success' so that searching
for the word 'error' in log files is more productive.
-
Minor change to the response timing of internal messaging mechanism to better respond to
dropped connections.
-
AMD functionality is now exposed via the regular API mechanism. Previously this was not
a primary method of accessing AMD features.
-
The SimpleTTSClient application was modified to be more flexible and also provide more
feedback to users when errors occur, such as when TTS server is not available.
-
Minor change to add more post-allocation memory checks to help prevent problems when
running into low memory situations.
Minor Changes and Fixes:
-
A minor change to the Media Server code was made to accommodate dtmf-char=none, which
was previously expecting a single DTMF character value.
-
A minor bug in Speech Tuner was fixed when users rapidly switched between calls, then
deselected all calls in the browser view, some interactions were incorrectly shown as
unprocessed.
-
Fixed a bug where the Speech Tuner audio display was incorrectly scaling very long audio
clips, which caused the tick marks to appear too close together. There was also a minor
rounding issue in the same section of code, making the ticks appear slightly offset
occasionally. This too was fixed.
-
Fixed a bug in Speech Tuner where flagging a number of interactions, then filtering
them out and subsequently unfiltering them caused the flag marker for these to be
cleared, which was incorrect behavior
-
Fixed the Linux installer to auto-create the configuration file for the call indexer.
Previously this would not have been created until the first run of the call indexer
which is not expected behavior.
-
Fixed problem with command line installation/removal of the TTS server. Previously
installing and removing of the server using the InstallShield utility was the only way
to perform the operations, which is not expected behavior.
-
Fixed a problem with the Call Indexer, which was recording log events to the Speech
Tuners log file instead of its own.
-
Fixed a problem in Media Server when parsing a 'simple' RTSP SETUP request with
transport line ending with the client port number. This was previously being rejected.
-
Modified Media Server configuration to use default 480ms for wind back time from 325ms.
Note that this change should have no affect on customers, and is a result of internal
resolution refinement.
10.1.600 (July 13, 2011):
Minor Changes and Fixes:
-
Fixes a problem loading and activating en-GB grammars that was introduced with 10.1.100 - all users running en-GB should consider upgrading to this version from other 10.1 releases
10.1.500 (June 20, 2011):
Improvements and Minor Changes:
-
This is a maintenance release to resolve a minor packaging issue in
10.1.400 and should be used in place of 10.1.400. The packaging problem related to a conflict with the xulrunner package in CentOS5 64-bit only
10.1.400 (June 1, 2011):
Improvements and Minor Changes:
-
Speech Tuner now better handles malformed GrXML
when files are being loaded, and reports such errors.
-
Removed unused 'engine' logs folder in Linux.
All Speech Engine log files are now saved in the 'sre' folder.
- Improved NLSML formatting when the Media Server is in compatibility mode in order to
better structure results with multiple parses and multiple n-best results.
Minor Changes and Fixes:
- Fixed rare problem on Windows where Speech Engine failed to restart correctly following a power outage which
caused corruption to one of the files accessed during startup
-
Fixed a Media Server problem where TTS audio streams were stopped prematurely on occasion.
- Fixed a Media Server problem where timestamps in TTS audio streams were not consistent with wall clock if different streams were started and stopped within the same session. Now the timestamps are recalculated at the beginning of each stream segment.
10.1.300 (May 24, 2011):
Do not use this version in production
10.1.200 (April 22, 2011):
Improvements and Minor Changes:
-
Speech Tuner confidence histogram display was modified to display confidence at
threshold when the mouse is clicked onto the histogram
-
Exposed an internal Speech Engine setting, allowing large and complex grammars to be
processed in specific ways. This is needed for very specific grammars that
fail to compile due to their complexity. Contact LumenVox Technical Support
for assistance when working with large/complex grammars for more information on this feature.
-
Minor change to timing mechanism used for restarting services when installing language
packs.
-
Minor change to Speech Tuner TTS view which better handles audio being displayed.
-
Improved Speech Engine shutting down with non-critical acoustic model loading failure.
-
Improved server-side SLM caching.
-
Fix for Speech Tuner when dragging and dropping many callsre files.
-
Fix for Speech Tuner that caused the number of SRE interactions to
not be displayed correctly in the Platforms list on the summary
view when using certain operating systems, including Windows 7.
10.1.100 (April 11, 2011):
Major Improvements and New Features:
Improvements and Major Changes:
-
The Media Server has a revamped communications subsystem, using significantly less
threads and offering noticeably improved throughput performance.
-
The Speech Tuner has new Streaming options page, allowing various stream parameters to be
configured and tried within the Speech Tuner environment.
-
Added a new log called LVApp_Critical.log for tracking serious application issues
that require user attention. This log should now be the first place users look when troubleshooting
LumenVox issues.
-
The Speech Tuner now has auto-completion of tags when editing GrXML grammars.
-
Added support for Multi-Part grammars in the Media Server.
-
Added option to specify Round-Robin or First-Available port allocation mode in Media
Server. The new Round-Robin mode was added to offer better performance when cycling
ports quickly when the system is under extreme load, avoiding CLOSE_WAIT socket issues.
-
Improved performance in critical low memory situations.
-
Improved Speech Engine decode throughput performance.
-
Modified the default port number ranges used by Media Server to avoid problems with
ephemeral ports. Here are the new recommended (default) values:
mrcp_server_port_base = 30000 (was previously 49922)
rtp_server_port_base = 35000 (was previously 50922)
monitoring_port = 29900 (was previously 39911)
-
Improved License Server performance when processing large numbers of simultaneous
requests.
-
Modified the client licensing mechanism to offer significantly improved performance
by temporarily caching released licenses, thus removing the need for round trip to the
license server if another license is requested within a short time period.
-
Speech tuner now allows filtering and sorting in the Call Browser view, similar to that
in other views.
-
N-best results are now sorted by confidence score, in descending order. Previously there had been
confusion surrounding the designation of the confidence scores of lower n-best results
which was caused by different methods of calculating these scores for independent
results. A more unified approach to confidence scoring has been implemented for
n-best results lower than the top value to eliminate this confusion.
Minor Changes and Fixes:
-
Changed inconsistent Media Server settings to correctly reflect compliance with MRCP
specifications for both MRCPv1 and MRCPv2. Now the values for these settings in config
file and also SET-PARAMS/GET-PARAMS use are correct as follows. The LumenVox API values
are also shown for comparison:
|
Speed vs Accuracy |
Sensitivity Level |
MRCPv1 |
0 is low speed |
0 is low sensitivity |
MRCPv2 |
0 is high speed |
0 is low sensitivity |
LV API |
0 is high speed |
0 is high sensitivity |
-
Added clearer indication of XML parsing errors in Grammar Editor, and also providing
much clearer indication of grammar processing errors when they are detected.
-
Modified Mexican Spanish builtin/currency grammar to correctly return semantic
interpretations.
-
Modified Mexican Spanish builtin/time grammar to include the accent on TR�S. Previously
this accent was missing.
-
Modified American English builtin/time to correctly return 1400h instead of 1400 which
was being returned previously.
-
Fixed LV_SRE_GetAvailableLicensesCount when License Server is in Authentication Mode.
Now only the count from the first of such servers is considered in computing the total
available licenses, instead of duplicating the number of available licenses.
-
Modified message routing code to prevent unwanted double-close of sockets under certain
specific error conditions.
-
Fixed a problem when loading grammars from specified URI locations containing '%20'
in place of spaces was incorrectly being converted to ' ', causing subsequent internal
http or ftp fetch requests to fail.
-
Clarified API logging of failed OpenPort/CreateClient calls to include an error
description (text) in addition to the error code previously returned.
-
Added new startup/shutdown logging for LumenVox products, allowing technical support
members to more quickly assist customers.
-
Application configuration settings are now reported to logs at startup.
-
Removed support for legacy semi-continuous models, thus reducing the installation
package sizes. For example Australian English is now 22 MB from the earlier 148MB.
-
Modified SimpleSREClient code to use CreateClient and DestroyClient in favor of
recently deprecated OpenPort and ClosePort calls.
-
License Server changed to report expiration dates of installed licenses to log at
startup.
-
Added Speech Tuner option to display the .callsre file associated with a selected
interaction in the advanced tab of the property dialog.
-
Added diagnostic statistics in the Media Server tracking for MRCPv2
in addition to previous MRCPv1 values.
-
Modified the Speech Tuner's method of calculating real time factor for decodes,
since this was being incorrectly shown, causing times on Linux to be some 20 times
faster than they actually were. Windows RTF times were unaffected.
10.0.1020 (Feb. 21, 2011):
Minor Changes:
-
Minor release releated to packaging for TTS voices. Will not affect most users.
10.0.1019 (Feb. 14, 2011):
Please note that there are a very large number of changes in the 10.0 release of LumenVox, as it represents one of
our biggest releases ever. In addition to a number of fixes and improvements, we have added native 64-bit binaries to our releases,
written a text-to-speech (TTS) server that is integrated with our Media Server, and made quite a few changes to our C and C++ interfaces.
Customers upgrading from older versions are advised to read through these release notes carefully.
Major Improvements and New Features:
-
Windows users should see our new instructions for Downloading the LumenVox products
as the download process has changed.
-
64-Bit Versions of Server and Client products in both Windows and Linux are officially
released. This supports full, native 64-Bit performance on supported 64-Bit operating
systems (Linux Red Hat 5 64-bit/CentOS5 64-bit, Windows Server 2008 R2, Windows 7 64-bit)
Note that only Intel x64 (AMD64 mode) processors are supported (not Itanium).
When running LumenVox in 64-bit native mode, you need one of the following supported
Operating Systems:
- Linux RHEL 5 / CentOS 5 (64-bit)
- Windows Server 2008 R2 (64-bit)
- Windows 7 (64-bit)
Minimum supported machine configuration:
- 8 GB Memory
- 200 GB Hard Drive
- 8 processor cores
Note that machine configurations are highly application specific, depending on things
like grammar size and number of simultaneous calls. Please contact LumenVox Technical
Support for assistance in determining your specific hardware needs.
-
Added new LumenVox Text To Speech functionality. This has been added to the Speech Port
API and also is embedded within the Media Server (both MRCPv1 and MRCPv2). The new TTS
functionality is exposed in non-MRCP implementations via two new LVSpeechPort header
files LV_TTS.h (C interface) and LVTTSClient.h (C++ interface). Note that the TTS server
is initially being released in 32-bit mode only. This can be run on 64-bit versions of
supported versions of both Windows and Linux. The native 64-bit version will follow in
the next product version. Most users should not encounter any performance limitations of
this 32-bit component
-
Added TTS events to callsre response log files (when enabled). These can now be viewed
in the Speech Tuner for diagnostic purposes. When using the Media Server, these log
files now use the name of the active session rather than a GUID so that they can be
more easily matched up later. These logs files contain a combination of SRE and TTS
events that happen within a single session, so the overall call flow can be visualized
within the Speech Tuner and optionally, audio from both types of events can be replayed.
Correct logging of DTMF events to callsre files was also added to LumenVox Media Server.
-
TTS functionality has been integrated into the LumenVox Media Server, which connects
to the LumenVox TTS Server (via licensing). SPEAK, PAUSE, RESUME, SET-PARAMS and
GET-PARAMS requests are serviced via MRCPv1 and MRCPv2 connectivity. MRCP CONTROL
requests are not handled at this time. Of note, when plain text TTS requests are made,
the TTS Server will utilize any SET-PARAMS settings that have been specified for the
session (or non-standard settings from the configuration) and internally produce and
handle this as a full SSML request. By contrast, SSML requests sent from external
client applications (platforms) must embed any SSML settings within their SSML
request string. Also note that syntax errors that are detected in the specified SSML
string passed into the TTS parser are reported back to the MRCP client in the failed
completion packet with details contained in a "Completion-Reason" header where possible.
-
Added new GetPropertyEx API functionality to expose the values that are set using the
companion SetPropertyEx API functions. This interface is available from both the C++
SpeechPort and also the C-style API interface
-
Added a new command line utility allowing grammars to be pre-loaded to the specified SRE
servers via batch or script files as was requested by some customers.
This utility is called lv_grammar_loader in Linux and
GrammarLoader.exe in Windows
Usage: GrammarLoader.exe <grammar-file> <server-ip> [timeout]
The timeout (specified in seconds) indicates the timeout for the grammar load call.
Defaults to 3600 (1 hour) if none specified
-
Added Statistical Language Model (SLM) support to the Engine. Use of SLMs requires a new "SLM" license
type that includes all functionality of the Full license plus support for SLMs. Please contact LumenVox
for more information about using SLM functionality.
Improvements and Major Changes:
-
Optimized the packaging of installation packages to remove uncommonly used medium and
high resolution models. Also split the installation packages into language packs,
which reduces the installation download size for most customers. Please be sure to read
the new installation instructions for more information on these language packages.
-
The client_property.conf file
has had the default value of STRICT_SISR_COMPLIANCE set to 1. This should not affect users who
upgrade as your existing values should be copied into the new file, but any new installations of
LumenVox will have this set. Any user who is using the older SISR tag-format making use of $
instead of out should verify the setting is set to 0. LumenVox encourages all developers
to switch to the current SISR tag-format as soon as it is convenient.
-
Many functions and definitions that were previously deprecated have been removed from the API.
Any applications using these should be modified to avoid their use. Please consult
LumenVox technical support if you require assistance with this.
The following items have been removed:
Defines and typedefs-
-----------------------
H_SPT
H_SPT_PRE_ORDER_ITR
H_SPT_CHILDREN_ITR
LV_NOT_A_VALID_PROPERTY_VALUE
LV_BAD_HPORT
LV_GRAMMAR_WARNING
LV_GRAMMAR_ERROR
LV_DECODE_USE_ABNF_GRAMMAR
Functions and methods-
-----------------------
LVParseTree_GetIteratorBegin()
LVParseTree_GetIteratorEnd()
LVParseTree_GetConceptIteratorBegin()
LVParseTree_GetConceptIteratorEnd()
LVParseTree_GetTerminalIteratorBegin()
LVParseTree_GetTerminalIteratorEnd()
LVParseTree_GetTagIteratorBegin()
LVParseTree_GetTagIteratorEnd()
LVParseTree_Node_GetParent()
LVParseTree_Node_GetIteratorBegin()
LVParseTree_Node_GetIteratorEnd()
LVParseTree_Node_GetChildrenIteratorBegin()
LVParseTree_Node_GetChildrenIteratorEnd()
LVParseTree_Node_GetTerminalIteratorBegin()
LVParseTree_Node_GetTerminalIteratorEnd()
LVParseTree_Node_GetTagIteratorBegin()
LVParseTree_Node_GetTagIteratorEnd()
SI_DATA_Clone()
SI_DATA_Release()
SI_DATA_Is_Equal()
SI_DATA_Type()
SI_DATA_Print()
SI_DATA_GetBool()
SI_DATA_GetInt()
SI_DATA_GetDouble()
SI_DATA_GetString()
SI_DATA_Object_Size()
SI_DATA_Object_Property_Id()
SI_DATA_Object_Property_Value()
SI_DATA_Object_Property_Exists()
SI_DATA_Array_Size()
SI_DATA_Array_Element()
LVGrammar_SaveCompiledGrammar()
LVGrammar_LoadCompiledGrammar()
LV_SRE_GetPhonemes()
LV_SRE_SetProperty()
LV_SRE_GetUtteranceScore()
LV_SRE_SetBuiltinGrammarURI()
LV_SRE_IsGlobalGrammarLoaded()
LV_SRE_UnloadGlobalGrammars()
LV_SRE_GetParseTreeHandle()
LV_SRE_GetNumberOfConceptsReturned()
LV_SRE_GetConcept()
LV_SRE_GetPhraseDecoded()
LV_SRE_GetRawTextDecoded()
LV_SRE_GetPhonemesDecoded()
LV_SRE_GetConceptScore()
LV_SRE_AddPhrase()
LV_SRE_LoadStandardGrammar()
LV_SRE_RemoveConcept()
LV_SRE_ResetGrammar()
LV_SRE_SetConceptRepetition()
LV_SRE_LoadGrammarIdx()
LV_SRE_LoadGrammarFromBufferIdx()
LV_SRE_LoadGrammarFromObjectIdx()
LV_SRE_UnloadGrammarIdx()
LV_SRE_IsGrammarLoadedIdx()
LV_SRE_ActivateGrammarIdx()
LV_SRE_DeactivateGrammarIdx()
LV_SRE_SwitchFromHotMode()
LVSpeechPort::GetNumberOfConceptsReturned(int VoiceChannel)
LVSpeechPort::GetConcept(int VoiceChannel, int Index)
LVSpeechPort::GetPhraseDecoded(int VoiceChannel, int Index)
LVSpeechPort::GetRawTextDecoded(int VoiceChannel, int Index)
LVSpeechPort::GetPhonemes(int VoiceChannel, int Index)
LVSpeechPort::GetPhonemesDecoded(int VoiceChannel, int Index)
LVSpeechPort::GetConceptScore(int VoiceChannel, int Index)
LVSpeechPort::AddPhrase(int GrammarSet, const char* Concept, const char* Phrase)
LVSpeechPort::LoadStandardGrammar(int GrammarSet, int DefaultGrammar)
LVSpeechPort::RemoveConcept(int GrammarSet, const char* Concept)
LVSpeechPort::ResetGrammar(int GrammarSet)
LVSpeechPort::SetProperty(int property,int value)
LVSpeechPort::LoadGrammarFromBuffer(int index, const char* buffer_string)
LVSpeechPort::LoadGrammarFromObject(int index, LVGrammar& Grammar)
LVSpeechPort::IsGrammarLoaded(int index)
LVSpeechPort::UnloadGrammar(int index)
LVSpeechPort::IsGlobalGrammarLoaded(const char* label)
LVSpeechPort::UnloadGlobalGrammars()
LVSpeechPort::SetBuiltinGrammarURI(const char* Name, lv_bool DTMF, const char* URI)
LVSpeechPort::ActivateGrammar(int index)
LVSpeechPort::SwitchFromHotMode()
LVSpeechPort::LoadGrammar(int index, const char* uri)
LVGrammar::SaveCompiledGrammar(const char* filename)
LVGramamr::LoadCompiledGrammar(const char* filename)
LVParseTree_Node::Parent()
-
Some API functions were deprecated. These will continue to be supported for some time
and can be used by including the LV_SRE_Deprecated.h header file:
LV_SRE_OpenPort2()/LV_SRE_OpenPort() replaced by LV_SRE_CreateClient()
LV_SRE_ClosePort() replaced by LV_SRE_DestroyClient()
LVSpeechPort::OpenPort() replaced by LVSpeechPort::CreateClient()
LVSpeechPort::ClosePort() replaced by LVSpeechPort::DestroyClient()
-
As noted above, CreateClient should now be called in place of
OpenPort or OpenPort2. This new single
function is a consolidation of the previous two. In addition, the ClosePort function
call has now been renamed DestroyClient to be more consistent with other API names.
The previous functions can continue to be called by including the LV_SRE_Deprecated.h
header file, but this older functionality will be removed in some future product version
so we recommend moving over to using the new function names as soon as is practical.
-
Several unused and no longer supported settings have been removed:
PROP_EX_DECODE_OPTIMIZATION
PROP_EX_SEARCH_BEAM_WIDTH
PROP_EX_LANGUAGE
-
Several settings have been deprecated, and will be removed in future versions:
PROP_EX_LIC_SERVER_HOSTNAME - replaced by new PROP_EX_LICENSE_SERVERS
PROP_EX_LIC_SERVER_PORTNUM - replaced by new PROP_EX_LICENSE_SERVERS
-
All references to the long data type in the exposed API functions have
been replaced with the int data type instead. This is for better
32/64-bit compatibility, since int data types are the same size in both 32- and 64-bit
implementations of Windows and Linux.
-
Configuration files on Windows are now located in a single common folder. This is the LVCONFIG
folder (which by default is set to $LVBIN/config and defaults to
Program Files\LumenVox\Engine\). On Linux, previously there were application-specific variables in lumenvox_settings.conf that would identify the full path to the respective configuration file. Now there is a global variable called LVCONFIG (under a section named "GLOBAL") in lumenvox_properties.conf that identifies the directory where all LumenVox products will look for their respective application-specific configuration file. By default, LVCONFIG points to /etc/lumenvox. When performing an upgrade from an
earlier version, the installation process will scan the old locations,
making backup copies of old configuration files as needed and moving the old settings
into the new files created in this folder. User settings should therefore be carried
to the new locations where possible. Note that some unused settings have been removed.
When this is encountered during an upgrade scan, the settings will be copied to the new
location, but will be commented out as removed. This change is in combination with a
new mechanism for creating configuration files following installation, which was
previously performed by the installation packaging process, but is now performed by a
command line utility application (ConfigurationUpdater.exe on Windows,
lv_configuration_updater on Linux). This utility should not need to be run by users
and should only be used in conjunction with LumenVox technical support. One noticeable
effect of these various configuration changes is that now, if users wish to revert to
the original, default settings at any time, they can rename or delete a configuration
file and it will be created by the host application the next time it is started.
-
Renamed Media Server configuration from file mediaserver.conf to media_server.conf in
both Windows and Linux.
-
Added support for custom XML lexicon files to be specified via the <lexicon> element
inside of a grammar.
-
Added a new command line utility allowing users to see their current client settings and
configuration file locations more easily. This utility can optionally perform a simple
request to determine availability of licenses with the current configuration settings
(particularly useful when running in authentication mode).
This utility is called lv_show_config in Linux and LVShowConfig.exe
in Windows. It can be run with no options in order to see a usage message.
-
Added new LV_SRE_GetAvailableLanguageCount and
LV_SRE_GetAvailableLanguageIndex API
functions to expose loaded acoustic model language information to the client. Note
that these are advanced functions and therefore are declared in LV_SRE_Advanced.h
-
Added new LV_SRE_SetCustomCallGuid and LV_SRE_GetCallGuid API
functions that can be
used to specify the callsre filename for the session (this is done automatically when
running the Media Server). If these are not used, the previous functionality of
using a random GUID each time will continue to work as before. Note that these are
advanced functions and therefore are declared in LV_SRE_Advanced.h
-
Exposed some C++ functions that were previously only accessibly using the C-Style API.
Added to LVSpeechPort.h:
LVSpeechPort::ReturnGrammarErrorString
LVSpeechPort::GetGrammarVocabSize
LVSpeechPort::GetAvailableLicensesCount
LVSpeechPort::IsServerAvailable
Added to LV_SRE_Grammar.h:
LVGrammar::GetLanguage
LVGrammar::GetMode
LVGrammar::GetTagFormat
-
The Speech Tuner now requires a separate Speech Tuner license in order to run. Please contact
our sales or support departments for details.
-
The Media Server now uses a default confidence-threshold of 0.05 in place of 0.45 due to
recent confidence score changes returning a broader spectrum of results. This can still
be changed as always by specifying a new value in the configuration file or as part of an MRCP/VXML
setting.
-
The Engine and the client have had significant improvements made to better handle very large and/or
recursive grammars.
-
The Speech Tuner's transcriber view has been changed to allow users to pause and resume audio
playback (using ~ key), and new options to jog forward and backward 5% through the audio
using Page Up / Page Down keys.
-
Added new upgrade analysis tool, which allows users to identify potential problems
before upgrading to newer versions of LumenVox products. In particular is there
would be any licensing issues encountered when moving to a newer version, these would
be identified so that they can be resolved before upgrading. This is an optional tool.
-
Added Media Server handling of the SIP Record-Route headers if present in SIP INVITE
requests. If Record-Route headers are not present, there is no effect. If one or more
are present, they will be preserved in their correct order and relayed back to the
client whenever appropriate.
-
Changed Media Server to allow more control over Save-Waveform functionality. This
includes persistent state of Save-Waveform flag as well as configuration file override
for the default Save-Waveform flag. The Save-Waveform flag can now be set from
SET-PARAMS (or other) requests in addition to just RECOGNIZE tasks. Includes changes to
naming convention of generated audio files that include the RECOGNITION task ID so that
each audio within the session can easily be identified.
-
Improved robustness across all products to better handle out-of-memory situations and
attempt to fail gracefully, continuing operation where possible. Also improved
robustness of Media Server when running under significant stress situations either under
heavy load, or low resource situations to be tolerant of socket and memory failures at
the operating system level.
-
Improved Media Server throughput performance.
-
Improved built-in time grammars to return results more compatible with other vendors.
Minor Changes and Fixes:
-
In addition to the configuration file location, all of the configuration files have had
sections added to help users better identify the area or module affected by each setting.
-
All of the older, now unused semi-continuous decoder functionality has now been removed
to allow better future performance and remove unnecessary overhead. This change included
removing some older acoustic models from the installation packages since they are no
longer necessary.
-
Invalid Tag-Formats that are used in grammars are now reported during grammar load,
rather than after use as before.
-
Added grammar display/selection option within the Grammar View and generally modified how
grammars are handled within Speech Tuner. This includes a change allowing users to
toggle between .callsre grammars and loaded grammars. Also, currently loaded grammars are
unloaded whenever a new tuner database is opened.
-
Media Server now supports "Logging-Tag" header, when specified. This tag is stored in
the corresponding callsre file and can be filtered using the Speech Tuner.
-
New recognizer_resource_url and synthesizer_resource_url settings were added to the
Media Server configuration file to indicate the ASR/TTS resources users wish to target.
-
Added new SimpleTTSClient example application to exercise the new TTS functionality. This
accompanies the newly renamed SimpleSREClient example application to exercise the SRE
functionality (this was previously called SimpleClient).
-
Added support for builtin:grammar of non-US English languages. See the SetPropertyEx
option PROP_EX_BUILTIN_GRAMMAR_LANGUAGE for more details. Note that this change is
automatically implemented within the LumenVox Media Server when the grammar language is
specified (via 'Speech-Language') in the MRCP RECOGNIZE or DEFINE-GRAMMAR request.
-
Added new error codes in the range -51 to -68 which are used by the TTS handling code
and License handling code as well as additional exception handling.
-
Fixed a minor bug in Speech Tuner where zero length transcriptions were causing problems
when deactivating the view (switching to another, or shutting down).
-
Identified and fixed a problem in Media Server where the possibility existed in which
MRCP packets could have been mishandled if multiple messages were compacted into a single
transport packet by the sending machine in a certain way. This would not have been a very
common problem.
-
Identified and fixed a problem relating to registering and unregistering the LumenVox
Media Server service from the command line (Windows only). This was tracked down to an
incorrect name being used by the InstallShield packaging code. This has now been
corrected. This would only have become apparent to Windows users attempting to manually
remove and reinstall the Media Server service.
-
Added new options to Speech Tuner, allowing users to specify the VAD streaming parameters
prior to running a test. These include SNR and Volume Sensitivity settings as well as
the VAD Init Mode (silence trimmed).
-
Added versioning to Speech Tuner Interactions files.
-
Modified Media Server SIP error detection algorithm to be more tolerant of errors
before resetting the UDP socket. This may have the effect of improving overall
performance when UDP receive errors persist.
-
Modified Media Server port allocation algorithm to favor round robin approach instead
of lowest available. This may have the effect of improving performance when cycling
through large numbers of sessions in quick succession due to TCP CLOSE-WAIT timing.
-
Removed previous memory check that would fail decodes indicating insufficient memory
if available physical memory was below 80 MB. This helps with low resource system
configurations where physical memory is not available, and is using virtual memory
instead, which is monitored using other methods.
-
Internally, Windows binaries are now built using Visual Studio 2008 in place of the
previous Visual Studio 2005 (this change should not affect most users).
-
Major product components now log out their operating system environment and location of
configuration settings files they are using at startup.
-
Improved logging in the License Server. Now installed licenses are reported to the log
file at both application startup and following a merge operation when new licenses are
added. Also, as each license is acquired, the number of used/remaining licenses of that
type are reported.
-
Media Server now utilizes the decode timeout specified in the configuration file when
performing the final steps of a decode (internally). This was previously using a fixed
20 second timeout. This change should not be noticeable to users.
-
Improved API logging of SetPropertyEx and GetPropertyEx in the speech port to give
clearer indication of the meanings of values specified.
-
During Windows installation, a check for MDAC (required for emailer.exe) is done, which
will be installed if needed.
-
Significant performance improvement in the Media Server Monitor, which now performs caching
and periodic (500ms) updates to prevent backlogs of messages that couldn't be written
quickly enough. Also, the auto-scroll feature has been improved to stabilize selection
and viewing area when new events are being appended to the display window.
-
Speech Tuner now offers better handling of .csv files and better reporting of files when
they cannot be found or loaded correctly.
-
Removed some diagnostic (and possibly confusing) messages from the console output when
installing/removing some LumenVox services from the command line (Windows only).
-
Logging was changed to remove unnecessary events being recorded that were cluttering the
otherwise useful logs. Also recategorized some events that were being reported with the
wrong level of severity.
-
Modified example grammars to use the correct {$=""} tags with instead of the :""
shorthand.
-
Added the option of having the license server create Info.bts files when needed using
the new optional /SYSINFO command line parameter. This can be useful as an alternative
to lv_license_manager, which can also produce these files when needed.
9.5.100 (May 10, 2010):
Improvements and New Features:
-
The confidence scoring mechanism used by the decoder has been almost entirely rewritten. A completely
new set of algorithms is used in calculating scores, which greatly improves the reliability
and meaning of the reported confidence scores. In particular, it provides a much clearer separation between
correct and incorrect results.
-
Due to the significantly revised confidence score calculation methods added to version 9.5, there is
no backward compatibility between the 9.5 server and older clients (or vice versa). You should recompile
your applications and be sure that you have updated all components prior to deploying 9.5.
-
All acoustic models for the various langauges have been rebuilt using the new continuous methodology
introduced for American English in 9.0. This should lead to improved accuracy with each language. It also
means that we are dropping support for semi-continuous mode as it is no longer needed. If you were previously using
semi-continuous mode (e.g. for Spanish support) you should switch to continuous mode.
-
The LumenVox software can now be configured to automatically
send e-mail alerts when critical errors occur. This is disabled by default.
-
On Windows installations, a new config file called lumenvox_settings.conf
will be installed in %CommonProgramFiles%/LumenVox/. This will be used to
hold global settings for all LumenVox products, including the new e-mail settings for reporting
critical errors. (As this file already existed on Linux, it has simply had the new [GLOBAL] section
added to it.)
-
The jitter buffer mechanism in the Media Server was rewritten. This should now perform
better when sequencing problems between audio and dtmf occur and also when dropped packets are encountered.
-
New LV_SRE_GetAvailableLicensesCount
and LVSpeechPort::GetAvailableLicensesCount
API functions allow client applications to query the number of available licenses from a License Server.
-
The speech server now has several options for controlling memory use and determining what happens when system memory
becomes low. Please see the sre_server.conf documentation for the
following settings: FRAME_TRACK_MODE, CRITICAL_MEMORY_THRESHOLD, LOW_MEMORY_THRESHOLD, and LIMITED_MEMORY_THRESHOLD.
-
The Engine now has an optional answering machine detection mode that can be enabled with a special
license type. This is useful for outbound calling, as it can very reliably detect answering machine
or voicemail beeps. Please contact LumenVox for information about obtaining answering machine detection
licenses.
-
Linux packages now ship with a script called lvservices_restarter.sh that allows you to
configure the system to automatically restart the LumenVox processes.
-
Please Note: In LumenVox 10.0, we will be completely removing a number of deprecated functions, including
the concept/phrase interface. If you are still using functions that are in the deprecated header, please consider changing them
now.
Fixes and Minor Enhancements:
-
Acoustic model file versions are now reported at Engine startup.
-
Various Media Server statistics are now periodically logged to a new file called LVStatus_LVMediaServer.log.
This file is located in the same directory as the Media Server log (by default
C:\Program Files\Lumenvox\Engine\Logs\ on Windows or /var/log/lumenvox/mediaserver/ on Linux).
-
All products now log a version number, operating system, and LumenVox environment variables at startup to aid
in troubleshooting of common problems.
-
Fixed a memory leak in the License Server.
-
When the License Server is running in authentication mode, it will now log usernames in plaintext (previously
usernmaes were logged as obfuscated hashcodes).
-
Linux client packages now include a compiled binary of the SimpleClient sample
along with the source code (helpful when there is no compiler on the target machine).
-
Changed the media server configuration file to better describe the various logging
options in the responses (callsre) files.
-
The LV_SRE_IsServerAvailable API function signature was changed to return an integer
instead of bool. The bool was inappropriate here since this is a C type interface. Now
1 means that an SRE server is available, 0 means there is not.
-
Configuration files now have a GLOBAL section of where previously there were no section markers. Applications
that looked for settings in unmarked locations will now look for them in the GLOBAL section.
-
Added dropped packet monitoring to Media Server statistics.
-
Added new error codes to LV_Error_Codes.h
-
Fixed a problem with the CallIndexer which previously had the possibility of waiting for an
infinite amount of time between scans. Now it will default to 1 day between scans.
-
Fixed a bug in the Linux version of the Call Indexer which caused the application to look
in the wrong location for configuration file. The application will now use the correct
/etc/lumenvox/ location for the configuration file.
-
On Windows, the Call Indexer now expects its configuration file to be located in the new %LVCONFIG%
folder, which defaults to %LVBIN%\config.
-
Added the missing LVCallIndexer.ini to Linux RPM packages.
-
Fixed a problem with the grammar parser that allowed for infinite recursion in grammars.
-
Modified the mechanism that established a connection between speech clients and servers at client
startup. The client will now try for up to 2 seconds to connect to a server before giving up.
Previously, on very fast machines, connections could timeout before this connection had been fully established
-
Modified the way in which Speech Tuner stores transcription text to allow for commas to be included
without affecting the comma separated file format, which previously caused corruption of the data when
reloading the files. Now commas detected in transcript text will be encoded as , when saved.
This method is now used for Transcript, Decode, Transcript SI, Comments, Error String and ModelName strings.
-
Speech Tuner now correctly calculates Word Accuracy scoring. Previously the number of
word was being derived incorrectly. Word deletions are now no longer used in counting
the number of total words in the statistics.
-
Speech Tuner was modified to update word counts displayed in the word list to include
mismatches (insertions and substitutions only).
-
Added correct C-style API function attributes to some deprecated functions. This change should not
adversely affect users.
-
Added new grammar list option to Speech Tuner, allowing grammars to be viewed and
selected from within the grammar editor window.
-
Added options to specify more than one active grammar for use when testing with the Speech Tuner. This makes
working with grammars much easier.
-
Improved grammar loading and error processing added to Speech Tuner allowing users
to more clearly see what problems are being reported for problematic grammars.
-
Improved grammar loading in Speech Tuner to resolve external references wherever possible. These references are
cached to local temporary files, allowing complex referenced grammars contained in callsre files to be edited and
tested on the fly.
-
Improved Speech Tuner prompting to save grammars when changing views or exiting the
application.
-
Fixed a bug in the Call Indexer which could cause a segmentation fault on Linux under certain conditions.
9.2.400 (February 26, 2010):
-
Fixed problems with custom pronunciations which caused custom pronunciations to be silently ignored
in a number of cases.
-
Fixed problem with new grammar caching mechanism where the original GrXML grammar text was not being
correctly stored in the resulting callsre files when using the streaming interface or Media Server.
-
Fixed problem with grammar parser resolving uris that specified relative paths using the ../
syntax.
-
Fixed problems with Speech Tuner loading malformed grammars into grammar editor.
-
Fixed cosmetic problem in Speech Tuner when displaying decode progress with greater than 65535
interactions due to minor overflow issue.
-
Fixed bug that caused grammar results to sometimes be returned in incorrect case.
-
Added missing acoustic models to Speech Tuner installation that are needed for phonetic
speller tool.
9.2.300 (February 16, 2010):
-
Fixed bug in Media Server where an empty result is returned with 0 confidence score when
a bad VAD silence is detected due to insufficient leading silence in the decoded audio.
This was changed to now return no-match if recognized string length is less than 1 and confidence
is set to 0.
-
Fixed bug in the client where a grammar label of greater than 260 characters could
create a buffer overrun situation and cause an exception.
9.2.200 (February 9, 2010):
-
A brand new Speech Tuner has been released! Full details are available in the
Tuner documentation
-
Significant internal changes to allow better handling of grammars. This specifically addresses
large grammar issues but also allows faster processing of all grammars. If you previously had
trouble loading large grammars, please try upgrading to 9.2.200 or later.
-
Changed default location of the Media Server configuration file in Windows (it will now default
to a folder called config in the Engine installation directory). In the event that your %LVBIN% variable
is set to somewhere other than the Engine installation directory, the Media Server will look in %LVBIN%\config.
-
Streamlined decode processing code allowing decodes to run faster generally.
-
Added more verbose logging to grammar loading and parsing.
-
Better compatibility with alaw audio format in Media Server. Is now faithful to SDP
specified format, not RTP packet markings. Previously, incorrectly marked RTP packets
were not processed correctly if there was an incorrect mismatch between SDP specifier
and RTP specifier. Also improved possible ambiguity when client requests both alaw and
ulaw audio formats at the same time (now first specified will be used).
Fixed a bug in the internal statistical pronunciation model relating to pronunciation of
words (typically nouns) that would not generally be in the language dictionary. This fix
improves accuracy in these cases.
-
Fixed a problem with possible non-unique Session-ID strings in MRCPv1 sessions.
-
The Media Server's RTSP interface will now correctly respond with 486 Busy Here when speech port is
unavailable at SETUP. Previously the SETUP would succeed, but subsequent DEFINE-GRAMMAR
or other calls would fail.
-
Improved the load balancing between client and multiple SRE servers using a new algorithm
designed to better utilize and balance SRE resoruces.
-
Client side grammar caching has been added. This reduces the amount of time needed for
subsequent grammar load requests from each client. This can significantly speed up loading
GrXML grammars, which need to be converted to ABNF prior to sending to server. Several new
settings have been added to client_property.conf
including: CLIENT_CACHE_ENABLE, CLIENT_CACHE_EXPIRATION, CLIENT_CACHE_MAX_NUMBER and
and CLIENT_CACHE_MAX_MEMORY.
-
The Media Server was getting into a deadlock state under extreme specific conditions on Linux
platforms. This problem was identified and fixed.
-
Changed configuration file parser to better handle situations where config file is
missing.
-
Fixed a problem in the way threads are stopped in Linux. This allows more robust thread
control.
-
Fixed problem in Media Server in Linux, where locking would occur sometimes due to
blocking socket call.
-
Changes to Linux shutdown code improve shutdown performance and reduce potential
problems.
-
More robust handling of configuration values in Media Server to prevent minor rounding
problems on some Linux distributions.
-
Media Server changed to better handle multiple duplicate SETUP requests in the same
session.
-
The Engine installation package now includes a new Call Indexer service. This works in conjunction with New
Speech Tuner to index and serve callsre files as needed on Linux and Windows machines
connected to Speech Tuner.
-
Logging of License Authentication requests was improved to indicate licenses on a per-user basis.
-
Fixed a grammar reference bug where a non-root-rule was being referred to in a
referenced grammar.
-
Fixed bug in server side grammar cache, when expiring grammars.
-
Added fix for NAT problems with certain routers and firewalls that would automatically
close license connections after a period of inactivity. Now the socket will be pinged to
remain open more frequently.
-
Fixed small memory leak when socket message connectivity is lost in certain conditions.
-
Fixed a bug where callsre responses were being stored in the incorrect folder under
certain conditions.
-
License Administrator changes were made to prevent freezing of GUI when connecting to License
Server that is under stress.
-
Fixed bug with multiple sequential $NULL rules being parsed in a grammar.
-
Fixed problem with GrXML to ABNF conversion that would not correctly assign grammar
weighting to $NULL rules in the generated ABNF output. Removed unnecessary double $NULL
rules.
-
Added option to automatically clear cache folders one time after installing or upgrading
software to avoid possible version compatibility problems. Upon installation or upgrading, a file
called cached_grammars.key will be generated in the server-side grammar cache folder, and a file
called cached_client_grammars.key will be generated in the client-side grammar cache folder.
If this file is detected by the Engine or client, it will clear the appropriate cache (including
the file). This means that after upgrading, large grammars may take a while to load the first
time as the cache is clear.
-
Removed medium and high resolution acoustic models from packaging to reduce shipping size
(these models are still available by custom request).
-
Removed some unwanted/unnecessary logging.
-
Windows 2000 is no longer a supported operating system; however we have tested on Vista and Windows
7 and now support those operating systems.
-
Support for Fedora Core 9 has been dropped, but support added for FC12.
-
Fixed bugs when handling custom pronunciations in (default) continuous decode mode.
9.1 (October 2009):
-
The LumenVox products now support licensing authentication, allowing for
the License Server to only provide licenses to clients who authenticate with a user name and
password. You can use this if you wish to provide licenses to customers over the public Internet,
or to take advantage of our new
subscription licenses.
-
American English (en-US) now has three different
resolution acoustic models you can use. These models
offer better accuracy, but use more memory and require more time for decodes. By default, only the lowest
resolution model is loaded.
-
We have removed the "-di" suffix in a language declaration to indicate that digits-only
acoustic models should be used. The Engine will now make this switch automatically, if it detects that
that all loaded grammars use only digit words. This should offer much greater compatibility when using
the built-in digits grammars, especially with VXML applications. If you continue to specify -di in a language,
it will be ignored.
-
There have been a number of bug fixes and enhancements that should improve compatibility between the LumenVox
Media Server and various voice platforms that use MRCP.
-
We have streamlined the Speech Engine such that its memory footprint has been reduced substantially. Under
light loads, you should find that the Engine uses only about 150 MB of memory.
-
Be sure to note that if you upgrade to 9.1, you should
download and install new license files following
our new upgrade procedures that were put into place in 9.0.
-
Fixed a bug that caused loads to be improperly distributed when using multiple speech servers.
-
Added instructions for using VoiceGenie 7,
Genesys Voice Platform 7.6, and
Syntellect Communications Platform
with the LumenVox Speech Engine.
9.0 (July 2009):
- This is the first release of the LumenVox
Continuous HMM Decoder for use with the Speech Engine.
The Continuous model does not use compression, so it has higher resolution, resulting in increased accuracy.
The continuous models have shown an accuracy increase across various domains, but at the expense of
approximately 15-20% more processing time.
- There are new noise reduction options available in 9.0, allowing better
recognition in noisy environments.
- The LumenVox Speech Engine version 9.0 offers improved support of the
$GARBAGE rule, which allows grammars to be
defined where utterances before and after the desired phrase can be ignored. This is programmatically
challenging to do correctly, and this new version sets out to improve performance over previous
mplementations of this rule.
- The new LumenVox Media Server provides
an interface to our Speech Engine via MRCPv1 (RTSP) and MRCPv2 (SIP) connectivity. These are two commonly used
networking protocols used among a variety of leading speech engines.
-
The LumenVox License Server will no longer allow old licenses to work with the newer versions of the
Speech Engine. This means that to use the latest versions of the software, you must ensure your software
maintenance is up to date and download and install licenses with the newer maintenance date. For more information,
see Upgrading LumenVox Software.
8.6 (January 2009):
-
New configuration files have been added to the Engine. This should
allow greater control over Engine settings without having to use the API. A few older configuration files, such as the old
license_client.conf, have been merged into these new files. If you still have the old configuration files in place,
the Engine will prefer the values from those, so it should not break backwards compatibility for any users.
-
The Engine has a new startup procedure that should help catch problems earlier. There is a new
startup procedure for the speech client (the SpeechPort interface) and a new
startup procedure for the speech server.
-
We have revamped all of our example application code. It should be better
commented and more cleanly written. We have also added a few new sample applications.
8.5.100 (May 2008):
-
Licenses can now be uninstalled, using the latest License
Server. If you have a machine with licenses that were set up before the release of 8.5, you will need to upgrade those licenses.
Users who do not need to uninstall a license should be unaffected by this change.
-
The location of files on Linux has been restructured. This represents a major revamp of the LumenVox software on
Linux. Existing Linux users will need to take this into account when they upgrade. Note that we have also completely dropped
the environment variables ($LVBIN, $LVLIB, etc.) on our Linux installations, and we have split the Engine into three
separate Linux packages: an Engine client package, an Engine server package, and a "core" package containing files
shared across all products. Please see Linux Directory Structure for information
about where our files are now installed.
-
Short words (e.g. "back" or "stop") should now have higher confidence scores when correctly recognized.
8.0.300 (December 2007):
-
Added new utilities related to phonetic
spellings and our internal dictionary of words. You can now use
LV_SRE_CheckWordsInDictionary
(C API) / CheckWordsInDictionary
(C++) to determine whether a word or string of words is in the dictionary of a given language.
LV_SRE_GetPhoneticPronunciationCount
(C API) / GetPhoneticPronunciationCount (C++) returns the number of pronunciations the Engine has for a string of words, and
LV_SRE_GetPhoneticPronunciation
(C API) / GetPhoneticPronunciation (C++ API) returns the actual phonemes for a string of words.
-
Improved the Engine's performance in a number of ways, particularly relating to memory use on Linux. Fixed some potential
memory leaks.
-
Improved memory performance for the MRCP Server.