---
source_path: "faq.md"
canonical_url: "https://doc.sensory.com/tnl/7.8/faq/"
---

# Frequently Asked Questions

- [Recipes](#faq-recipes) — push audio, capture speech after a wake word, push-to-talk, UI-thread callbacks.
- [Concepts](#faq-concepts) — thread safety, sample rate, model types, LVCSR, RTOS integration.
- [Performance](#faq-performance) — code size, memory, wake-word tuning.
- [Troubleshooting](#faq-troubleshooting) — debug audio, model compatibility, display issues.

## Recipes

Short answers for tasks that come right after
[Your first program](https://doc.sensory.com/tnl/7.8/getting-started/your-first-program.md#your-first-program). Follow the links
for full samples and API detail.

### How do I push audio?

Use **push** mode when your app owns the audio path — for example a custom
driver, an RTOS without a blocking read API, or fixed-size buffers from
another thread. The library does not read the microphone for you; you pass
each chunk to [push](https://doc.sensory.com/tnl/7.8/api/inference.md#push) on [->audio-pcm](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#-audio-pcm).

Contrast with **pull** mode on [Your first program](https://doc.sensory.com/tnl/7.8/getting-started/your-first-program.md#your-first-program):
there you attach [fromAudioDevice](https://doc.sensory.com/tnl/7.8/api/io.md#fromaudiodevice) (or a file stream) and call [run](https://doc.sensory.com/tnl/7.8/api/inference.md#run).

1. [new](https://doc.sensory.com/tnl/7.8/api/inference.md#new) / `new SnsrSession()` — empty [Session](https://doc.sensory.com/tnl/7.8/api/inference.md#session).
2. [load](https://doc.sensory.com/tnl/7.8/api/inference.md#load) your `.snsr` model.
3. [setHandler](https://doc.sensory.com/tnl/7.8/api/inference.md#sethandler) for [^result](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#result) and any other [events](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#events) you need.
4. Loop: read a chunk from your driver (often 10–20 ms of 16 kHz PCM), then
   `snsrPush(s, SNSR_SOURCE_AUDIO_PCM, buffer, nbytes)` (Java: `session.push(...)`).
5. When finished, call [stop](https://doc.sensory.com/tnl/7.8/api/inference.md#stop) once to flush buffered audio, then [release](https://doc.sensory.com/tnl/7.8/api/heap.md#release).

Do **not** attach an input stream with [setStream](https://doc.sensory.com/tnl/7.8/api/inference.md#setters) on [->audio-pcm](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#-audio-pcm) in push
mode — audio enters only through [push](https://doc.sensory.com/tnl/7.8/api/inference.md#push). Handlers run on the thread that
calls [push](https://doc.sensory.com/tnl/7.8/api/inference.md#push) (see [UI thread callbacks](https://doc.sensory.com/tnl/7.8/faq.md#ui-thread-callbacks)).

<!-- tab: c -->

**C/C++**

```c
/* 15 ms @ 16 kHz mono 16-bit LE */
#define CHUNK 480
char pcm[CHUNK];
size_t n = myAudioRead(pcm, CHUNK);  /* your driver */
snsrPush(s, SNSR_SOURCE_AUDIO_PCM, pcm, n);
```
<!-- /tab -->

<!-- tab: java -->

**Java**

```java
byte[] pcm = myAudioRead();  /* your driver */
session.push(Snsr.SOURCE_AUDIO_PCM, pcm);
```
<!-- /tab -->

<!-- tab: py -->

**Python**

```python
# 15 ms @ 16 kHz mono 16-bit LE
CHUNK = 480
pcm = my_audio_read(CHUNK)  # your driver
session.push(snsr.SOURCE_AUDIO_PCM, pcm)
```
<!-- /tab -->

**Also see these related items:** [API overview § Push mode](https://doc.sensory.com/tnl/7.8/api/overview.md#processing-modes), [push-audio.c](https://doc.sensory.com/tnl/7.8/api/sample/c/push-audio.md#push-audioc), [spot-data.c](https://doc.sensory.com/tnl/7.8/api/sample/c/spot-data.md#spot-datac)

### How do I capture the audio that fired the wake word?

Use a **composed** spotter + VAD model so the SDK segments speech after the
trigger and writes PCM to an output stream.

1. Build or download a pipeline model — for example compose
   [tpl-spot-vad](https://doc.sensory.com/tnl/7.8/models/index.md#tpl-spot-vad) with [snsr-edit](https://doc.sensory.com/tnl/7.8/tools/snsr-edit.md#snsr-edit) (see [tpl-spot-vad-type](https://doc.sensory.com/tnl/7.8/models/tpl/tpl-spot-vad.md#tpl-spot-vad-type)), or use a
   pre-built spot+VAD `.snsr` from your SDK tree.
2. [load](https://doc.sensory.com/tnl/7.8/api/inference.md#load) the pipeline into a [Session](https://doc.sensory.com/tnl/7.8/api/inference.md#session).
3. Attach live input on [->audio-pcm](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#-audio-pcm) ([fromAudioDevice](https://doc.sensory.com/tnl/7.8/api/io.md#fromaudiodevice) or your push loop).
4. Attach a WAV (or buffer) sink on [<-audio-pcm](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#-audio-pcm):
   `snsrSetStream(s, SNSR_SINK_AUDIO_PCM, snsrStreamFromFileName("out.wav", "w"))`
   (wrap with [fromAudioStream](https://doc.sensory.com/tnl/7.8/api/io.md#fromaudiostream) if needed — see [live-segment.c](https://doc.sensory.com/tnl/7.8/api/sample/c/live-segment.md#live-segmentc)).
5. Register [^result](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#result) to note the spot, and [^end](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#end) (or [^limit](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#limit)) to know
   when the following utterance ended; stop [run](https://doc.sensory.com/tnl/7.8/api/inference.md#run) or return [STOP](https://doc.sensory.com/tnl/7.8/api/inference.md#rc_stop)
   from the endpoint handler.
6. Optional: set [include-leading-silence](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#include-leading-silence) to `1` (or [include-wake-word-audio](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#include-wake-word-audio))
   if the saved clip should include the wake word audio, not only the command.

The Java sample [segmentSpottedAudio.java](https://doc.sensory.com/tnl/7.8/api/sample/java/segmentSpottedAudio.md#segmentspottedaudiojava) runs this flow with Gradle; C uses
the same settings in [live-segment.c](https://doc.sensory.com/tnl/7.8/api/sample/c/live-segment.md#live-segmentc).

Read [begin-ms](https://doc.sensory.com/tnl/7.8/api/setting-keys/results.md#begin-ms) / [end-ms](https://doc.sensory.com/tnl/7.8/api/setting-keys/results.md#end-ms) in the endpoint handler if you need
timestamps without writing a file.

### How do I gate a command set with a push-to-talk button or other external event?

Use [tpl-spot-sequential](https://doc.sensory.com/tnl/7.8/models/tpl/tpl-spot-sequential.md#tpl-spot-sequential-type) with [loop](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#loop) `= 2`. This
template normally listens for a wake word in slot `0`, then a command in
slot `1`; with `loop = 2` it skips slot `0` and pins listening to the command
recognizer in slot `1` until you reset it.

A typical "wake word *or* push-to-talk" flow:

1. Build a sequential model with your wake word in slot `0` and your command
   set in slot `1` ([tpl-spot-sequential](https://doc.sensory.com/tnl/7.8/models/tpl/tpl-spot-sequential.md#tpl-spot-sequential-type) § Examples).
2. Run the [Session](https://doc.sensory.com/tnl/7.8/api/inference.md#session) in the default mode (`loop = 0`). Slot `0` listens
   for the wake word and hands off to slot `1` after a spot.
3. When the user presses the push-to-talk button, set `loop = 2` from your UI
   thread (or whichever thread receives the button event) and the recognizer
   will treat the next utterance as a slot-`1` command.
4. After the command spots, set `loop = 0` to resume always-listening behavior.

If you want the *recognized* utterance to be a regular wake-word-gated
recognition but the wake word can come at the *end* of speech (for example,
"… please, computer"), use [wake-word-at-end](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#wake-word-at-end) on a
[tpl-spot-vad-lvcsr](https://doc.sensory.com/tnl/7.8/models/tpl/tpl-spot-vad-lvcsr.md#tpl-spot-vad-lvcsr-type),
[tpl-opt-spot-vad-lvcsr](https://doc.sensory.com/tnl/7.8/models/tpl/tpl-opt-spot-vad-lvcsr.md#tpl-opt-spot-vad-lvcsr-type), or
[tpl-spot-vad](https://doc.sensory.com/tnl/7.8/models/tpl/tpl-spot-vad.md#tpl-spot-vad-type) pipeline.

The older two-session pattern (one [Session](https://doc.sensory.com/tnl/7.8/api/inference.md#session) for the wake word, another
for push-to-talk) is no longer recommended; a single sequential model has the
same behavior and shares one audio path.

**Also see these related items:** [tpl-spot-sequential](https://doc.sensory.com/tnl/7.8/models/tpl/tpl-spot-sequential.md#tpl-spot-sequential-type), [loop](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#loop), [wake-word-at-end](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#wake-word-at-end)

### How do I wire callbacks into a UI thread?

[Session](https://doc.sensory.com/tnl/7.8/api/inference.md#session) [events](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#events) and [^result](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#result) handlers run on the **same thread** that
calls [run](https://doc.sensory.com/tnl/7.8/api/inference.md#run) or [push](https://doc.sensory.com/tnl/7.8/api/inference.md#push) — not on your UI thread. Keep handlers short: copy what
you need, then return. Update the UI from your toolkit's main thread.

| Platform | Pattern |
|----------|---------|
| **Android** | Run [run](https://doc.sensory.com/tnl/7.8/api/inference.md#run) / [push](https://doc.sensory.com/tnl/7.8/api/inference.md#push) on a worker thread or `HandlerThread`; post UI work with `Handler` / `runOnUiThread`. See [snsr-debug](https://doc.sensory.com/tnl/7.8/api/sample/android/snsr-debug.md#snsr-debug) (`PhraseSpot` worker thread). |
| **iOS (C API)** | Call [run](https://doc.sensory.com/tnl/7.8/api/inference.md#run) from a background `DispatchQueue`; update SwiftUI/UIKit on the main actor. See [PhraseSpot](https://doc.sensory.com/tnl/7.8/api/sample/ios/phrasespot.md#ios-ps). |
| **Java desktop** | Run recognition off the EDT; use `SwingUtilities.invokeLater` (or equivalent) for UI updates. |

The one cross-thread exception: you may call [stop](https://doc.sensory.com/tnl/7.8/api/inference.md#stop) from another thread to
unblock a [run](https://doc.sensory.com/tnl/7.8/api/inference.md#run) that is waiting on live audio ([thread-safe](https://doc.sensory.com/tnl/7.8/faq.md#thread-safe) FAQ).

Never share a [Session](https://doc.sensory.com/tnl/7.8/api/inference.md#session) or [Stream](https://doc.sensory.com/tnl/7.8/api/io.md#stream) handle across threads without your own
lock; create one session per recognition worker.

**Also see these related items:** [Your first program](https://doc.sensory.com/tnl/7.8/getting-started/your-first-program.md#your-first-program) (platform tabs),
[push](https://doc.sensory.com/tnl/7.8/api/inference.md#push), [live-spot.c](https://doc.sensory.com/tnl/7.8/api/sample/c/live-spot.md#live-spotc)

## Concepts

### Is this SDK thread-safe?

Yes, as long as [Session](https://doc.sensory.com/tnl/7.8/api/inference.md#session) and [Stream](https://doc.sensory.com/tnl/7.8/api/io.md#stream) handles are not shared between threads.
The number of handles per thread is limited only by system resources.

If you need to share one of these handles across threads, you _must_
provide application-level mutual exclusion locking.

**Note:**

There is just one exception to this requirement:
You may call [stop](https://doc.sensory.com/tnl/7.8/api/inference.md#stop) on a [Session](https://doc.sensory.com/tnl/7.8/api/inference.md#session) handle from a different thread than
the one [run](https://doc.sensory.com/tnl/7.8/api/inference.md#run) is executing on.

If you replace the dynamic memory allocator with [config](https://doc.sensory.com/tnl/7.8/api/library-config.md#config) and [CONFIG_ALLOC](https://doc.sensory.com/tnl/7.8/api/library-config.md#config_alloc)
the new allocator implementation must be thread-safe. Use [allocLock](https://doc.sensory.com/tnl/7.8/api/library-config.md#alloclock)
to add thread-safety to an allocator that is not.

### What sample rate does the SDK expect?

Sample rate is technically model-dependent — read the active model's
[samples-per-second](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#samples-per-second) setting if you need to confirm — but every model
shipped in this TrulyNatural distribution requires **16 kHz, mono,
16-bit signed PCM**. If you need a model that runs at a lower rate
(typically 8 kHz for telephony audio), [contact Sales][Sales].

When your capture device runs at a different rate, follow these rules:

- **Never up-sample to 16 kHz.** Up-sampling does not add the high-frequency
  information the recognizer relies on; the resulting audio sounds similar
  to a human listener but recognition accuracy will be noticeably worse than
  on natively recorded 16 kHz audio. If 16 kHz capture is not available,
  [contact Sales][Sales] about a sub-16 kHz model instead.
- **Down-sampling from a higher rate is fine** — for example, from 48 kHz on
  a typical USB microphone — provided you follow standard down-sampling
  practice. In particular, apply a low-pass anti-aliasing filter with a
  cut-off below the new Nyquist frequency before decimating; otherwise
  high-frequency content will fold back into the band the recognizer cares
  about and degrade accuracy.

Most platform audio APIs (Android `AudioRecord`, iOS Audio Queue Services,
ALSA, Core Audio, Windows Multimedia Extensions) will resample correctly when
you ask them for 16 kHz directly; prefer that over rolling your own filter
chain.

### What is a Command Set?

Command sets are phrase spotters with more than one phrase. These are
frequently tuned to have a limited [listen-window](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#listen-window).

Command set recognizers have [task-type](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#task-type) `==` [phrasespot](https://doc.sensory.com/tnl/7.8/api/setting-keys/values.md#phrasespot)
and can be used as a drop-in replacement for any wake word. No code changes are required.

Most command sets are tuned for use after an always-listening keyword
spotter. The [tpl-spot-sequential](https://doc.sensory.com/tnl/7.8/models/tpl/tpl-spot-sequential.md#tpl-spot-sequential-type) template provides
a convenient way to build such a model.

### Can I run two wake word models at the same time?

Yes, see [tpl-spot-concurrent](https://doc.sensory.com/tnl/7.8/models/tpl/tpl-spot-concurrent.md#tpl-spot-concurrent-type).

### Can I create a trigger-to-search model?

Yes. Create a new phrase spot model from the [tpl-spot-vad](https://doc.sensory.com/tnl/7.8/models/tpl/tpl-spot-vad.md#tpl-spot-vad-type) template.

### How do I enroll Fixed Trigger models?

EFT models use the same API, and follow the same enrollment recipe as UDT models.

Replace the UDT model _udt-universal-3.67.1.0.snsr_ in any of the
examples with an EFT enrollment model such as _eft-hbg-enUS-23.0.0.9.snsr_.

### How do I improve the user experience for wake words in poor audio environments?

Use a spotter model with Smart Wake Word support. See
[low-fr-operating-point](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#low-fr-operating-point) and [duration-ms](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#duration-ms).

### How do I spot phrases on a Real-Time Operating System (RTOS) with a custom audio driver and no filesystem?

You should implement a new custom stream similar to
[data-stream.c](https://doc.sensory.com/tnl/7.8/api/sample/c/data-stream.md#data-streamc) which is used in [spot-data-stream.c](https://doc.sensory.com/tnl/7.8/api/sample/c/spot-data-stream.md#spot-data-streamc). This shows
how to make a custom stream which should encapsulate your audio driver
functionality, and which your [Session](https://doc.sensory.com/tnl/7.8/api/inference.md#session) can pull data from.

An alternative is pushing data onto a stream. See [spot-data.c](https://doc.sensory.com/tnl/7.8/api/sample/c/spot-data.md#spot-datac).
You can take data chunks of any size (perhaps provided by
your audio driver) and push them onto a stream to be read by an [Session](https://doc.sensory.com/tnl/7.8/api/inference.md#session).

### How do I use Large Vocabulary Continuous Speech Recognition? _(TrulyNatural only)_

This TrulyNatural release includes three different ways of running a
speech-to-text recognizer: [without audio segmentation](https://doc.sensory.com/tnl/7.8/faq.md#lvcsr-no-vad),
[with VAD audio segmentation](https://doc.sensory.com/tnl/7.8/faq.md#lvcsr-vad), and with [wake word gated VAD](https://doc.sensory.com/tnl/7.8/faq.md#lvcsr-spot-vad).

**Note:**

The [^result](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#result) callback only happens when a VAD
endpoint is detected, or the end of the input stream is reached. For
applications with live audio recognition, LVCSR recognizers should
always be used with a VAD, such as [tpl-opt-spot-vad-lvcsr](https://doc.sensory.com/tnl/7.8/models/tpl/tpl-opt-spot-vad-lvcsr.md#tpl-opt-spot-vad-lvcsr-type),
[tpl-spot-vad-lvcsr](https://doc.sensory.com/tnl/7.8/models/tpl/tpl-spot-vad-lvcsr.md#tpl-spot-vad-lvcsr-type), or [tpl-vad-lvcsr](https://doc.sensory.com/tnl/7.8/models/tpl/tpl-vad-lvcsr.md#tpl-vad-lvcsr-type).

#### LVCSR without audio segmentation

The _stt-enUS-automotive-medium-2.3.15-pnc.snsr_ model included in this distribution
is a generic broad-domain US English speech-to-text recognizer with a
special domain focus on automotive commands.

```console
% bin/snsr-eval -t model/stt-enUS-automotive-medium-2.3.15-pnc.snsr \
    data/enrollments/armadillo-1-3-c.wav
P     40    200 Im
P     80    640 Armadillo
P    120   1120 Armadillo playing
P    120   1520 Armadillo play marsa
P    120   1880 Armadillo play more songs by
P    120   2320 Armadillo play more songs by this art
P    120   2600 Armadillo play more songs by this artist
P    120   2640 Armadillo play more songs by this artist
NLU intent: music_player (0.9849) = armadillo play more songs by this artist
   120   2640 Armadillo play more songs by this artist.
```

Preliminary or partial results above are prefixed with `P`. Suppress
these by setting the [partial-result-interval](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#partial-result-interval) to `0`:

```console
% bin/snsr-eval -t model/stt-enUS-automotive-medium-2.3.15-pnc.snsr \
    -s partial-result-interval=0 \
    data/enrollments/armadillo-1-3-c.wav
NLU intent: music_player (0.9849) = armadillo play more songs by this artist
   120   2640 Armadillo play more songs by this artist.
```

#### LVCSR with VAD-segmented audio

Large vocabulary recognizers perform better when used with a Voice
Activity Detector that removes extraneous leading and trailing
silence.

Create such a VAD-lvcsr model using the [tpl-vad-lvcsr](https://doc.sensory.com/tnl/7.8/models/tpl/tpl-vad-lvcsr.md#tpl-vad-lvcsr-type) template:

```console
% bin/snsr-edit -t model/tpl-vad-lvcsr-3.17.0.snsr \
    -f 0 model/stt-enUS-automotive-medium-2.3.15-pnc.snsr \
    -o vad-stt-enUS-automotive-medium-pnc.snsr
```

Evaluate using [snsr-eval](https://doc.sensory.com/tnl/7.8/tools/snsr-eval.md#snsr-eval):

```console
% bin/snsr-eval -t vad-stt-enUS-automotive-medium-pnc.snsr \
    data/enrollments/armadillo-1-0-c.wav
P    230    830 Armadilla
P    270   1150 Armadillo, eight
P    310   1630 Armadillo, eighteen percent
P    310   1910 Armadillo. Eighteen percent of s
P    310   2430 Armadillo, eighteen percent of six hundred
P    310   2790 Armadillo, eighteen percent of six hundred and forty
P    310   3150 Armadillo, eighteen percent of six hundred forty three
NLU intent: no_command (0.9765) = armadillo eighteen percent of 643
NLU entity:   number (0.9564) = 643
   310   3190 Armadillo, eighteen percent of six hundred forty three.
```

#### LVCSR following a wake word

The [tpl-spot-vad-lvcsr](https://doc.sensory.com/tnl/7.8/models/tpl/tpl-spot-vad-lvcsr.md#tpl-spot-vad-lvcsr-type) template provides a way to start a
large-vocabulary recognizer with a spotted wake word. The example
below enrolls a wake word, then uses the enrolled spotter with the
broad-domain recognizer.

Create an enrolled spotter for "jackalope":

```console
% spot-enroll -vt model/udt-universal-3.67.1.0.snsr \
    +jackalope \
    data/enrollments/jackalope-1-0.wav \
    data/enrollments/jackalope-1-1.wav \
    data/enrollments/jackalope-1-4.wav \
    data/enrollments/jackalope-1-3.wav
Adapting: 100% complete.
Enrolled model saved to "enrolled-sv.snsr"
```

Combine the enrolled spotter and the broad-domain recognizer using the
[tpl-spot-vad-lvcsr-3.23.0.snsr](https://doc.sensory.com/tnl/7.8/models/index.md#tpl-spot-vad-lvcsr) template:

```console
% snsr-edit -vt model/tpl-spot-vad-lvcsr-3.23.0.snsr \
  -f 0 enrolled-sv.snsr \
  -f 1 model/stt-enUS-automotive-medium-2.3.15-pnc.snsr \
  -s include-leading-silence=1 \
  -o jackalope-stt-enUS-automotive-medium-pnc.snsr
Saved edited model to "jackalope-stt-enUS-automotive-medium-pnc.snsr".
```

Evaluate using [snsr-eval](https://doc.sensory.com/tnl/7.8/tools/snsr-eval.md#snsr-eval). The wake word is not included
in the LVCSR transcription.

```console
% snsr-eval -t jackalope-stt-enUS-automotive-medium-pnc.snsr \
    data/enrollments/jackalope-1-2-c.wav
P   1050   1530 Directions
P   1050   1930 Directions to sus
P   1050   2370 Directions to Susan's house
P   1050   2530 Directions to Susan's house
NLU intent: navigation (0.9973) = directions to susan's house
NLU entity:   navigation_location (0.9811) = susan's house
  1050   2530 Directions to Susan's house.
```

#### LVCSR with lightweight NLU parsing

The included LVCSR and STT models support a lightweight natural language mark-up.
This can significantly simplify application code that
has to interpret recognition results. See [grammar-based recognition](https://doc.sensory.com/tnl/7.8/models/types/lvcsr.md#grammar-based-recognition)
for a description of the grammar syntax.

##### NLU with custom grammar recognizers

```console
% snsr-eval -t model/lvcsr-build-enUS-14.0.2-5MB.snsr \
    -s partial-result-interval=0 \
    -f grammar-stream data/grammars/enrollments-nlu-slot.txt \
    data/enrollments/armadillo-1-4-c.wav
NLU intent: avcontrol (0) =  record a video
NLU entity:   action (0) = record
NLU entity:   type (0) = video
   435   1920 armadillo record a video
```

##### NLU with broad-domain recognizers

In TrulyNatural [v6.16.0](https://doc.sensory.com/tnl/7.8/changes/version-6.md#v6.16.0) and later, NLU parsing is a separate processing
step that occurs after the [^result](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#result) event. NLU parsing includes
a special `.` symbol that matches any input word. This allows
crafting of more robust island parsers that can be used with free-form
recognition results from a broad-domain model.

This small example detects a small set of microwave control commands
using _lvcsr-lib-enUS-14.0.2.snsr_.

**Note:**

The _stt-enUS-automotive-medium-2.3.15-pnc.snsr_ model includes
machine-learned NLU processing for automotive command tasks. If you
use [nlu-grammar-stream](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#nlu-grammar-stream) with this model the grammar-based NLU will
override the machine-learned NLU parsing.

**`tiny-microwave.nlu`**

```
# Microwave command NLU post-processor grammar
# tiny-microwave.nlu

# power level setting, "fifty percent". don't capture optional "power"
power = ~s.percent power?;

# timer duration, "two minutes and ten seconds"
duration = ~s.timer;

# defrost command: the word "defrost" followed by
# zero or more power or duration values, both captured
# .* matches any input word sequence
defrost = defrost ( .* ({power} | {duration}) .* )* ;

# default action matches any input and discards it
default = .:*;

# set clock time: the word "clock" or "time" followed by
# a time ("seven twenty nine pm").
# ignore spurious words before and after the time specification
clock = (clock | time) .* {time ~s.time} .*;

# list of all the actions we've defined, captured
action = {defrost} | {clock} | {default};

# match any one of the actions, ignoring unknown words before
# and after
nlu = <s> .* $action .* </s>;
```

Build and run a recognizer with live input.

```
% snsr-eval -vat model/stt-enUS-automotive-medium-2.3.15-pnc.snsr \
    -t model/lvcsr-lib-enUS-14.0.2.snsr \
    -f nlu-grammar-stream tiny-microwave.nlu \
    -s partial-result-interval=0
Using live audio from default capture device. ^C to stop.

# "Defrost my soup for 15 minutes at 30% power"
Using live audio from default capture device. ^C to stop.
  4035   8835 [^end] VAD speech region.
NLU intent: defrost (0) = defrost my soup for 15 minutes at thirty percent power
NLU entity:   duration (0) = 15 minutes
NLU entity:   power (0) = thirty percent power
  4310   8470 (0.4805) Defrost my soup for fifteen minutes at thirty percent power.

# "Could you set the clock to 3:43 pm?"
 48165  51810 [^end] VAD speech region.
NLU intent: clock (0) = clock to 15:43
NLU entity:   time (0) = 15:43
 48360  51360 (0.163) Could you set the clock to three? Forty three P? M.
```

##### Dealing with NLU parse ambiguity

It is possible to get more than one valid parse result if the
NLU grammar introduces ambiguity. The NLU processor scores these
alternates and returns the best hypotheses in order, up to
[nlu-match-max](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#nlu-match-max). During the [^nlu-slot](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#nlu-slot) callback,
[nlu-match-count](https://doc.sensory.com/tnl/7.8/api/setting-keys/results.md#nlu-match-count) reports the number of alternates available, with
[nlu-match-index](https://doc.sensory.com/tnl/7.8/api/setting-keys/results.md#nlu-match-index) the current alternate.

[nlu-match-max](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#nlu-match-max) defaults to `1` for best compatibility with
earlier releases.

**Warning:**

Resolving NLU ambiguity can be expensive both in terms of computation and
heap memory use.

Avoid using patterns that match arbitrary input in multiple ways:
```
g = <s> {left .*} {right .*} </s>;
```

This example uses two NLU grammars: _system.nlu_ for basic
functionality provided by a product, and _app.nlu_ to extend
NLU processing for a plug-in application. If the application
duplicates some of the system NLU actions, those duplicates need to be
reported for the system to take appropriate action.

**`system.nlu`**

```
# system.nlu
volume = volume: {volume-level ~s.percent};
preset = preset: number:? ~s.number-integer-0-9;
system = {volume} | {preset};
# :/-0.1 adds a small weight bias towards the ~app class, so
# ~app will outscore $system for identical matches
plugin = :/-0.1 ~app;
action = {system} | {plugin};
nlu = <s> $action </s>;
```

**`app.nlu`**

```
# app.nlu
media-control = ~s.control.media;
preset = preset: ( one | two | three | four | five );
nlu = {media-control} | {preset};
```

Build and run a recognizer with live input. Set the value
for [nlu-match-max](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#nlu-match-max) to allow up to ten alternate matches.

```console
% snsr-eval -vvat model/stt-enUS-automotive-medium-2.3.15-pnc.snsr \
    -t model/lvcsr-lib-enUS-14.0.2.snsr \
    -f nlu-grammar-stream system.nlu \
    -f nlu-grammar-stream.app app.nlu \
    -s partial-result-interval=0 \
    -s nlu-match-max=10
Using live audio from default capture device. ^C to stop.

# "volume 50%"
# in system grammar
  5235 [^begin]
  4710   6645 [^end] VAD speech region.
NLU intent: system (0) =  fifty percent
NLU entity:   volume.volume-level (0) = fifty percent
NLU  1/1 nlu-slot-value.system (0) = { volume { volume-level fifty percent } }
NLU  1/1 nlu-slot-value.system.volume (0) = { volume-level fifty percent }
NLU  1/1 nlu-slot-value.system.volume.volume-level (0) = fifty percent
phrase:
  4990   6270 (0.8939) Volume. Fifty percent.
words:
  4990   5470 (0.8955) Volume.
  5550   5870 (0.9986) Fifty
  5950   6270 (0.9996) percent.

 # "fast forward"
# in plugin grammar
 17070 [^begin]
 16545  17940 [^end] VAD speech region.
NLU intent: plugin (0) =  fast forward
NLU entity:   media-control (0) = fast forward
NLU  1/1 nlu-slot-value.plugin (0) = { media-control fast forward }
NLU  1/1 nlu-slot-value.plugin.media-control (0) = fast forward
phrase:
 16860  17540 (0.7646) Fast forward.
words:
 16860  17100 (0.9913) Fast
 17220  17540 (0.7713) forward.

# "preset 5"
# in both system and plugin grammars, but plugin reported first
# due to the weight bias
 22290 [^begin]
 21765  23325 [^end] VAD speech region.
NLU intent: plugin (0) =  five
NLU entity:   preset (0) = five
NLU  1/2 nlu-slot-value.plugin (0) = { preset five }
NLU  1/2 nlu-slot-value.plugin.preset (0) = five
NLU intent: system (0) =  five
NLU entity:   preset (0) = five
NLU  2/2 nlu-slot-value.system (0) = { preset five }
NLU  2/2 nlu-slot-value.system.preset (0) = five
phrase:
 22040  22920 (0.9432) Preset. Five.
words:
 22040  22480 (0.9443) Preset.
 22680  22920 (0.9988) Five.
```

##### How do I take action on an NLU result?

You can think of an intent as specifying which function or method
you should call to perform an action.  Entities identify parts of
the utterance that include additional detail. For example, a
`call_contact` intent might have a `contact_name` entity that
specifies who to call.

- Register a handler for [^nlu-intent](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#nlu-intent)
- In this handler,
    + Retrieve [nlu-intent-name](https://doc.sensory.com/tnl/7.8/api/setting-keys/results.md#nlu-intent-name) as a string.
    + Map this intent name to an action. Do this by comparing the
      intent name to all valid intent names for which you want to
      perform an action.
    + If the matched action requires additional data, retrieve
      the expected [nlu-entity-value](https://doc.sensory.com/tnl/7.8/api/setting-keys/results.md#nlu-entity-value) by name.
    + Call a function (specified by the intent value) with zero or
      more arguments specified by the entity values.
    + Return from the intent event handler with [OK](https://doc.sensory.com/tnl/7.8/api/inference.md#rc_ok).

## Performance

### How can I reduce application code size?

By default, any applications linked against the TrulyNatural library
can run any model (_.snsr_) file supported by the library. You can
reduce the overall code size of an application by limiting the library
capabilities to only the models of interest.

Use [snsr-edit](https://doc.sensory.com/tnl/7.8/tools/snsr-edit.md#snsr-edit) with the `-i` flag to create custom initialization
code that references only the modules used by the models included in your
application. For example:

```console
% snsr-edit -v -i -t spot-voicegenie-enUS-6.5.1-m.snsr
Output written to "snsr-custom-init.c".
```

This creates a custom initialization file, _snsr-custom-init.c_,
that references only the code modules used by
_spot-voicegenie-enUS-6.5.1-m.snsr_. Add this file to your
application, and compile with `-DSNSR_USE_SUBSET`.
This will replace all calls to [snsrNew](https://doc.sensory.com/tnl/7.8/api/inference.md#new) with a variant that
initializes only the required modules. See [Compile-time macros § SNSR_USE_SUBSET](https://doc.sensory.com/tnl/7.8/api/compile-macros.md#snsr-use-subset).

You can further reduce code size by linking at the function instead of
the module level.
See [sample/c/Makefile](https://doc.sensory.com/tnl/7.8/api/sample/c/index.md#makefile) for compiler and linker flag examples (`-ffunction-sections`).

### Can I avoid dynamic memory allocation?

You can avoid all calls to `malloc()`, `realloc()`, and `free()` by
replacing the memory allocator with [CONFIG_ALLOC](https://doc.sensory.com/tnl/7.8/api/library-config.md#config_alloc).

For embedded use, [allocTLSF](https://doc.sensory.com/tnl/7.8/api/library-config.md#alloctlsf) is a good choice. Use it with one or
more pre-defined read-write memory segments that remain valid for the
lifetime of the application.

### How do I improve wake word performance?

[Contact Sensory](https://doc.sensory.com/tnl/7.8/contact.md#contact) if interested in pursuing these customizations.
There may be additional cost involved. Not all combinations may be possible
depending on platform and trigger specification.

#### How to measure real-time factor and MIPS
* To measure the real-time factor, time how long it takes to run
  the spotter over a long audio file. Then, real
  time factor = (run time in seconds) / (length of audio in seconds).
* To measure the MIPS on your device, use a profiler like perf
  when running the spotter over an audio file. Then,
  MIPS = (No. of instructions) / (length of audio in seconds * 1000000).

#### What if the spotter runs too slow, or consumes too many cycles?

You could explore one of these options to see an improvement: Try
[multi-threaded](https://doc.sensory.com/tnl/7.8/faq.md#multi-threaded), [frame-stacked](https://doc.sensory.com/tnl/7.8/faq.md#frame-stacked), or [little-big](https://doc.sensory.com/tnl/7.8/faq.md#little-big) spotters.
You may also want to get a smaller spotter model, which uses less CPU
(in proportion to its size) with a small reduction in FA and FR performance.
Contact Sensory to see if these options are right for you.

#### What if the spotter consumes too much memory?

1. Contact Sensory for a smaller model.
1. If your platform runs code directly from ROM, consider converting
   the spotter to compiled-in code. This will run from read-only code
   space, and reduce heap requirements.
   Use the [snsr-edit](https://doc.sensory.com/tnl/7.8/tools/snsr-edit.md#snsr-edit) tool to create a C source file from any
   spotter model. See [fromCode](https://doc.sensory.com/tnl/7.8/api/io.md#fromcode) and examples
   [spot-data-stream.c](https://doc.sensory.com/tnl/7.8/api/sample/c/spot-data-stream.md#spot-data-streamc) and [spot-data.c](https://doc.sensory.com/tnl/7.8/api/sample/c/spot-data.md#spot-datac)

#### What is a little-big spotter?

A little-big spotter does sequential recognition by first running a
low-power spotter. When this spots, it re-processes the audio with a
high-power state-of-the-art spotter. This reduces average CPU cycles
(and hence power) required to run a spotter with a small increase in
latency. This one combined model has the behavior of a high-power
spotter.

#### What is a frame-stacked spotter?

Frame stacked spotters reduce the CPU load by 30-45%, in exchange for
a small reduction in FA and FR performance. The resolution of time
alignments is also reduced by a factor of two.

#### What is a multi-threaded spotter?

Multi-threaded spotters speed up execution on CPUs with more than one core.

## Troubleshooting

### How do I diagnose wake word audio issues?

Create a new wake word model from the [tpl-spot-debug](https://doc.sensory.com/tnl/7.8/models/tpl/tpl-spot-debug.md#tpl-spot-debug-type)
template. See the [notes](https://doc.sensory.com/tnl/7.8/models/tpl/tpl-spot-debug.md#tpl-spot-debug-type-notes) and [example](https://doc.sensory.com/tnl/7.8/models/tpl/tpl-spot-debug.md#tpl-spot-debug-type-examples).

### Can I use models from the beta releases?

Yes. This release is compatible with older models, but it requires a
modification to the task requirement sanity checks.

Use `"~0.5.0 || 1.0.0"` instead of `"1.0.0"`, for example:

<!-- tab: c -->

**C/C++**

```c
snsrRequire(session, SNSR_TASK_VERSION,  "~0.5.0 || 1.0.0");
```
<!-- /tab -->

<!-- tab: java -->

**Java**

```java
session.require(Snsr.TASK_VERSION,  "~0.5.0 || 1.0.0");
```
<!-- /tab -->

<!-- tab: py -->

**Python**

```python
session.require(snsr.TASK_VERSION, "~0.5.0 || 1.0.0")
```
<!-- /tab -->

The models included in the [v6.0.0](https://doc.sensory.com/tnl/7.8/changes/version-6.md#v6.0.0) release use
[task-version](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#task-version) values of `1.0.0`.
This makes these models incompatible with 5.0.0-beta releases.

### How do I display international characters in results?

On Windows systems, when using Sensory STT models with [snsr-eval](https://doc.sensory.com/tnl/7.8/tools/snsr-eval.md#snsr-eval) [v7.3.0](https://doc.sensory.com/tnl/7.8/changes/index.md#v7.3.0) or
earlier, international characters such as Chinese (zhCN) may appear as garbled symbols
such as "Σ╜á σÑ╜ σÉù" instead of correct [UTF-8][] characters "您 好 吗".

This is a display encoding issue, not an issue with the recognition output itself.

Solution Options

1.  Set Console Code Page to UTF-8

    Before running snsr-eval.exe, run the following command in the Windows Command Prompt:

    ```
    chcp 65001
    ```

    This sets the console's code page to UTF-8, enabling correct display of international characters.

    `snsr-eval` [v7.4.0](https://doc.sensory.com/tnl/7.8/changes/index.md#v7.4.0) and later does this before writing any output.

2.  Enable System-Wide UTF-8 Support (Recommended for Long-Term Use)

    - Open Settings > Time & Language > Administrative Language Settings
    - Under Change system locale, check: "**Beta: Use Unicode UTF-8 for worldwide language support**"
    - Save your changes and restart your computer to apply them

    This setting ensures that all applications and the console will handle UTF-8 properly by default.

<!-- Reference definitions from includes/links.md -->
[Sales]: https://www.sensory.com/contact/ "Sensory Sales"
[UTF-8]: https://en.wikipedia.org/wiki/UTF-8

<!-- Abbreviation definitions from includes/abbreviations.md -->
*[ALSA]: Advanced Linux Sound Architecture
*[API]: Application Programming Interface
*[EFT]: Enrolled Fixed Trigger: fixed wake words adapted to a speaker to improve accuracy
*[FA]: False Accept: the recognizer triggered when the target phrase was not spoken
*[FR]: False Reject: the recognizer did not trigger when the target phrase was spoken
*[LVCSR]: Large Vocabulary Continuous Speech Recognition model, feed-forward neural net acoustic model with FST decoder
*[MIPS]: Million Instructions Per Second
*[NLU]: Natural Language Understanding model
*[PNC]: Punctuation and Capitalization, an STT model variant that emits cased text with punctuation
*[ROM]: Read-Only Memory, typically nonvolatile flash memory
*[RTOS]: Real-Time Operating System
*[SDK]: Software Development Kit
*[STT]: Speech To Text: transformers with language model and CTC decoding
*[TNL]: TrulyNatural, Sensory's large-vocabulary speech recognition technology
*[UDT]: User-Defined Trigger: enrolled wake words and command sets
*[VAD]: Voice Activity Detector
