---
source_path: "models/tpl/tpl-opt-spot-vad-lvcsr.md"
canonical_url: "https://doc.sensory.com/tnl/7.8/models/tpl/tpl-opt-spot-vad-lvcsr/"
---

# tpl-opt-spot-vad-lvcsr _(TrulyNatural only)_

This [template](https://doc.sensory.com/tnl/7.8/models/tpl/index.md#template-type) _optionally_ runs the [wake word](https://doc.sensory.com/tnl/7.8/models/types/wake-word.md#wake-word-type) in slot [0](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#0)
until it detects, then segments the audio following the wake word with a [VAD](https://doc.sensory.com/tnl/7.8/models/types/vad.md#vad-type)
and sends the segmented audio to the [LVCSR](https://doc.sensory.com/tnl/7.8/models/types/lvcsr.md#lvcsr-type) or [STT](https://doc.sensory.com/tnl/7.8/models/types/stt.md#stt-type)
recognizer in slot [1](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#1).

[slot](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#slot) controls whether `tpl-opt-spot-vad-lvcsr` waits for the wake word:

* With `slot == 0` it waits for the wake word before starting the VAD.
  In this mode the behavior is that of [tpl-spot-vad-lvcsr](https://doc.sensory.com/tnl/7.8/models/tpl/tpl-spot-vad-lvcsr.md#tpl-spot-vad-lvcsr-type).
* With `slot == 1` starts the VAD immediately and the behavior is that of [tpl-vad-lvcsr](https://doc.sensory.com/tnl/7.8/models/tpl/tpl-vad-lvcsr.md#tpl-vad-lvcsr-type).

You can change [slot](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#slot) at runtime. Use this to gate only the first of a series of commands
with a wake word.

`tpl-spot-vad-lvcsr` has [task-type](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#task-type)` == `[phrasespot](https://doc.sensory.com/tnl/7.8/api/setting-keys/values.md#phrasespot).

Expected [task types](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#task-type):

* **Slot 0:** [phrasespot](https://doc.sensory.com/tnl/7.8/api/setting-keys/values.md#phrasespot)
* **Slot 1:** [lvcsr](https://doc.sensory.com/tnl/7.8/api/setting-keys/values.md#lvcsr)

**Also see these related items:** [tpl-opt-spot-vad-lvcsr-1.28.0.snsr](https://doc.sensory.com/tnl/7.8/models/index.md#tpl-opt-spot-vad-lvcsr), [tpl-spot-vad-lvcsr](https://doc.sensory.com/tnl/7.8/models/tpl/tpl-spot-vad-lvcsr.md#tpl-spot-vad-lvcsr-type), [tpl-vad-lvcsr](https://doc.sensory.com/tnl/7.8/models/tpl/tpl-vad-lvcsr.md#tpl-vad-lvcsr-type)

## Operation

```mermaid
flowchart TD
  start((start))
  slotCheck0{slot == 0?}

  start --> slotCheck0
  slotCheck0 -->|yes| startWW
  slotCheck0 -->|no| fetch0

  subgraph slot0[<b>slot 0</b> &lpar;phrasespot&rpar;]
    startWW((start))
    fetchWW[/samples from ->audio-pcm/]
    audioWW(^sample-count)
    processWW[process]
    result(0.^result)
    stopWW((stop))
    startWW --> fetchWW
    fetchWW --> audioWW
    audioWW --> processWW
    processWW --> fetchWW
    processWW -->|recognize| result
    result --> stopWW
  end

  subgraph slot1[<b>slot 1</b> &lpar;lvcsr&rpar;]
    startSTT((start))
    startSTTfinal((start))
    stopSTT((stop))
    stopSTTpartial((stop))
    processSTT[process]
    partialSTT(^result-partial)
    intentSTT(^nlu-intent)
    slotSTT(^nlu-slot)
    resultSTT(^result)
    nluSTT{NLU<br>match?}

    slmSTT{SLM<br>included?}
    generateSTT[generate]
    slmstartSTT(^slm-start)
    slmresultpartialSTT(^slm-result-partial)
    slmresultSTT(^slm-result)

    startSTT --> processSTT
    processSTT ---->|hypothesis| partialSTT
    partialSTT --> stopSTTpartial

    startSTTfinal --> nluSTT
    nluSTT -->|yes| intentSTT
    nluSTT -->|no| resultSTT
    intentSTT --> slotSTT
    slotSTT --> resultSTT
    slotSTT -->|more| intentSTT

    resultSTT --> slmSTT
    slmSTT -->|yes| slmstartSTT
    slmSTT -->|no| stopSTT
    slmstartSTT -->|OK| generateSTT
    slmstartSTT -->|STOP| stopSTT
    generateSTT -->|response| slmresultpartialSTT
    slmresultpartialSTT --> generateSTT
    generateSTT -->|done| slmresultSTT
    slmresultSTT --> stopSTT
  end

  listenBegin(^listen-begin)
  listenEnd(^listen-end)

  stopWW --> listenBegin
  listenBegin --> fetch0

  fetch0[/samples from ->audio-pcm/]
  fetch1[/samples from ->audio-pcm/]
  audio0(^sample-count)
  audio1(^sample-count)

  silence(^silence)
  begin(^begin)
  END(^end)
  limit(^limit)

  process0[VAD process]
  process1[VAD process]

  final@{ shape: f-circ }

  slotCheck1{slot == 0?}

  fetch0 --> audio0
  audio0 --> process0
  process0 --> fetch0
  process0 -->|speech start| begin
  process0 -->|timeout| silence
  silence ~~~ final
  silence --> slotCheck1

  begin --> fetch1
  fetch1 --> audio1
  audio1 --> process1

  process1 --> startSTT
  stopSTTpartial --> fetch1

  process1 -->|speech end| END
  process1 -->|speech limit| limit
  END --> final
  limit --> final

  final --> startSTTfinal
  stopSTT --> slotCheck1

  slotCheck1 -->|no| fetch0
  slotCheck1 -->|yes| listenEnd
  listenEnd --> startWW
```

Operation flow.

1. Read audio data from [->audio-pcm](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#-audio-pcm).
2. Invoke [^sample-count](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#sample-count-event).
3. If processing does not detect a wake word, continue at step 1.
4. Invoke [0.^result](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#result) for the wake word.
5. Invoke [^listen-begin](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#listen-begin) and start VAD processing.
6. Read audio data from [->audio-pcm](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#-audio-pcm).
7. Invoke [^sample-count](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#sample-count-event).
8. If VAD processing does not detect the start of speech within the [leading-silence](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#leading-silence) timeout, invoke [^silence](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#silence) and continue at step 15.
9. Invoke [^begin](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#begin) if processing detects the start of speech, else continue at step 6.
10. Read audio date from [->audio-pcm](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#-audio-pcm).
11. Invoke [^sample-count](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#sample-count-event).
12. If VAD processing detects an endpoint invoke either [^limit](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#limit) or [^end](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#end) and continue at step 14.
13. Process VAD segmented audio in the LVCSR or STT recognizer
    * Invoke [^result-partial](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#result-partial) with interim recognition result hypothesis.
    * Continue at step 10.
14. Produce a final LVCSR or STT recognition hypothesis.
    * Invoke [^nlu-intent](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#nlu-intent) and [^nlu-slot](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#nlu-slot) for each NLU intent found.
    * Invoke [^result](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#result) with the final recognition hypothesis.
    * If there's no SLM, continue at step 15.
    * Invoke [^slm-start](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#slm-start), if the callback returns [STOP](https://doc.sensory.com/tnl/7.8/api/inference.md#rc_stop), continue at step 15.
    * Generate SLM result, invoking [^slm-result-partial](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#slm-result-partial) on each generated token.
    * Invoke [^slm-result](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#slm-result) with complete SLM result.
15. Invoke [^listen-end](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#listen-end) and start listening for the wake word again at step 1.

Register callback handlers with [setHandler](https://doc.sensory.com/tnl/7.8/api/inference.md#sethandler) only for those events you're interested in.

## Settings

**Available events:** [^begin](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#begin), [^end](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#end), [^limit](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#limit), [^listen-begin](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#listen-begin), [^listen-end](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#listen-end), [^nlu-intent](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#nlu-intent), [^nlu-slot](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#nlu-slot), [^result](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#result), [^result-partial](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#result-partial), [^sample-count](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#sample-count-event), [^silence](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#silence), [^slm-result](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#slm-result), [^slm-result-partial](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#slm-result-partial), [^slm-start](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#slm-start)

**Available iterators:** [operating-point-iterator](https://doc.sensory.com/tnl/7.8/api/setting-keys/iterators.md#operating-point-iterator), [vocab-iterator](https://doc.sensory.com/tnl/7.8/api/setting-keys/iterators.md#vocab-iterator)

**Available results:** [audio-stream](https://doc.sensory.com/tnl/7.8/api/setting-keys/results.md#audio-stream), [audio-stream-first](https://doc.sensory.com/tnl/7.8/api/setting-keys/results.md#audio-stream-first), [audio-stream-last](https://doc.sensory.com/tnl/7.8/api/setting-keys/results.md#audio-stream-last)

**Available runtime settings:** [->audio-pcm](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#-audio-pcm), [audio-stream-from](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#audio-stream-from), [audio-stream-to](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#audio-stream-to)

**Available configuration settings:** [audio-stream-size](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#audio-stream-size), [audio-stream-size](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#audio-stream-size), [backlog-interval](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#backlog-interval), [backoff](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#backoff), [custom-vocab](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#custom-vocab), [delay](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#delay), [duration-ms](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#duration-ms), [hold-over](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#hold-over), [include-leading-silence](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#include-leading-silence), [include-wake-word-audio](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#include-wake-word-audio), [leading-silence](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#leading-silence), [low-fr-operating-point](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#low-fr-operating-point), [max-recording](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#max-recording), [operating-point](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#operating-point), [partial-result-interval](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#partial-result-interval), [samples-per-second](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#samples-per-second), [slot](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#slot), [stt-profile](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#stt-profile), [sv-threshold](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#sv-threshold), [wake-word-at-end](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#wake-word-at-end)

**Available values:** [lvcsr](https://doc.sensory.com/tnl/7.8/api/setting-keys/values.md#lvcsr), [phrasespot](https://doc.sensory.com/tnl/7.8/api/setting-keys/values.md#phrasespot)

**Also see these related items:** [live-spot.c](https://doc.sensory.com/tnl/7.8/api/sample/c/live-spot.md#live-spot-code), [snsr-eval.c](https://doc.sensory.com/tnl/7.8/api/sample/c/snsr-eval.md#snsr-eval-code), [PhraseSpot.java](https://doc.sensory.com/tnl/7.8/api/sample/android/enroll-trigger.md#et-code)

## Notes

Use this template for command and control type applications where commands are
initiated with a wake word in certain contexts and not in others.

Set [slot](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#slot)`= 1` in the [^result](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#result) handler, and [slot](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#slot)`= 0` in the
[^silence](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#silence) handler. With this configuration the recognizer requires a wake word to start
listening only for the first in a series of interactions. After this it will revert to requiring
a wake word only if the user does not say anything for at least [leading-silence](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#leading-silence) ms.

VAD settings [backoff](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#backoff), [hold-over](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#hold-over), [leading-silence](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#leading-silence), [max-recording](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#max-recording), and [trailing-silence](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#trailing-silence)
apply to both slot 0 and slot 1, but [include-leading-silence](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#include-leading-silence) applies only to slot 0.

Set [include-wake-word-audio](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#include-wake-word-audio)` = 1` to include the wake word audio in the
samples passed to the LVCSR or STT recognizer. STT hypotheses do not include the wake word
text unless Sensory specifically configured the model to do so.

The [^result-partial](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#result-partial) and [^result](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#result) events are for the LVCSR or STT recognizer
in slot 1. If you need direct access to the wake word result, prefix the event
with the slot path: `0.^result`  Use the slot prefix to read values in the `0.^result` event handler too, for example call [getString](https://doc.sensory.com/tnl/7.8/api/inference.md#getters) with key [0.text](https://doc.sensory.com/tnl/7.8/api/setting-keys/results.md#text) to read the wake word transcription.

## Examples

### Select wake-word or VAD-only behavior

```console
% cd $HOME/Sensory/TrulyNaturalSDK/7.9.0-pre.0

% bin/snsr-edit -o opt-vg-stt.snsr\
    -t model/tpl-opt-spot-vad-lvcsr-1.28.0.snsr\
    -f 0 model/spot-voicegenie-enUS-6.5.1-m.snsr\
    -f 1 model/stt-enUS-automotive-medium-2.3.15-pnc.snsr\
    -s include-wake-word-audio=1

# Say "Voice genie, open the sunroof."
% snsr-eval -vt opt-vg-stt.snsr
Using live audio from default capture device. ^C to stop.
P  33010  33490 (0.3201) Open the sun
P  33050  33890 (0.7712) Open the sunroof
 32010  34185 [^end] VAD speech region.
NLU intent: open_window (0.9956) = open the sunroof
NLU entity:   roof (0.9595) = sunroof
 33050  33890 (0.5731) Open the sunroof.
^C

# Select the VAD-only path with slot=1
# Say "Close all the windows"
% snsr-eval -vt opt-vg-stt.snsr -s slot=1
Using live audio from default capture device. ^C to stop.
P   2150   2670 (0.257) Clothes. All
P   2190   3150 (0.7631) Close. All the wind
P   2190   3430 (0.9899) Close all the windows
  1950   3855 [^end] VAD speech region.
NLU intent: close_window (0.9977) = close all the windows
  2190   3470 (0.9244) Close all the windows.
^C
```

### Use trailing wake-word
Recognize a phrase with the wake word at either end of an utterance.

```console
% cd $HOME/Sensory/TrulyNaturalSDK/7.9.0-pre.0

% bin/snsr-edit -o opt-vg-stt-vg.snsr\
    -t model/tpl-opt-spot-vad-lvcsr-1.28.0.snsr\
    -f 0 model/spot-voicegenie-enUS-6.5.1-m.snsr\
    -f 1 model/stt-enUS-automotive-medium-2.3.15-pnc.snsr\
    -s include-wake-word-audio=1\
    -s wake-word-at-end=1

# Say "Voice genie, set the radio to 91.5 FM."
% bin/snsr-eval -vt opt-vg-stt-vg.snsr
Using live audio from default capture device. ^C to stop.
P   4360   5000 (0.2927) Set. The radio
P   4400   5280 (5.7e-07) Set the radio to n
P   4400   5760 (0.7336) Set the radio to ninety-one
P   4400   6120 (0.6005) Set the radio to ninety one point
P   4400   6440 (0.5195) Set the radio to ninety one point. Five
P   4400   6480 (0.6733) Set the radio to ninety one point. Five
  3405   7455 [^end] VAD speech region.
NLU intent: set_radio (0.9674) = set the radio to 91.5 FM
NLU entity:   radio_station (0.9688) = 91.5 FM
  4400   7080 (0.3896) Set the radio to ninety one point. Five F. M.
 15225  17490 [^end] VAD speech region.

# Say "Will it rain in Portland tomorrow, Voice Genie?"
NLU intent: no_command (0.9977) = will it rain in portland tomorrow
NLU entity:   time (0.9773) = tomorrow
 15460  17260 (0.6731) Will it rain in Portland tomorrow?
^C
```

<!-- Abbreviation definitions from includes/abbreviations.md -->
*[API]: Application Programming Interface
*[FR]: False Reject: the recognizer did not trigger when the target phrase was spoken
*[LVCSR]: Large Vocabulary Continuous Speech Recognition model, feed-forward neural net acoustic model with FST decoder
*[NLU]: Natural Language Understanding model
*[SLM]: Generative Small Language Model
*[STT]: Speech To Text: transformers with language model and CTC decoding
*[TNL]: TrulyNatural, Sensory's large-vocabulary speech recognition technology
*[VAD]: Voice Activity Detector