---
source_path: "models/tpl/tpl-vad-lvcsr.md"
canonical_url: "https://doc.sensory.com/tnl/7.8/models/tpl/tpl-vad-lvcsr/"
---

# tpl-vad-lvcsr _(TrulyNatural only)_

This [template](https://doc.sensory.com/tnl/7.8/models/tpl/index.md#template-type) detects speech with a [VAD](https://doc.sensory.com/tnl/7.8/models/types/vad.md#vad-type)
and sends the segmented audio to the [LVCSR](https://doc.sensory.com/tnl/7.8/models/types/lvcsr.md#lvcsr-type) or [STT](https://doc.sensory.com/tnl/7.8/models/types/stt.md#stt-type)
recognizer in slot [0](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#0).

`tpl-vad-lvcsr` has [task-type](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#task-type)` == `[phrasespot](https://doc.sensory.com/tnl/7.8/api/setting-keys/values.md#phrasespot).

Expected [task types](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#task-type):

* **Slot 0:** [lvcsr](https://doc.sensory.com/tnl/7.8/api/setting-keys/values.md#lvcsr)

**Also see these related items:** [tpl-vad-lvcsr-3.17.0.snsr](https://doc.sensory.com/tnl/7.8/models/index.md#tpl-vad-lvcsr), [tpl-opt-spot-vad-lvcsr](https://doc.sensory.com/tnl/7.8/models/tpl/tpl-opt-spot-vad-lvcsr.md#tpl-opt-spot-vad-lvcsr-type)

## Operation

```mermaid
flowchart TD
  start((start))
  start --> fetch0

  subgraph slot0[<b>slot 0</b> &lpar;lvcsr&rpar;]
    startSTT((start))
    startSTTfinal((start))
    stopSTT((stop))
    stopSTTpartial((stop))
    processSTT[process]
    partialSTT(^result-partial)
    intentSTT(^nlu-intent)
    slotSTT(^nlu-slot)
    resultSTT(^result)
    nluSTT{NLU<br>match?}

    slmSTT{SLM<br>included?}
    generateSTT[generate]
    slmstartSTT(^slm-start)
    slmresultpartialSTT(^slm-result-partial)
    slmresultSTT(^slm-result)

    startSTT --> processSTT
    processSTT ---->|hypothesis| partialSTT
    partialSTT --> stopSTTpartial

    startSTTfinal --> nluSTT
    nluSTT -->|yes| intentSTT
    nluSTT -->|no| resultSTT
    intentSTT --> slotSTT
    slotSTT --> resultSTT
    slotSTT -->|more| intentSTT

    resultSTT --> slmSTT
    slmSTT -->|yes| slmstartSTT
    slmSTT -->|no| stopSTT
    slmstartSTT -->|OK| generateSTT
    slmstartSTT -->|STOP| stopSTT
    generateSTT -->|response| slmresultpartialSTT
    slmresultpartialSTT --> generateSTT
    generateSTT -->|done| slmresultSTT
    slmresultSTT --> stopSTT
  end

  fetch0[/samples from ->audio-pcm/]
  fetch1[/samples from ->audio-pcm/]
  audio0(^sample-count)
  audio1(^sample-count)

  silence(^silence)
  begin(^begin)
  END(^end)
  limit(^limit)

  process0[VAD process]
  process1[VAD process]

  final@{ shape: f-circ }
  listenEnd@{ shape: f-circ }

  fetch0 --> audio0
  audio0 --> process0
  process0 --> fetch0
  process0 -->|speech start| begin
  process0 -->|timeout| silence
  silence ~~~ final
  silence --> listenEnd

  begin --> fetch1
  fetch1 --> audio1
  audio1 --> process1

  process1 --> startSTT
  stopSTTpartial --> fetch1

  process1 -->|speech end| END
  process1 -->|speech limit| limit
  END --> final
  limit --> final

  final --> startSTTfinal
  stopSTT --> listenEnd
  listenEnd ----> fetch0
```

Operation flow.

1. Read audio data from [->audio-pcm](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#-audio-pcm).
2. Invoke [^sample-count](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#sample-count-event).
3. If VAD processing does not detect the start of speech within the [leading-silence](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#leading-silence) timeout, invoke [^silence](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#silence) and continue at step 1.
4. Invoke [^begin](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#begin) if processing detects the start of speech, else continue at step 1.
5. Read audio date from [->audio-pcm](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#-audio-pcm).
6. Invoke [^sample-count](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#sample-count-event).
7. If VAD processing detects an endpoint invoke either [^limit](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#limit) or [^end](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#end) and continue at step 9.
8. Process VAD segmented audio in the LVCSR or STT recognizer
    * Invoke [^result-partial](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#result-partial) with interim recognition result hypothesis.
    * Continue at step 5.
9. Produce a final LVCSR or STT recognition hypothesis.
    * Invoke [^nlu-intent](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#nlu-intent) and [^nlu-slot](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#nlu-slot) for each NLU intent found.
    * Invoke [^result](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#result) with the final recognition hypothesis.
    * If there's no SLM, continue at step 1.
    * Invoke [^slm-start](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#slm-start), if the callback returns [STOP](https://doc.sensory.com/tnl/7.8/api/inference.md#rc_stop), continue at step 1.
    * Generate SLM result, invoking [^slm-result-partial](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#slm-result-partial) on each generated token.
    * Invoke [^slm-result](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#slm-result) with complete SLM result.
    * Continue at step 1.

Register callback handlers with [setHandler](https://doc.sensory.com/tnl/7.8/api/inference.md#sethandler) only for those events you're interested in.

## Settings

**Available events:** [^begin](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#begin), [^end](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#end), [^limit](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#limit), [^nlu-intent](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#nlu-intent), [^nlu-slot](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#nlu-slot), [^result](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#result), [^result-partial](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#result-partial), [^sample-count](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#sample-count-event), [^silence](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#silence), [^slm-result](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#slm-result), [^slm-result-partial](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#slm-result-partial), [^slm-start](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#slm-start)

**Available iterators:** _none_

**Available results:** [audio-stream](https://doc.sensory.com/tnl/7.8/api/setting-keys/results.md#audio-stream), [audio-stream-first](https://doc.sensory.com/tnl/7.8/api/setting-keys/results.md#audio-stream-first), [audio-stream-last](https://doc.sensory.com/tnl/7.8/api/setting-keys/results.md#audio-stream-last)

**Available runtime settings:** [->audio-pcm](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#-audio-pcm), [audio-stream-from](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#audio-stream-from), [audio-stream-to](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#audio-stream-to)

**Available configuration settings:** [audio-stream-size](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#audio-stream-size), [audio-stream-size](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#audio-stream-size), [backoff](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#backoff), [custom-vocab](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#custom-vocab), [hold-over](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#hold-over), [include-leading-silence](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#include-leading-silence), [leading-silence](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#leading-silence), [max-recording](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#max-recording), [partial-result-interval](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#partial-result-interval), [samples-per-second](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#samples-per-second), [stt-profile](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#stt-profile)

**Available values:** [lvcsr](https://doc.sensory.com/tnl/7.8/api/setting-keys/values.md#lvcsr), [phrasespot](https://doc.sensory.com/tnl/7.8/api/setting-keys/values.md#phrasespot)

**Also see these related items:** [live-spot.c](https://doc.sensory.com/tnl/7.8/api/sample/c/live-spot.md#live-spot-code), [snsr-eval.c](https://doc.sensory.com/tnl/7.8/api/sample/c/snsr-eval.md#snsr-eval-code), [PhraseSpot.java](https://doc.sensory.com/tnl/7.8/api/sample/android/enroll-trigger.md#et-code)

## Notes

Use this template for command and control type applications where commands are initiated
just by speaking.

## Examples

```console
% cd $HOME/Sensory/TrulyNaturalSDK/7.9.0-pre.0

% bin/snsr-edit -o vad-stt.snsr\
    -t model/tpl-vad-lvcsr-3.17.0.snsr\
    -f 0 model/stt-enUS-automotive-medium-2.3.15-pnc.snsr

# Say, for example: "Turn the air conditioning up all the way"
% snsr-eval -t vad-stt.snsr
P   1000   1040 T
P   1000   1600 Turn the egg
P   1040   2040 Turn the air conditioner
P   1040   2320 Turn the air conditioning up
P   1040   2760 Turn the air conditioning up all the way
NLU intent: set_fan (0.9547) = turn the air conditioning up 100%
NLU entity:   hvac (0.9744) = air conditioning
NLU entity:   percentage_value (0.8963) = 100%
  1040   2880 Turn the air conditioning up all the way.
^C
```

<!-- Abbreviation definitions from includes/abbreviations.md -->
*[API]: Application Programming Interface
*[LVCSR]: Large Vocabulary Continuous Speech Recognition model, feed-forward neural net acoustic model with FST decoder
*[NLU]: Natural Language Understanding model
*[SLM]: Generative Small Language Model
*[STT]: Speech To Text: transformers with language model and CTC decoding
*[TNL]: TrulyNatural, Sensory's large-vocabulary speech recognition technology
*[VAD]: Voice Activity Detector