---
source_path: "models/tpl/tpl-spot-vad.md"
canonical_url: "https://doc.sensory.com/tnl/7.8/models/tpl/tpl-spot-vad/"
---

# tpl-spot-vad

This [template](https://doc.sensory.com/tnl/7.8/models/tpl/index.md#template-type) runs the [wake word](https://doc.sensory.com/tnl/7.8/models/types/wake-word.md#wake-word-type) in slot [0](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#0)
until it detects, then does start- and endpoint detection with a [VAD](https://doc.sensory.com/tnl/7.8/models/types/vad.md#vad-type)
on the audio stream following the wake word.

`tpl-spot-vad` has [task-type](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#task-type)` == `[phrasespot-vad](https://doc.sensory.com/tnl/7.8/api/setting-keys/values.md#phrasespot-vad).

Expected [task types](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#task-type):

* **Slot 0:** [phrasespot](https://doc.sensory.com/tnl/7.8/api/setting-keys/values.md#phrasespot)

**Also see these related items:** [tpl-spot-vad-3.13.0.snsr](https://doc.sensory.com/tnl/7.8/models/index.md#tpl-spot-vad)

## Operation

```mermaid
flowchart TD
  start((start))
  start --> startWW

  subgraph slot0[<b>slot 0</b> &lpar;phrasespot&rpar;]
    startWW((start))
    fetchWW[/samples from ->audio-pcm/]
    audioWW(^sample-count)
    processWW[process]
    result(^result)
    stopWW((stop))
    startWW --> fetchWW
    fetchWW --> audioWW
    audioWW --> processWW
    processWW --> fetchWW
    processWW -->|recognize| result
    result --> stopWW
  end

  listenBegin(^listen-begin)
  listenEnd(^listen-end)

  stopWW --> listenBegin
  listenBegin --> fetch0

  fetch0[/samples from ->audio-pcm/]
  fetch1[/samples from ->audio-pcm/]
  audio0(^sample-count)
  audio1(^sample-count)

  silence(^silence)
  begin(^begin)
  END(^end)
  limit(^limit)

  process0[process]
  process1[process]
  out[\samples to <-audio-pcm\]

  final@{ shape: f-circ }

  fetch0 --> audio0
  audio0 --> process0
  process0 --> fetch0
  process0 -->|speech start| begin
  process0 -->|timeout| silence
  silence --> final

  begin --> fetch1
  fetch1 --> audio1
  audio1 --> out
  out --> process1
  process1 --> fetch1
  process1 -->|speech end| END
  process1 -->|speech limit| limit
  END --> final
  limit --> final

  final --> listenEnd
  listenEnd --> startWW
```

Operation flow.

1. Read audio data from [->audio-pcm](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#-audio-pcm).
2. Invoke [^sample-count](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#sample-count-event).
3. If processing detects a vocabulary phrase, skip to step 5.
4. Continue processing until [STREAM_END](https://doc.sensory.com/tnl/7.8/api/inference.md#rc_stream_end) occurs on [->audio-pcm](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#-audio-pcm),
   or one of the event handlers returns a code other than [OK](https://doc.sensory.com/tnl/7.8/api/inference.md#rc_ok).
5. Invoke [^result](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#result)
6. Invoke [^listen-begin](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#listen-begin)
7. Read audio data from [->audio-pcm](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#-audio-pcm).
8. Invoke [^sample-count](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#sample-count-event).
9. If speech detected within [leading-silence](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#leading-silence) ms continue at step 12.
10. If _no_ speech detected within [leading-silence](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#leading-silence) ms, invoke [^silence](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#silence)
   and skip to step 19.
11. Continue processing at step 7 until [STREAM_END](https://doc.sensory.com/tnl/7.8/api/inference.md#rc_stream_end).
12. Invoke [^begin](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#begin).
13. Read audio data from [->audio-pcm](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#-audio-pcm).
14. Invoke [^sample-count](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#sample-count-event).
15. If [pass-through](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#pass-through) `== 1` write speech samples to [<-audio-pcm](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#audio-pcm-out).
16. If end detected within [max-recording](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#max-recording) ms, invoke [^end](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#end)
    and skip to step 19.
17. If end _not_ detected within [max-recording](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#max-recording) ms, invoke [^limit](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#limit)
    and skip to step 19.
18. Continue processing at step 13 until [STREAM_END](https://doc.sensory.com/tnl/7.8/api/inference.md#rc_stream_end).
19. Invoke [^listen-end](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#listen-end)
20. Restart at step 1.

Register callback handlers with [setHandler](https://doc.sensory.com/tnl/7.8/api/inference.md#sethandler) only for those events you're interested in.

## Settings

**Available events:** [^begin](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#begin), [^end](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#end), [^limit](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#limit), [^listen-begin](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#listen-begin), [^listen-end](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#listen-end), [^result](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#result), [^sample-count](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#sample-count-event), [^silence](https://doc.sensory.com/tnl/7.8/api/setting-keys/events.md#silence)

**Available iterators:** [operating-point-iterator](https://doc.sensory.com/tnl/7.8/api/setting-keys/iterators.md#operating-point-iterator), [vocab-iterator](https://doc.sensory.com/tnl/7.8/api/setting-keys/iterators.md#vocab-iterator)

**Available results:** [audio-stream](https://doc.sensory.com/tnl/7.8/api/setting-keys/results.md#audio-stream), [audio-stream-first](https://doc.sensory.com/tnl/7.8/api/setting-keys/results.md#audio-stream-first), [audio-stream-last](https://doc.sensory.com/tnl/7.8/api/setting-keys/results.md#audio-stream-last)

**Available runtime settings:** [->audio-pcm](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#-audio-pcm), [<-audio-pcm](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#audio-pcm-out), [audio-stream-from](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#audio-stream-from), [audio-stream-to](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#audio-stream-to), [dsp-acmodel-stream](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#dsp-acmodel-stream), [dsp-header-stream](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#dsp-header-stream), [dsp-search-stream](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#dsp-search-stream), [skip-to-ms](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#skip-to-ms), [skip-to-sample](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#skip-to-sample)

**Available configuration settings:** [audio-stream-size](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#audio-stream-size), [backoff](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#backoff), [delay](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#delay), [dsp-target](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#dsp-target), [duration-ms](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#duration-ms), [hold-over](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#hold-over), [include-leading-silence](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#include-leading-silence), [include-wake-word-audio](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#include-wake-word-audio), [leading-silence](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#leading-silence), [listen-window](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#listen-window), [low-fr-operating-point](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#low-fr-operating-point), [max-recording](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#max-recording), [operating-point](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#operating-point), [pass-through](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#pass-through), [samples-per-second](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#samples-per-second), [sv-threshold](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#sv-threshold), [wake-word-at-end](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#wake-word-at-end)

**Available values:** [phrasespot-vad](https://doc.sensory.com/tnl/7.8/api/setting-keys/values.md#phrasespot-vad)

**Also see these related items:** [live-segment.c](https://doc.sensory.com/tnl/7.8/api/sample/c/live-segment.md#live-segment-code), [snsr-eval.c](https://doc.sensory.com/tnl/7.8/api/sample/c/snsr-eval.md#snsr-eval-code), [segmentSpottedAudio.java](https://doc.sensory.com/tnl/7.8/api/sample/java/segmentSpottedAudio.md#segmentspottedaudio-code)

## Notes

Use this for wake-word gated audio sent to cloud engines.

Set [include-wake-word-audio](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#include-wake-word-audio) `= 1` to include the wake word audio in the
VAD audio output stream.

This template writes the VAD-segmented audio to [<-audio-pcm](https://doc.sensory.com/tnl/7.8/api/setting-keys/runtime.md#audio-pcm-out).
If your application does not use this, set [pass-through](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#pass-through) `= 0`.

## Examples

```console
% cd $HOME/Sensory/TrulyNaturalSDK/7.9.0-pre.0

% bin/snsr-edit -o vg-vad.snsr\
    -t model/tpl-spot-vad-3.13.0.snsr\
    -f 0 model/spot-voicegenie-enUS-6.5.1-m.snsr\
    -s include-wake-word-audio=1

# Say "Voice genie, what's the capital of Oregon?"
% bin/snsr-eval -o vad-audio.wav -vvt vg-vad.snsr
Using live audio from default capture device. ^C to stop.
Using operating point 8.
Available operating points: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21.
Available vocabulary:
  1: "voicegenie"
phrase:
  1950   2550 (1) voicegenie
words:
  1950   2550 (1) voicegenie

  2730 [^listen-begin]
  2730 [^begin]
  1650   4200 [^end] VAD speech region.
  4980 [^listen-end]
^C
```

Review _vad-audio.wav_: the recording starts [backoff](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#backoff) ms before the
the beginning of "voice genie" and continues until [hold-over](https://doc.sensory.com/tnl/7.8/api/setting-keys/configuration.md#hold-over) ms after the end
of the utterance.

<!-- Abbreviation definitions from includes/abbreviations.md -->
*[API]: Application Programming Interface
*[FR]: False Reject: the recognizer did not trigger when the target phrase was spoken
*[TNL]: TrulyNatural, Sensory's large-vocabulary speech recognition technology
*[VAD]: Voice Activity Detector
