---
source_path: "tools/snsr-eval-batch.md"
canonical_url: "https://doc.sensory.com/tnl/7.8/tools/snsr-eval-batch/"
---

# snsr-eval-batch

This tool runs a [Wake word](https://doc.sensory.com/tnl/7.8/models/types/wake-word.md#wake-word-type), [LVCSR](https://doc.sensory.com/tnl/7.8/models/types/lvcsr.md#lvcsr-type) or [STT](https://doc.sensory.com/tnl/7.8/models/types/stt.md#stt-type)
model over a (typically large) number of audio files to measure the performance in
terms of the false accept (FA) rate, and the false reject (FR) ratio. Can also be
used to measure command substitution or word-error-rate (WER) in LVCSR and STT.

## Test data requirements

Audio files used for FR (in-vocabulary) testing:

- Must contain a single target phrase utterance per file.
- Must contain lead-in ambient audio before the target phrase begins.
  + In most cases one second of ambient audio will suffice.
  + For custom spotters, refer to the documentation delivered with the
    model for the exact requirements.
  + Most models created after May 2020 include a setting,
    `min-in-vocab-duration`, which specifies the minimum required lead-in
    time in milliseconds.
    You can query this with `snsr-edit -t model.snsr -q min-in-vocab-duration`
  + Recognition events that happen during the required lead-in time are
    counted as errors. See `INVFA` in the log file format table below.
  + You can override the minimum lead-in requirement on the command-line
    (with `-s min-in-vocab-duration=0`), but doing
    so means you will be testing the model outside of its intended
    operating environment.
- The FR ratio is calculated as the fraction of the in-vocabulary files
  that the spotter model did not find the phrase in, expressed as a
  percentage. Example: Out of 2000 files, 120 did not trigger the spotter.
  The false-reject ratio is therefore 6.0%.
- If reference phrase checking is used, then mismatches will be noted as
  substitutions (SB code) and be included in the FR count and ratio.
- If word-error-rate is used, then the total words, substitutions, additions
  and deletions in each phrase will be noted. The total count for each
  across the entire test set will be reported also.

Audio files used for FA testing:

- Should be much longer than the in-vocabulary examples.
- Should contain a selection of noise expected to be encountered during
  regular use.
- Must not contain explicit instances of the target phrase.
- The FA rate is calculated as the average number of times the spotter
  model mistakenly triggered per hour.
  Example: Out of 120 hours of audio, the spotter triggered 60 times.
  The false-accept rate is 0.5 / hour.
- If you run `snsr-eval-batch` with the `-u` flag, unexpected recognition
  events from the FR testing files are included in the false accept totals.
  These unexpected events include:
  + Spots that happen during the required lead-in period.
  + The second and all subsequent spots, as each in-vocabulary file
    must contain only a single target phrase utterance.
- FA testing can only be done on wake words. Commands, LVCSR and
  STT are not continuous listening technologies and FA testing is not
  relevant here.

## Usage

```
Runs a TrulyNatural SDK wake word model file on test data
and reports the false accept rate, false reject ratio, and execution speed.

usage: snsr-eval-batch -t task [options]
 options:
  -a                  : Add tpl-vad-lvcsr to LVCSR and STT models
  -c filename         : csv in-vocabulary (FR) and reference filename list
  -f setting filename : load filename into task setting
  -h                  : show this help and exit
  -i filename         : in-vocabulary (FR) filename list
  -j threads          : number of concurrent jobs (default: 1)
  -l filename         : log output file (default: <task>.log)
  -n                  : normalize results (lower case, strip punctuation)
  -o filename         : out-of-vocabulary (FA) filename list
  -s setting=value    : override a task setting
  -t task             : specify task filename (required)
  -u                  : count in-vocabulary FAs
  -v [-v [-v]]        : increase verbosity
  -w                  : calculate word-error rate on in-vocabulary audio

At least one of -i, -c, or -o is required.
-c and -i cannot be used together.

-c file format is two comma-separated filespecs '<audio.wav>,<reference.txt>'

Settings are strings used as keys to query or change task behavior.
Most frequently used for wake words and command sets is operating-point.
Refer to the TrulyNatural SDK documentation for a complete list and
descriptions of all supported settings.
```

- The files specified by the `-i` and `-o` options must contain
  exactly one audio file path per line, with no extraneous whitespace. The
  line separator is the newline character, `\n`.
- Files specified by the `-c` option must be a comma-separated value (CSV)
  file with exactly one audio file path and reference file path per line,
  and no extraneous whitespace. Each line will have two comma-separated
  fields. The first field is an audio file, and the second field is a text
  file containing the reference (expected result). UTF-8 is supported.
- Some combination of `-c`, `-i`, and/or `-o` must be specified.
- `-c` and `-i` cannot be used together.
- `-w` requres `-c`.
- `-u` counts unexpected phrase spots in the in-vocabulary (FR) data
  towards the false accept total. This only has an effect for spotters
  that require a significant lead-in time to stabilize. This flag can only
  be used when testing wake words and commands. (cannot be used with -w).
- The `-j` option determines the number of concurrent threads to start.
  For multi-core CPUs this can significantly speed up the evaluation.

## Example

```console
% snsr-eval-batch -v -v -v -t alexa-fr.snsr \
    -i inv.txt -o oov.txt -j 6 -s operating-point=10
Writing log to "alexa-fr.log"
    INV:  2612 files,  23.128 hr,  23:07:39.285
    OOV:   686 files, 142.984 hr, 142:59:01.345
  Total:  3298 files, 166.111 hr, 166:06:40.630
Using operating point 10.
Available operating points: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20.
3298 files, 166.111 hr, 118 FA  0.83/hr,  3.33% FR, 2525 TA,  658.9x RT
```

- 3298 files processed.
- 166.111 hours of audio processed.
- 118 false accept spots, which is an FA rate of 0.83 per hour.
- 3.33% false reject ratio.
- 2525 true accept spots on in-vocabulary test audio.
- 658.9 real-time factor.

```console
% snsr-eval-batch -a -t model/stt-enUS-automotive-medium-2.3.15-pnc.snsr \
    -c stt-16kHz-en-general-quicktest-full.csv -w -n -j 6
999 files, 1.281 hr, 9174 Words, 833 Substitutions, 198 Insertions, 120 Deletions, 12.546% WER,     5.3 xRT
```

- 999 files processed.
- 1.281 hours of audio processed.
- 9164 total words in test.
- 833 substitutions.
- 198 insertions.
- 120 deletions.
- 12.546% Word Error Rate.
- 5.3 real-time factor.

## Log file format

`snsr-eval-batch` produces a log file in plain text format. Each line in
this file follows the same pattern: `KEY [subkey] [detail]`.

KEY | subkey | detail | notes
-|-|-|-
CMDFR | "filename" | reference | false reject, no matches detected in this file
CMDSB | "filename" | start-ms end-ms "phrase" "reference" [sv-score](https://doc.sensory.com/tnl/7.8/api/setting-keys/results.md#sv-score) [score](https://doc.sensory.com/tnl/7.8/api/setting-keys/results.md#score) | mismatch between command phrase and reference
CMDTA | "filename" | start-ms end-ms "phrase" "reference" [sv-score](https://doc.sensory.com/tnl/7.8/api/setting-keys/results.md#sv-score) [score](https://doc.sensory.com/tnl/7.8/api/setting-keys/results.md#score) | true-accept with reference phrase checking
ERROR | error message | | unexpected error encountered
FR+SBCOUNT | double | | total number of false-reject + substitution errors
FR+SBRATIO | double | % | overall false-reject + substitution ratio
FACOUNT | integer | | total number of false-accept spots
FARATE | double | / hr | overall false-accept rate
FRCOUNT | double | | total number of false-reject errors
FRRATIO | double | % | overall false-reject ratio
INFO | start-time | YYYY-MM-DD HH:MM:SS.sss UTC | job start time in UTC
INFO | completion-time | YYYY-MM-DD HH:MM:SS.sss UTC | job end time in UTC
INFO | duration | double | total job duration in seconds
INFO | sdk-name | "TrulyHandsfree" or "TrulyNatural" |  
INFO | sdk-version | version-string | snsr-eval-batch SDK version
INFO | command-line | command-line arguments | includes @c argv[0]
INFO | operating-point | integer | selected operating point
INFO | inv-files | integer | number of in-vocabulary (FR) test files
INFO | inv-seconds | integer | seconds of in-vocabulary audio
INFO | inv-hours | HHH:MM:SS.sss | inv-seconds as hours, minutes, seconds
INFO | oov-files | integer | number of out-of-vocabulary (FA) test files
INFO | oov-seconds |integer | seconds of out-of-vocabulary audio
INFO | oov-hours | HHH:MM:SS.sss | oov-seconds as hours, minutes, seconds
INFO | inv/oov-seconds |integer | seconds of OOV audio in FR test files
INFO | inv/oov-hours | HHH:MM:SS.sss | inv/oov-seconds as hours, minutes, seconds
INFO | rejected-files | integer | number of rejected files (not used in the test)
INFO | real-time-factor | double | total duration of audio processed divided by the processing time
INVFA | "filename" | start-ms end-ms "phrase" [sv-score](https://doc.sensory.com/tnl/7.8/api/setting-keys/results.md#sv-score) [score](https://doc.sensory.com/tnl/7.8/api/setting-keys/results.md#score) | FA in in-vocabulary test file. This is a spot that happened during the `min-in-vocab-duration` lead-in period, or an additional, spurious, spot recognized in the in-vocabulary file.
INVFR | "filename" | | false reject, no spot in this file
INVTA | "filename" | start-ms end-ms "phrase" [sv-score](https://doc.sensory.com/tnl/7.8/api/setting-keys/results.md#sv-score) [score](https://doc.sensory.com/tnl/7.8/api/setting-keys/results.md#score) | true accept
INVTX | "filename" | N spots | more than one spot in this file
OOVFA | "filename" | start-ms end-ms "phrase" [sv-score](https://doc.sensory.com/tnl/7.8/api/setting-keys/results.md#sv-score) [score](https://doc.sensory.com/tnl/7.8/api/setting-keys/results.md#score) | FA in out-of-vocabulary test file
REJECT | "filename" | reason | filename was rejected as unusable
STTFR | "filename" | reference | false reject, no matches detected in this file
STTSB | "filename" | start-ms end-ms "phrase" "reference" word-count, substitutions, additions, deletions, word-error-rate | mismatch between LVCSR/STT phrase and reference
STTTA | "filename" | start-ms end-ms "phrase" "reference" word-count, substitutions, additions, deletions, word-error-rate | true-accept (no mismatch) between LVCSR/STT phrase and reference
TACOUNT | integer | | total number of true-accept spots
WER | double | % | overall word-error-rate
WER_DELETIONS | integer | | total number of WER deletions
WER_INSERTIONS | integer | | total number of WER insertions
WER_SUBSTITUTIONS | integer | | total number of WER substitutions
WER_WORDS | integer | | total number of WER words

<!-- Abbreviation definitions from includes/abbreviations.md -->
*[API]: Application Programming Interface
*[FA]: False Accept: the recognizer triggered when the target phrase was not spoken
*[FR]: False Reject: the recognizer did not trigger when the target phrase was spoken
*[LVCSR]: Large Vocabulary Continuous Speech Recognition model, feed-forward neural net acoustic model with FST decoder
*[SDK]: Software Development Kit
*[STT]: Speech To Text: transformers with language model and CTC decoding
*[TNL]: TrulyNatural, Sensory's large-vocabulary speech recognition technology
