Edit on GitHub

FileTranscriber API

Transcribe speech from audio/video files to text using the whisper.cpp speech-to-text implementation.

FileTranscriber(options)

Create a new FileTranscriber instance.

Syntax

const transcriber = new FileTranscriber(options);

Parameters

ParamTypeDefaultDescription
optionsFileTranscriberOptions
createModule(moduleArg = {}) => Promise<any>Exported createModule() function from @transcribe/shout
modelstring | FileWhisper.cpp model file in ggml format. Will call fetch() if string, otherwise will use the provided file.
workerPathstringDefaults to the directory where shout.wasm.js is located.Path to shout.wasm.worker.mjs file.
dtwTypeDtwType: "tiny" | "base" | "small" | "tiny.en" | "base.en" | "small.en"""Specify the type of the model used if should compute word level timestamps using DTW algorithm.
onReady() => void() => {}Called after init.
onProgress(progress: number) => void() => {}Called on progress (new segment), 0..100
onCanceled() => void() => {}Called after transcription process got canceled.
onSegment(segment: TranscribeResultSegment) => void() => {}Called when a new transcribed segment is ready.
onComplete(result: TranscriptionResult) => void() => {}Called when transcription is complete.

Returns

Returns a new FileTranscriber instance.

init()

Loads model and creates a new shout instance. Must be called before transcribe().

Syntax

await transcriber.init();

Returns

A promise that resolves to void Promise<void>.

transcribe(file, options?)

Transcribes audio to text and returns a Promise that resolves with a TranscriptionResult that contains the transcription data as JSON.

Syntax

await transcriber.transcribe("my.mp3");
await transcriber.transcribe("my.mp3", options);
await transcriber.transcribe(file, options);

Parameters

ParamTypeDefaultDescription
audiostring | FileURL to audio file or File object.
optionsFileTranscriberOptions
langstring"auto"Language code of the audio language (eg. en)
threadsnumberthis.maxThreadsNumber of threads to use. Defaults to max available.
translatebooleanfalseTranslate result to english.
max_lennumber0Max number of characters in a single segment, 0 means no limit.
split_on_wordbooleanfalseIf true, transcriber will try to split the text on word boundarie.
suppress_non_speechbooleanfalseIf true, transcriber will try to suppress non-speech segments.
token_timestampsbooleantrueIf true, token level timestamps will be calculated.

Returns

A promise that resolve to a transcribe result Promise<TranscripeResult> JSON containing all transcribe data like, text, timestamps, ect. .

cancel()

Cancels the current transcription. May take some time.

Syntax

await transcriber.cancel();

Returns

Returns a Promise that resolve to void Promise<void>.

destroy()

Destroys shout instance and frees wasm memory.

Syntax

transcriber.destroy();