FileTranscriber API

Transcribe speech from audio/video files to text using the whisper.cpp speech-to-text implementation.

FileTranscriber(options)

Create a new FileTranscriber instance.

const transcriber = new FileTranscriber(options);

Param	Type	Default	Description
options	`FileTranscriberOptions`
createModule	`(moduleArg = {}) => Promise<any>`		Exported `createModule()` function from `@transcribe/shout`
model	`string` \| `File`		Whisper.cpp model file in ggml format. Will call `fetch()` if string, otherwise will use the provided file.
workerPath	`string`	Defaults to the directory where `shout.wasm.js` is located.	Path to `shout.wasm.worker.mjs` file.
dtwType	`DtwType: "tiny" \| "base" \| "small" \| "tiny.en" \| "base.en" \| "small.en"`	`""`	Specify the type of the model used if should compute word level timestamps using DTW algorithm.
onReady	`() => void`	`() => {}`	Called after init.
onProgress	`(progress: number) => void`	`() => {}`	Called on progress (new segment), `0..100`
onCanceled	`() => void`	`() => {}`	Called after transcription process got canceled.
onSegment	`(segment: TranscribeResultSegment) => void`	`() => {}`	Called when a new transcribed segment is ready.
onComplete	`(result: TranscriptionResult) => void`	`() => {}`	Called when transcription is complete.

Returns a new FileTranscriber instance.

Loads model and creates a new shout instance. Must be called before transcribe().

await transcriber.init();

A promise that resolves to void Promise<void>.

Transcribes audio to text and returns a Promise that resolves with a TranscriptionResult that contains the transcription data as JSON.

await transcriber.transcribe("my.mp3");
await transcriber.transcribe("my.mp3", options);
await transcriber.transcribe(file, options);

Param	Type	Default	Description
audio	`string \| File`		URL to audio file or `File` object.
options	`FileTranscriberOptions`
lang	`string`	`"auto"`	Language code of the audio language (eg. `en`)
threads	`number`	`this.maxThreads`	Number of threads to use. Defaults to max available.
translate	`boolean`	`false`	Translate result to english.
max_len	`number`	`0`	Max number of characters in a single segment, `0` means no limit.
split_on_word	`boolean`	`false`	If `true`, transcriber will try to split the text on word boundarie.
suppress_non_speech	`boolean`	`false`	If `true`, transcriber will try to suppress non-speech segments.
token_timestamps	`boolean`	`true`	If `true`, token level timestamps will be calculated.

A promise that resolve to a transcribe result Promise<TranscripeResult> JSON containing all transcribe data like, text, timestamps, ect. .

Cancels the current transcription. May take some time.

await transcriber.cancel();

Returns a Promise that resolve to void Promise<void>.

Destroys shout instance and frees wasm memory.

transcriber.destroy();