PeerTube/packages/transcription
Chocobozzz 07058e17c3
Better whisper-ctranslate vad filter option
2024-07-03 17:08:32 +02:00
..
src Better whisper-ctranslate vad filter option 2024-07-03 17:08:32 +02:00
README.md Integrate transcription in PeerTube 2024-06-28 08:44:58 +02:00
package.json Integrate transcription in PeerTube 2024-06-28 08:44:58 +02:00
tsconfig.json Integrate transcription in PeerTube 2024-06-28 08:44:58 +02:00
tsconfig.types.json feat(transcription): groundwork 2024-06-28 08:43:40 +02:00

README.md

Transcription

Video transcription consists in transcribing the audio content of a video to a text.

This process might be called Automatic Speech Recognition or Speech to Text in more general context.

Provide a common API to many transcription backend, currently:

  • openai-whisper CLI
  • faster-whisper (via whisper-ctranslate2 CLI)

Potential candidates could be: whisper-cpp, vosk, ...

Requirements

  • Python 3
  • PIP

And at least one of the following transcription backend:

  • Python:
    • openai-whisper
    • whisper-ctranslate2>=0.4.3

Usage

Create a transcriber manually:

import { OpenaiTranscriber } from '@peertube/peertube-transcription'

(async () => {
  // Optional if you want to use a local installation of transcribe engines
  const binDirectory = 'local/pip/path/bin'

  // Create a transcriber powered by OpenAI Whisper CLI
  const transcriber = new OpenaiTranscriber({
    name: 'openai-whisper',
    command: 'whisper',
    languageDetection: true,
    binDirectory
  });

  // If not installed globally, install the transcriber engine (use pip under the hood)
  await transcriber.install('local/pip/path')

  // Transcribe
  const transcriptFile = await transcriber.transcribe({
    mediaFilePath: './myVideo.mp4',
    model: 'tiny',
    format: 'txt'
  });

  console.log(transcriptFile.path);
  console.log(await transcriptFile.read());
})();

Using a local model file:

import { WhisperBuiltinModel } from '@peertube/peertube-transcription/dist'

const transcriptFile = await transcriber.transcribe({
  mediaFilePath: './myVideo.mp4',
  model: await WhisperBuiltinModel.fromPath('./models/large.pt'),
  format: 'txt'
});

You may use the builtin Factory if you're happy with the default configuration:

import { transcriberFactory } from '@peertube/peertube-transcription'

transcriberFactory.createFromEngineName({
  engineName: transcriberName,
  logger: compatibleWinstonLogger,
  transcriptDirectory: '/tmp/transcription'
})

For further usage ../tests/src/transcription/whisper/transcriber/openai-transcriber.spec.ts

Lexicon

  • ONNX: Open Neural Network eXchange. A specification, the ONNX Runtime run these models.
  • GPTs: Generative Pre-Trained Transformers
  • LLM: Large Language Models
  • NLP: Natural Language Processing
  • MLP: Multilayer Perceptron
  • ASR: Automatic Speech Recognition
  • WER: Word Error Rate
  • CER: Character Error Rate