91 lines
2.5 KiB
Markdown
91 lines
2.5 KiB
Markdown
# Transcription
|
|
|
|
Video **transcription** consists in transcribing the audio content of a video to a text.
|
|
|
|
> This process might be called __Automatic Speech Recognition__ or __Speech to Text__ in more general context.
|
|
|
|
Provide a common API to many transcription backend, currently:
|
|
- `openai-whisper` CLI
|
|
- `faster-whisper` (*via* `whisper-ctranslate2` CLI)
|
|
|
|
> Potential candidates could be: whisper-cpp, vosk, ...
|
|
|
|
## Requirements
|
|
- Python 3
|
|
- PIP
|
|
|
|
And at least one of the following transcription backend:
|
|
- Python:
|
|
- `openai-whisper`
|
|
- `whisper-ctranslate2>=0.4.3`
|
|
|
|
## Usage
|
|
|
|
Create a transcriber manually:
|
|
|
|
```typescript
|
|
import { OpenaiTranscriber } from '@peertube/peertube-transcription'
|
|
|
|
(async () => {
|
|
// Optional if you want to use a local installation of transcribe engines
|
|
const binDirectory = 'local/pip/path/bin'
|
|
|
|
// Create a transcriber powered by OpenAI Whisper CLI
|
|
const transcriber = new OpenaiTranscriber({
|
|
name: 'openai-whisper',
|
|
command: 'whisper',
|
|
languageDetection: true,
|
|
binDirectory
|
|
});
|
|
|
|
// If not installed globally, install the transcriber engine (use pip under the hood)
|
|
await transcriber.install('local/pip/path')
|
|
|
|
// Transcribe
|
|
const transcriptFile = await transcriber.transcribe({
|
|
mediaFilePath: './myVideo.mp4',
|
|
model: 'tiny',
|
|
format: 'txt'
|
|
});
|
|
|
|
console.log(transcriptFile.path);
|
|
console.log(await transcriptFile.read());
|
|
})();
|
|
```
|
|
|
|
Using a local model file:
|
|
|
|
```typescript
|
|
import { WhisperBuiltinModel } from '@peertube/peertube-transcription/dist'
|
|
|
|
const transcriptFile = await transcriber.transcribe({
|
|
mediaFilePath: './myVideo.mp4',
|
|
model: await WhisperBuiltinModel.fromPath('./models/large.pt'),
|
|
format: 'txt'
|
|
});
|
|
```
|
|
|
|
You may use the builtin Factory if you're happy with the default configuration:
|
|
|
|
```Typescript
|
|
import { transcriberFactory } from '@peertube/peertube-transcription'
|
|
|
|
transcriberFactory.createFromEngineName({
|
|
engineName: transcriberName,
|
|
logger: compatibleWinstonLogger,
|
|
transcriptDirectory: '/tmp/transcription'
|
|
})
|
|
```
|
|
> For further usage [../tests/src/transcription/whisper/transcriber/openai-transcriber.spec.ts](../tests/src/transcription/whisper/transcriber/openai-transcriber.spec.ts)
|
|
|
|
|
|
## Lexicon
|
|
- ONNX: Open Neural Network eXchange. A specification, the ONNX Runtime run these models.
|
|
- GPTs: Generative Pre-Trained Transformers
|
|
- LLM: Large Language Models
|
|
- NLP: Natural Language Processing
|
|
- MLP: Multilayer Perceptron
|
|
- ASR: Automatic Speech Recognition
|
|
- WER: Word Error Rate
|
|
- CER: Character Error Rate
|