Whisper - 音声ファイルから文字へ

(2024-11-08)

文字を音声合成エンジンで音声ファイルに変換することはできますが、その逆・つまり音声ファイルから文字を起こすのはこれまで考えたことがありませんでした。

でも、whisper というものを使えば実現できるようです。

whisper のインストール

chatGPT のアドバイスに従って whisper をインストールしました。

pip install --break-system-packages --user openai-whisper

全体をインストールするのに 4 GB 近くかかりました。テザリング環境で、15 GB/月なのでこの容量は少々きついです。

よく見ると openAI 関連のアプリをインストールしたようです。まあ、いいか。

実際に変換してみる

次のような英語音声を用意しました。

この音声をテキストに変換してみます。

whisper aaa.mp3 --model tiny --language English

いろいろなファイルが作成されます。作成されたテキストファイルの内容は、

The ball is on the floor. It is a red ball. It is a rubber ball. The baby looks at the ball.
The cat looks at the ball. The cat is black. The cat walks over to the ball.
The cat hits the ball with its paw. The ball rolls on the floor.
The baby smiles.

これはすごいですね。

これができるのであれば、音声ファイルさえ用意すればそのテキストファイルを作成して、100 % 理解できるコンテンツを作成して繰り返して聞けば効率的かもしれません。

シェルスクリプトで一括変換

例によってシェルスクリプトを作って一括変換します。

#!/bin/sh

output_dir="./after"

for file in $(find ./before -name "*.mp3"); do
    faname_ext="${file##*/}"
    fname="${faname_ext%.*}"
    echo "Processing: $fname"
    whisper "$file" --model tiny --language English
    mv "${fname}.txt" "${output_dir}/${fname}.txt"
    rm *.json *.srt *.tsv *.vtt
done