Cocoa練習帳(2024-01)

iOS/iPhone/iPad/watchOS/tvOS/MacOSX/Android プログラミング, Objective-C, Cocoa, Swiftなど

2024-01-01 [macOS][iOS] TextToSpeech

Macintoshの文章読み上げエンジンの歴史は古く、1984年のMacintosh発表のイベントでMacinTalkを使用して実演されました。このMacinTalkはPlainTalkと総称され、確か、これを利用するために用意されたライブラリがSpeech Managerだったと思います。

この流れからだと思いますが、Mac OS XとなってCocoaのフレームワークとして用意されたのがNSSpeechSynthesizerです。

今回、文章読み上げ（text-to-speech (TTS)）を利用しようと調べたのですが、NSSpeechSynthesizerはDeprecatedとなっていました。その後継として用意されたのが、おそらく、AVFoundationに追加されたAVSpeechSynthesizerのようです。

NSSpeechSynthesizerがDeprecatedになったのは、macOSのみだったのをiOSに対応させる際、古いAPIをモダン化するために大幅な変更が発生するので、別のクラスになったのでしょう。

Speech Synthesis フレームワークの利用は簡単で、文章と音声の設定する。

// Create an utterance.
let utterance = AVSpeechUtterance(string: text)
 
// Configure the utterance.
utterance.rate = 0.57
utterance.pitchMultiplier = 0.8
utterance.postUtteranceDelay = 0.2
utterance.volume = 0.8
 
// Retrieve the Japanese voice.
let voice = AVSpeechSynthesisVoice(language: "ja-JP")
 
// Assign the voice to the utterance.
utterance.voice = voice

文章を読み上げる。

// Create a speech synthesizer.
synthesizer = AVSpeechSynthesizer()
 
// Tell the synthesizer to speak the utterance.
synthesizer!.speak(utterance)

日本語の場合、漢字の読みが文脈によって異なるので、振り仮名をつけたいなどの要望があると思いますが、Speech Synthesis Markup Language (SSML)を利用すれば文章に情報をつけることができます。

<speak>
    Hello
    <break time="1s"/>
    <prosody rate="200%">nice to meet you!</prosody>
</speak>

これをAVSpeechUtteranceのコンストラクタでSSMLだと指定して渡します。

let utterance = AVSpeechUtterance(ssmlRepresentation: ssml)

voiceは以下のコードで一覧が取得できます。

let voices = AVSpeechSynthesisVoice.speechVoices()
print("\(voices)")

ログから、日本語のvoiceを抜き出してみました。

[AVSpeechSynthesisVoice 0x600000934170] Language: ja-JP, Name: Kyoko, Quality: Enhanced [com.apple.voice.enhanced.ja-JP.Kyoko], 
[AVSpeechSynthesisVoice 0x600000934400] Language: ja-JP, Name: Otoya, Quality: Enhanced [com.apple.voice.enhanced.ja-JP.Otoya], 
[AVSpeechSynthesisVoice 0x60000093bca0] Language: ja-JP, Name: Kyoko, Quality: Default [com.apple.voice.compact.ja-JP.Kyoko], 
[AVSpeechSynthesisVoice 0x60000093bef0] Language: ja-JP, Name: Hattori, Quality: Default [com.apple.ttsbundle.siri_Hattori_ja-JP_compact], 
[AVSpeechSynthesisVoice 0x600000934330] Language: ja-JP, Name: Otoya, Quality: Default [com.apple.voice.compact.ja-JP.Otoya], 
[AVSpeechSynthesisVoice 0x6000009343d0] Language: ja-JP, Name: O-Ren, Quality: Default [com.apple.ttsbundle.siri_O-Ren_ja-JP_compact],

これはシステム設定のシステムの声の内容と一致します。

_ 【ソースコード】

GitHubからどうぞ。
https://github.com/murakami/workbook/tree/master/multiplatform/TextToSpeech - GitHub

[ツッコミを入れる]

Cocoa練習帳

2024-01-01 [macOS][iOS] TextToSpeech

_ 【ソースコード】

_ 【関連情報】