代码之家  ›  专栏  ›  技术社区  ›  Jad Ghadry

使用SFSpeechRecognizer和AVSpeechSythesizer时如何正确设置AVAudioSession和AVAudioEngine

  •  1
  • Jad Ghadry  · 技术社区  · 6 年前

    我正在尝试创建一个应用程序,同时利用STT(语音到文本)和TTS(文本到语音)。然而,我遇到了一些模糊的问题,并将感谢您的专业知识。

    该应用程序由屏幕中央的一个按钮组成,单击该按钮后,使用下面的代码启动所需的语音识别功能。

    // MARK: - Constant Properties
    
    let audioEngine = AVAudioEngine()
    
    
    
    // MARK: - Optional Properties
    
    var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
    var recognitionTask: SFSpeechRecognitionTask?
    var speechRecognizer: SFSpeechRecognizer?
    
    
    
    // MARK: - Functions
    
    internal func startSpeechRecognition() {
    
        // Instantiate the recognitionRequest property.
        self.recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
    
        // Set up the audio session.
        let audioSession = AVAudioSession.sharedInstance()
        do {
            try audioSession.setCategory(.record, mode: .measurement, options: [.defaultToSpeaker, .duckOthers])
            try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
        } catch {
            print("An error has occurred while setting the AVAudioSession.")
        }
    
        // Set up the audio input tap.
        let inputNode = self.audioEngine.inputNode
        let inputNodeFormat = inputNode.outputFormat(forBus: 0)
    
        self.audioEngine.inputNode.installTap(onBus: 0, bufferSize: 512, format: inputNodeFormat, block: { [unowned self] buffer, time in
            self.recognitionRequest?.append(buffer)
        })
    
        // Start the recognition task.
        guard
            let speechRecognizer = self.speechRecognizer,
            let recognitionRequest = self.recognitionRequest else {
                fatalError("One or more properties could not be instantiated.")
        }
    
        self.recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest, resultHandler: { [unowned self] result, error in
    
            if error != nil {
    
                // Stop the audio engine and recognition task.
                self.stopSpeechRecognition()
    
            } else if let result = result {
    
                let bestTranscriptionString = result.bestTranscription.formattedString
    
                self.command = bestTranscriptionString
                print(bestTranscriptionString)
    
            }
    
        })
    
        // Start the audioEngine.
        do {
            try self.audioEngine.start()
        } catch {
            print("Could not start the audioEngine property.")
        }
    
    }
    
    
    
    internal func stopSpeechRecognition() {
    
        // Stop the audio engine.
        self.audioEngine.stop()
        self.audioEngine.inputNode.removeTap(onBus: 0)
    
        // End and deallocate the recognition request.
        self.recognitionRequest?.endAudio()
        self.recognitionRequest = nil
    
        // Cancel and deallocate the recognition task.
        self.recognitionTask?.cancel()
        self.recognitionTask = nil
    
    }
    

    单独使用时,此代码就像一个符咒。但是,当我想用 AVSpeechSynthesizer

    我浏览了多个堆栈溢出帖子的建议,其中建议修改

    audioSession.setCategory(.record, mode: .measurement, options: [.defaultToSpeaker, .duckOthers])
    

    以下是

    audioSession.setCategory(.playAndRecord, mode: .default, options: [.defaultToSpeaker, .duckOthers])
    

    但这是徒劳的。分别运行STT和TTS后,应用程序仍在崩溃。

    解决办法是我用这个而不是前面提到的

    audioSession.setCategory(.multiRoute, mode: .default, options: [.defaultToSpeaker, .duckOthers])
    

    1 回复  |  直到 6 年前
        1
  •  1
  •   Ángel Téllez Daniel Diehl    6 年前

    我正在开发一个既有SFSpeechRecognizer又有AVSpeechSythesizer的应用程序,对我来说 .setCategory(.playAndRecord, mode: .default) 很好,这是我们需要的最好的分类, according to Apple .speak()

    以及为什么 .multiRoute 分类工作:我想这是一个问题 AVAudioInputNode . 如果你在控制台看到这样的错误

    或者像这样

    由于未捕获的异常“com.apple.coreaudio.avfaudio”而终止应用程序,原因:“必需的条件为false:nullptr==Tap()

    ,或确保输入节点的抽头 .多路径 能够重用相同的输入节点 its nature 使用不同的音频流和路由。

    我留下下面的逻辑,我用我的程序遵循苹果的 WWDC session :

    override func viewDidLoad() { //or init() or when necessarily
        super.viewDidLoad()
        try? AVAudioSession.sharedInstance().setCategory(.playAndRecord, mode: .default)
    }
    

    验证/权限

    func shouldProcessSpeechRecognition() {
        guard AVAudioSession.sharedInstance().recordPermission == .granted,
            speechRecognizerAuthorizationStatus == .authorized,
            let speechRecognizer = speechRecognizer, speechRecognizer.isAvailable else { return }
            //Continue only if we have authorization and recognizer is available
    
            startSpeechRecognition()
    }
    

    启动STT

    func startSpeechRecognition() {
        let format = audioEngine.inputNode.outputFormat(forBus: 0)
        audioEngine.inputNode.installTap(onBus: 0, bufferSize: 1024, format: format) { [unowned self] (buffer, _) in
            self.recognitionRequest.append(buffer)
        }
        audioEngine.prepare()
        do {
            try audioEngine.start()
            recognitionTask = speechRecognizer!.recognitionTask(with: recognitionRequest, resultHandler: {...}
        } catch {...}
    }
    

    func endSpeechRecognition() {
        recognitionTask?.finish()
        stopAudioEngine()
    }
    

    取消STT

    func cancelSpeechRecognition() {
        recognitionTask?.cancel()
        stopAudioEngine()
    }
    

    停止音频引擎

    func stopAudioEngine() {
        audioEngine.stop()
        audioEngine.inputNode.removeTap(onBus: 0)
        recognitionRequest.endAudio()        
    }
    

    AVSpeechSynthesizer 引用并说出一句话。