Would a full audio recording of the entire interview work for your use case @dr_michaelmarks? Could you describe your specific use case?
Even if we record the entire interview (e.g the way I proposed above), timestamps from the audit feature can mark exactly where each question (screen) starts and ends, so extracting audio for only specific questions would be possible that way.
Another option would be to require audio for the survey, but then add a per-question parameter that would explicitly require audio only for those questions.
My feeling is this would be more difficult to implement but I understand how it could be important for some use cases.