I wanted to offer some feedback with an eye (ear?) toward using automated transcription tools. Google's best practices for their Speech-to-Text API say:
For optimal results...Use a lossless codec to record and transmit audio. FLAC or LINEAR16 is recommended. If your application must use a lossy codec to conserve bandwidth, we recommend the AMR_WB, OGG_OPUS or SPEEX_WITH_HEADER_BYTE codecs, in that preferred order.
"AMR_WB" is defined as "Adaptive Multi-Rate Wideband" with a sample rate of 16kHz. Plain ol' AMR (narrow band, as used in street-amr.amr
) is not recommended at all. Possibly helpful—Wikipedia claims that Android provides a mechanism to encode AMR Wideband:
For encoding, another open-source library exists as well, provided by VisualOn. It is included in the Android mobile operating system.
IBM is more cavalier about the situation and oddly recommends only a sample depth, not a particular codec, sample rate, or bitrate:
With Speech to Text, you can safely use lossy compression to maximize the amount of audio that you can send to the service with a recognition request. Because the dynamic range of the human voice is more limited than, say, music, speech can accommodate a bit rate that is much lower than other types of audio. For speech recognition, IBM® recommends that you use 16 bits per sample for your audio and employ a format that compresses the audio data.
Although I am but a mere developer and can't speak directly about use cases in the field, I do think having a CD-quality, archival option is important in general—so thumbs up to the following:
PS: @Tino_Kreutzer, your 64 vs. 192 kbps test uses AC3, not AAC. That said, I can definitely hear the difference between 64 and 320 kbps AAC at a 32 kHz sampling rate. Here's a brief speech sample I recorded as 44.1 kHz WAV on a Pixel XL (first-generation, using this app) and then encoded with ffmpeg using:
For fun, I also subtracted (by inverting and adding) the decoded 64 kbps waveform from the decoded 320 kbps waveform, yielding a difference that gives a sense of the content lost by encoding at a lower bitrate: 64 kbps 32 khz -subtracted from- 320 kbps 32 khz.zip (2.1 MB).
PPS: Do we get to dodge the concerns expressed earlier about codec patents and licensing because the encoding is handled by the Android OS and therefore not our problem?