PESQ Perceptual Audio Testing with APx

Created on 2012-09-28 20:23:00

Perceptual audio tests measure how people perceive sound quality. While useful for evaluating both small differences in high-quality music, and larger differences in lower-quality voice material, this article focuses on doing the latter using PESQ (Perceptual Evaluation of Speech Quality).

Perceptual voice quality tests are especially valuable for devices that are bandwidth and/or bitrate limited, and that have codecs that significantly alter the sound. In these cases, many compromises must be made, and perceptual tests help to determine what balance of compromises will lead to the best voice intelligibility. In fact, it can be difficult with some of these devices to get consistent or comparable results using standard sine wave audio tests of electrical properties, such as frequency response and distortion. Ultimately, a combination of both perceptual and electrical audio tests can help to give the most complete picture of the performance of these devices.

Subjective vs. Objective Perceptual Tests

In subjective perceptual measurements, a group of people is assembled and asked to judge the sound quality of various audio clips, typically on a scale from 1 to 5. Subjective perceptual measurements require careful selection of a representative audience and careful control over the environment to achieve meaningful results. When all the individual scores are tallied, the result is called the Mean Opinion Score (MOS).

It’s expensive and time consuming to test with real people, and results will vary from group to group. We can overcome these limitations by instead using tools that incorporate algorithms based on psychoacoustic modeling. Psychoacoustic modeling seeks to correlate measurable impairments in the audio with users’ opinion scores. Testing with these tools imparts other advantages, like being able to make small adjustments to a design and quickly observe the results, or being able to do perceptual testing on a production line—things that would be essentially impossible to do with groups of people. These measurements are classified as objective because they are unaffected by human temperament or test conditions, and the results are repeatable.

Signal path: perceptual model, comparison of reference and test signals, cognitive model, and total quality figure.

Fig 1  The underlying concept for perceptual measurement (courtesy of OPTICOM GmbH).


PESQ1 is one such perceptual quality measurement tool. It was developed by OPTICOM GmbH in Germany and forms the basis of ITU-T Recommendation P.862. It is specifically designed for testing voice quality on low bandwidth devices, like mobile phones and smartphones. When used in this context with appropriate test signals, it can achieve a very high correlation with results obtainable using human subjects.

Voice quality testing may be done using a reference signal (known as full reference, or FR), or with no reference signal (no reference, or NR). In full reference testing, the measurement tool compares an original recording to one that has been degraded by the system under test. In no reference testing, there is no reference signal played and the measurement tool computes a score without using a comparison. No reference testing is strongly talker dependent and requires a large sampling of voices to achieve accuracy similar to full reference. PESQ uses the full reference method to evaluate voice quality, as shown in the block diagram below.

Signal path: level align input filter, time align and eqalize, auditory transform, disturbance processing, cognitive modeling, and prediction of perceived speech.

Fig 2  PESQ block diagram (courtesy of OPTICOM GmbH).

PESQ In APx500

AP’s PESQ implementation adds two new APx measurements (PESQ and PESQ Average) to deliver comprehensive results:

The PESQ measurement returns overall perceptual quality in MOS or PESQ units after playing each voice sample, along with a quality vs. time display to help pinpoint specific problems such as clipped words or dropouts. Additional results show average delay, delay vs. time, and the reference and the acquired waveforms.

The PESQ Average measurement allows you to run a collection of different voice samples (as recommended in ITU-T P.862) and then display the resulting overall score.

Both measurements let you choose to analyze the entire signal, or to look at only the active speech or silence. When 8 kHz sample rate voice clips are played, you will have the option to choose PESQ Narrowband (ITU-T P.862) or PESQ Wideband (ITU-T P.862.2) analysis.

Entire screen with voice file list and MOS score.

Fig 3  PESQ Average measurement in APx500 (enlarge image).

PESQ is fully integrated into the APx500 measurement navigator in the same way as the existing electrical properties measurements, giving it access to all the same automation, statistical, and reporting features. PESQ measurement may also be combined with electrical measurements to make comprehensive automated test suites. All standard and optional I/O modules may be used, including unbalanced/balanced analog, unbalanced/balanced/optical digital, serial digital, Bluetooth, HDMI, and PDM.

1 PESQ® is a registered trademark of OPTICOM GmbH, Germany. Used under license.