Mean opinion score (MOS) is a test that has been used for decades in telephone networks to obtain the human user's view of the quality of the network. Historically, and implied by the word Opinion in its name, MOS was a subjective measurement where listeners would sit in a "quiet room" and score call quality as they perceived it; per ITU-T recommendation P.800: "The talker should be seated in a quiet room with volume between 30 and 120 m3 and a reverberation time less than 500 ms (preferably in the range 200-300 ms). The room noise level must be below 30 dBA with no dominant peaks in the spectrum." Measuring VoIP is more objective, and is instead a calculation based on the performance of the IP network over which it is carried.
In multimedia (audio, voice telephony, or video) especially when codecs are used to compress the bandwidth requirement (for example, of a digitized voice connection from the standard 64 kilobit/second PCM modulation), the MOS provides a numerical indication of the perceived quality of received media from the users' perspective after compression and/or transmission. The MOS is expressed as a single number in the range 1 to 5, where 1 is the lowest perceived audio quality, and 5 is the highest.
The MOS is generated by averaging the results of a set of standard, subjective tests where a number of listeners rate the audio quality of test sentences read aloud by both male and female speakers over the communications medium being tested. A listener is required to give each sentence a rating using the following rating scheme:
MOS | Quality | Impairment |
5 | Excellent | Imperceptible |
4 | Good | Perceptible but not annoying |
3 | Fair | Slightly annoying |
2 | Poor | Annoying |
1 | Bad | Very annoying |
The MOS is the arithmetic mean of all the individual scores and can range from 1 (worst) to 5 (best).
Compressor/decompressor (codec) systems and digital signal processing (DSP) are commonly used in voice communications and can be configured to conserve bandwidth, but there is a trade-off between voice quality and bandwidth conservation. The best codecs provide the most bandwidth conservation while producing the least degradation of voice quality. Bandwidth can be measured quantitatively, but voice quality requires human interpretation, although estimates of voice quality can be made by automatic test systems.
As an example, the following are mean opinion scores for one implementation of different codecs:
Codec | Data Rate [kbit/s] | MOS |
G.711 (ISDN) | 64 | 4.1 |
iLBC | 15.2 | 4.14 |
AMR | 12.2 | 4.14 |
G.729 | 8 | 3.92 |
G.723.1 r63 | 6.3 | 3.9 |
GSM EFR | 12.2 | 3.8 |
G.726 ADPCM | 32 | 3.85 |
G.729a | 8 | 3.7 |
GSM FR | 12.2 | 3.5 |