PIKA TECHNOLOGIES INC.
Printer Friendly Printer Friendly RSS Contact Us Contact Us   Search Search Home
 
       
   
HMP Software

Performance of Host-Based Media Processing

 

The viability of host-based software in providing media processing capabilities continues to expand the possibilities for voice application development. Over time, we will see a larger percentage of applications deployed on host vs. board-based systems. This trend is driven primarily by the relentless increase in processor speed and capabilities. Host-based media processing is also a natural fit for VoIP-based applications, since these applications can now be delivered as software-only solutions.

Vendors of host-based media processing implementations face a number of technical challenges in providing a viable solution to telephony application developers. The main challenges are:

1. System Capacity: optimizing media processing algorithms to take advantage of the computing power now available on standard desktop platforms.
2. Robustness: preventing other applications running on the platform and the media processing from interfering with each other when competing for CPU resources.
3. Latency: ensuring that the latency introduced by the media processing does not affect the perceived quality of the audio signal being processed.

This white paper discusses each of these challenges in detail and how PIKA’s host-based media processing implementation meets each of these challenges.

Defining System Capacity

The capacity of a system is the maximum number of active media processing channels (such as play, record, DTMF detection, and echo cancellation) that can be supported by the application on a platform. The greater the capacity of a single platform, the lower the cost per port for an application, and the greater the value the application provides to a customer. To increase the capacity of an application, the media processing must be as efficient as possible.

The factors that affect the system capacity of host-based media processing applications on a platform are:

• The acceptable percentage of CPU capacity that can be dedicated to media processing.
• The types of media processing that are performed on each active channel.
• The specifications (CPU speed and architecture, operating system, amount of RAM, cache size, and NIC speed) of the platform running the application.

PIKA has performed extensive performance tests with a variety of media processing applications executed on a wide range of platform configurations. Figure 1 lists a representative sample of applications that PIKA has benchmarked and the media processing that was performed on each channel for the duration of the test.

Application

Media Processing Performed

Gateway
VoIP interface (using G.711 or G.729 Codec)
Echo cancellation (12ms tail length)
PSTN interface
IVR VoIP interface (using G.711 or G.729 Codec) or PSTN interface
Play for all channels for 70% of the call duration
DTMF detection
Conference
VoIP interface (using G.711 or G.729 Codec) or PSTN interface
Conference
DTMF detection

Figure 1: Benchmark Applications Media Processing Functions

Figure 2 lists the system capacities measured, using a medium-powered AMD platform and a higher-powered Intel platform, for each of the applications on different operating systems.

AllOnHost 2.0 Benchmark System Capacity
Number of Channels at 60% CPU Utilization

Platform Intel Dual Xeon Nocona with 2x3.0 GHz, 1GB RAM, 2MB L2 Cache, 1 GB NIC AMD Dual Opteron 242 Sledgehammer with 2x1.6 GHz, 1GB RAM, 1MB L2 Cache, 1 GB NIC
Operatiang System Windows XP SuSe 10 Windows XP SuSe 10
Application
G.711Gateway 290 (1) 160 160 130
G.729 Gateway 140    60 100 50
IPSTNVR 490 530 350 460
G.711 IVR 430 430 280 320
G.729 IVR 170 80 140 70
PSTN Conference 540 600 380 510
G.711 Conference 540 590 300 350
G.729 Conference 170 80 150 70
Note: 1 – The CPU utilization for different numbers of channels for this application is shown in Figure 3.

Figure 2: System Capacities for Applications on Intel and AMD Platforms Using Windows and SuSe

To see how well PIKA’s host-based media process implementation scales, let us examine the CPU utilization for the G.711 gateway application under the Windows operating system. Figure 3 shows the CPU utilization for this application with different numbers of active channels.

 illustration here

Figure 3: System Capacity for a G.711 Gateway Application on a Dual Xeon Windows Platform

This graph shows the linearity of the CPU utilization. Other benchmark applications displayed similar linear increases in CPU utilization with increases in the number of active channels. The linearity of the measurements indicates two important facts:

1. The host-based media processing CPU utilization can be confidently predicted for different channel densities.
2. As the speed of the processors grows, the channel density supported by applications will grow at the same rate.

Please contact PIKA Customer Support (Phone: +1-613-591-1555, Email: Support@pikatech.com) for assistance in determining the expected channel densities supported by a specific deployment platform for a specific set of media processing applications.

 Robustness of Host-Based Processing

When implementing a host-based media processing solution there are two key robustness objectives:

1. Host-based media processing must receive sufficient CPU capacity to perform all the required functions in real-time.
2. Other applications running on the same platform must regularly receive sufficient CPU capacity to perform all their required activities without noticeable deterioration of performance or noticeable pauses in execution.

Ideally, the CPU would be partitioned so that the host-based media processing receives a specific proportion of the CPU capacity and other applications receive the rest. For example, on a 3.2 GHz processor, the host-based media processing could be guaranteed 25% of the CPU capacity, and other applications would perform as if they were running on a dedicated 2.4 GHz platform.

 Why is CPU partitioning important? If there is not strong partitioning between host-based media processing and application CPU utilization, a number of problems arise. If the host-based media processing does not receive sufficient CPU capacity, on a regular basis, it is not able to process all the media in real-time. The quality of the audio will deteriorate and sections of the audio may be dropped, causing distortion and choppiness in the audio signal. If other applications do not receive sufficient CPU capacity on a regular basis, their performance becomes slow, choppy, and non-responsive. In extreme cases, the platform may not respond to mouse movements or keystrokes.

Windows and Linux are not real-time operating systems and, as such, are not designed to easily partition CPU usage. They have no built-in mechanisms to ensure that a process does not monopolize the CPU to the detriment of other processes.

PIKA’s solution is a real-time microkernel that acts as a firewall between applications and media processing. The microkernel allows non-real-time operating systems to serve the real-time demands of processing voice media without allowing the processor to be monopolized. The PIKA host-based media processing implementation partitions the available CPU capacity so that in every tick, the host-based media processing function execution is guaranteed to be allotted sufficient CPU capacity in real-time while ensuring that other processes running on the same platform receive sufficient CPU capacity to run smoothly and provide good performance.

PIKA’s micro-kernel is designed to work on single processor platforms and to balance the load for each CPU on hyper-threading or multiple processors platforms.

 To test PIKA’s microkernel and host-based media processing, a number of formal and informal tests were executed in conjunction with media processing activity. No combination of commercial applications or CPU load had any effect on the quality of the media processed audio.

PIKA performed the following formal tests, on both Windows and Linux operating systems, to verify the robustness of its microkernel:

• Test the interference from and with user-level processes (normal applications).
• Test the interference from and with kernel-level processes (device drivers, such as NIC drivers, and disk drivers).
• Test competition for PCI bus resources.

To test interference between media processing and user level processes:

1. A media processing application was set up to provide the continuous playing of a recorded message to a large conference with 120 members.
2. A phone was used to connect to the conference to monitor the audio quality.
3. A base-line measure of the CPU utilization was taken. 
4. A normal-priority user application was executed that consumes a continually increasing percentage of CPU capacity.
5. As the user application consumes more and more CPU capacity the quality of the audio recording received from the conference was monitored.

 Results: There was no change to the quality of the audio, even when the user application saturated the CPU utilization at 100%.

To test interference between media processing and kernel level processes, a similar test was performed, only this time the test process consumed CPU capacity at the kernel level.

Results: The results were identical to the previous test with no change to the quality of the audio, even when the kernel application saturated the CPU utilization at 100%.

To test competition for PCI bus resources, a Vmetro board was installed on the test platform. This board flooded the PCI bus with data, causing congestion on the PCI bus.

Results: Again, there was no change to the quality of the audio being received from the conference.

Finally, informal tests were performed that more accurately simulated real application processing. The following tasks were performed on the test platform while a conferencing application was executing:

• Search for the word “cow” in all files on the hard drive. This is a disk I/O and CPU intensive application that generates a large number of interrupts.
• Copy a large file to a network drive. This function causes a large number of interrupts and network traffic.
• Play an MP3 streaming audio file. This function causes interrupts from the audio card. The quality of the audio heard is very sensitive to the application being starved for CPU capacity.
• Compile source code. Compiling code is a CPU intensive activity.

With these functions running, the recording played to the conference was monitored. 

Results: As with the other tests, there was no change to the quality of the audio being received from the conference. There was also no deterioration in the quality of the streaming audio file being played.

We can see from the above testing that PIKA’s microkernel has succeeded in partitioning the CPU and in isolating the media processing and other applications running on the same platform.

Measuring Latency

In simple terms, latency is the length of time from when you say something until the person on the other end of the phone line hears what you said. The perceived quality of the audio heard in a call is highly dependent on latency. Typically, the latency for PSTN switching, including PIKA’s AllOnHost TDM switching, and networks is very low, on the order of 5 ms. Developers of pure PSTN applications are rarely concerned with latency. On the other hand, elements of a VoIP network (phones, switches, and gateways) generally add significant latency to the audio path; therefore, applications using VoIP must be must be aware of the total latency of the audio path to ensure that the latency does not exceed the acceptable limits. 

How much latency is acceptable? There are two classes of applications that should be considered when determining the acceptable amount of latency; terminating applications such as IVRs, and switching applications such as PBXs, conference bridges, and gateways.

Terminating applications typically imply human interaction with a computer. The application records audio data, plays announcements, and detects DTMF tones or speech generated by the caller. Studies have shown that as long as the application responds within 500 ms, the caller will perceive a good quality connection.

Switching applications typically imply human-to-human interaction. The ITU-T specification G.114 defines three audio quality regions for latency in human-to-human call.

Latency Range (ms) Audio Quality
0 to 150 ms Acceptable for most applications
150 to 400 ms Marginally acceptable - impacts the quality of application
Above 400 ms Unacceptable

Figure 5: G.114 Latency Guidelines for Switching Applications

PIKA measured the latency of several connection types using its AllOnHost (host-based) media processing. The result of these measurements is shown in Figure 6. For comparison, the latency measured between two good quality IP phones is also listed.

Equipment Configuration Measured Latency Range (ms)
TDM switching using AllOnHost – Analog phone to analog phone <5
Good IP phone directly to good IP phone 50 to 60
IP Gateway using AllOnHost – Analog phone to IP phone 50 to 60
Mixed conferencing using AllOnHost – Analog phone to IP phone 50 to 60
IP conferencing using AllOnHost – IP phone to IP phone 105 to 120
IP transcoding using AllOnHost – G.711 IP phone to G.729 IP phone 105 to 120

Figure 6: AllOnHost Latency Measurement

These values are valid for both G.711 and G.729 codecs. All tests used 20 ms packets and good quality IP phones. The measurements were performed using a switched LAN and locally-connected IP and analog phones.

Note that the latency of the AllOnHost IP gateway is identical to the latency measured between two good quality IP phones and that the latency added by the AllOnHost IP gateway is the same as the latency of a hardware-based IP gateway.

For some applications, the network latency must also be considered when determining the overall latency the callers experience. Figure 7 lists the range of latencies that can be expected, as well as the typical latency for a number of network distances. For comparison, values for PSTN latency for different distances are also given.

PSTN Network
Expected Latency
Range (m)
Typical Latency (ms)
Local 0.5 to 4 2
National long distance 2 to 70 12
International long distance (excluding satellite) 2 to 150 20
IP Network
Switched LAN 0.1 to 2 <1
Metropolitan WAN 2 to 50 20
National WAN 2 to 150 50

Figure 7: Network Connection Latency

To determine the expected latency of an application, take the equipment latency measured for that type of application from Figure 6 and add in the network latency, from Figure 7, that will be encountered by the audio signal as it passes from the speaker to the listener. To demonstrate this, consider the following examples, an IP PBX and a conference server. 

IP PBX – In this application, shown in Figure 8, the IP PBX provides connectivity between the PSTN network and local and remote IP phones.

When determining the latency for this application, the latency for three different types of connection must be considered:

• Between PSTN and local IP phones (such as A and B)
• Between PSTN and remote IP phones (such as A and C)
• Between local and remote IP phones (such as B and C)

Figure 9 lists the connection type, the connection latency, the networks traversed, the network latency, and the total latency for each type of connection.

The latency for each type of connection is within the 150 ms limit although the connections to the remote IP phone are at the upper end of the acceptable range. Any additional latency introduced by the host-based media processing would cause the perceived quality of the connection to deteriorate.


 Figure 8: IP PBX Application Architecture

Between Phones Equipment Configuration Equipment Latency (ms)
Networks Traversed Network Latency (ms) Total Latency (ms)
PSTN and Local IP IP Gateway 60 PSTN-LAN 2+1 63
PSTN and Remote IP IP Gateway 60 PSTN-LAN-WAN -LAN 2+1+50+1 114
Local and Remote IP IP Phone to IP Phone 60 LAN-WAN -LAN 1+50+1 112

Figure 9: IP PBX Latency Estimates

Conference Server – In this application, shown in Figure 10, the conference server provides connectivity between phones from the PSTN network and local IP phones.

 When determining the latency for this application, the latency for three different types of connections must be considered:

• Between PSTN phones (such as A and B)
• Between PSTN and IP phones (such as A and C)
• Between IP phones (such as C and D)

Figure 11 lists the connection type, the connection latency, the networks traversed, the network latency, and the total latency for each type of connection.

The latency for each type of connection is within the 150 ms limit although the connection between IP phones is at the upper end of the acceptable range. Any additional latency introduced by the host-based media processing would cause the perceived quality of the connection to deteriorate.

 Figure 10: Conference Server Application Architecture

Between Phones Equipment Configuration Equipment Latency (ms) Network Latency (ms) Total Latency (ms)
PSTN PSTN conferencing 4 PSTN-PSTN 2+2 8
PSTN and IP Mixed conferencing 60 LAN-LAN 1+1 62
IP IP conferencing 120 LAN-LAN 1+1 122

 

Figure 11: Conference Server Latency Estimate

Summary: Overcoming the Challenges

This whitepaper has shown that there are significant challenges to implementing a host-based media processing solution. Each of these challenges must be overcome to produce a viable software-only telephony application. PIKA’s microkernel solution ensures that:

• The capacity of applications based on PIKA’s AllOnHost (host-based) media processing can accommodate up to 600 active channels. This is sufficient capacity for cost-effective small-sized to medium-sized applications.

• There is no interference between the AllOnHost media processing and other applications executing on the platform. The media processing is guaranteed to receive sufficient CPU capacity to perform the required functions in real time.

 • The latency added by the host-based media processing is small enough that applications can achieve latency below the 150 ms required for good quality human-to-human conversation. 

 

 

 
 
PIKA