2. Data Communication Networks

Unlike voice communication networks, data communication networks focus on allowing computers and peripherals to exchange data with one another. Vending machines, robots, cameras and home appliances have been linked to networks, with varying degrees of usefulness. The Internet is comprised of many different types of computers, operating systems, network protocols and data links. For this type of project it is very important to understand the structure of the Internet and how it affects audio data as it traverses from one computer to another. With real-time applications it becomes necessary to understand how data packets are transmitted and received, how delays are introduced and handled and how much overhead different protocols will add to the data.


2.1 The Structure of the Internet

In September of 1969, four computers at UCLA, U. of Calf. in Santa Barbara, U. of Utah, and Stanford Research Institute were connected to form the Arpanet (Advanced Research Projects Association NETwork). This was the beginning of the Internet. Today the Internet includes over 16.1 million computers that can distribute information by interactive TCP/IP services such as file transfer protocol (FTP), Telnet and the World Wide Web [1]. An estimated 50 million users of more than 15 million computers can access that information. While the original four-computer Arpanet was fairly simple, the structure of the Internet today is far more complex. In general there are two main types of networks: local area and wide area.


2.1.1 Local Area Networks (LANs)

Computers that are connected together and exist in the same room, building, or geographical area form a Local Area Network (LAN). To allow these computers to "talk" to one another they must speak the same language, or protocol. This is usually done by placing a card in each computer that supports the type of protocol desired. Small LANs typically use an Ethernet protocol and so each computer has an Ethernet card. With an Ethernet protocol in a bus configuration the cards are connected to one another in a chain and one end of the chain is connected to a bridge (B) (Figure 3). Each floor or building might have one or more LANs, each with a bridge. Bridges are connected together to form a site-wide (backbone) LAN and handle routing between the other LANs (if LANs of different types are connected together then routers are used instead). This allows communication between computers located at a site such as a university or a company. One or more file servers (F), printers, CD-ROM drives, etc. may be connected to the LAN and shared by everyone on the network, allowing users to share equipment and access to data.



Figure 3: Site-wide backbone forms larger LAN



2.1.2 Wide Area Networks (WANs)

When two or more computers are connected over a large geographical distance the resulting network is known as a wide area network (WAN) (Figure 4). A WAN can also be used to interconnect one or more LANs. One end of a site-wide LAN is connected to a gateway (G/W), also known as a router, or more generally as an intermediate system (IS) or an interworking Unit (IWU). The gateway is connected to the WAN which typically utilizes a Public-switched Data Network (PSDN) to transfer data.



Figure 4: WANs utilizing the PSDN


The PSDN allows a WAN to connect to a number of other gateways. If there are any protocol differences between LANs the gateways resolve them. The technical term for connecting all of these different networks (LANs & WANs) is internetworking and so a network of networks is called an internetwork, or Internet for short. This is not to be confused with "the Internet" discussed earlier.


2.1.3 The Public-switched Data Network

Prior to Public-switched Data Networks (PSDNs) data was transmitted over the Public-switched Telephone Network (PSTN). The PSTN, however, only provides a small amount of bandwidth. For this reason many companies and organizations developed their own PSDNs by leasing high speed lines from the telephone companies. These high speed digital backbones operate from 1.544 (a T1 link) to 274.176 Mbps (a T4 link). Other interconnection methods include microwave and satellite links. Leasing these high speed links is expensive and, as such, they typically terminate at gateways located in places like businesses, universities, Internet access providers and the phone company's central offices. By creating a PSDN a company can link all of its LANs together and create a company-wide WAN.

There are two ways by which users can gain access to the Internet: via a dedicated connection or with a dial-up connection. LANs which are connected to the Internet are dedicated connections. Many people have access to LANs at work. Few people, however, have high speed digital lines running to their homes. Instead they use a dial-up connection to an Internet service provider (ISP) which has a high speed dedicated connection to the Internet.


2.1.4 Dial-up Connections

Most users who use dial-up connections utilize modems to establish a data connection across an analog Public-switched Telephone Network (PSTN). Other options such as ISDN, ADSL, and cable modems offer high speed digital dial-up connections but these services are still unavailable or expensive and it will be some time before they become commonplace. For now we are largely limited by the Plain Old Telephone System (POTS) and its restrictions.

Every data link that connects two computers has a maximum speed at which it can transfer data. The PSTN's analog telephone lines were designed to accommodate speech. That is, they were designed to carry audio frequencies in the range of 400 to 3400 Hz, translating to a bandwidth of 3000 Hz [5]. Bit streams comprised of long streams of 1s or 0s produce frequencies below 400 Hz and cannot be transmitted, therefore a special device must be used to ensure the data signal is within the 3 kHz bandwidth. This device is known as a modulator/demodulator (or modem).

A modem modulates the digital signal onto an analog carrier frequency. At the receiving end another modem demodulates the signal to retrieve the digital data. The speed at which this can be done depends on a number of factors such as the type of data link, the amount of noise present, etc. With frequency modulation (FM), or frequency-shift keying (FSK), the 3000 Hz bandwidth translates to a practical maximum signaling rate of 1200 baud, equaling a bitrate of 1200 bps. Modem manufacturers have used techniques such as phase modulation and amplitude modulated-phase shift keying (AM-PSK) to achieve higher speeds. Currently the fastest modems operate at a speeds of 36.6 kbps in both directions or 56 kbps in one direction (downstream). Most computer users, however, are still using 28.8 kbps or 14.4 kbps modems.


2.1.5 Transmission Control Protocol / Internet Protocol (TCP/IP)

The Internet operates on a protocol called TCP/IP (transmission control protocol / Internet protocol). Any computer which is directly connected to the Internet must be able to "speak this language" in order to communicate with other computers on the Internet. Each computer directly connected to the Internet is assigned an I.P. number to uniquely identify it. Similar to postal addresses, I.P. addresses are necessary to route packets to the correct computer. This poses a problem to dial-up users because they are establishing a temporary connection to the Internet and do not have their own I.P. number. Protocols such as the serial line Internet protocol (SLIP) and the newer point to point protocol (PPP), however, allow dial-up users to receive a temporary I.P. number from a bank of numbers owned by their access provider. In this way they can receive TCP/IP data just as if their computer were directly connected to the Internet, allowing them to use services such as the WWW, FTP, and Telnet.

TCP/IP uses a process known as packet switching. Groups of bytes are assembled into IP datagrams (self-contained packets). Datagrams contain routing information (the source and destination IP address), checksums (to aid in error correction), up to 64,000 bytes (64k) of data, and other information. A 128k e-mail message might be contained in 2 datagrams.

As the packets travel across the Internet they encounter a number of switches. Switches look at the destination IP address in a packet and then route it along the best path. They contain precise tables which are constantly updated to allow optimal path selection. Thus, each datagram, or packet, should travel the shortest path to its destination. If a link is broken or congested the switches, gateways, and routers do not route packets on that link. This makes the TCP/IP protocol resilient in that it can detect when a link is down or crowded and route around the troubled spot. This is exactly what the Department of Defense wanted so that in the event of a war we would be able to communicate if part of the network were to be destroyed.

This may also, however, cause the packets to travel different paths since the sending computer may detect a shorter path after the first packet is sent. Packets are reassembled in the correct order by the receiving computer but this can be problematic for sequentially ordered audio and video data if all of the necessary packets are not received by the time they are to be played.


2.1.6 TCP vs. UDP data transmission

TCP/IP is a connection-oriented protocol; it provides reliable data transfers. It is connection-oriented because it establishes a connection to (verifies the presence of) another machine before sending data. The service it offers is known as the reliable stream transport service [5]. The word stream is used in the sense that all user data is treated as two simultaneous streams of incoming and outgoing data. Data is transmitted to the IP layer in units known as segments. The segments may be small in applications such as UNIX talk where single characters may be transmitted, or large as in the case of a file transfer. In the case of large segments the transmitting computer waits until a buffer fills before adding header information. Large segments pose a problem to real-time audio files because the wait time for the buffer may be too long, interrupting the audio stream.

TCP/IP transfers are reliable because TCP employs go-back-N error control procedure in which the receiving computer requests N frames to be retransmitted when the incoming packet sequence gets out of order. While error correction is extremely important for files such as bank statements or binary programs, it creates another problem for real-time audio data. If an audio packet is lost, by the time the host has retransmitted it, it is too late to use the packet. On a noisy connection many packets may be lost. Thus, useless packets are retransmitted instead of needed real-time packets, decreasing the already limited amount of available bandwidth. This problem can be alleviated by using a buffer to store packets. If the retransmitted packets arrive before the rest of the buffer is depleted then there will be no break in the audio signal. In situations where the bandwidth required by the audio stream is equal to that of the communication path, however, it is impossible to repopulate the buffer. Thus the audio stream's bandwidth must be fixed low enough to regenerate the buffer in a reasonable amount of time.

The user datagram protocol (UDP) is a best-try (connectionless) service. It does not establish a connection with another machine before sending data and it does not check for errors or missing packets. This makes it ideal for real-time audio and video data schemes which can mask errors to some degree. This method does, however, pose a problem when a large number of packets are lost because the decoder must somehow deal with the loss. Methods for dealing with packet loss include repeating previous packets and masking techniques.


2.2 Digital Audio

Audio files pose one major problem when they are sent across data communication networks: they consume a large amount of bandwidth. A stereo CD quality audio file, for example, requires 1.41 Mbps of bandwidth. This is much larger than the 28.8 kbps of bandwidth provided by modems. Two factors contribute to the bandwidth consumed by an audio file: the sample rate and the quantization level.


2.2.1 Sample Rate

In order to preserve all of the frequencies in the range of human hearing (20 Hz - 20 kHz) an analog to digital converter must sample an analog signal at 40 kHz. This is in accordance with the Shannon - Nyquist theorem which states that the sample rate must be at least twice the highest frequency to be sampled. In this case 40 kHz is twice as large as 20 kHz. Sampling any lower than twice the highest frequency produces aliasing distortion. In practice a sampling rate of 44.1 kHz or 48 kHz is used.


2.2.2 Quantization Level

As an analog signal is sampled the amplitude of each sample must be stored. Any number of bits can be used to represent the amplitude. The number of bits that are used determines the number of quantization levels. Samples which fall in between two quantization levels are assigned the closest value. For example, the second sample shown in Figure 5 has a value lying half way between two quantization levels but must be assigned a value of 1. This process is known as quantizing and the difference between the actual analog value and the chosen quantization interval value is called quantization error [6].



Figure 5: Quantization error shown on a 2-bit system


Quantization error produces audible quantization distortion. The more bits used in the sampling process, the less audible the quantization distortion. Analog equipment performance is often judged by its signal to noise ratio (S/N) in decibels. That is, how loud the audio signal is compared to the level of noise present in the signal. A good signal to noise ratio is 90 dB. A similar measurement called signal to error ratio (S/E) is used for digital equipment. The S/E ratio is a measurement of quantization error. While not the same, the S/N and S/E ratio both measure the level of the signal over something undesirable such as noise or distortion. For this reason, an S/E ratio is often referred to as an S/N ratio in digital equipment.

In general, each additional bit used to represent the signal adds another 6 dB to the S/E ratio:

Signal-to-error ratio = 6.02 n + 1.76 dB (where n = the number of bits) [6]

Thus, using four bits gives us around 26 dB S/E, eight bits about 50 dB S/E, and sixteen bits almost 98 dB S/E. The average person can hear tape hiss -60 dB below the audio level [7]. It is important to use enough bits to lower the noise floor to a level where it cannot be heard, especially in soft passages. Most of the digital audio industry has agreed that 16 to 20 bits is adequate.


2.2.3 Bandwidth Considerations

With 2 bytes (or 16 bits) being generated 44,100 times per second it doesn't take long to generate a very large audio file. A one minute stereo song, for example, will generate a 10.58 million byte file (10.34 Megabytes). A 74 minute compact disc requires 764.86 Mb of storage. This is too large of a file to idly store on the 1.2 to 1.6 Gb hard drives that are standard in computers today.

More importantly, these large files present a problem when they are transferred from computer to computer over standard telephone lines. Using a sample rate of 44.1 kHz and 16 bits per sample, stereo audio generates a 1,411,200 bps real-time audio signal. In addition, all transmission protocols add bits to the audio file for routing purposes. This large amount of bandwidth eliminates any possibility of playing a raw PCM sound file through a 28.8 kbps modem in real-time.



< Back ..... Continue >