As reported on Wired.
BY CADE METZ
Image: Flickr/HarshWCAM3.
One early morning in 2011, somewhere behind the curtain at the world’s most popular social network, a Facebook engineer pressed a single button and brought down the entire operation.
This unnamed engineer didn’t necessarily make a mistake. He just decided to run the kind of software task the social networking giant runs all the time. He ran a “Hadoop job,” a way of analyzing data. The trouble is that Facebook analyzes data generated by hundreds of millions of people. This data is stored across thousands of machines inside the company’s data centers, and when you analyze it, all those servers must talk to each other.
According to Facebook man Donn Lee — who recalled the incident at a conference last spring — that one Hadoop job flooded the company’s computer network with so much traffic, the rest of the operation nearly ground to a halt. “I remember this morning very well,” Lee said. “It brought down Facebook — or severely crippled it.”
Lee — then a Facebook network engineer — was trying to show how much has changed with the computer networks that power the web’s biggest operations. In the past, most network traffic streamed back and forth between a server and the people out there on the internet trying to visit a webpage. But nowadays — with the rise of increasingly large and complex operations like Facebook, Google, and Amazon — there’s far more traffic bouncing around inside the data center, from server to server, and the traditional networking gear used by these net giants wasn’t meant to handle it all.
‘I remember this morning very well. It brought down Facebook — or severely crippled it.’
As a result, networks are changing. Companies like Facebook and Google are moving to higher-speed networking hardware, and they’re revamping the topology of their networks to accommodate the extra traffic moving between servers. But these improvements are only so valuable. Networking gurus like Donn Lee are also looking toward a new breed of networking gear inside the data center — gear that sends data as beams of light.
Yes, some internet data already travels as light. This is called optical networking. Standard electrical signals are converted to photons and then sent racing down lines of glass fiber. But typically, this happens over connections that move informationbetween data centers, and if it does happen inside the data center, it happens sparingly. The next step is to rebuild data center networks with an eye toward optics, pairing traditional electrical networking switches with optical switches that can significantly speed the transfer of data from server to server.
“If we can do this, the scaling properties of this kind of hybrid network — how the network scales up to accommodate more data traffic — is very attractive,” says George Papen, an optical networking researcher at the University of California, San Diego. “We’re not there yet, but we’re closer than we were.”
Papen is part of a large UCSD team that has already built a pair of test networks that demonstrate such optical switching, and this effort — typically referred to as Helios — is funded by Google, among other tech giants. One of the project’s primary researchers, Amin Vahdat, is now on leave at Google, where he’s actively exploring similar research, and another member of the team, Nathan Farrington, has joined the staff at Facebook.
According to Papen, Helios is still a long way from driving live data centers. But across the country, in Cambridge, Massachusetts, a startup known as Plexxi recently introduced an optical networking switch that seeks to remake the data center, and though this technology is quite different from Helios, it has the same basic goal.
“Photonic switching can be such a powerful thing. If you can keep things in the optical domain — as opposed to the electronic switching domain — there is a built-in performance advantage,” says Plexxi CEO Dave Husak. “We’re both trying to harness that effect.”
Helios Goes Back to the Future
It only makes sense that Google would tap Amin Vahdat to reinvent its data centers. He did it once before.
Traditionally, networks were hierarchies. You stuffed servers into racks, and you connected these servers to networking switches sitting at the top of the rack. Then you plugged these “top-of-rack switches” into another tier of faster networking gear — and you connected that tier to a third that was faster still. By the time you got to the network “core,” you were running enormously expensive networking hardware at speeds well beyond the switches sitting at the top of the rack.
You needed that extra speed in the core to accommodate all the traffic that was coming from the rest of the network — or so we thought. What Amin Vahdat and his co-researchers showed is that a hierarchical structure is wrong way to go. You could run your network much more efficiently if you used relatively cheap networking gear that operated at one common speed.
What Amin Vahdat and his co-researchers showed is that a hierarchical structure is wrong way to go. You could run your network much more efficiently if you used relatively cheap networking gear that operated at one common speed
“It was a revolution,” says George Papen. “Before this, people were building their data center networking like wide-area telecom networks. But Amin’s group realized this wasn’t cost-effective and they showed that you could build them in a completely different way.”
This uniform networking setup is known as a “fat tree” design, and it’s now commonplace among the big web operations. It’s part of the reason companies like Google have moved away from expensive gear from the likes of Cisco in favor of low-cost hardware acquired directly from manufacturers in Asia. But the Helios project — which Vahdat also played a role in — seeks to make even bigger changes.
The basic idea is to build a network that’s part electrical and part optical. Much of this network would continue to operate like existing electrical networks, moving data as electrons across copper wires and through silicon, but it would also be smart enough to shuttle certain traffic between servers using optical switches.
Today, some networks are already using optical lines to move data from server to switch — or between switches. But once the photons reach the switches themselves, they’re always converted back to electrons. With Helios, the idea is to build a true optical network — where the actual switching is optical — and then use this to remove some of the burden from your electrical network.
In a way, this project goes back to the future. Today’s networks use what’s called packet-switching to move data to and fro, breaking information into tiny messages before sending them across the wire. This is what made the internet possible. But the optical portion of the Helios project uses circuit-switching, establishing a dedicated connection between two end points. This is how an old-school phone network operates.
“Looking at every single packet inside a data center is not a very efficient use of your resources,” Papen says. “If you can figure out, even partially, where the traffic is going, and you don’t have to look at every single header on every single packet, you can create a dedicated circuit and ship a lot of the data — or shunt it — and not have it go through a packet-switched network.”
Papen compares this to a system that would solve Los Angeles automobile gridlock by magically laying temporary bridges between certain parts of the city — on the fly, wherever they’re needed. “On a moment-by-moment basis, you want to be able to drop the bridge down, shunt the traffic that’s congested, and, at a future date, move that bridge somewhere else, where other traffic is congested,” he says.
‘If you can figure out, even partially, where the traffic is going, and you don’t have to look at every single header on every single packet, you can create a dedicated circuit and ship a lot of the data — or shunt it — and not have it go through a packet-switched network.’
The setup is particularly attractive because an optical circuit-switched network is far more flexible than a traditional design. A traditional networking switch is built for a particular data rate: 10 Gigabits per second, 40 Gbps, etc. But an optical switch is different. “The circuit is a pipe and it doesn’t care what the data rate is. It’s rate-agnostic,” he says. “You can run almost any data-rate across it, and that’s very attractive — as you can imagine — as data centers continue to scale out.”
Though this setup is still quite a long way from real world data centers — at least from where Papen is sitting — he believes it will eventually come to fruition. “The real trick is to figure out the optimal partitioning,” he says. “Which traffic do you send along the network as it exists now and which do you offload or shunt to a circuit-switched network?” Then there’s the problem of cost. Optical hardware is more expensive than electrical gear, though the costs are coming down.
There’s always the possibility that Vahdat and Google have taken this research closer to reality, but Papen stresses that even he has no insight into what Google is doing. “Even I have optics friends inside large data centers,” he says. “I have no knowledge of what they’re doing.”
For Google, its most important competitive advantage is the design of its internal infrastructure, and it keeps the particulars hidden even from the outside researchers it’s funding. Vahdat did not respond to an interview request, and Google’s public relations arm declined to discuss the company’s optical-networking research.
But Google isn’t the only one exploring the future of optical switching. There’s also Facebook, Cisco, IBM, and now Plexxi.
Optics in the Heavens
Whereas Helios splits the network in two — one half driven by electrical switches and the other by optical hardware — Plexxi combines the electrical and the optical in a single switch. This device was officially unveiled late last year, and at least one company — a cloud operation known as Cloud Sigma— is using the switch inside live data centers.
You string these switches together in a ring, and though they continue to move some of the data using electrical means, you can also create direct optical connections between particular end-points — end-points where you’re exchanging unusually large amounts of data. A single optical line connects this ring of switches. But within that line, you can use different wavelengths of light to establish connections between two specific switches, and these connections operate with interruption from data streaming acros the rest of the network.
Much like Papen, Plexxi CEO Dave Husak uses a highway analogy when describing the technology. “With a conventional network, you’re stuck sending data where the wires go. It’s called a highway network. You go where the asphalt is,” he says. “With Plexxi, you can, in a way, make your own asphalt. If if turns out two places are going to talk to each other a lot, we can create optical lanes that directly connect them together.”
‘If if turns out two places are going to talk to each other a lot, we can create optical lanes that directly connect them together.’
But Plexxi doesn’t use circuit switching. In moving to the optical realm, it sticks with packet-switching. And it doesn’t redirect traffic on the fly. It provides a software controller that let’s you set up these optical paths based on the sort of applications you’re running.
Plexxi’s switches are about twice as expensive as an ordinary networking switch: about $70,000 versus $35,000. But according to Alexander Ivanov, the head of networking at CloudSigma, you make up that cost in other ways. The company can run its network with fewer networking switches, and this network is more adept at handling the massive amounts of traffic streaming back and forth within the company’s data centers — aka “east-west traffic,” as opposed to the “north-south traffic” that moves in and out of the data center.
The rise of east-west traffic, you see, is an issue not only for Google, Facebook, and so many other companies that rely on distributed data software along the lines of Hadoop, but also for CloudSigma, Amazon, and other “cloud” services — services that provide computing power to a world of outside customers. Cloud operations require the same heavy communication between servers.
As all these operations continue to grow, the limitations of electrical networking will only cause more problems. The bigger the data centers, the longer the distance between servers, and as these distances lengthen, electrical connections become less reliable and, well, more of a hassle. Optical networking is surely the answer. The question is how quickly it will arrive.
Additional reporting by Robert McMillan