By Michael V. Copeland
After taking one of the first Internet companies — EarthWeb — public in 1998, Nova Spivack joined some friends at a weedy airstrip deep inside the new Russia for a trip into Earth’s stratosphere.
Having space travel on your resume is de rigueur for Internet entrepreneurs these days, but this was 1999, and not even the Russian pilots were sure how the flight would turn out. As Spivack was being strapped into a MiG-25 and prepped for his trip at Mach 3, about 20 miles straight up, he looked around for an ejection button or lever in case things went south.
There wasn’t one. “‘Don’t worry about eet,’ the pilot told me,” says Spivack, mustering his best Russian accent. “At the speed you will be going, even if you could eject, first your body would explode into vapor, then the vapor would freeze into ice crystals, and then the crystals would burn up on reentry.”
With that, they taxied down the runway for a quick ride to the edge of space.
Spivack returned in one piece ready to launch more startups, but the image of his body exploded into ice crystals and skittering into the stratosphere never left him. And in fact, it’s not a bad metaphor — in reverse — for what his newest venture is trying to do.
If you think of the World Wide Web as a cloud of largely undifferentiated information, the mission of the company he’s about to unveil, Radar Networks, is to take that cloud and impose order on it. Not just any order, but a very special kind known to experts by one of the hottest buzzwords in computer science today: the semantic Web.
For all the wonders that today’s Web can deliver to your fingertips — the Norwegian word for ice cream, a seat on the next flight to Paris, the best price for a Clash CD — it has a fundamental flaw.
It’s basically a compendium of billions of text documents designed to be read by humans. You can search it for keywords, but the results aren’t much use until you sort through them to find the page that has the info you want.
To take the Web to the next level — to move from Web 2.0 to Web 3.0 — the information in those documents will have to be turned into data that a machine can read and evaluate on its own. Only then will computers be able to take over tasks we now do by hand: find the nearest restaurant, book the best flight, buy the cheapest CD.
Think of it as the difference between two dimensions and three dimensions. “People will see the Web start to become smarter,” Spivack says. “Eventually it will have some reasoning capabilities built into it.”
We’ll get to how that happens in a bit. For Spivack, however, the semantic Web begins now with the data engine and user applications he and his team are prepping for launch — and ends somewhere in the future with artificially intelligent software agents handling all the online drudgery of your business and professional life.
Radar Networks isn’t the only company exploring the potential of the semantic Web. It’s a disruptive technology with the power to unseat today’s Internet titans — especially search engine giants like Google and Yahoo — and it’s being vigorously pursued by startups like Garlik, Metaweb Technologies, Powerset, and ZoomInfo, as well as big corporations like Citigroup, Eli Lilly, Kodak, Oracle, and Google and Yahoo themselves.
One estimate pegs the market for products and services stemming from semantic Web technologies at $50 billion by 2010, up from about $7 billion today.
But for all the entrepreneurs ready to spin gold out of the semantic Web, there are as many skeptics convinced that it’s a pipe dream — a fancy name for a problem that will never be fully solved. Spivack, with the confidence of a man who has been to space without a safety net, is determined to prove them wrong.
Radar Networks is housed in a renovated warehouse not far from the ballpark where the San Francisco Giants play. Inside, massive redwood timbers span the high ceilings alongside thick clusters of data cable. A Nintendo Wii and a shiny new De’Longhi espresso machine are the only outward signs of anything being done here but mind-bending work.
There are 20 people at the company now, but there’s space for 50, and with just a bit less than $10 million in venture funding, Spivack and his senior executives are busy hiring.
The background of the Radar team includes deep expertise in statistics, bioinformatics, and artificial intelligence. Radar’s chief architect, Jim Wissner, is a Java ace. Chris Jones, director of products and operations, is a design and user interface whiz. CTO Lew Tucker got his start by mapping neural transmitters in the brain. Tucker and Spivack go back to the late ’80s, when they both worked for Danny Hillis at Thinking Machines.
Given all the firepower assembled at Radar Networks, you get the sense that this is not your typical Web startup. And it’s not. The task the company has set for itself — bringing the power of the semantic Web to the Internet — is not easy to describe. Even the man who invented the Web, Tim Berners-Lee, needs a little room to explain why it’s important.
The term “semantic Web” first gained prominence in a 2001 article by Berners-Lee and two coauthors, James Hendler and Ora Lassila, in Scientific American. In it they described software agents roaming across the Web, making travel arrangements and doctor’s appointments and muting the stereo when the telephone rings.
It was a great vision, but it couldn’t be achieved with today’s Internet.
For the semantic Web to work, online information needs to be made readable by machines. Services like Google do a great job of sifting through all those webpages, but it’s up to people to recognize the things they want when they see them in the results. It’s also up to people to combine information to, say, plan a long-overdue ski trip.
The Web just isn’t very smart yet; one webpage is the same as any other. It might have a higher Google ranking, but there’s no distinction based on meaning.
The semantic Web in the Berners-Lee vision acts more like a series of connected databases, where all information resides in a structured form. Within that structure is a layer of description that adds meaning that the computer can understand. (“Semantics” is the branch of linguistics concerned with meaning.)
On the semantic Web, a person — Nova Spivack, for example — isn’t just a name that comes up on webpages when you google him. He’s a fully described object endowed with certain well-defined properties: a date of birth, a job title, a home address, specific hobbies, the fact that he is the grandson of legendary management thinker Peter Drucker.
People on the semantic Web have unambiguous connections to the places they work, the people they’re related to, their friends, their calendars, and the things they’re interested in. Being able to connect those properties in seen and unseen ways is what gives the semantic Web its power.
Consider this scenario: Say you want to arrange a dinner at an upcoming conference. Today you might go through your address book and ping folks by e-mail to see who’s attending. Then you probably send out e-mail invitations to dinner. You go back and forth with the group on the place and time, somehow you all agree, and then somebody makes a reservation. Files fly back and forth, with humans at the center.
In the semantic Web, your software agent will “know” in advance what’s involved in arranging a dinner. Instead of you sending out a flurry of e-mails, the agent could cull the conference attendees and make a list of potential invitees.
It might also look through your address book to see which of your friends live in the city where the conference is being held. Once a list of potential dinner guests has been approved by you, the agent would negotiate the date and time with everyone else’s agents via a calendar database, pick a restaurant from another database based on availability and your personal preferences, make the reservation, and send out directions. In a GPS-enabled world, it could even let you know how far a guest who is running late has to go.
Of course, it’s been six years since Berners-Lee put his vision out there, and you still can’t get that sort of service. Tagging is a start, and services like Flickr offer a sort of crude Web 2.0 version of the semantic Web. Google Base is another stab at bringing semantic technologies to the wider Web, serving as a place where anyone can enter data and have it searched, but it doesn’t use the semantic approach from start to finish.
Bringing a true semantic Web to the world is a chicken-and-egg problem. Until there’s enough data rendered in computer-readable form (resource description framework, or RDF, is the leading standard) with enough metadata attached to it to make it meaningful, nobody is going to be able to create any interesting services.
The agents of the semantic Web need the raw ingredients before they can make their souffl�s.
But you can do some interesting things within subsets of the Web. Large pharmaceutical companies like Eli Lilly have been experimenting with adding a semantic layer on top of their drug discovery databases to help scientists see connections between drug molecules and diseases.
Amazon.com is keen on using semantic technologies to help customers search its databases. Kodak wants semantic tagging to help photographers organize their snapshots online. The CIA has been loading its databases of overseas phone taps into semantic “mills” to make it easier to sift for connections between people, places, and incidents — hoping to spot terror threats before it’s too late.
“But how do you make this thing really useful for ordinary people?” asks Radar CTO Tucker. “Not everyone is a CIA analyst.”
Spivack’s answer grew out of conversations he had with Drucker in the summer of 2001, about four years before the professor’s passing. “We would meet for two hours a day and talk about organizations vs. organisms,” Spivack says. Drucker was particularly interested in what he called the intelligence of organizations. “My grandfather helped me think about group minds,” Spivack says. “How groups get more intelligent, and how connections play into that.”
Since bringing the semantic Web into the world is a chicken-and-egg proposition, Radar Networks has built both the chicken and the egg. The chicken is the underlying engine the team has created that not only turns data into simple but meaningful digital objects via RDF but also scales up to hold hundreds of millions of objects that can be searched, swapped, and connected to one another. The egg is the user application that rides on top of it all.
The first consumer app Radar plans to launch is a sort of personal data organizer. It will allow you to bring in e-mail, contacts, photos, video, music –anything digital, really — from anywhere on the Web, turn it into RDF, and access it in one place.
Semantic tags are added manually, or automatically if the item is a photo from Flickr or a video from YouTube. “We add a new level of order to connect and interact with these things at a higher level than is possible today,” Spivack says. “We are letting you build a little semantic Web for your project, your group, or your interest.”
When it’s done, it should be like the best wiki you’ve ever used. To illustrate, Spivack flips open his computer and pulls up his own Radar-enabled page. On it are groups of people he knows and interests he’s pursuing, including the space industry, alternative energy, physics, Internet-related technology, and skiing. In each of these categories are objects that Spivack has collected and tagged or, if it is a topic that has multiple people included, that they have collected and tagged.
In the skiing topic, for example, Spivack has posed a question: Where should we go skiing? One of the responses is Alta, Utah. When Spivack clicks on that item, the Radar engine goes out and finds all the things in the Radar Networks database related to Alta. It “knows” that Alta, in this case, refers to a place (as opposed to the Spanish word for “high”), so there are hotel suggestions. There are also photos, videos, trail maps, and comments from people in his group who have skied there before.
In a sense, what Radar allows Spivack to do is build a database around any question, project, or interest he may have and then start looking at it from different perspectives: cost, distance from San Francisco, snow conditions in March, nearby restaurants, what his friends liked about a particular resort.
And if they liked Alta, what other places did they like? “You start to see new ways to look at the information,” Spivack says. “What gets me excited is what we can do when we have billions of objects and 10 million people using them.”
For that to happen, of course, people need to start adding their own digital stuff to the mix. The digital life organizer is the bait Spivack and his team are using to try to draw them in. The team will also open Radar Networks to outside developers to write their own applications. Those might involve travel, food, or a better way to manage large projects.
Radar hopes to be the engine powering all that, providing a massive, meaning-filled Web of data that can be infinitely poked and prodded and leveraged. The company will make its money from advertising and premium subscriptions; the basic service will be free.
But don’t expect a sci-fi software agent that takes care of your every whim — Spivack is quick to say that’s not what Radar is launching. “Those people who think we will be offering Hal 9000 when this goes public in October will be disappointed,” he says. “We’ve had the problem of overpromising in this industry; a lot of us who were working on semantic Web technologies early on saw the potential and got a little excited. It has taken much longer to realize than we thought. One thing Web 2.0 has taught everybody is that simpler is better. Find something useful and iterate on that.”
Tom Coates, whose day job at Yahoo involves working on just these issues, thinks the Web 2.0 crowd is already taking care of the problem. He points to tagging and microformats that add some of the same metadata to webpages that semantic technologies offer.
“I call it the dirty semantic Web,” Coates says from his London office. “It may not be the pristine Berners-Lee view of the world, but it is headed in the right direction.”
On a lark, Coates and a colleague created a site called Astronewsology that demonstrates the power of a semantic approach by combining news reports and horoscopes. Using it, you can search the news by the sign Capricorn and see whether that day’s horoscope had any bearing on what happened to people born between Dec. 22 and Jan. 20. Coates’s point is that you can extract meaning from the data without adopting the exacting standards proposed by Berners-Lee.
Things get even more interesting when the data starts to become interconnected.
“It’s in the combination that the real power of this comes out,” Coates says. “The mashup is an early example of the Web that is to come. Semantic technologies have not taken off as much as we’d hoped because people are finding more utility in other Web 2.0 technologies at the moment. The goal is the most important thing: reusable, repurposable, and reconnectable data. How we get there is not as important.”
The shift to a semantic Web is still in its very early days. Spivack envisions a time line of five to seven years. But the shift is clearly under way. James Hendler, one of the coauthors of that seminal Scientific American article, sees the same dynamics he did when the Web was first forming.
“Those of us who were involved saw little islands of the Web being created,” he says. “To most people, the Web seemed to happen overnight, because they hadn’t seen the first six to eight years of effort. We’re in that early phase of the semantic Web.”
Radar Networks, Google Base, and even Flickr are the first islands to pop into public view. Larger islands are being formed by corporations and government agencies. Many more will rise.
Spivack is counting on those islands to eventually coalesce. That’s when the potential becomes reality. That’s when we can all kick back and let our software agents go out and bring some order to the chaos of our digital lives.