Mining social networks can boost efficiency, innovation and, as IBM’s Dr. Ching-Yung Lin has shown, prove the value of your connections.
By Quinn Norton
Dr. Ching-Yung Lin treats numbers like they’re people, because in his work, they are. A research scientist at the IBM T. J. Watson Research Center in New York, Lin is the founder and lead of SmallBlue, one of the world’s largest social analytics projects. His lab is the 410,000-strong global city of IBM.
SmallBlue is a program that builds topography of all employees out of data from volunteers who share work-based data with the system. The e-mails, wikis, blogs and instant messages (IM) of volunteers aren’t available to the whole company, but the analysis creates an interface to a network anyone in the company can use. With the data, Lin can build the network topography of far more employees, with accuracy about placement and areas of interest for non-volunteers increasing in social proximity to volunteers. This IBM map of relational expertise allows people to be found the same way subjects are findable on Google—with more of an awareness of what you wanted than the knowledge of where to look.
On the user side, SmallBlue gives you a personalized page with information the system earmarks for you from services such as Whisper, a kind of Twitter-meets-Del.icio.us used internally at IBM. From there, a search on local or global expertise by keyword brings up the person who SmallBlue believes is most suitable to answer your query, and provides you the social path to that person. In the age of the disappearing and distributed office, it’s important to break down the barriers to information flow.
"Sixty percent of IBM workers work at home," Lin says. "As long as you can find the guy [you need] you can just send an IM."
If you’re a volunteer, that information is fed back into the system. The path you follow, the expertise, everything is implicit, built out of those e-mails, IM conversations, posts and so on.
Lin followed an improbable path to social network analysis. In college, he studied electrical engineering, but he had a fascination for understanding other people unaddressed by engineering. He studied sociology on his own time, and found himself living social change during a vital moment in Taiwanese history.
"As an undergraduate, I spent a lot of time working on street demonstrations. It was the turning point of Taiwan from one-party rule to democracy," Lin says.
He left Taiwan for his doctoral work at Columbia University in New York. There he worked on image and signal processing and analysis, and was hired by IBM straight out of university to continue his work, which came to include video as well. Lin invented ways of guessing at the content of images and videos in order to categorize them, inferring meaning from contextual clues. It’s a subtle and still largely unsolved problem in computer science—not only can’t we figure out how to teach computers to infer meaning, we don’t really understand how we do it ourselves.
While working on the video processing problem in 2003, Lin attended an internal presentation on IBM’s problem with finding its own expertise. The company’s sales and consulting departments often needed to find people to help them, but they had no idea how. Everyone knew the institutional knowledge was there somewhere, but buried in IBM: the institution’s unknown knowns. The problem intrigued Lin, whose work had centered on finding the implicit in images and videos. This was about finding the implicit in crowds.
"People are much more complicated than even the media," he says. "The human part is more interesting."
Like many researchers that bridged the technical and the social, Lin tore into Enron e-mails after their release as part of a federal investigation. They offered a look into the inner workings of a social group with one of the most studied fates since the advent of the Internet. The Enron e-mails represented a before and after, you could compare your predictions of what was going on in the e-mail to what later came out in the courts and the press. It was the first social network analysis that was like a math book with answers you could check at the end. Lin’s analysis of the roles of people revealed in their e-mails during the California energy crisis matched well with the roles later revealed by the media. It was small, but it was a proof of concept.
IBM needed better ways of talking to itself. One of its early efforts was a social network called Blue Pages, but the program languished with the problems of self-reporting. People didn’t want to bother, and when they bothered, they were too busy to subsequently update their Blue Page. Lin believed the techniques developed out of the Enron e-mails could help, but instead of looking for malfeasance, he would use the same mathematics to find expertise, interests and relationships, both between people and between topics. Lin decided to build a prototype of a data-mining-derived social network.
He started working with data from two volunteer co-workers and himself. The prototype of SmallBlue modeled a social network based on all three people’s e-mails, IMs and other communications. But it described much more than three people—the communications those three had with other people allowed Lin to start building profiles of others—and how they interconnected among themselves.
For example, being in on an e-mail thread would set up social relationships between everyone listed on the e-mail, and other e-mails with high co-occurrence of certain people would strengthen the social bonds between those people in SmallBlue. Keywords in those e-mails told SmallBlue what those people worked on together, and then the system could guess as to their interests and areas of expertise. The further someone was from a volunteer, the less the network could pinpoint their relationships and interests, but even with just three people the prototype was good enough to persuade IBM’s Global Business Services to fund the expansion of SmallBlue, and Lin quit image processing to work on social network mining full time.
Through the data from 12,000 volunteers in 72 countries over the past three years, Lin has constructed a map that tells the story in broad outlines (some broader than others) of the 410,000 employees in the global city that is IBM. But how accurately can 12,000 describe 410,000?
"It depends on where those people are situated," says Charles Armstong, CEO of the social analytics company Trampoline Systems. "It’s remarkable how much you can get from a small sample, but the data will be very thin for a large portion of the people they can map from that."
Right now SmallBlue picks up data from IBM volunteers, describing not only themselves in extreme detail, but the information and the people they touch. SmallBlue has run through more than 20 million e-mails and IMs and 2 million blog and database entries and chews through more every day. It is the largest social network dataset publicly known to exist.
Analyzing social patterns reveals a lot about the ecosystem of a social network. For instance, based on social network analysis techniques, mobile carriers in China are analyzing traffic to spot text spammers. If you text many people who don’t text you back and don’t text each other, that doesn’t look like a natural community—it looks like spam. The same technique makes it hard to game Lin’s system, where spamming is replaced by trying to look like an expert on a subject in SmallBlue.
Soon after expanding the project, Lin’s team ran into jurisdictional issues. Although SmallBlue opened data to all employees, gathering that kind of data and turning it over for analysis to the corporation raised alarms and was difficult to implement. As a result, while IBMers in Northern Europe can consume SmallBlue, legal authorities hold that it is impossible to be sure that data volunteers are completely free of coercion in the workplace.
"The solution for Northern Europe was that they could use our system, but they could not donate their data," Lin says.
Despite not being volunteers, people under restricted jurisdiction are generally, according to Lin, pretty well described. Lin’s work demonstrates the problem with a privacy law written in 1995, before online social networks were yet a twinkle in the Web’s eye, much less talking points for politicians in Brussels. Once you can algorithmically draw the outline around a person it’s not too hard to fill in the gaps.
"If we know something about your neighbors, we can find out something about you," Lin explains.
Neighbors in this case don’t have to be geographic—they can be colleagues in your field or even just acquaintances from other IBM offices.
Lin got together with colleagues from Massachusetts Institute of Technology and New York University to see what else they could glean from the volunteer data. They decided to take a first stab at putting a price tag on social capital. To do this they looked at a constrained set of people—IBM consultants—and measured them against their billable hours. Even with this very specific sample, which can’t really be generalized much further than "IBM consultants," the result was powerful. Lin found that every work connection was worth, on average, US$948—more if the subject’s network was diverse, less if the network was tight knit.
What made this group easy to quantify in this first study on social capital was the billable hours database, but without a doubt there are more rich pickings within SmallBlue.
"If you use just traditional sociological methodology you just collect details on people," Lin says. "Maybe we will push social science in another way, because we have a larger scale."
Amstrong agrees. "It becomes on-tap sociology…something that can underpin ongoing strategic thinking rather than a one-off information grab," he says.
In the future, Lin hopes to create an economics of social networks.
"Right now the financial capital and the human capital are easier to understand," Lin says. "But there is a human network capacity—this is something new." One+
QUINN NORTON is a freelance journalist best known for covering intellectual property, science and technology.