Web 2.0: Under Construction

Contents

Scale

,

by Thomas Claburn

Content Management

,

by Thomas Claburn

Security

,

by Charles Babcock

Lightweight Development

,

by Charles Babcock

The User Experience

,

by Aaron Ricadela

Communities

,

by Aaron Ricadela

Interactive Timeline: A Brief History Of Web 2.0

So how do you create interactive Web sites that play in this world? And what's the business model behind them? Business and technology leaders, including Amazon .com's Jeff Bezos, Microsoft's Ray Ozzie and Debra Chrapaty, Google's Eric Schmidt, Salesforce.com's Marc Benioff, and Skype's Niklas Zennstrom, will take up those subjects this week at the Web 2.0 Conference in San Francisco. This much we know: Web 2.0 requires a software and server infrastructure and IT architecture that are very different from Web sites of the past. Following is a discussion of six such areas: scale, content management, security, development techniques, user experience, and community. Web 2.0 companies now have more options and technologies for dealing with them all.

SCALE

id
unit-1659132512259
type
Sponsored post
Contents

Scale

,

by Thomas Claburn

Content Management

,

by Thomas Claburn

Security

,

by Charles Babcock

Lightweight Development

,

by Charles Babcock

The User Experience

,

by Aaron Ricadela

Communities

,

by Aaron Ricadela

Interactive Timeline: A Brief History Of Web 2.0

Few people launching a web site expect millions or tens of millions of visitors right away. But it happens. Ask YouTube, which had 20.8 million unique U.S. visitors in September, up from 114,000 a year earlier, according to Internet metrics company comScore Networks.

For most sites, network and IT infrastructure doesn't matter as much as you might think. True, there's no business without scalable, available IT resources--but that's where Web 2.0 companies start, not what sets them apart. Despite the sophisticated and massive tech infrastructure behind Google Video, YouTube proved more popular, prompting the deal for Google to acquire its former competitor. Infrastructure is the ante. A winning hand requires something more: innovation, community dynamic, mojo.

In fact, Web 2.0 startups can get out of the gate without their own data centers. Online retailer Amazon is selling slices of its infrastructure to startups that need help getting their computing up to speed. While that infrastructure--servers, operating systems, database software, network connections--is critical, it doesn't add much to the customer experience, says Adam Selipsky, VP of product management and developer relations for Amazon Web Services. And it can be a resource drain: Companies spend about 70% of their resources building and maintaining an IT foundation, Selipsky says.

In contrast to the first wave of Web expansion, when buying million-dollar servers and getting big fast was the mantra, today's 2.0 startups aren't caught up in the nuts and bolts of computing. "We definitely don't see low-level data centers as being core to our business or value proposition," says Don MacAskill, CEO and co-founder of online photo-sharing site SmugMug, which uses Amazon's S3 storage service, a massive array of storage devices linked via storage management software, to augment homegrown infrastructure.

"It's pretty simple, because Amazon does the heavy lifting for us that involves replicating the files among many data centers and storage media," says Chris MacAskill, president and co-founder of SmugMug.

With 18 employees, SmugMug handles 180,000 paying customers and 115 million photos. "We really think our value proposition is the customer experience," Don MacAskill says. "That includes our Web user interface and our customer service."

(click image for larger view)

\

\

\

Cute pictures, and they're sitting on an array of commodity storage devices operated by Amazon.com

Scaling customer service may be more difficult than scaling servers, Don MacAskill says. SmugMug has space right next to YouTube in one Silicon Valley data center, and both companies spend time on many of the same infrastructure challenges, like servers and redundant, self-healing file systems, "It's kind of silly that we're all reinventing the wheel rather than someone coming along and commoditizing it," he says.

The basic stuff is a commodity. David Dudas, co-founder and CTO of online video-editing site Eyespot, points to inexpensive, powerful Intel-based servers; bandwidth from multiple providers available at a fraction of the cost of a few years ago; disk storage that's inexpensive and dense (requiring less data center space); and open source software that includes a free, enterprise-class operating system (Fedora Linux), relational database (MySQL), Web server (Apache), and application framework (Ajax).

Eyespot's advantage is its ability to combine pieces into a scalable online video-editing platform. "All the inexpensive hardware in the world will do you no good if you don't know how to put it together correctly, or if your systems break down or grind to a halt at the 50-million-user mark because of poor architecture," Dudas says.

That's a challenge. One of the keys is putting together a system of IT components--such as servers, databases, routers--that can grow independently of one another. Another key is understanding that different media-serving functions--streaming, image serving, Web page serving, databases, and so on--have different resource requirements.

The rent-a-data-center approach to Web 2.0 infrastructure may work for only so long. At some point, "we need to build our own systems," says Arik Czerniak, co-founder and CEO of Metacafe, the third-fastest-growing site on the Net between August and September. Metacafe had 16.6 million unique users and 492 million page views worldwide in September, comScore says. "At this scale, it's a huge technical challenge to make sure your site is up and running," Czerniak says.

Metacafe designed its own software infrastructure--the services, template libraries, metrics, versioning, and monitoring infrastructures, says co-founder and chief product officer Eyal Hertzog. At the same time, the online video site relies on content delivery network Limelight Networks to cache files for more efficient delivery and on Web hosting company RackSpace for server hosting. Metacafe uses the Lamp (Linux, Apache, MySQL, PHP) software stack.

For Metacafe, scaling efficiently means running the site on several hundred servers rather than several thousand. Says Czerniak, "If we were still using the same technology we were using on day one, we would probably need 10,000 servers."

-- Thomas Claburn

CONTENT MANAGEMENT

Contents

Scale

,

by Thomas Claburn

Content Management

,

by Thomas Claburn

Security

,

by Charles Babcock

Lightweight Development

,

by Charles Babcock

The User Experience

,

by Aaron Ricadela

Communities

,

by Aaron Ricadela

Interactive Timeline: A Brief History Of Web 2.0

For sites whose main reason for being is to gather content, package it, and deliver it to millions of people, the challenge is to figure out the best way to manage all those files. Companies may need to develop their own approaches because the architecture of Web 2.0 interactivity--tagging, rating, uploading--isn't well supported in commercial content management systems. "Scaling is the big question when it comes to user-generated content," says Jesse James Garrett, director of user experience strategy and a founding partner of Web design firm Adaptive Path.

The reason existing content management infrastructures don't work for these Web 2.0 companies is "their definition of content management was completely outside what the vendors were considering when they created their software," Garrett says. Most enterprise content management systems were designed to handle documents, spreadsheets, databases, and other conventional types of files--not photos, video, or online communities.

Photo sharing site SmugMug has been adding 300,000 to 500,000 image files a day, CEO MacAskill says. The company's content management system isn't especially sophisticated, he says, just "a small bit of glue. Not many lines of code." His main concern is large amounts of bulletproof storage, which the company gets from Amazon's S3 storage service, in conjunction with a customer-friendly user interface and Amazon support. The "glue" serves to prevent data loss when file-writing operations fail.

(click image for larger view)

\

\

\

Handling this much content and nearly half a billion visitors a month is a huge technical challenge for Metacafe

What makes content management more difficult for many Web 2.0 companies is the need to deal with user-generated material; everything those companies do revolves around data and its management. Before storing the files it receives, SmugMug does a lot of work on them, such as making sure they're the right color space, extracting information that may be used as captions and keyword tags, and making copies in various sizes that can be fetched from the disk for quick display, says Chris MacAskill. After that, Amazon replicates the files among data centers and storage.

The challenge for Metacafe is dealing with massive amounts of video as well as data gathered from users and developers. That means choosing the right content delivery network, tracking buffering times around the globe, and doing lots of development work to track page loads and the stress placed on databases. "Metacafe is really different in terms of the sheer amounts of data that we mine to bring results to our users," CEO Czerniak says.

The company uses open source software across the board for production and development. It uses a wiki to manage its development cycle and as its main knowledge management tool, chief product officer Hertzog says. "Every idea and thought are written to the wiki and reviewed and edited by people from the company," he says. "Once an idea is accepted, we continue to spec it, design it, and write the test plans on it."

Content management is difficult for Web 2.0 companies no matter how you look at it. But the good news is that people are learning. In the late 1990s, a lot of sites hit a wall because they couldn't scale, Garrett says. "We've learned a lot as an industry in the last five years," he says, "about how to build applications from the start with the flexibility that's going to be able to sustain a mass audience."

-- Thomas Claburn

SECURITY

Contents

Scale

,

by Thomas Claburn

Content Management

,

by Thomas Claburn

Security

,

by Charles Babcock

Lightweight Development

,

by Charles Babcock

The User Experience

,

by Aaron Ricadela

Communities

,

by Aaron Ricadela

Interactive Timeline: A Brief History Of Web 2.0

As companies upgrade their Web sites with the latest interactive technologies, they'll find the sites offer both a greater opportunity to attract and retain users and pose a greater danger of security breaches inside the firewall. Ajax, with its use of JavaScript, lets a writer create programs that automatically execute when loaded into a visitor's browser window. JavaScript is just the most prominent of browser-ready scripting languages that can launch malicious code back toward a server. Others include Microsoft Visual Basic and Microsoft's answer to JavaScript based on the ECMASript standard.They also include Adobe's ActiveScript, another ECMAScript look-alike, which runs in the browser window on the ubiquitous Flash player, already installed on 98% to 99% of Internet clients.

Asynchronous JavaScript, which is part of Ajax, is what powers Google Maps, tracking the user's cursor on a map grid and sending the information back to an Internet server. In effect, the JavaScript program is telling the server, "He's moving north. Send more map data on what's north of his present location."

Such interactive features are an ongoing threat because they contain hazards that can be minimized but not eliminated. And just minimizing them takes discipline by developers who may not have the experience to know when they're getting into trouble. Ajax applications can run lots of scripted code on the server side and in browsers, opening vulnerabilities hackers can exploit in the databases with which the apps communicate.

Even disciplined developers can fall prey. A year ago, social networking site MySpace hosted a new profile by a user called Samy. Included in his posting was a hidden JavaScript worm that would infect the browser of any MySpace user who came to Samy's profile and replicate itself in that user's profile. In one sense, the result was merely playful: Samy's goal was to post the line "Samy is my hero" in the "Heroes" section of as many MySpace users as possible.

(click image for larger view)

\

\

\

Does this MySpace profile contain a JavaScript worm?

But the infection spread quickly. Within 20 hours, the JavaScript worm had infected a million MySpace users. As it continued to build, the artificial traffic being generated by the worm's actions brought MySpace servers to their knees. MySpace has declined to comment, but it was reported on Slashdot that the company had to shut down its site temporarily to get rid of the infection.

That's one example of why Web 2.0 developers must think about security from the beginning. A big danger of Web 2.0 technologies is when they call for users to put responses into forms or data fields. Developers may seek a particular response, such as a name or a ZIP code, but too few Web sites carefully validate the input. "At the client, there's no control over what gets actually input. It's all under the user's control," says Bryan Sullivan, development manager for SPI Dynamics, a security software company.

David Wagner, assistant professor of computer science at the University of California at Berkeley, warns that there are 1,001 ways to hide JavaScript in an HTML page, in a wiki, or on a MySpace or Yahoo Mail type of site. "If you caught 1,000 of them, you're still out of luck," Wagner says. "The bad guys have the advantage." Yahoo's Web mail servers were invaded by the Yamanner worm that a user uploaded last spring.

If a knowledgeable user types a SQL statement into an address field, that statement will execute against an available database back on the server, a maneuver known as SQL injection. If a MySpace user loads his personal page on the MySpace Web server with a JavaScript worm, that worm will execute in the browser window of visitors who inspect his content. MySpace has taken steps to block a repeat of the Samy worm, but malware writers undoubtedly will try something different next time.

Unlike worms that preceded it, the Samy worm wasn't limited to one operating system. It was a cross-platform worm, like Ajax on the Web, and it could be launched by Apple Macs, Linux workstations, or Windows PCs. It was silent, captured user information, and gave no warning that the user was being infected and would infect others. Warns Sullivan, "Imagine an Ajax worm on a bank site."

-- Charles Babcock

LIGHTWEIGHT DEVELOPMENT

Contents

Scale

,

by Thomas Claburn

Content Management

,

by Thomas Claburn

Security

,

by Charles Babcock

Lightweight Development

,

by Charles Babcock

The User Experience

,

by Aaron Ricadela

Communities

,

by Aaron Ricadela

Interactive Timeline: A Brief History Of Web 2.0

Two popular options--Ruby and Flash--are similar to Ajax, the lightweight, browser-based combination of JavaScript and XML on which Google Maps and other interactive sites are based. Unlike Ajax, which is relatively new, Ruby and Flash are Web site building technologies with mature toolsets.

Backchannelmedia built its site using Ruby on Rails, a specialized Web platform with its own lightweight programming language, Ruby. The site gives customers rapid access to a massive database of the results of TV commercials. Advertisers can match up TV viewer 800-number call-in orders with where their ad played in different markets throughout the day.

Backchannelmedia CIO Madeleine Noland says the firm of 25 employees had the skills to use Java, Visual Studio .Net, Ruby, and PHP when it decided a year ago to rebuild its customer interaction service, DRTV Research, using Ruby on Rails. The service was built in two months, rather than the nine months it would have taken if the company had used Java, says Jason Toy, director of technology, and it required only one-tenth the lines of code Java would have needed. The service adds 2.5 million ad records a day to the database while maintaining a capability to deliver a million different Web page results based on user queries.

(click image for larger view)

\

\

\

Nike kicked off a new store built on Flash, just like its sneakers

NikeStore, another interactive site, was built with Adobe Systems' Flash, a multimedia engine that runs Macromedia's ActionScript inside the browser Window. (Macromedia was acquired by Adobe.) NikeStore was launched in early September as a retail site employing the latest capabilities for shopper interaction, says John Mayo-Smith, CTO of Nike's site building agent, R/GA. For example, as a visitor's cursor moves over headings such as "Men" or "Kids," a drop-down menu of products appropriate to the category appears. Clicking on a featured item transforms the window surrounding it into a chance to view it in different colors and with related products. The transformations take place nearly instantaneously, without any changes to the surrounding page.

The shopper can personalize footwear at NikeID, a separate site that's tightly integrated into NikeStore, and put the selection in a shopping cart. It's as if everything sought by the shopper is being brought to the page he's on, rather than sending him to another page. In addition, the back button works throughout the site, which isn't common at many retail sites. "The shopper's experience," Mayo-Smith says, "lines up much better with people's expectations."

-- Charles Babcock

THE USER EXPERIENCE

Contents

Scale

,

by Thomas Claburn

Content Management

,

by Thomas Claburn

Security

,

by Charles Babcock

Lightweight Development

,

by Charles Babcock

The User Experience

,

by Aaron Ricadela

Communities

,

by Aaron Ricadela

Interactive Timeline: A Brief History Of Web 2.0

Few companies have spent as much time thinking about the online user experience as Microsoft. MSN, in the online content business since 1995, remains one of the Internet's most popular sites. But it's iTunes that people go to for music, not Urge or any of the other partnerships Microsoft engaged in with makers of iPod knock-offs. People "Google" for information, they don't "MSN" for it. And it's Google Maps and Google Earth that own the find-your-way franchise, not Microsoft's Virtual Earth. So far, at least, Microsoft's sites aren't a destination for the tastemakers of today's younger generation.

Recognizing its problems, Microsoft plans some renovations. The company will spend $500 million this fiscal year to develop Internet search engines and other software that can compete with products from Google and Yahoo. The price tag includes the cost of building new data center capacity to host upcoming consumer and commercial software. Its new Zune music player and music-shopping site are scheduled to launch next week. This week, Microsoft's online mapping software--a popular application that's closely tied to the success of its search engine--gets a major upgrade that casts its maps of the United States into eye-pleasing 3-D.

(click image for larger view)

If it's a hit, the new site could help Microsoft build a following on par with Google Earth, which has won rave reviews for its ability to let users zoom around the globe, soar over landmarks from the Grand Canyon to the Pyramids of Egypt, or swoop down to a neighborhood to find the nearest Starbucks.

What technology will Microsoft use to improve the Virtual Earth experience? Two acquisitions from earlier this year offer a clue. Microsoft bought Vexcel, which uses a technique called photogrammetry that can create 3-D images of cities and countries from aerial photographs. It also acquired Massive, a software company that lets sponsors inject ads into video games. Microsoft already licenses Virtual Earth APIs to Best Buy, Expedia, and others, like Google and Yahoo do for their mapping software. That gives partners a way to inject these applications with a dose of their own innovation--a decidedly Web 2.0 characteristic.

Google came out with a new version of Google Earth in September and has some new tricks, as well. They include descriptions of world landmarks from Discovery Networks and National Geographic, information on campgrounds and trails from the National Park Service, and the ability to record HDTV movies of users' flyover sessions. These may turn out to be the kinds of advances that thrill users, letting them see things they never could before. In this battle, fast, detailed, and fun count for a lot.

-- Aaron Ricadela

COMMUNITIES

Contents

Scale

,

by Thomas Claburn

Content Management

,

by Thomas Claburn

Security

,

by Charles Babcock

Lightweight Development

,

by Charles Babcock

The User Experience

,

by Aaron Ricadela

Communities

,

by Aaron Ricadela

Interactive Timeline: A Brief History Of Web 2.0

But the concept of online communities can be a red herring. They're seldom the "we're all in this together" gathering places the term conjures up. "I always hated the word 'community.' It's one of those words people use so they don't have to think about what really matters," says Tim O'Reilly, CEO of O'Reilly Media, which co-produces the Web 2.0 Conference along with CMP Technology, InformationWeek's parent company.

In O'Reilly's view, some of the most successful community sites actually work in ways that run counter to the notion of harmonious Internet denizens sharing their thoughts. Amazon, for instance, boasts millions of user-written book, music, and product reviews, harnessing the smarts (and presumably, the intellectual chaff as well) of its customers. "They work and work and work their users," O'Reilly says. But is Amazon a community site?

Often, online communities--sites with features that let users find ways to talk, share, and relate to one another--end up being tangential to a site's business goals. And that, O'Reilly says, is the crux of the problem.

Take Flickr, the popular photo-sharing site owned by Yahoo. Sharing photos is great, but Flickr makes prominent use of one-way links, so I can see your photos without you knowing that I'm looking. Hardly communitarian. Craigslist, another of the Web's most popular sites, is driven by user power, but everyone's self-interested--you respond to my ad, and I benefit.

MySpace has turned communities into piles of cash, but most of its imitators have failed. Even Wikipedia, which projects the utopian notion of collective writing and editing to disseminate the world's knowledge, is dominated by a core group of contributors.

"The real question is, how can users add value to what you do?" O'Reilly says. "Community is this little tiny piece of this much bigger story."

-- Aaron Ricadela