Match Game
Imagine reading an ESPN blog post about pro golfer Tiger Woods and seeing a short ad for the San Francisco Zoo displayed on the same page. Or imagine the horror of reading a lurid news item about a headless body found in a suitcase—and then seeing an advertisement for a leading luggage manufacturer. Not the best way to win friends and influence sponsors.
It’s no secret that online ads should correspond to the pages on which they appear. After all, relevant ads ensure a better user experience and increase the likelihood of clickthroughs.
The problem, however, is that conventional ad matching techniques do not always produce the most reliable results. That’s why Yahoo! Research scientists Andrei Broder, Marcus Fontoura, Vanja Josifovski, and Lance Riedel developed a new approach, which is highlighted in their research paper titled “A Semantic Approach to Contextual Advertising.”
Previous approaches matched ads based on the appearance of the same words or phrases within both the ad and the webpage. For instance, a news story about the Wimbledon tennis tournament could feature contextual ads with terms like “tennis racquet” or “tennis balls”.
But what if the webpage and the ad use different words to describe the same thing? In this scenario, an ad containing the word “physician” might not be matched with a webpage about “doctors.”
To rectify the problem, Yahoo! researchers are developing a mechanism that not only matches ads based on keywords, but also on overall concepts that are more general than specific words. “Basically, the crucial idea we want to convey is that words matter, but understanding what the page is about matters more,” says Broder.
The researchers created a technology that classifies the page and the ads into a large tree of topics, or taxonomy, and then uses proximity as a factor in the ad ranking formula. Essentially, the classifier maps words from both the ad and the webpage to specific nodes on the tree, and then leverages these nodes to facilitate better matching. For example, if a page is about the sport of curling and contains the words “Lake Tahoe”, ads for ski packages to Lake Tahoe would still rank highly because both curling and skiing are winter sports.
Forming the right matches is a critical element of contextual advertising. So is speed. Humans can glance at a webpage and know almost immediately what it’s about. Not so for computers. But Yahoo! researchers are trying to change that too.
In their paper titled “Just-in-Time Contextual Advertising,” Aris Anagnostopoulos, Andrei Broder, Evgeniy Gabrilovich, Vanja Josifovski, and Lance Riedel outline summarization techniques that can extract short but informative excerpts—usually no more than 500 bytes of data—that are representative of the entire page content.
Why is this important? Because when a user views a page, the ad selection engine has only a couple hundred milliseconds to provide the ads. In most cases, this requirement does not allow pages to be fetched and analyzed online. Instead, the pages are fetched and analyzed offline, and the results are applied in a subsequent ad serving.
This approach works well for static content pages that are displayed repeatedly. However, a significant amount of the Web is not static. Some pages are dynamic by definition, constantly adding and changing content, such as news sites, forums, and blogs.
In all these cases, ads need to be matched to the page while it is being served to the end-user, with extremely limited time allotted for content analysis. To speed the process, Yahoo! Researchers have developed an algorithm that is capable of carefully choosing just 500 bytes of data, or less than 100 words, for analysis, rather than grinding through the whole page, which could easily be hundreds of kilobytes.
“We call this just-in-time because we don’t do any analysis ahead of time,” says Gabrilovich. “We only do it at the exact moment when we need to match ads. We quickly summarize the page, and then analyze that short summary, not the entire page.”
So far, the results have been encouraging. Experimental findings confirm that employing only a small portion of the page text can yield highly relevant ads, and the quality of summary-based ad matching is competitive with using the full page.