Wednesday, November 26, 2008

Genetic Algorithms in Online Advertising?

During my visit to Weebly, there were several things that impressed me about the Weebly company and operation.

One item was an application service for online advertising, SnapAd.com. Due to an NDA, I couldn’t talk about it at the time. But, the product is now launched, and it is very impressive both in terms of technology and potential for online advertising.

The SnapAds service starts with a base population of ad variations and then uses a genetic algorithm and click through to select the key characteristic of each ad to morph the top performing possible ad, which is composed of the best elements of each of the ad variations. Think of it as A / B on steroids.

During my visit, I was super impressed with the demo, and SnapAds is one of the first fielded systems in the online marketing area that I have seen that make use of a genetic algorithm.

I have always had an interest in these optimization algorithms for application in the Web search area, with some of my first work being on simulated annealing.

Congrats to David Rusenko, Dan Veltri, and Chris Fanini, the Weebly founders, and to Greg Dingle, the lead developer of SnapAds.

Daehee Park has an interesting blog posting about SnapAds.com.

In addition to maintaining an excellent blog about IT matters, Daehee was on one of the winning teams in the 2008 Google Online Marketing Challenge.

Tuesday, November 25, 2008

Yahoo! Grid-computing Software Transforming (or at least improving) the Way Data Is Analyzed

Yahoo!'s has an open source, grid-computing data-mining program (Hadoop) that they are using to improve the relevance of the ads that Yahoo! shows on the Internet by analyzing the company's massive flow of data — over 10 terabytes a day — all on the fly.

Developed primarily by Doug Cutting (from Apple, Excite, and Nutch), Hadoop is based on Google’s MapReduce, but it an open source version. Universities, companies (including IBM, Google, Amazon, Facebook, and Intel) are using Hadoop. Yahoo! first used Hadoop to build their Web index.

Really interesting to see if this technology is transformational in terms of Yahoo! business aspects. Regardless, methods and technology to analysis Web data is an outgoing issue. I’ve leveraged methods such as times series analysis and binary trees. All computing and time intensive, so efforts like Hadoop could be a major step forward.

Read the full article on Yahoo! Hadoop

Monday, November 24, 2008

Search Engine Evaluation

There was an interesting discussion recently on the SIGIR Listserv. The discussion centered on evaluation and the various viewpoints were interesting and reflect the nuances of search engine evaluation. This discussion (and most other discussions on evaluation) comes down to how one views evaluation – from a system or people perspective.

The standard metrics, in academia, for evaluating information searching systems are precision and recall. Of the two, precision generally trumps in Web searching systems.

In practice, precision is effective for algorithmically evaluating functioning systems. One can run A / B tests to evaluate one algorithmic approach relative to another using a test collection with relevance judgments or click throughs for a fielded system.

Throw in some real users, though, and the situation gets messy quickly. For example, there are scenarios where the precision can be poor but the user satisfaction with system performance is good (e.g., situations where a system returns some really high quality results, along with some trash or in fact finding situations where the system returns the ‘right’ answer).

There is really no good solution to this dichotomy – instead, it’s usually best to determine the goals of the evaluation. If you are interested in algorithmic performance, than precision is for you. If you are interested in users’ view of performance, you might need precision but certainly other measures as well, such as sufficiency (i.e. did the system provide a sufficient amount of quality information), interface design, speed, and content collection.

If you are moving a system from the lab to production, then you certainly have to deal with system performance. A system that has high precision but too long (i.e., a few seconds) to return results is a poor system regardless of any precision score.

There are many angles to take on evaluation, including evaluating search engines for ecommerce searching investigating the effects of logos, and contextual help features.

Friday, November 21, 2008

Privacy, Weblogs, and User Tracking

Lawmakers are investigating online data collection issues, again, with concerns on privacy. News flash – want something private, don’t use a computer and certainly not a network of any type. There is even some good for society that can come of online logging and tracking.

However, I am very supportive of companies being open about the data they collect and what they do with it. Some online advertising companies are using an online tactic working with Internet service providers (ISPs) to log Internet users' activities across multiple Websites.

There are some privacy advocates and lawmakers are arguing that practice should be regulated because it gives ad companies unprecedented access to Internet users' movements. Ad companies say they do not collect personally identifiable information.

The government’s position -- or at least Rep. Edward Markey's (D-Mass.), chairman of the Energy and Commerce Subcommittee on Telecommunications and the Internet – is that:

(a) Internet tracking across different Websites should take place only with customers' consent, and
(b) the Internet should be governed by the same privacy regulations as those that apply to telephone and cable services.

Totally agree with (a) – although how ISPs could stop the logging may be difficult. Maybe they could just not store it.

However, couldn’t disagree more with (b). Each communication medium has its own expectation of privacy. Face-to-Face communication is push and pull. Cables services are primary push. Telephone systems are primary pull. Web is push and pull. In other words, an individual has a variety and range of options for engagement and for distributing information. Therefore, the privacy issues vary across mediums.

For years, many Internet companies have been tracking users' searches and activities to link advertisements to a user’s interests. The practice of third-party companies collaborating with ISPs to track users' online activities across the Web is sort of new, although the capability has been there for awhile.

I am not concerned with these companies tracking my movements on their own site. Its is their stuff, as long they are up front about it. Tracking across multiple Websites is a different matter though, and companies need to provide the option to opt out or use another service.

Thursday, November 20, 2008

Sponsored Search Theory Book

I have, finally, started writing an academic book on the theoretical underpinnings of sponsored search (a.k.a., keyword advertising, pay-per-click advertising).

Sponsored search refers to a range of techniques that link advertisements (i.e., sponsored results) with terms in queries or to Websites. The Google AdWords and Yahoo! Sponsored Search platforms implement the most common form of sponsored search. These platforms display sponsored results alongside organic listings on search engine results pages.

Sponsored search is one of the most revolutionary innovations to happen to search in more than a decade. Although there are a lot of ‘how to’ books, there are no books (that I know of) that deal with the theory, critical elements, and foundational aspects of sponsored search.

So, I am writing one that focuses on these aspects – the theory that makes sponsored search work. I am going to avoid as completely as possible any empirical findings or specific discussions of technology platforms. A book on sponsored search fundamentals is a project that I am really excited about and have been meaning to do for a long time.

I'm maintaining a special blog on the book at Understanding Sponsored Search. Always open to feedback and suggestions.

Wednesday, November 19, 2008

Web Analytics References

I am co-editor of a book, Handbook of Research on Weblog Analysis, that deals with methods to analyze data and content from search logs, transaction logs, blogs, Websites, and listservs.

Along with the 25 chapters in the book and a glossary of more 250 key terms, there is an extensive reference section that contains most of the relevant literature dealing with Web search and transaction log analysis.

Such a reference list can be a big help to anyone doing method and research work in the transaction log analysis area.

Download the complete list of references for Weblog Analysis

If you want the whole book, your library/organization can order it and get the complete contents online for free. Makes it good for method courses or group of folks that need access to the book. You can recommend this book to your librarian using this simple online form.

Tuesday, November 18, 2008

Home activity recognition (a.k.a., Google Watch)

Google researcher Bill Schilit, along with academics Jeonghwa Yang from the Georgia Institute of Technology and David McDonald from the University of Washington published a paper in IEEE Computer on home activity recognition, which is an proposal for a system that would track people's activities at home via network interactions.

The idea is that via home monitoring, technology could become active assistants, doing things like reminding people to perform forgotten tasks, helping them remember information, or encouraging them to act more safely or healthily.

Naturally, folks pointed out privacy questions, including data protection, data access, and legal aspects. The paper proposes types of data that could be collected from devices attached to a home network, similar to a Web search log that records the use of search engine.

The paper is kind of interesting, but a little behind the times in terms of what is already outside. Microsoft has been working on smart home technologies for several years. Those RFID tags embedded in products? – already a network there to do tracking of individuals as they cart products around. Just a small step to link other technology to this RFID tags. My Pocket PC can control technology in the home (and has been able to do for years).

So, I am failing to see what is new (in the research front). In terms of implementation, of course, there are hundreds of nifty things that still need to be done.

Here is the full article on home activity recognition

Monday, November 17, 2008

Weblog Analysis Terminology

I co-edited a book, Handbook of Research on Weblog Analysis, that deals with methods on analyzing content from search logs, transaction logs, blogs, Websites, and listservs.

Among the many good chapters in the books by an international range of authors, there is an extensive glossary of 250+ terms relating to Website log, search log, blog, listserv, discussion board, Web analytics, and transaction log analysis. The glossary can certainly be a help to anyone doing methodology and research work in this and related fields.

See the glossary of Weblog Analysis terms

If you want the whole book, your library can order it and get the complete contents online for free. Makes it good for method courses or group of folks that need access to it. You can recommend this book to your librarian using this simple online form.

Sunday, November 16, 2008

Journalists Using Facebook for Scoops

As Facebook becomes more mainstream, naturally folks are looking for ways to leverage it for commercial benefit, along with it being a neat place to virtually hang out. One occupation that is using Facebook is journalists.

Why are journalists doing so? Probably for the same reason that human resource folks hang out on linkedin or news organizations all have twitter accounts. It helps them do their job better. Here is an interesting story about a journalist who joined Facebook and got a story within 20 minutes of signing up.

The journalist noticed some interesting activity on the Facebook page of a bioethicist columnist, like a curious status line about having the worst month ever, a job that ended abruptly, and references to lawyers. This got the journalist curious. He started digging and scooped a story.

Have you noticed all the realtors who are on Facebook (and linkedin and twitter) now? Similar deal – these social networking sites are good places to get the jump when folks start moving, companies start downsizing, or organizations start hiring.

These social networking sites are changing the way people find and share information. I am doing research on the effect of twitter on brands. Finding out folks use twitter to find and share information.

Read the complete Journalists Using Facebook for Scoops article.

Wednesday, November 12, 2008

Rimm-Kaufman, Seach Engine Marketing Firm



Had a site visit to Rimm-Kaufman, a search engine marketing firm in Albemarle County / Charlottesville, Va.

Interesting that sometimes one doesn’t know the expertise in ones own backyard, as I was not familiar with Rimm-Kaufman until a story appeared in our local paper, the Daily Progress.

I have a serious interest in search engine marketing, especially in the keyword advertising area, so I really enjoy seeing what is going on at the front lines. I view this area not as advertising, but another form of information searching.

The company was founded by Alan Rimm-Kaufman and George Michie, and the company does a focused brand of search engine marketing. They seem to have a special expertise in cross channel marketing.

What I found most interesting at the firm was the focus on research. Alan has a PhD from MIT, is a regular speaker at industry events, is marred to an academic, and is an affiliate professor at the University of Virginia (UVA) business colleges, McIntire School of Commerce and Darden Graduate School of Business Administration.

George has degrees in physics and math from Macalester College and a master’s degree in political science from the UVA. George also lectures at the McIntire School of Commerce.

So, with this academic background, there is a high level of interest in research, testing, evaluation, and learning. Really innovative and a special outlook for a firm in this sector.

The company is in good place geographically. With the educational facilities of UVA and James Madison nearby, there are three business schools within a stone's throw of the company. And, no other competition in the area, so employee turn over is minimal.

Really educational visit for me and much appreciated.

Plus, they had this nice welcome sign out for me! :-)