[2008 feb 25]
I spend way too much time reading stupid blogs. With a lot of them, I'd really like a summary feed. To take a particularly egregious example, look at Gizmodo, which gets 50 or 60 posts per day. Most of it isn't terribly interesting (to me) – Nokia introduces a rhinestone-encrusted version of one of their cell phones, or whatever. I'm guessing I'd be perfectly happy to see four or five of those posts a day (on average). On top of that, Engadget has a similar traffic level, and is mostly (but not entirely) the same content as Gizmodo, so what I really want is four or five posts from both of those.

As a somewhat less frivolous note, the same issues apply to RSS feeds from news sites. A lot of what gets reported on BBC's "News Front Page" feed isn't particularly interesting (to me) – I couldn't care less about cricket scores. (And if I see one more article about Britney Spears, Paris Hilton, or OJ Simpson, I think I'll cry.) And there's quite a bit of overlap between, say, the BBC world news and the NYT world news. (Though in this case, it might be interesting to see both the BBC and the NYT version of the article.)

So I want something that merges multiple RSS feeds together, producing one feed with possibly multiple links per article (one to the BBC report, one to the NYT); and then I want something that filters out the uninteresting bits. I suppose some sort of Bayesian filtering (à la spam filters) might work, at least for the second part. I think I've seen research papers go by describing methods for automatically generating summaries of articles – that might be something else to look into. I also wonder if one of those "offshore personal assistant" services could do this kind of thing. Clipping services have been around forever (though my only experience was at a previous employer who subscribed to a clipping service which apparently just scanned for a few keywords and then blindly forwarded everything that matched – great for some applications, but I want something more sophisticated here).

There are some interesting issues here. I don't to just scan for keywords, positive or negative. If some celebutard gets arrested for drunk driving yet again, I really don't need to see it. On the other hand, if s/he does something spectacularly interesting (hey, it's theoretically possible), I suppose I'd want to know (to keep up with the pop culture references, if nothing else). More generally, it would be nifty to have some way of estimating how "new and different" something is (maybe easier for gadgets than for news articles?).

Tags: internet