New, faster in-house content analysis tool
March 8th, 2006, 1:07
Shortly after we posted the "We've been Techcrunched" post, we fixed the slowlyness of our in-house content analysis tool. We didn't announce it right away because well, yesterday was a really busy day in all counts.|
In terms of speed, the improvements are amazing. We tested a 20 posts feed (200 words per post average) with our previous engine and it took about 75 seconds to process - along with putting up quite some stress on the server, nothing significant on its own, but it would be if we were to do that for 100 feeds simultaneously.
Then we tested the same feed with our new engine and it took less than 9 seconds!! That's including the time to connect to the feed and fetch it, which on average can take between 1 an 4 seconds, depending on how fast the other server responds. Ok, now that's a lot better. In fact, we believe that time-wise it beats Yahoo's API, but being fair, Yahoo's API also does things that our content analysis tool doesn't.
We had to make one small sacrifice though. And that's to ignore, as we analyze the content and select terms and words, all words of three characters or less, unless they contain at least one capital letter. In other words, if ZoomClouds updates your blog/feed and finds the word "drm" 25 times, it will ignore it anyway (unless you've added that word to your list of "wanted terms" of course). However, if it finds DRM, Drm, dRM, etc (you get the picture), then it will consider it. Two and one letter words are and were ignored before.
Other than that, the quality of our content analysis is exactly the same as before, but now it's over 8 times faster. Why didn't we do this before launch??? Ah...
By the way, would you be interested to know what are the three more popular terms in ZoomClouds clouds so far? The winner is blog, followed by Google and then podcast. That's interesting :-)
Looking for something?
Join by email
Other Zoom Sites