Birth of an Index
Popdex came about because an inflated technological ego thought that a current-events news & link spider could be built in a few days. A month later, after many Mountain Dew filled nights and weekends, an index was born.
Not having added anything “new” to what was out there already, the next step was incorporating the popularity of linking sites into the rankings. Thus if websites A and B are extremely popular, and link to site C, then site C is given more weight in the rankings than a site linked to by sites with smaller numbers of inbound links.
A score is computed hourly and the rankings are updated, with the highest possible ranking out of 100 (like a percentage). I call this technology PopScore.
The technology
The site uses the LAMP architecture (Linux, Apache, MySQL, Perl and PHP). 100% pure Open Source baby! The crawler is written in Perl, with some interfaces to MySQL done in PHP with XML as the messaging protocol.
The architecture
I wanted to make a distributed, scalable architecture. The primary client crawler that grabs URLs and checks for updates is written in Perl. But all clients have a centralized interface to the database through a PHP web endpoint. This script handles creating sessions and distributing URLs to the clients for them to crawl in an orderly fashion.
The client crawler has been tested with Perl on Linux and Windows. This allows me to distribute the load arbitrarily among any number of machines, so long as they have a Perl interpreter loaded. It should also save on bandwidth costs once I get a real web host!
I feel the need, the need for speed
PHP pages that serve the main content are compiled into static HTML every hour (for now), so the site should be wicked fast. Popular searches are cached and the results are refreshed often. Cached query results are served from static HTML and so should be extremely fast. That’s about it!
I must make a disclaimer that this site is just a side hobby. I am happily employed and cannot devote more attention to this site than to my employer (Gotta pay the bills, right?). I graduated with a BS in Computer Science from Washington University in St. Louis.
Contact info
Feedback? Flames? Suggestions? Send them to me at:
popdex (at) gmail.com
Popularity Index: unranked



