In a nutshell, URL Catcher is an irssi plugin that watches channels for URLs and then stores them in a database.
The irssi script is written in perl and makes use of DBI to connect to a database of choice (I'm using MySQL).
Settings are stored in an XML file and read in by XML::Simple.
The web UI is written in PHP and VERY much a work in progress.
Come look at the source and ticket tracker at http://dev.urlcatcher.org/
The TODO list (this might be a little long as the ideas keep coming in (mostlty from no_pants):
- Pagination for Raw Data page.Sort of done - raw data is now displayed 1 day at a time.
- Propper modulasiation
- User login
- Tags - added by logged in users
- When recording an URL, fetch the target's title (if there is one) - how should this work for multipul URLs on one line?
- Fetch a copy of images when URL has the extension jpg/jpeg/png/gif/bmp/svg
- With YouTube URLs, fetch a copy of the image displayed before the video is played
- Fetch a copy of pdf and flash files?
- Work flow for copyrighted material to be removed upon request
- Correct the order nicks (and other fields) are listed in the raw filter
- Set up SVN repo.
- Get other people interested and involved :)
If you are interested in running the code yourself or would like to help code on this project, please contact me