search
Carter Cole LinkedInCarters Twitter PageCarter Cole on Facebook Carter Coles RSS

Sunday, October 24, 2010

Proxies proxies everywhere and all for free

SEO Site Tools has gone into an intensive development kick working to get everything ready for v3.0 framework upgrade and more features like export and reporting as well as more metrics and integration to some other services (Google Analytics, Webmaster Tools), but some of this data cant be gathered by the extensions and IP based API rate limit how much data you can pull, so thats where proxies come in, they let you mascaraed as other IPs from other places... I need a big list of proxies to do the dirty work and by modifying a few open source projects ive built my list to 1505 open proxies in just a couple days. Heres how my dealie works...

First i took an open source uptime monitor and created my own plugin to test HTTP proxies using cUrl and PHP

Heres my proxy checker plugin code...

I found it could only connect out port 80 (firewall i guess) so I built my script to only check those it finds with valid ports

heres a shot of my dashboard
1505 proxies up what what
I changed the code that runs the template so i can pass an API parameter to pull xml or json proxy list and status as well as hacking some code so my proxy scraper could add items (and password protection)... oh heres a sample return from my proxy tester API (returns XML for simple parsing)
That lets me simplexml_load_string to test and see what level proxy it is... they go something like

  1. Elite Proxy, connection looks like a regular client
  2. Anonymous Proxy, no ip is forworded but target site could still tell it's a proxy
  3. Transparent Proxy, ip is forworded and target site would be able to tell it's a proxy
and are graded by how and what headers are returned by the proxy (ill upload my proxy judge code later if anyone wants it) so then to make the process even easier i decided i would use some crowd sourcing techniques, i made a script to scrape proxies from text using regex and then made a page to test proxies... then i test the proxies and add the good ones to my database :) i also found a few proxy lists on google and setup a cron job to run and scrape their proxies every few hours...

if you have any questions feel free to bother me... thats what im here for

1 remarks:

Post a Comment

Link to this post if you found it usefull

Proxies proxies everywhere and all for free