Login  Register

Re: Canadian Postal Code list

Posted by Daniel Haran on Sep 23, 2008; 10:35pm
URL: http://civicaccess.416.s1.nabble.com/Canadian-Postal-Code-list-tp1258p1263.html

On Tue, Sep 23, 2008 at 5:46 PM, Tracey P. Lauriault <[hidden email]> wrote:
> Some was asking me what page scraping means.  Could you explain - in sorta
> lay person terms?

Scraping is a way to extract structured information from websites.
Let's use my next project as an example.

The list of 813,358 postal codes is now public. I am writing software
that will go to a political party's website, submit the form to 'find
your candidate' and save the resulting page. Then I'll write another
small bit of software that reads each page, finds the electoral
district id, and outputs a single line:
<postal_code>,<district_id>

813,358 pages, one resulting file with as many lines.

Because of the large number of requests, compiling the data can take a
very long time. Getting one page per second, it would still take 9.4
days to get this data file.

I hope that helps... I may be in too deep to offer a good lay person's
explanation :)

d.