Canadian Postal Code list

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Canadian Postal Code list

Daniel Haran
Hi folks,

It took about 2 hours of work, but I have a list of 813358 postal
codes. That must include some old ones.

If anyone wants a copy, it's available with either:
  http://s3.amazonaws.com/danielharan/postal_codes.txt.gz
  http://s3.amazonaws.com/danielharan/postal_codes.txt.gz?torrent

Now the work on scraping the corresponding EDIDs can begin.

Cheers,

d.

Reply | Threaded
Open this post in threaded view
|

Re: Canadian Postal Code list

Tracey P. Lauriault
Some was asking me what page scraping means.  Could you explain - in sorta lay person terms?

cheers
t

On Tue, Sep 23, 2008 at 4:33 PM, Daniel Haran <[hidden email]> wrote:
Hi folks,

It took about 2 hours of work, but I have a list of 813358 postal
codes. That must include some old ones.

If anyone wants a copy, it's available with either:
 http://s3.amazonaws.com/danielharan/postal_codes.txt.gz
 http://s3.amazonaws.com/danielharan/postal_codes.txt.gz?torrent

Now the work on scraping the corresponding EDIDs can begin.

Cheers,

d.
_______________________________________________
CivicAccess-discuss mailing list
[hidden email]
http://lists.pwd.ca/mailman/listinfo/civicaccess-discuss



--
Tracey P. Lauriault
613-234-2805
https://gcrc.carleton.ca/confluence/display/GCRCWEB/Lauriault
Reply | Threaded
Open this post in threaded view
|

Re: Canadian Postal Code list

Daniel Haran
On Tue, Sep 23, 2008 at 5:46 PM, Tracey P. Lauriault <[hidden email]> wrote:
> Some was asking me what page scraping means.  Could you explain - in sorta
> lay person terms?

Scraping is a way to extract structured information from websites.
Let's use my next project as an example.

The list of 813,358 postal codes is now public. I am writing software
that will go to a political party's website, submit the form to 'find
your candidate' and save the resulting page. Then I'll write another
small bit of software that reads each page, finds the electoral
district id, and outputs a single line:
<postal_code>,<district_id>

813,358 pages, one resulting file with as many lines.

Because of the large number of requests, compiling the data can take a
very long time. Getting one page per second, it would still take 9.4
days to get this data file.

I hope that helps... I may be in too deep to offer a good lay person's
explanation :)

d.

Reply | Threaded
Open this post in threaded view
|

Re: Canadian Postal Code list

Russell McOrmond-2
In reply to this post by Daniel Haran

Daniel Haran wrote:
> Hi folks,
>
> It took about 2 hours of work, but I have a list of 813358 postal
> codes. That must include some old ones.

   The number of codes in the August 2007 PCFRF database is 813666, and
801,340 from the February 2005 version.

   That suggests to me that what you have is pretty darn complete!

--
  Russell McOrmond, Internet Consultant: <http://www.flora.ca/>
  Please help us tell the Canadian Parliament to protect our property
  rights as owners of Information Technology. Sign the petition!
  http://www.digital-copyright.ca/petition/ict/

  "The government, lobbied by legacy copyright holders and hardware
   manufacturers, can pry my camcorder, computer, home theatre, or
   portable media player from my cold dead hands!"

Reply | Threaded
Open this post in threaded view
|

Re: Canadian Postal Code list

Russell McOrmond-2
In reply to this post by Tracey P. Lauriault
Tracey P. Lauriault wrote:
> Some was asking me what page scraping means.  Could you explain - in
> sorta lay person terms?

   A computer automated cut-and-paste where what page you go to is
automated, and what piece of information you try to learn from the
resulting page is automated.

--
  Russell McOrmond, Internet Consultant: <http://www.flora.ca/>
  Please help us tell the Canadian Parliament to protect our property
  rights as owners of Information Technology. Sign the petition!
  http://www.digital-copyright.ca/petition/ict/

  "The government, lobbied by legacy copyright holders and hardware
   manufacturers, can pry my camcorder, computer, home theatre, or
   portable media player from my cold dead hands!"

Reply | Threaded
Open this post in threaded view
|

What page scraping means

Robin Millette
Le Tue, 23 Sep 2008 19:56:55 -0400,
Russell McOrmond <[hidden email]> a écrit :

> Tracey P. Lauriault wrote:
> > Some was asking me what page scraping means.  Could you explain - in
> > sorta lay person terms?
>
>    A computer automated cut-and-paste where what page you go to is
> automated, and what piece of information you try to learn from the
> resulting page is automated.

When it's _really_ automated, it's called a feed or a microformat. It's called scaping because it usually also involves manual labor to get the job done right, as HTML pages are often modified with no regards to its semantic value.

--
Robin

Reply | Threaded
Open this post in threaded view
|

Re: What page scraping means

Russell McOrmond-2
Robin Millette wrote:

> Le Tue, 23 Sep 2008 19:56:55 -0400, Russell McOrmond
> <[hidden email]> a écrit :
>
>> Tracey P. Lauriault wrote:
>>> Some was asking me what page scraping means.  Could you explain -
>>> in sorta lay person terms?
>> A computer automated cut-and-paste where what page you go to is
>> automated, and what piece of information you try to learn from the
>>  resulting page is automated.
>
> When it's _really_ automated, it's called a feed or a microformat.
> It's called scaping because it usually also involves manual labor to
> get the job done right, as HTML pages are often modified with no
> regards to its semantic value.


   Aren't definitions fun.  The difference in my mind between a
feed/microformat and 'scraping' is whether the relevant output format
was designed to be human readable (IE: html) or machine readable (XML,
csv, etc).  Whether there is manual labour is unrelated in my mind.

   It is often called "screen scraping" from back in the days that a
screen of information was drawn, and then we tried to pull information
from that screen based on the location of information.   Re-intepreting
HTML like we are doing here is a bit different, but we are still talking
about taking a page intended to be read by a human (rendered by a
browers) and instead interpret it as data as input to a
program/database/etc.

--
  Russell McOrmond, Internet Consultant: <http://www.flora.ca/>
  Please help us tell the Canadian Parliament to protect our property
  rights as owners of Information Technology. Sign the petition!
  http://www.digital-copyright.ca/petition/ict/

  "The government, lobbied by legacy copyright holders and hardware
   manufacturers, can pry my camcorder, computer, home theatre, or
   portable media player from my cold dead hands!"

Reply | Threaded
Open this post in threaded view
|

Re: Canadian Postal Code list

Tracey P. Lauriault
In reply to this post by Russell McOrmond-2
Thanks gang!

On Tue, Sep 23, 2008 at 7:56 PM, Russell McOrmond <[hidden email]> wrote:
Tracey P. Lauriault wrote:
Some was asking me what page scraping means.  Could you explain - in sorta lay person terms?

 A computer automated cut-and-paste where what page you go to is automated, and what piece of information you try to learn from the resulting page is automated.


--
 Russell McOrmond, Internet Consultant: <http://www.flora.ca/>
 Please help us tell the Canadian Parliament to protect our property
 rights as owners of Information Technology. Sign the petition!
 http://www.digital-copyright.ca/petition/ict/

 "The government, lobbied by legacy copyright holders and hardware
 manufacturers, can pry my camcorder, computer, home theatre, or
 portable media player from my cold dead hands!"
_______________________________________________



--
Tracey P. Lauriault
613-234-2805
https://gcrc.carleton.ca/confluence/display/GCRCWEB/Lauriault