CivicAccess

Canadian Postal Code list

Classic

List

Threaded

8 messages Options

Daniel Haran

Sep 23, 2008; 8:33pm

Canadian Postal Code list

Hi folks,

It took about 2 hours of work, but I have a list of 813358 postal
codes. That must include some old ones.

If anyone wants a copy, it's available with either:
http://s3.amazonaws.com/danielharan/postal_codes.txt.gz
http://s3.amazonaws.com/danielharan/postal_codes.txt.gz?torrent

Now the work on scraping the corresponding EDIDs can begin.

Cheers,

d.

Tracey P. Lauriault

Sep 23, 2008; 9:46pm

Re: Canadian Postal Code list

Some was asking me what page scraping means. Could you explain - in sorta lay person terms?

cheers
t

On Tue, Sep 23, 2008 at 4:33 PM, Daniel Haran <[hidden email]> wrote:

Hi folks,

It took about 2 hours of work, but I have a list of 813358 postal
codes. That must include some old ones.

If anyone wants a copy, it's available with either:
http://s3.amazonaws.com/danielharan/postal_codes.txt.gz
http://s3.amazonaws.com/danielharan/postal_codes.txt.gz?torrent

Now the work on scraping the corresponding EDIDs can begin.

Cheers,

d.
_______________________________________________
CivicAccess-discuss mailing list
[hidden email]
http://lists.pwd.ca/mailman/listinfo/civicaccess-discuss

... [show rest of quote]

--
Tracey P. Lauriault
613-234-2805
https://gcrc.carleton.ca/confluence/display/GCRCWEB/Lauriault

Daniel Haran

Sep 23, 2008; 10:35pm

Re: Canadian Postal Code list

On Tue, Sep 23, 2008 at 5:46 PM, Tracey P. Lauriault <[hidden email]> wrote:
> Some was asking me what page scraping means. Could you explain - in sorta
> lay person terms?

Scraping is a way to extract structured information from websites.
Let's use my next project as an example.

The list of 813,358 postal codes is now public. I am writing software
that will go to a political party's website, submit the form to 'find
your candidate' and save the resulting page. Then I'll write another
small bit of software that reads each page, finds the electoral
district id, and outputs a single line:
<postal_code>,<district_id>

813,358 pages, one resulting file with as many lines.

Because of the large number of requests, compiling the data can take a
very long time. Getting one page per second, it would still take 9.4
days to get this data file.

I hope that helps... I may be in too deep to offer a good lay person's
explanation :)

d.

Russell McOrmond-2

Sep 23, 2008; 11:50pm

Re: Canadian Postal Code list

In reply to this post by Daniel Haran

Daniel Haran wrote:
> Hi folks,
>
> It took about 2 hours of work, but I have a list of 813358 postal
> codes. That must include some old ones.

The number of codes in the August 2007 PCFRF database is 813666, and
801,340 from the February 2005 version.

That suggests to me that what you have is pretty darn complete!

--
Russell McOrmond, Internet Consultant: <http://www.flora.ca/>
Please help us tell the Canadian Parliament to protect our property
rights as owners of Information Technology. Sign the petition!
http://www.digital-copyright.ca/petition/ict/

"The government, lobbied by legacy copyright holders and hardware
manufacturers, can pry my camcorder, computer, home theatre, or
portable media player from my cold dead hands!"

Russell McOrmond-2

Sep 23, 2008; 11:56pm

Re: Canadian Postal Code list

In reply to this post by Tracey P. Lauriault

Tracey P. Lauriault wrote:
> Some was asking me what page scraping means. Could you explain - in
> sorta lay person terms?

A computer automated cut-and-paste where what page you go to is
automated, and what piece of information you try to learn from the
resulting page is automated.

--
Russell McOrmond, Internet Consultant: <http://www.flora.ca/>
Please help us tell the Canadian Parliament to protect our property
rights as owners of Information Technology. Sign the petition!
http://www.digital-copyright.ca/petition/ict/

"The government, lobbied by legacy copyright holders and hardware
manufacturers, can pry my camcorder, computer, home theatre, or
portable media player from my cold dead hands!"

Robin Millette

Sep 24, 2008; 12:02am

What page scraping means

Le Tue, 23 Sep 2008 19:56:55 -0400,
Russell McOrmond <[hidden email]> a écrit :

> Tracey P. Lauriault wrote:
> > Some was asking me what page scraping means. Could you explain - in
> > sorta lay person terms?
>
> A computer automated cut-and-paste where what page you go to is
> automated, and what piece of information you try to learn from the
> resulting page is automated.

When it's _really_ automated, it's called a feed or a microformat. It's called scaping because it usually also involves manual labor to get the job done right, as HTML pages are often modified with no regards to its semantic value.

--
Robin

Russell McOrmond-2

Sep 24, 2008; 1:58am

Re: What page scraping means

Robin Millette wrote:

> Le Tue, 23 Sep 2008 19:56:55 -0400, Russell McOrmond
> <[hidden email]> a écrit :
>
>> Tracey P. Lauriault wrote:
>>> Some was asking me what page scraping means. Could you explain -
>>> in sorta lay person terms?
>> A computer automated cut-and-paste where what page you go to is
>> automated, and what piece of information you try to learn from the
>> resulting page is automated.
>
> When it's _really_ automated, it's called a feed or a microformat.
> It's called scaping because it usually also involves manual labor to
> get the job done right, as HTML pages are often modified with no
> regards to its semantic value.

Aren't definitions fun. The difference in my mind between a
feed/microformat and 'scraping' is whether the relevant output format
was designed to be human readable (IE: html) or machine readable (XML,
csv, etc). Whether there is manual labour is unrelated in my mind.

It is often called "screen scraping" from back in the days that a
screen of information was drawn, and then we tried to pull information
from that screen based on the location of information. Re-intepreting
HTML like we are doing here is a bit different, but we are still talking
about taking a page intended to be read by a human (rendered by a
browers) and instead interpret it as data as input to a
program/database/etc.

--
Russell McOrmond, Internet Consultant: <http://www.flora.ca/>
Please help us tell the Canadian Parliament to protect our property
rights as owners of Information Technology. Sign the petition!
http://www.digital-copyright.ca/petition/ict/

"The government, lobbied by legacy copyright holders and hardware
manufacturers, can pry my camcorder, computer, home theatre, or
portable media player from my cold dead hands!"

Tracey P. Lauriault

Sep 24, 2008; 2:35am

Re: Canadian Postal Code list

In reply to this post by Russell McOrmond-2

Thanks gang!

On Tue, Sep 23, 2008 at 7:56 PM, Russell McOrmond <[hidden email]> wrote:

Tracey P. Lauriault wrote:

Some was asking me what page scraping means. Could you explain - in sorta lay person terms?

A computer automated cut-and-paste where what page you go to is automated, and what piece of information you try to learn from the resulting page is automated.

--
Russell McOrmond, Internet Consultant: <http://www.flora.ca/>
Please help us tell the Canadian Parliament to protect our property
rights as owners of Information Technology. Sign the petition!
http://www.digital-copyright.ca/petition/ict/

"The government, lobbied by legacy copyright holders and hardware
manufacturers, can pry my camcorder, computer, home theatre, or
portable media player from my cold dead hands!"
_______________________________________________
CivicAccess-discuss mailing list
[hidden email]
http://lists.pwd.ca/mailman/listinfo/civicaccess-discuss

... [show rest of quote]

--
Tracey P. Lauriault
613-234-2805
https://gcrc.carleton.ca/confluence/display/GCRCWEB/Lauriault