Scraping election results..

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Scraping election results..

Russell McOrmond-2

   Curious if anyone is taking
http://webinfo.parl.gc.ca/MembersOfParliament/MainCandidatesCompleteList.aspx?TimePeriod=Current&Language=E
and turning it into a database?  I'm otherwise going to spend some time
on it.

   The link to the riding has the electoral district in the resulting
page (as part of the link to elections Canada)

   The link to the winning candidates points to full information for
those who are incumbents (email, websites, etc).

--
  Russell McOrmond, Internet Consultant: <http://www.flora.ca/>
  Please help us tell the Canadian Parliament to protect our property
  rights as owners of Information Technology. Sign the petition!
  http://www.digital-copyright.ca/petition/ict/

  "The government, lobbied by legacy copyright holders and hardware
   manufacturers, can pry my camcorder, computer, home theatre, or
   portable media player from my cold dead hands!"

Reply | Threaded
Open this post in threaded view
|

Re: Scraping election results..

Daniel Haran
Robin scraped some candidate data:
http://cancan.waglo.com/dataface/

For elected candidates, I wrote
http://rubyforge.org/projects/mp-ca-scraper/ last year, which may
still be useful.

On Wed, Oct 15, 2008 at 10:38 AM, Russell McOrmond <[hidden email]> wrote:

>
>  Curious if anyone is taking
> http://webinfo.parl.gc.ca/MembersOfParliament/MainCandidatesCompleteList.aspx?TimePeriod=Current&Language=E
> and turning it into a database?  I'm otherwise going to spend some time on
> it.
>
>  The link to the riding has the electoral district in the resulting page (as
> part of the link to elections Canada)
>
>  The link to the winning candidates points to full information for those who
> are incumbents (email, websites, etc).
>
> --
>  Russell McOrmond, Internet Consultant: <http://www.flora.ca/>
>  Please help us tell the Canadian Parliament to protect our property
>  rights as owners of Information Technology. Sign the petition!
>  http://www.digital-copyright.ca/petition/ict/
>
>  "The government, lobbied by legacy copyright holders and hardware
>  manufacturers, can pry my camcorder, computer, home theatre, or
>  portable media player from my cold dead hands!"
> _______________________________________________
> CivicAccess-discuss mailing list
> [hidden email]
> http://lists.pwd.ca/mailman/listinfo/civicaccess-discuss
>

Reply | Threaded
Open this post in threaded view
|

Re: Scraping election results..

Russell McOrmond-2

Daniel Haran wrote:
> Robin scraped some candidate data:
> http://cancan.waglo.com/dataface/
>
> For elected candidates, I wrote
> http://rubyforge.org/projects/mp-ca-scraper/ last year, which may
> still be useful.

   I hate to admit this in public, but I've never managed to get a rails
application running.  In my case I don't want to run a web application,
I want to extract the data and import it into an SQL database.

   The page format for
http://webinfo.parl.gc.ca/MembersOfParliament/MainCandidatesCompleteList.aspx?TimePeriod=Current&Language=E 
is similar to
http://webinfo.parl.gc.ca/MembersOfParliament/MainMPsCompleteList.aspx?TimePeriod=Historical&Language=E 
, except for the column "Election Result" which says "Defeated" or
"Elected".

   Not to get into any language wars/etc, but I may look into writing
something in either PHP or Perl for those of us old folks who can't
figure Ruby/Rails out.

--
  Russell McOrmond, Internet Consultant: <http://www.flora.ca/>
  Please help us tell the Canadian Parliament to protect our property
  rights as owners of Information Technology. Sign the petition!
  http://www.digital-copyright.ca/petition/ict/

  "The government, lobbied by legacy copyright holders and hardware
   manufacturers, can pry my camcorder, computer, home theatre, or
   portable media player from my cold dead hands!"

Reply | Threaded
Open this post in threaded view
|

Re: Scraping election results..

Robin Millette
Le Wed, 15 Oct 2008 11:02:40 -0400,
Russell McOrmond <[hidden email]> a écrit :

> I want to extract the data and import it into an SQL database.

I can put an hour on this right away. Will take my mind off our brand new minority government. I'll do it with PHP parsing the HTML as XML (thanks to Tidy) to make it easier.

>    The page format for
> http://webinfo.parl.gc.ca/MembersOfParliament/MainCandidatesCompleteList.aspx?TimePeriod=Current&Language=E 
> is similar to
> http://webinfo.parl.gc.ca/MembersOfParliament/MainMPsCompleteList.aspx?TimePeriod=Historical&Language=E 
> , except for the column "Election Result" which says "Defeated" or
> "Elected".

--
Robin

Reply | Threaded
Open this post in threaded view
|

Re: Scraping election results..

Jennifer Bell
In reply to this post by Russell McOrmond-2
There's a tab separated file on Elections Canada already:
http://enr.elections.ca/

But it lists all candidates, with a vote count.  I've done a little script that parses it and figured out the top candidate per riding.

Interestingly, the file says there have been no (0) spoiled ballots in the entire country.  Either we're a very well behaved people, or there's a reporting issue.

Jennifer


--- On Wed, 10/15/08, Russell McOrmond <[hidden email]> wrote:

> From: Russell McOrmond <[hidden email]>
> Subject: Re: [CivicAccess-discuss] Scraping election results..
> To: "civicaccess discuss" <[hidden email]>
> Received: Wednesday, October 15, 2008, 11:02 AM
> Daniel Haran wrote:
> > Robin scraped some candidate data:
> > http://cancan.waglo.com/dataface/
> >
> > For elected candidates, I wrote
> > http://rubyforge.org/projects/mp-ca-scraper/ last
> year, which may
> > still be useful.
>
>    I hate to admit this in public, but I've never
> managed to get a rails
> application running.  In my case I don't want to run a
> web application,
> I want to extract the data and import it into an SQL
> database.
>
>    The page format for
> http://webinfo.parl.gc.ca/MembersOfParliament/MainCandidatesCompleteList.aspx?TimePeriod=Current&Language=E
>
> is similar to
> http://webinfo.parl.gc.ca/MembersOfParliament/MainMPsCompleteList.aspx?TimePeriod=Historical&Language=E
>
> , except for the column "Election Result" which
> says "Defeated" or
> "Elected".
>
>    Not to get into any language wars/etc, but I may look
> into writing
> something in either PHP or Perl for those of us old folks
> who can't
> figure Ruby/Rails out.
>
> --
>   Russell McOrmond, Internet Consultant:
> <http://www.flora.ca/>
>   Please help us tell the Canadian Parliament to protect
> our property
>   rights as owners of Information Technology. Sign the
> petition!
>   http://www.digital-copyright.ca/petition/ict/
>
>   "The government, lobbied by legacy copyright holders
> and hardware
>    manufacturers, can pry my camcorder, computer, home
> theatre, or
>    portable media player from my cold dead hands!"
> _______________________________________________
> CivicAccess-discuss mailing list
> [hidden email]
> http://lists.pwd.ca/mailman/listinfo/civicaccess-discuss


      __________________________________________________________________
Yahoo! Canada Toolbar: Search from anywhere on the web, and bookmark your favourite sites. Download it now at
http://ca.toolbar.yahoo.com.

Reply | Threaded
Open this post in threaded view
|

Re: Scraping election results..

Robin Millette
In reply to this post by Russell McOrmond-2
Le Wed, 15 Oct 2008 10:38:26 -0400,
Russell McOrmond <[hidden email]> a écrit :

>
>    Curious if anyone is taking
> http://webinfo.parl.gc.ca/MembersOfParliament/MainCandidatesCompleteList.aspx?TimePeriod=Current&Language=E
> and turning it into a database?  I'm otherwise going to spend some time
> on it.

http://rym.waglo.com/elections-2008-canada-results.csv

I didn't include URLs, perhaps I should have...

PHP code is here:
http://rym.waglo.com/elections-2008-canada-results.phps

--
Robin

Reply | Threaded
Open this post in threaded view
|

Re: Scraping election results..

Robin Millette
In reply to this post by Jennifer Bell
Le Wed, 15 Oct 2008 08:34:23 -0700 (PDT),
Jennifer Bell <[hidden email]> a écrit :

> There's a tab separated file on Elections Canada already:
> http://enr.elections.ca/
>
> But it lists all candidates, with a vote count.  I've done a little script that parses it and figured out the top candidate per riding.

That looks good, who knew we could expect so much from our own government :)

I should have peeked before wasting 30 minutes on scraping, oh well.

--
Robin

Reply | Threaded
Open this post in threaded view
|

Re: Scraping election results..

Robin Millette
In reply to this post by Jennifer Bell
Le Wed, 15 Oct 2008 08:34:23 -0700 (PDT),
Jennifer Bell <[hidden email]> a écrit :

> There's a tab separated file on Elections Canada already:
> http://enr.elections.ca/

I incorporated that into Cancan and mentionned it on my blog:
http://rym.waglo.com/wordpress/2008/10/15/results-des-elections-sur-cancan/

Not pretty, yet. But hey ;)

--
Robin


Reply | Threaded
Open this post in threaded view
|

Re: Scraping election results..

Russell McOrmond-2
In reply to this post by Robin Millette
Robin Millette wrote:
> I didn't include URLs, perhaps I should have...

   Thanks for the sample -- I've not seen SimpleXMLElement used before.

Here is the foreach() loop, grabbing the URL's.

foreach ($z as $a) {
         $data = array();
         $data[] = trim((string)$a->td[0], "\n");
         $data[] = trim((string)$a->td[1]->a, "\n");
         $temp=($a->td[1]->a->attributes());
         $data[] = trim($temp['href']);
         $data[] = trim((string)$a->td[2]->a, "\n");
         $temp=($a->td[2]->a->attributes());
         $data[] = trim($temp['href']);
         $data[] = trim((string)$a->td[3], "\n");
         $data[] = trim((string)$a->td[4], "\n");

     $data2 = array_map('quotes', $data);
     echo join(',', $data2) . "\n";

}


   Enough playing around for today.  What I'm wanting to do is have
something handy which I can point at pages like this that will
automatically grab the pages at the URL's and parse them as well to add
data like the EDID, email, phone, website information for MPs.

--
  Russell McOrmond, Internet Consultant: <http://www.flora.ca/>
  Please help us tell the Canadian Parliament to protect our property
  rights as owners of Information Technology. Sign the petition!
  http://www.digital-copyright.ca/petition/ict/

  "The government, lobbied by legacy copyright holders and hardware
   manufacturers, can pry my camcorder, computer, home theatre, or
   portable media player from my cold dead hands!"

Reply | Threaded
Open this post in threaded view
|

Re: Scraping election results..

Ilona Dougherty
Hey all,

Just a note to say that if anyone can give me a sense of how Apathy  
is Boring might be able to use these election results tools in our  
work, I would be really happy to promote that this info is available.

Let me know what you think might be relevant to our audience.

Ilona


On 15-Oct-08, at 3:03 PM, Russell McOrmond wrote:

> Robin Millette wrote:
>> I didn't include URLs, perhaps I should have...
>
>   Thanks for the sample -- I've not seen SimpleXMLElement used before.
>
> Here is the foreach() loop, grabbing the URL's.
>
> foreach ($z as $a) {
>         $data = array();
>         $data[] = trim((string)$a->td[0], "\n");
>         $data[] = trim((string)$a->td[1]->a, "\n");
>         $temp=($a->td[1]->a->attributes());
>         $data[] = trim($temp['href']);
>         $data[] = trim((string)$a->td[2]->a, "\n");
>         $temp=($a->td[2]->a->attributes());
>         $data[] = trim($temp['href']);
>         $data[] = trim((string)$a->td[3], "\n");
>         $data[] = trim((string)$a->td[4], "\n");
>
>     $data2 = array_map('quotes', $data);
>     echo join(',', $data2) . "\n";
>
> }
>
>
>   Enough playing around for today.  What I'm wanting to do is have  
> something handy which I can point at pages like this that will  
> automatically grab the pages at the URL's and parse them as well to  
> add data like the EDID, email, phone, website information for MPs.
>
> --
>  Russell McOrmond, Internet Consultant: <http://www.flora.ca/>
>  Please help us tell the Canadian Parliament to protect our property
>  rights as owners of Information Technology. Sign the petition!
>  http://www.digital-copyright.ca/petition/ict/
>
>  "The government, lobbied by legacy copyright holders and hardware
>   manufacturers, can pry my camcorder, computer, home theatre, or
>   portable media player from my cold dead hands!"
> _______________________________________________
> CivicAccess-discuss mailing list
> [hidden email]
> http://lists.pwd.ca/mailman/listinfo/civicaccess-discuss
>


Reply | Threaded
Open this post in threaded view
|

Re: Scraping election results..

Russell McOrmond-2
Ilona Dougherty wrote:
> Hey all,
>
> Just a note to say that if anyone can give me a sense of how Apathy is
> Boring might be able to use these election results tools in our work, I
> would be really happy to promote that this info is available.


   Later on the poll-by-poll results will be available.  I really want
to have someone look at the polling stations that serve University
students and see if voter turnout is as bad as we are told (IE: the
lowest of any  definable demographic in Canada).

   I bet there are ridings where a high University turnout could have
changed the outcome -- if only students could be convinced to get
involved and that their voice matters!


--
  Russell McOrmond, Internet Consultant: <http://www.flora.ca/>
  Please help us tell the Canadian Parliament to protect our property
  rights as owners of Information Technology. Sign the petition!
  http://www.digital-copyright.ca/petition/ict/

  "The government, lobbied by legacy copyright holders and hardware
   manufacturers, can pry my camcorder, computer, home theatre, or
   portable media player from my cold dead hands!"