All right, since a few people said they would actually use this, I
decided to try extracting the list of MPs by postal code. Tools like the lobby module in Drupal try to extract the information from the page - an arduous task since the web page is a *mess*. Already having the list of MPs, I only needed to extract the email address. If I have time this week-end, it will go inside the mp-scraper, with a REST interface for anyone to use. Here's the code for you geeks: require 'rubygems' require 'hpricot' require 'open-uri' #postal_code = 'A1A 1A1'.gsub(/ /, '') postal_code = 'H1T4C6'.gsub(/ /, '') doc = Hpricot(open('http://www.parl.gc.ca/information/about/people/house/PostalCode.asp?Language=E&txtPostalCode='+postal_code)) emails = (doc/"h4").select {|e| e.innerHTML =~ /Parliament/}.collect {|e| e.next_sibling.innerHTML.match(/ (.*@parl\.gc\.ca)/)[1]} mps = Mp.find_all_by_email(emails) Sample output on H1T4C6: >>emails => ["[hidden email]", "[hidden email]", "[hidden email]", "[hidden email]", "[hidden email]"] Cheerio, Daniel. -- Change the world one loan at a time - visit Kiva.org to find out how |
MP search by postal code is now live:
http://lokobo.com:3000/ e.g.: http://lokobo.com:3000/mps;search?postal_code=A1A1A1 http://lokobo.com:3000/mps;search?postal_code=H1T4C6 Results are 'cached' - a copy of each search is copied into a database to avoid having to retrieve the information again. Cached searches are about 100 times faster, so the more people use this service, the better it is for everyone :) Please let me know if you use it and/or encounter any problems. Thanks! -Daniel. |
Fabulous work, Daniel!
Not that there are a lot of political reporters who are handy with a database, but for those of us that are, this should be a helpful little dataset for all sorts of projects as we approach a possible federal election. Thanks! On 3/24/07, Daniel Haran <[hidden email]> wrote: > MP search by postal code is now live: > > http://lokobo.com:3000/ > > e.g.: > http://lokobo.com:3000/mps;search?postal_code=A1A1A1 > http://lokobo.com:3000/mps;search?postal_code=H1T4C6 > > Results are 'cached' - a copy of each search is copied into a database > to avoid having to retrieve the information again. Cached searches are > about 100 times faster, so the more people use this service, the > better it is for everyone :) -- David Akin ------------------- http://www.davidakin.com |
Daniel: superb. Is the database open (& legal!) for all to use? ...
David: the idea (I hope) is that other citizen groups etc can build on that database to make more user-friendly tools. 1st step is getting the data available in a open format; 2nd step is using it in creative & useful ways. On Mar 24, 2007, at 4:15 PM, David Akin wrote: > Fabulous work, Daniel! > > Not that there are a lot of political reporters who are handy with a > database, but for those of us that are, this should be a helpful > little dataset for all sorts of projects as we approach a possible > federal election. > > Thanks! > > On 3/24/07, Daniel Haran <[hidden email]> wrote: >> MP search by postal code is now live: >> >> http://lokobo.com:3000/ >> >> e.g.: >> http://lokobo.com:3000/mps;search?postal_code=A1A1A1 >> http://lokobo.com:3000/mps;search?postal_code=H1T4C6 >> >> Results are 'cached' - a copy of each search is copied into a >> database >> to avoid having to retrieve the information again. Cached searches >> are >> about 100 times faster, so the more people use this service, the >> better it is for everyone :) > > > > -- > David Akin > ------------------- > http://www.davidakin.com > > _______________________________________________ > CivicAccess-discuss mailing list > [hidden email] > http://civicaccess.ca/mailman/listinfo/civicaccess- > discuss_civicaccess.ca |
On 3/24/07, Hugh McGuire <[hidden email]> wrote:
> Daniel: superb. Is the database open (& legal!) for all to use? ... Thanks! As far as I'm concerned, yes. As for the government, I hope so. :) The list of MPs clearly ought to be in the public domain. I'm only caching results from the postal code lookups, and not attempting to build a complete database, so I believe I'm in the clear for that too. Daniel. |
> I'm only
> caching results from the postal code lookups, and not attempting to > build a complete database, so I believe I'm in the clear for that too. OK so the project to build an open, free database of ridings v postal codes is still desirable? |
In reply to this post by Daniel Haran
>> Daniel: superb. Is the database open (& legal!) for all to use? ... >> > Thanks! As far as I'm concerned, yes. As for the government, I hope so. > > :) > > The list of MPs clearly ought to be in the public domain. I'm only > caching results from the postal code lookups, and not attempting to > build a complete database, so I believe I'm in the clear for that too. > It would be nice if we could use this data, but I don't think that this is legal. Given that the data costs thousands to acquire, scraping it from their website would not be considered fair play. Perhaps we should contact the site and ask for a clarification on their use restrictions. In actuality, we should be lobbying for the release of this data under more accessible licensing terms from the people that make it. This was our original intent, no? Cory. |
On 3/24/07, Cory Horner <[hidden email]> wrote:
> It would be nice if we could use this data, but I don't think that this > is legal. Given that the data costs thousands to acquire, scraping it > from their website would not be considered fair play. Perhaps we should > contact the site and ask for a clarification on their use restrictions. I tend to think this is ok. The data isn't copyright itself, only it's disposition and grouping and layout, and we're not using that. I would be comfortable using this new database and sharing it with anyone, I would suggest making it public domain for now even and take some more time to come up with a licence to use. > In actuality, we should be lobbying for the release of this data under > more accessible licensing terms from the people that make it. This was > our original intent, no? I think Canadians can do both. If we can get it officially, then great. It's also why we need to explore licencing issues a bit more. -- Robin 'oqp' Millette : http://rym.waglo.com/ Bande-Passante : http://bande-passante.info/ SQIL 2007 : http://2007.sqil.info/ |
In reply to this post by Hugh McGuire
Hugh McGuire wrote:
>> I'm only >> caching results from the postal code lookups, and not attempting to >> build a complete database, so I believe I'm in the clear for that too. > OK so the project to build an open, free database of ridings v postal > codes is still desirable? Yes, this is still needed. Screen scraping is a very ugly kludge that doesn't solve the underlying technical, legal or political problems. Elections Canada deliberately tries to break screen scraping, and has randomly changes the method they use over recent years to kill screen scraping tools (Drupal's Lobby module, the ECTOOLS tool that I used in the past, etc). http://sourceforge.net/projects/campaigntoolz/ ECTools was a PHP system which used XML-RPC to split a caching server that would screen scrape from a small client which would do a database lookup. The idea was to have many sites using the client, and one site running the caching server. This never really worked well as the screen scraping kept breaking and it was very hard to keep it up-to-date. http://campaigntoolz.cvs.sourceforge.net/campaigntoolz/ectools/ The parl.gc.ca site won't work during the election, or at least it was shut down in the past. This would work if there was a secondary database that converted the incumbent MP link into an Electoral District that could then be used as the index against the current candidates database. Expect parl.gc.ca to follow elections.ca to shutting things down if screen scraping becomes common. We need this information to be released directly with a clear open license so that it can be shared and imported without the problems that screen scraping has (IE: may work this moment, may be dead a minute from now). While IANAL, I believe this screen scraping is a clear copyright infringement, but one where the copyright holder is quite unlikely to sue for infringement. The bad politics of Elections Canada or the Library of Parliament sueing someone for screen scraping this data could even be a win for us as the data is then made public legally. One-time screen scraping like the collection of the contact and other information for MPs is different in that we can scrape, verify the data, and publish the results as has been done. We don't need to rely on the parl.gc.ca site letting us in tomorrow as we already have the relevant information today. Note: During elections Elections Canada releases a database of all candidates, which is what makes sites that list candidates so easy. It is unfortunate that parl.gc.ca doesn't already do this for sitting MPs (more reliable than screen scraping), and that the postal code database needs to be released. Some interesting stuff from Elections Canada to be aware of, especially if we are heading into an election (Possibly over the Clean Air Act since the budget won't be an issue). Final List of Confirmed Candidates – 39th General Election (This is live updated during the election, and can be imported directly into a database) http://www.elections.ca/content.asp?section=pas&document=index&dir=39ge/loc&lang=e&textonly=false Here is a tool I wrote to allow people to browse this database (with additions of websites/email contact that was done by our community during the election) http://www.digital-copyright.ca/election2006/candidates Official Voting Results of the 39th General Election – Poll-by-Poll Results – Raw Data http://www.elections.ca/scripts/resval/ovr_39ge.asp?prov=&lang=e ... -- Russell McOrmond, Internet Consultant: <http://www.flora.ca/> Please help us tell the Canadian Parliament to protect our property rights as owners of Information Technology. Sign the petition! http://www.digital-copyright.ca/petition/ict/ "The government, lobbied by legacy copyright holders and hardware manufacturers, can pry my camcorder, computer, home theatre, or portable media player from my cold dead hands!" |
In reply to this post by Daniel Haran
i just tried and it is wonderful!
The only issue i can think of, it the look, feel and findability. General or novice web users may not find this service and may not be able to relate to the tool because of how it looks. It is perfect for most of us on this list but perhaps not for the general public. I can see this tool being very powerful particularly if there are elections coming up. I will send it around to some advocacy groups who may also find it useful. Cheers t Daniel Haran wrote: > MP search by postal code is now live: > > http://lokobo.com:3000/ > > e.g.: > http://lokobo.com:3000/mps;search?postal_code=A1A1A1 > http://lokobo.com:3000/mps;search?postal_code=H1T4C6 > > Results are 'cached' - a copy of each search is copied into a database > to avoid having to retrieve the information again. Cached searches are > about 100 times faster, so the more people use this service, the > better it is for everyone :) > > Please let me know if you use it and/or encounter any problems. Thanks! > > -Daniel. > > _______________________________________________ > CivicAccess-discuss mailing list > [hidden email] > http://civicaccess.ca/mailman/listinfo/civicaccess-discuss_civicaccess.ca > > > |
Hi all,
Thanks all for the great feedback. The site as is only demonstrates data access and redistribution under various formats. My hope is that others will use it in their projects and offer a foretaste of what civic benefits we could get from freeing this data. Please get in touch with me if you need help integrating this, or have any questions. There are a few ways in which this data can be obtained legally. ->Government finally stops duplicating its work and releases the data in the public domain. By far the best alternative, although it's going to take an organized (and louder) political effort to get them to notice us. ->Canada Post could allow derivative products. Ages ago I inquired with Canada Post about using one of their data products and was told they did not see a web service as redistribution, which significantly changes the licensing cost. Would someone check with them? A polygon file of all the postal codes would be enough to generate the correspondence database since the MP polygons are already public domain. -> StatsCan allows a web service to operate under a single license Just as Canada Post didn't count a web service as redistribution, maybe they'll compromise. --- My preference would be for government to wake up and smell the roses. In the meantime, if enough groups were to rely on a web service perhaps we could perhaps obtain a common license as a stop-gap measure. While others pursue rigorously legal avenues, I'm happy to work with code and reveal the grey areas. I believe both roles are needed at this stage. Daniel. On 3/26/07, Tracey P. Lauriault <[hidden email]> wrote: > i just tried and it is wonderful! > > The only issue i can think of, it the look, feel and findability. > General or novice web users may not find this service and may not be > able to relate to the tool because of how it looks. It is perfect for > most of us on this list but perhaps not for the general public. I can > see this tool being very powerful particularly if there are elections > coming up. > > I will send it around to some advocacy groups who may also find it useful. > > Cheers > t |
I understand better! Thanks!
I sent it out to some advocacy groups this morning and well see what they do and say! Daniel Haran wrote: > Hi all, > > Thanks all for the great feedback. > > The site as is only demonstrates data access and redistribution under > various formats. My hope is that others will use it in their projects > and offer a foretaste of what civic benefits we could get from freeing > this data. Please get in touch with me if you need help integrating > this, or have any questions. > > There are a few ways in which this data can be obtained legally. > ->Government finally stops duplicating its work and releases the data > in the public domain. > > By far the best alternative, although it's going to take an organized > (and louder) political effort to get them to notice us. > > ->Canada Post could allow derivative products. > > Ages ago I inquired with Canada Post about using one of their data > products and was told they did not see a web service as > redistribution, which significantly changes the licensing cost. Would > someone check with them? A polygon file of all the postal codes would > be enough to generate the correspondence database since the MP > polygons are already public domain. > > -> StatsCan allows a web service to operate under a single license > > Just as Canada Post didn't count a web service as redistribution, > maybe they'll compromise. > > --- > > My preference would be for government to wake up and smell the roses. > In the meantime, if enough groups were to rely on a web service > perhaps we could perhaps obtain a common license as a stop-gap > measure. > > While others pursue rigorously legal avenues, I'm happy to work with > code and reveal the grey areas. I believe both roles are needed at > this stage. > > Daniel. > > On 3/26/07, Tracey P. Lauriault <[hidden email]> wrote: > >> i just tried and it is wonderful! >> >> The only issue i can think of, it the look, feel and findability. >> General or novice web users may not find this service and may not be >> able to relate to the tool because of how it looks. It is perfect for >> most of us on this list but perhaps not for the general public. I can >> see this tool being very powerful particularly if there are elections >> coming up. >> >> I will send it around to some advocacy groups who may also find it useful. >> >> Cheers >> t >> > > _______________________________________________ > CivicAccess-discuss mailing list > [hidden email] > http://civicaccess.ca/mailman/listinfo/civicaccess-discuss_civicaccess.ca > > > |
In reply to this post by Robin Millette
Robin Millette wrote:
> I tend to think this is ok. The data isn't copyright itself, only > it's disposition and grouping and layout, and we're not using that. Please be careful here. If you take source material that is under copyright, and manipulate it such that it is no longer the same work (Remix, etc), this doesn't mean that the new work is no longer a copyright infringement. In Canada, "original" databases are protected under copyright, but the Federal Court of Appeal has held that "non-original" databases are not protected. There was discussion as part of the Section 92 report and the 2001 consultation about whether non-original databases should also receive the copyright monopoly. http://strategis.ic.gc.ca/epic/site/crp-prda.nsf/en/rp00872e.html#A1_4 Database protection was not a big part of the process, given larger issues such as legal protection for TPMs and other digital issues were seen as more critical. The lines aren't always black-and-white, although I think in this case if there was a court case that we would easily loose as the cache and the use of the screen scraping isn't even a remixing but a simple use of the Crown Copyright database of postal-code to MPs. IANAL, TINLA, but I want people to be very careful and not blindly believe that screen scraping and remixing avoids any copyright questions. > I would be comfortable using this new database and sharing it with > anyone, I would suggest making it public domain for now even and take > some more time to come up with a licence to use. Dedicating things to the public domain will also make it easier to win in the "court of public opinion", where applying a strong license (such as a CopyLeft/ShareAlike) could backfire. -- Russell McOrmond, Internet Consultant: <http://www.flora.ca/> Please help us tell the Canadian Parliament to protect our property rights as owners of Information Technology. Sign the petition! http://www.digital-copyright.ca/petition/ict/ "The government, lobbied by legacy copyright holders and hardware manufacturers, can pry my camcorder, computer, home theatre, or portable media player from my cold dead hands!" |
In reply to this post by Daniel Haran
Daniel Haran wrote:
> Ages ago I inquired with Canada Post about using one of their data > products and was told they did not see a web service as > redistribution, which significantly changes the licensing cost. I am wondering if someone has the time to do the footwork to check with Statistics Canada on the PCFRF file. If it turns out that a web service would not be a problem, then I can talk to the people at Make Poverty History about setting up a web service for this type of thing. I'd envision some XML-RPC type of service, similar to the ECTOOLS scripts at http://sourceforge.net/projects/campaigntoolz/ . They may be willing in exchange for credit for running the web service, which can hopefully drive more 'human' traffic to their site that would then generate more letters to MPs/etc. (I would do it, but I'm a bit over-booked for the next little while..) -- Russell McOrmond, Internet Consultant: <http://www.flora.ca/> Please help us tell the Canadian Parliament to protect our property rights as owners of Information Technology. Sign the petition! http://www.digital-copyright.ca/petition/ict/ "The government, lobbied by legacy copyright holders and hardware manufacturers, can pry my camcorder, computer, home theatre, or portable media player from my cold dead hands!" |
In reply to this post by Russell McOrmond-2
On 3/26/07, Russell McOrmond <[hidden email]> wrote:
> The lines aren't always black-and-white, although I think in this case > if there was a court case that we would easily loose as the cache and > the use of the screen scraping isn't even a remixing but a simple use of > the Crown Copyright database of postal-code to MPs. IANAL, TINLA, but I > want people to be very careful and not blindly believe that screen > scraping and remixing avoids any copyright questions. Oh, I support 100% the notion that laws should be changed, etc. Scraping is just a stopgap measure. I really think we need to do both, that is, to get the word out why it's important we have legitimate access to this data. That's the main mission of COACID, no? -- Robin 'oqp' Millette : http://rym.waglo.com/ Bande-Passante : http://bande-passante.info/ SQIL 2007 : http://2007.sqil.info/ |
In reply to this post by Daniel Haran
absolutely!
Tracey P. Lauriault Geomatics and Cartographic Research Centre Department of Geography and Environmental Studies Carleton University Ottawa (ON) K1S 5B6 Canada [hidden email] https://gcrc.carleton.ca/confluence/display/GCRCWEB/Lauriault On Tue Mar 27 11:16 , 'Robin Millette' sent: On 3/26/07, Russell McOrmond <<a href="javascript:top.opencompose('russell@flora.ca','','','')">russell@...> wrote: |
In reply to this post by Robin Millette
Robin Millette wrote: > Oh, I support 100% the notion that laws should be changed, etc. > Scraping is just a stopgap measure. I really think we need to do both, > that is, to get the word out why it's important we have legitimate > access to this data. That's the main mission of COACID, no? Some of us are not in the position to make use of stopgap measures that possibly push the legal envelope. For instance, I think I would have a hard time going to the coalition members of Make Poverty History with a proposal to use data that might (or might not) infringe Crown Copyright. Given many of their members receiving government funding, they wouldn't be interested in participating in pushing that envelope. In the case of Elections Canada releasing their version of the postal code-->EDID mapping, no laws need to change. This is a simple policy decision on their part to make this database consistent with the public releasing that they are already doing for things like detailed election results (after elections) and candidate lists (during elections). Note that I believe that the PCFRF product from Statistics Canada may be a distraction for our goal. Whether Statistics Canada has this data or not as part of a larger data product shouldn't discourage Elections Canada from freely releasing this information. BTW: Those who wanted to investigate possible inconsistencies in the data between government agencies may want to look at the additional PCFRF query results I added to http://www.digital-copyright.ca/node/1607#comment-1667 I provide examples of postal codes that map to multiple electoral district. There are 2 of these that have 6 matches, 11 that have 5 matches, and I provide a sample of 10 that have between 2 and 4 matches. -- Russell McOrmond, Internet Consultant: <http://www.flora.ca/> Please help us tell the Canadian Parliament to protect our property rights as owners of Information Technology. Sign the petition! http://www.digital-copyright.ca/petition/ict/ "The government, lobbied by legacy copyright holders and hardware manufacturers, can pry my camcorder, computer, home theatre, or portable media player from my cold dead hands!" |
Free forum by Nabble | Edit this page |