I just had a chance to re-read the Postal Codes page on the wiki*.
Michael, you've done some great work, the letter seems very compelling now! http://civicaccess.ca/wiki/PostalCodes What tangible forms of support are we asking from political parties (and the others listed)? What organization, if any, should spearhead this? -Daniel. * That was prompted by makepovertyhistory's latest action alert, which could have been made more effective by personalizing with MP names and phone numbers. You can see a copy of it here: http://www.makepovertyhistory.ca/e/take-action/e-alerts/2007-03-05.html |
Off-topic, but might be relevant to some other later project... Daniel Haran wrote: > * That was prompted by makepovertyhistory's latest action alert, which > could have been made more effective by personalizing with MP names and > phone numbers. You can see a copy of it here: > http://www.makepovertyhistory.ca/e/take-action/e-alerts/2007-03-05.html Sometimes it isn't a CivicAccess issue, but other timing/funding issues that cause these things. There are other reasons why that specific e-alert pointed to parl.gc.ca rather than using the already purchased postal-code --> EDID database. We have 308 MPs (plus the Minister for Public Works) with information about each, and I don't know of any group that is maintaining a table of information about these MPs. While we at MPH keep the name and email address updated in our database, we didn't keep the phone number or constituency office updated which is one of the things we were wanting people to look up. Does anyone know of a structured WIKI, something that would allow a group to collaboratively maintain a table of information? -- Russell McOrmond, Internet Consultant: <http://www.flora.ca/> Please help us tell the Canadian Parliament to protect our property rights as owners of Information Technology. Sign the petition! http://www.digital-copyright.ca/petition/ict/ "The government, lobbied by legacy copyright holders and hardware manufacturers, can pry my camcorder, computer, home theatre, or portable media player from my cold dead hands!" |
Hello,
I made some progress trying to get some information from parl.gc.ca. The following may only be of interest to techies... After extracting the list of MP codes from http://webinfo.parl.gc.ca/MembersOfParliament/MainMPsCompleteList.aspx?TimePeriod=Current&Language=E I tried getting more information from individual MP pages. Two scraping kits later, after REXML choked on XPath queries and various other tech horrors (take a look at the source... __VIEWSTATE weighs in at 8k, even tidy can't parse it, etc), I decided to resort to Dapper. E.g.: http://webinfo.parl.gc.ca/MembersOfParliament/ProfileMP.aspx?Key=78902&Language=E => http://www.dapper.net/RunDapp?dappName=CanadianMPdetails&v=1&variableArg_0=78902 I'll use some regexps to clean up what I couldn't get Dapper to extract, and publish the whole as a db and a RESTful web service so no one else ever need go through this. Let me know if this is useful and/or if anything is missing. -Daniel. On 3/6/07, Russell McOrmond <[hidden email]> wrote: > > > Off-topic, but might be relevant to some other later project... > > Daniel Haran wrote: > > * That was prompted by makepovertyhistory's latest action alert, which > > could have been made more effective by personalizing with MP names and > > phone numbers. You can see a copy of it here: > > http://www.makepovertyhistory.ca/e/take-action/e-alerts/2007-03-05.html > > Sometimes it isn't a CivicAccess issue, but other timing/funding > issues that cause these things. There are other reasons why that > specific e-alert pointed to parl.gc.ca rather than using the already > purchased postal-code --> EDID database. > > > We have 308 MPs (plus the Minister for Public Works) with information > about each, and I don't know of any group that is maintaining a table of > information about these MPs. While we at MPH keep the name and email > address updated in our database, we didn't keep the phone number or > constituency office updated which is one of the things we were wanting > people to look up. > > > Does anyone know of a structured WIKI, something that would allow a > group to collaboratively maintain a table of information? > > -- > Russell McOrmond, Internet Consultant: <http://www.flora.ca/> > Please help us tell the Canadian Parliament to protect our property > rights as owners of Information Technology. Sign the petition! > http://www.digital-copyright.ca/petition/ict/ > > "The government, lobbied by legacy copyright holders and hardware > manufacturers, can pry my camcorder, computer, home theatre, or > portable media player from my cold dead hands!" > > _______________________________________________ > CivicAccess-discuss mailing list > [hidden email] > http://civicaccess.ca/mailman/listinfo/civicaccess-discuss_civicaccess.ca > -- Change the world one loan at a time - visit Kiva.org to find out how |
In case the non computer geeks are feeling left out - here's a mini translation.
In the email below - "scraping": "Screen scraping is a technique in which a computer program extracts text data from the display output of another program.... The program doing the scraping is called a screen scraper... There are a number of synonyms for screen scraping, including: Data scraping, data extraction, web scraping, page scraping, web page wrapping and HTML scraping (the last four being specific to scraping web pages)." It basically means creating a program that automatically grabs websites and extracting their information and putting it in a database. So that you can do more useful stuff with it. HowdtheyVote does that with the parliamentary Hansards. Daniel also referred to Dappit / Dapper. It's a web service that tries to make it easier to make screen scrapers. Regexp means "regular expressions". It's geek talk for sorting through files of text to look fro certain things. So Daniel is talking about downloading the webpages from Parliament, and searching through them for specific information which he then stores in a database. This info is all on the Tech page on our wiki: http://civicaccess.ca/wiki/Tech There's some other stuff that he's talking about - but it's a bit out of my area. and it's not absolutely necessary for everyone to undertand all of it - as long as we get the gist of using tools like scrappers to collect / liberate civic info. Thanks to Daniel for sharing this. Non-techies - please don't be scared off. We need your experience + expertise if we're ever going to get anywhere with this stuff. On 3/8/07, Daniel Haran <[hidden email]> wrote: > Hello, > > I made some progress trying to get some information from parl.gc.ca. > The following may only be of interest to techies... > > After extracting the list of MP codes from > http://webinfo.parl.gc.ca/MembersOfParliament/MainMPsCompleteList.aspx?TimePeriod=Current&Language=E > > I tried getting more information from individual MP pages. Two > scraping kits later, after REXML choked on XPath queries and various > other tech horrors (take a look at the source... __VIEWSTATE weighs in > at 8k, even tidy can't parse it, etc), I decided to resort to Dapper. > E.g.: > > http://webinfo.parl.gc.ca/MembersOfParliament/ProfileMP.aspx?Key=78902&Language=E > => > http://www.dapper.net/RunDapp?dappName=CanadianMPdetails&v=1&variableArg_0=78902 > > I'll use some regexps to clean up what I couldn't get Dapper to > extract, and publish the whole as a db and a RESTful web service so no > one else ever need go through this. > > Let me know if this is useful and/or if anything is missing. > > -Daniel. > > On 3/6/07, Russell McOrmond <[hidden email]> wrote: > > > > > > Off-topic, but might be relevant to some other later project... > > > > Daniel Haran wrote: > > > * That was prompted by makepovertyhistory's latest action alert, which > > > could have been made more effective by personalizing with MP names and > > > phone numbers. You can see a copy of it here: > > > http://www.makepovertyhistory.ca/e/take-action/e-alerts/2007-03-05.html > > > > Sometimes it isn't a CivicAccess issue, but other timing/funding > > issues that cause these things. There are other reasons why that > > specific e-alert pointed to parl.gc.ca rather than using the already > > purchased postal-code --> EDID database. > > > > > > We have 308 MPs (plus the Minister for Public Works) with information > > about each, and I don't know of any group that is maintaining a table of > > information about these MPs. While we at MPH keep the name and email > > address updated in our database, we didn't keep the phone number or > > constituency office updated which is one of the things we were wanting > > people to look up. > > > > > > Does anyone know of a structured WIKI, something that would allow a > > group to collaboratively maintain a table of information? > > > > -- > > Russell McOrmond, Internet Consultant: <http://www.flora.ca/> > > Please help us tell the Canadian Parliament to protect our property > > rights as owners of Information Technology. Sign the petition! > > http://www.digital-copyright.ca/petition/ict/ > > > > "The government, lobbied by legacy copyright holders and hardware > > manufacturers, can pry my camcorder, computer, home theatre, or > > portable media player from my cold dead hands!" > > > > _______________________________________________ > > CivicAccess-discuss mailing list > > [hidden email] > > http://civicaccess.ca/mailman/listinfo/civicaccess-discuss_civicaccess.ca > > > > > -- > Change the world one loan at a time - visit Kiva.org to find out how > > _______________________________________________ > CivicAccess-discuss mailing list > [hidden email] > http://civicaccess.ca/mailman/listinfo/civicaccess-discuss_civicaccess.ca > |
Hi all,
Thanks to Michael for translating what I did :) Here's the initial result: http://lokobo.com:3000/mps Could Russell and others let me know if there is anything obviously missing (Party affiliations aren't being recognized and the addresses are a mess)? In what format should I make the database available? Is anyone going to use this? The main objective I had was to get constituency phone numbers for each MP, since that could have been useful for Make Poverty History. I've no affiliation - it just bugs me to see an advocacy group not being quite as effective as they could because information is disorganized. And boy, is it EVER disorganized. One MP, Deepak Obhrai, doesn't even have his constituency phone listed on the government website: http://webinfo.parl.gc.ca/MembersOfParliament/ProfileMP.aspx?Key=78365&Language=E I have a small number of technical issues I'd like to resolve. However the software that did the scraping and runs the website and web service is in the public domain on rubyforge: http://rubyforge.org/projects/mp-ca-scraper/ (stats aren't updated on the main page, but the source IS there:) svn checkout svn://rubyforge.org/var/svn/mp-ca-scraper/trunk scraper Web pages are available for human and computer consumption: http://lokobo.com:3000/mps http://lokobo.com:3000/mps.xml MP pages are indexed by EDID - Electoral District ID: http://lokobo.com:3000/mps/35049 http://lokobo.com:3000/mps/35049.xml Cheers, Daniel. On 3/8/07, Michael Lenczner <[hidden email]> wrote: > In case the non computer geeks are feeling left out - here's a mini translation. > > In the email below - "scraping": > "Screen scraping is a technique in which a computer program extracts > text data from the display output of another program.... The program > doing the scraping is called a screen scraper... There are a number > of synonyms for screen scraping, including: Data scraping, data > extraction, web scraping, page scraping, web page wrapping and HTML > scraping (the last four being specific to scraping web pages)." > |
Free forum by Nabble | Edit this page |