Login  Register

Re: postal codes campaign - next steps?

Posted by Michael Lenczner on Mar 08, 2007; 8:56pm
URL: http://civicaccess.416.s1.nabble.com/postal-codes-campaign-next-steps-tp932p939.html

In case the non computer geeks are feeling left out - here's a mini translation.

In the email below - "scraping":
"Screen scraping is a technique in which a computer program extracts
text data from the display output of another program.... The program
doing the scraping is called a screen scraper...   There are a number
of synonyms for screen scraping, including: Data scraping, data
extraction, web scraping, page scraping, web page wrapping and HTML
scraping (the last four being specific to scraping web pages)."

It basically means creating a program that automatically grabs
websites and extracting their information and putting it in a
database.  So that you can do more useful stuff with it.

HowdtheyVote does that with the parliamentary Hansards.

Daniel also referred to Dappit / Dapper.  It's a web service that
tries to make it easier to make screen scrapers.

Regexp means "regular expressions".  It's geek talk for sorting
through files of text to look fro certain things.

So Daniel is talking about downloading the webpages from Parliament,
and searching through them for specific information which he then
stores in a database.

This info is all on the Tech page on our wiki:
 http://civicaccess.ca/wiki/Tech

There's some other stuff that he's talking about - but it's a bit out
of my area.  and it's not absolutely necessary for everyone to
undertand all of it - as long as we get the gist of using tools like
scrappers to collect / liberate civic info.

Thanks to Daniel for sharing this.  Non-techies - please don't be
scared off.  We need your experience + expertise if we're ever going
to get anywhere with this stuff.



On 3/8/07, Daniel Haran <[hidden email]> wrote:

> Hello,
>
> I made some progress trying to get some information from parl.gc.ca.
> The following may only be of interest to techies...
>
> After extracting the list of MP codes from
> http://webinfo.parl.gc.ca/MembersOfParliament/MainMPsCompleteList.aspx?TimePeriod=Current&Language=E
>
> I tried getting more information from individual MP pages. Two
> scraping kits later, after REXML choked on XPath queries and various
> other tech horrors (take a look at the source... __VIEWSTATE weighs in
> at 8k, even tidy can't parse it, etc), I decided to resort to Dapper.
> E.g.:
>
> http://webinfo.parl.gc.ca/MembersOfParliament/ProfileMP.aspx?Key=78902&Language=E
> =>
> http://www.dapper.net/RunDapp?dappName=CanadianMPdetails&v=1&variableArg_0=78902
>
> I'll use some regexps to clean up what I couldn't get Dapper to
> extract, and publish the whole as a db and a RESTful web service so no
> one else ever need go through this.
>
> Let me know if this is useful and/or if anything is missing.
>
> -Daniel.
>
> On 3/6/07, Russell McOrmond <[hidden email]> wrote:
> >
> >
> > Off-topic, but might be relevant to some other later project...
> >
> > Daniel Haran wrote:
> > > * That was prompted by makepovertyhistory's latest action alert, which
> > > could have been made more effective by personalizing with MP names and
> > > phone numbers. You can see a copy of it here:
> > > http://www.makepovertyhistory.ca/e/take-action/e-alerts/2007-03-05.html
> >
> >    Sometimes it isn't a CivicAccess issue, but other timing/funding
> > issues that cause these things.  There are other reasons why that
> > specific e-alert pointed to parl.gc.ca rather than using the already
> > purchased postal-code --> EDID database.
> >
> >
> >    We have 308 MPs (plus the Minister for Public Works) with information
> > about each, and I don't know of any group that is maintaining a table of
> > information about these MPs.   While we at MPH keep the name and email
> > address updated in our database, we didn't keep the phone number or
> > constituency office updated which is one of the things we were wanting
> > people to look up.
> >
> >
> >    Does anyone know of a structured WIKI, something that would allow a
> > group to collaboratively maintain a table of information?
> >
> > --
> >   Russell McOrmond, Internet Consultant: <http://www.flora.ca/>
> >   Please help us tell the Canadian Parliament to protect our property
> >   rights as owners of Information Technology. Sign the petition!
> >   http://www.digital-copyright.ca/petition/ict/
> >
> >   "The government, lobbied by legacy copyright holders and hardware
> >    manufacturers, can pry my camcorder, computer, home theatre, or
> >    portable media player from my cold dead hands!"
> >
> > _______________________________________________
> > CivicAccess-discuss mailing list
> > [hidden email]
> > http://civicaccess.ca/mailman/listinfo/civicaccess-discuss_civicaccess.ca
> >
>
>
> --
> Change the world one loan at a time - visit Kiva.org to find out how
>
> _______________________________________________
> CivicAccess-discuss mailing list
> [hidden email]
> http://civicaccess.ca/mailman/listinfo/civicaccess-discuss_civicaccess.ca
>