CivicAccess

Toronto Sun: Toronto’s data open but almost useless

Classic

List

Threaded

9 messages Options

William Wolfe-Wylie

Toronto Sun: Toronto’s data open but almost useless

Hi everyone,

I have to say, I'm very happy that this piece is drawing so much
interesting and well-thought-out discussion. Ultimately that's what
every writer hopes for. But, as Karl invited me, I thought I'd weigh
in after reading all your points.
I think there's some heavy cultural biases infiltrating the discussion
from your side and my side. You're coming at open data from the
perspective of developers and engaged community members who have clear
ideas about the evolution and development of open data and open
government policies.

I'm coming at it as a technically-savvy-but-not-a-developer-guy who is
currently training a lot of journalists (read: extraordinarily
non-techie people) on how to use open data sites to write stories
about how their communities are run, are being developed and to ask
questions about those processes. This is a group of people for whom
shape files are completely foreign. CSV is even a stretch for these
guys. KML? What's that mean?

I think James came the closest to interpreting my article the way I
had intended. You guys are able to use this data. My skill level might
— and I emphasize "might" — allow me to write a small application that
would read one of Toronto's XML files and display that information in
a cute way. And it would take a lot of self-teaching to produce that
app. My colleagues consider me the uber-nerd who knows everything
there is to know about computers. The vast majority of Torontonians
are at the skill level of my colleagues.

Toronto's open data is admirable, and I mention that at several points
in the article. That they're re-shaping the mentality of city
departments to observe open data principles is nothing short of a
miracle in many cases. But there's a difference between making raw
data available and respecting the rights of people like my colleagues
to ask questions about their city.

I agree with you all, after reading your discussion, that my
comparison of accessible data through KML and CSV formats is an
inappropriate apples-and-oranges comparison in the context of true
open data. What I had meant to suggest was that the simultaneous
publication of the raw data useful to your community, along with more
easily accessible formats more appropriate for my community, would
make the information more universally readable to every
politically-curious citizen, not just those with computer training.
The more brains we have looking at data, after all, the better for
everyone.

I also agree with you that an increasing part of government's role in
society is community building. Part of that process will be preventing
communities from being isolated from each other. In this case, techies
need to be speaking to non-techies so they can listen to each others'
needs. As Michael noted, as well, "...particularly in Canada, where
there's no significant institutional/nonprofit presence around open
data..." Until that happens, I believe government should fill that
role and, if necessary, run basic conversions on the data before it's
published so the average person can interact with it. That was the
point of the article: that it is possible to build bridges that meet
the needs of both communities.

Thanks again for your thoughts, it's been enlightening.

William

----
William Wolfe-Wylie
@wolfewylie

Heather Morrison-2

Re: Toronto Sun: Toronto’s data open but almost useless

This discussion has indeed been enlightening; thanks to William Wolfe-
Wylie for his column and post to the list.

When a discussion like this brings out two perspectives, that doesn't
mean that one is right and the other wrong, or even that one is more
important than the other. We need governments to move towards open
data (both releasing what they have and planning on future data-
gathering with a view to open sharing). We also need governments to
provide information in a way that is accessible to the public.

Hiding data that is not readable for most of us helps no one.
Releasing it without making it readable for the public helps to
illustrate the potential for government information that is actually
understandable by the public. Seeing the possibility and the gap with
the reality is a great way to begin to inspire change for something
better. William's article is a great start on this path.

best,

Heather Morrison
http://pages.cmns.sfu.ca/heather-morrison/
[hidden email]

Tracey P. Lauriault

Re: Toronto Sun: Toronto’s data open but almost useless

Whooly great!

William, the first question that came to mind was, what stories do the people you are working with want to tell with data and what data in what formats do they need to tell those stories. For instance, journalists may not want to use the KML data anyway as it will not help them tell their stories irrespective of format.

William, do you think your gang can use xls files? What if there were instructions on how to use .csv etc.? What could make it easier?

In terms of shape files, there is not much to be done, as mapping has its own particular technologies that just are not yet user friendly, getting better but there is still a long way to go. Journalist do not become GIS experts, it takes time to do that, just like journalists do not become engineers. Sometimes people do both, but rarely. Even the NYTimes folks get it wrong sometimes and they have an entire visualization team working. There are some data that you just cannot remove from its technological context, GIS files are those, and potentially many of the KML files.

I work with close to 1000 users based in community groups, school boards, shelters, municipalities, etc. doing community based local research using demographic data (http://www.communitydata-donneescommunautaires.ca/). In their case, .xls and Beyond2020 which is a format StatCan uses works for them. B2020 has its issues, however, for complex multidimensional data, it is a relatively easy tool to use. It is however not open source but is free. Some in the 1000 are super users and will put data into other statistical analytical packages while the non users will remain in the real of descriptive stats and make great newsletters with charts and graphs in them. Different skills, different needs, different data and different tools.

Some of the NGOs do mapping, but very few do, as none of the mapping software is easy, they are expensive and there is some geography knowledge required to make useful and reliable maps. Just like there is some skill required to write a good investigative news piece. Most of the groups do not work in open source environments as the support is just not there for them yet, especially on the mapping side. They do work with standards based web stuff though. None would use KML data but they can use .csv and join those files into their ESRI, MAPINFO or GeoClip systems if they have the necessary base maps. Community groups know what .csv files are, and journalists can learn that in a second.

There are some interesting public health initiatives that are building in charts & mapping tools into their data sharing and information dissemination platforms. EMIS in Montreal - http://emis.santemontreal.qc.ca/ and CommunityView Collaboration in Saskatoon http://www.communityview.ca/. They create information products suited to the needs of their networks and users, not always open source, but interoperable, and because of StatCan licencing they cannot give all their data away unless there is some value added. The open data folks might not consider these to be open data products, I do consider them to be accessible and incredibly useful public policy products. Their mandate is to inform their network and the public about social inequity and health and hopefully change behaviour, therefore they take a more information dissemination/story telling approach as opposed to a raw data approach, albeit EMIS has more raw data available. In both cases they have super limited budgets, and we all know who highly regarded social policy is in Canada, and so they do the best they can with the resources they have. They have thousands of active users.

People give the Wellbeing Toronto http://map.toronto.ca/wellbeing/ initiative a hard time, however, it is meeting the needs of who it was intended for, the public, public policy officials, and community groups who want packaged information, who cannot work with raw data but who do need information disseminated this way to inform their work. Again, just open data people do not do social policy work, it takes time and skill to do that work, but those same people cannot be expected to be technically savvy either. Different specializations, different skills, needs and expertise. Also, because much of the data they have in the system is not all City of Toronto data they are bound by the data sharing agreements and licenses of data contributors. It is just the sad reality in Canada that our public institutions are so restrictive with social, demographic and health data.

You are doing a great service with helping people out. The UK Guardian already had teck savvy people in house but that is not the case for journalism in Canada generally. We at the Community Data Consortium spend lots of time doing capacity building and some NGOs are doing great community mapping work like the Social Planning Network of Ontario (http://ganis.spno.ca/) that built GIS capacity in 14 Ontario Social Planning Councils. Your journalist may want to touch base with them and collaborate on telling local stories, the Hamilton Specter did a great Code Red series that included stories, maps, charts and other data and collaborated with the Social Planning Council of Hamilton who had the data and the skills to help them out.

The Data Liberation Initiative (http://www.statcan.gc.ca/dli-ild/dli-idd-eng.htm) folks in research libraries in Canada are also another great resource for those who have a university in their town and who are students and faculty. Again, they are restricted by StatCan in what they can disseminate, but they do have lots of capacity building tools on their site.

Also, give a shout or email me and I can connect you with lots of local people doing social policy analysis with data who could help you out.

Cheers

Tracey

On Thu, Jul 7, 2011 at 10:15 PM, Heather Morrison <[hidden email]> wrote:

This discussion has indeed been enlightening; thanks to William Wolfe-Wylie for his column and post to the list.

When a discussion like this brings out two perspectives, that doesn't mean that one is right and the other wrong, or even that one is more important than the other. We need governments to move towards open data (both releasing what they have and planning on future data-gathering with a view to open sharing). We also need governments to provide information in a way that is accessible to the public.

Hiding data that is not readable for most of us helps no one. Releasing it without making it readable for the public helps to illustrate the potential for government information that is actually understandable by the public. Seeing the possibility and the gap with the reality is a great way to begin to inspire change for something better. William's article is a great start on this path.

best,

Heather Morrison
http://pages.cmns.sfu.ca/heather-morrison/
[hidden email]

_______________________________________________
CivicAccess-discuss mailing list
[hidden email]
http://lists.pwd.ca/mailman/listinfo/civicaccess-discuss

--
Tracey P. Lauriault
613-234-2805

http://traceyplauriault.ca/

David Eaves

Re: Toronto Sun: Toronto’s data open but almost useless

In reply to this post by William Wolfe-Wylie

William - think it is great you wrote that article. THank you for
commenting here as well.

Would love to hear more about the training you do for other journalists.

cheers,
dave

On 11-07-07 6:53 PM, William Wolfe-Wylie wrote:

> Hi everyone,
>
> I have to say, I'm very happy that this piece is drawing so much
> interesting and well-thought-out discussion. Ultimately that's what
> every writer hopes for. But, as Karl invited me, I thought I'd weigh
> in after reading all your points.
> I think there's some heavy cultural biases infiltrating the discussion
> from your side and my side. You're coming at open data from the
> perspective of developers and engaged community members who have clear
> ideas about the evolution and development of open data and open
> government policies.
>
> I'm coming at it as a technically-savvy-but-not-a-developer-guy who is
> currently training a lot of journalists (read: extraordinarily
> non-techie people) on how to use open data sites to write stories
> about how their communities are run, are being developed and to ask
> questions about those processes. This is a group of people for whom
> shape files are completely foreign. CSV is even a stretch for these
> guys. KML? What's that mean?
>
> I think James came the closest to interpreting my article the way I
> had intended. You guys are able to use this data. My skill level might
> — and I emphasize "might" — allow me to write a small application that
> would read one of Toronto's XML files and display that information in
> a cute way. And it would take a lot of self-teaching to produce that
> app. My colleagues consider me the uber-nerd who knows everything
> there is to know about computers. The vast majority of Torontonians
> are at the skill level of my colleagues.
>
> Toronto's open data is admirable, and I mention that at several points
> in the article. That they're re-shaping the mentality of city
> departments to observe open data principles is nothing short of a
> miracle in many cases. But there's a difference between making raw
> data available and respecting the rights of people like my colleagues
> to ask questions about their city.
>
> I agree with you all, after reading your discussion, that my
> comparison of accessible data through KML and CSV formats is an
> inappropriate apples-and-oranges comparison in the context of true
> open data. What I had meant to suggest was that the simultaneous
> publication of the raw data useful to your community, along with more
> easily accessible formats more appropriate for my community, would
> make the information more universally readable to every
> politically-curious citizen, not just those with computer training.
> The more brains we have looking at data, after all, the better for
> everyone.
>
> I also agree with you that an increasing part of government's role in
> society is community building. Part of that process will be preventing
> communities from being isolated from each other. In this case, techies
> need to be speaking to non-techies so they can listen to each others'
> needs. As Michael noted, as well, "...particularly in Canada, where
> there's no significant institutional/nonprofit presence around open
> data..." Until that happens, I believe government should fill that
> role and, if necessary, run basic conversions on the data before it's
> published so the average person can interact with it. That was the
> point of the article: that it is possible to build bridges that meet
> the needs of both communities.
>
> Thanks again for your thoughts, it's been enlightening.
>
> William
>
> ----
> William Wolfe-Wylie
> @wolfewylie
> _______________________________________________
> CivicAccess-discuss mailing list
> [hidden email]
> http://lists.pwd.ca/mailman/listinfo/civicaccess-discuss

William Wolfe-Wylie

Re: Toronto Sun: Toronto’s data open but almost useless

Thanks for the interest and responses, everyone. Journalists, as I'm
sure many of you are aware, are notorious for thinking they know more
about many subjects than they do (just read most economics journalism
for good examples). Data is no exception to the rule. Journalists are
knowledgeable in many fields, but experts in few. My goal is to make
them knowledgeable in in reading and analyzing city data files so they
can start to take advantage of this enormous resource in their
research and story idea generation.

Tracey, to answer your question: CSV and XLS are formats that they can
easily become familiar with and that's where we're starting our
efforts. Making a journalist GIS-competent is an enormous challenge
and one that I don't think we can start to tackle while newsrooms are
as frantic as they are. But what we can do is make the information
visible so that journalists — and everyday citizens — can start to ask
questions around that information (and yes, I recognize that it's at
this point in the data transformation process that we cross over into
information).

Journalists, who for most of the last century have been tasked with
organizing information for broad consumption, are now finding
themselves out of their element in the new tech-driven world. My job
is to try an ease that burden a little bit. We're starting with the
basics: Storify; Twitter; Facebook and search/curation tools
associated with those platforms. From there we're teaching them about
Fusion Tables, Google's Refine tool, Yahoo's Pipes tool and other
pre-packaged tools that require slightly more skill. Maybe one day
we'll get them using tools like QGIS to work with shape files as well.

All of this is a challenge. We're dealing with people who can't
remember their email passwords and think BlackBerry OS4 is cutting
edge. That's detrimental to our readers in many ways, makes their
research more difficult and prevents them from being able to ask
hard-hitting questions. We don't have time to wait for developers to
do the analyzing for us, we need to be able to grab it, read it, ask
about it, and write about it.

I'd be interested to hear from people on this list what the main
concerns are with converting shapefiles to KML. What kind of
information is lost in the conversion? How detrimental is that? What
about converting XML files that are commonly found on open data sites
to CSV? Or, again, am I falling into an apples-and-oranges comparison?

Thanks again, everyone.

William

On Fri, Jul 8, 2011 at 12:21 AM, David Eaves <[hidden email]> wrote:

> William - think it is great you wrote that article. THank you for commenting
> here as well.
>
> Would love to hear more about the training you do for other journalists.
>
> cheers,
> dave
>
> On 11-07-07 6:53 PM, William Wolfe-Wylie wrote:
>>
>> Hi everyone,
>>
>> I have to say, I'm very happy that this piece is drawing so much
>> interesting and well-thought-out discussion. Ultimately that's what
>> every writer hopes for. But, as Karl invited me, I thought I'd weigh
>> in after reading all your points.
>> I think there's some heavy cultural biases infiltrating the discussion
>> from your side and my side. You're coming at open data from the
>> perspective of developers and engaged community members who have clear
>> ideas about the evolution and development of open data and open
>> government policies.
>>
>> I'm coming at it as a technically-savvy-but-not-a-developer-guy who is
>> currently training a lot of journalists (read: extraordinarily
>> non-techie people) on how to use open data sites to write stories
>> about how their communities are run, are being developed and to ask
>> questions about those processes. This is a group of people for whom
>> shape files are completely foreign. CSV is even a stretch for these
>> guys. KML? What's that mean?
>>
>> I think James came the closest to interpreting my article the way I
>> had intended. You guys are able to use this data. My skill level might
>> — and I emphasize "might" — allow me to write a small application that
>> would read one of Toronto's XML files and display that information in
>> a cute way. And it would take a lot of self-teaching to produce that
>> app. My colleagues consider me the uber-nerd who knows everything
>> there is to know about computers. The vast majority of Torontonians
>> are at the skill level of my colleagues.
>>
>> Toronto's open data is admirable, and I mention that at several points
>> in the article. That they're re-shaping the mentality of city
>> departments to observe open data principles is nothing short of a
>> miracle in many cases. But there's a difference between making raw
>> data available and respecting the rights of people like my colleagues
>> to ask questions about their city.
>>
>> I agree with you all, after reading your discussion, that my
>> comparison of accessible data through KML and CSV formats is an
>> inappropriate apples-and-oranges comparison in the context of true
>> open data. What I had meant to suggest was that the simultaneous
>> publication of the raw data useful to your community, along with more
>> easily accessible formats more appropriate for my community, would
>> make the information more universally readable to every
>> politically-curious citizen, not just those with computer training.
>> The more brains we have looking at data, after all, the better for
>> everyone.
>>
>> I also agree with you that an increasing part of government's role in
>> society is community building. Part of that process will be preventing
>> communities from being isolated from each other. In this case, techies
>> need to be speaking to non-techies so they can listen to each others'
>> needs. As Michael noted, as well, "...particularly in Canada, where
>> there's no significant institutional/nonprofit presence around open
>> data..." Until that happens, I believe government should fill that
>> role and, if necessary, run basic conversions on the data before it's
>> published so the average person can interact with it. That was the
>> point of the article: that it is possible to build bridges that meet
>> the needs of both communities.
>>
>> Thanks again for your thoughts, it's been enlightening.
>>
>> William
>>
>> ----
>> William Wolfe-Wylie
>> @wolfewylie
>> _______________________________________________
>> CivicAccess-discuss mailing list
>> [hidden email]
>> http://lists.pwd.ca/mailman/listinfo/civicaccess-discuss
>
> _______________________________________________
> CivicAccess-discuss mailing list
> [hidden email]
> http://lists.pwd.ca/mailman/listinfo/civicaccess-discuss
>

Karl Dubost

Re: Toronto Sun: Toronto’s data open but almost useless

William,

Le 8 juil. 2011 à 20:50, William Wolfe-Wylie a écrit :
> Tracey, to answer your question: CSV and XLS are formats that they can
> easily become familiar with and that's where we're starting our
> efforts. Making a journalist GIS-competent is an enormous challenge
> and one that I don't think we can start to tackle while newsrooms are
> as frantic as they are.

This "issue" is not only happening in the Press world, but in fact a bit everywhere across the spectrum at different degree. The jobs are just called a bit differently in each domain. Basically with the rise of computing and easy collection of data comes the idea of "What's next?" or "What do we do with these data?".

I do no think journalists should necessary become data scientists, or programmers for analyzing data. Some will do for sure. I believe more in hiring people with these skills. NYT and Guardian did it (with certainly bigger budgets).

Last year, last.fm (music) was looking for a "data griot" (I love the term)
http://anti-mega.com/antimega/2010/07/04/griotism

Flickr (photos) has positions opened for "Data engineers"
http://www.flickr.com/jobs/data_engineer/

Etsy (indie marketplace) has positions for "Data Analysis Engineer"
http://www.etsy.com/careers/job_description.php?job_id=o1kFVfwF

All of these companies are "tech" companies with a massive load of data.
These needs competencies.

Press organizations and cities will need these competent people too.
As usual, the most tech/data savy will explore the fields. Some people will change their career path, but eventually with more understanding that it needs dedicated resources, budgets will be created in the city for opening such positions. And sincerely that is good. Though it will take time, pain, and a lot of seduction ;)

--
Karl Dubost
Montréal, QC, Canada
http://www.la-grange.net/karl/

James McKinney

Re: Toronto Sun: Toronto’s data open but almost useless

In reply to this post by William Wolfe-Wylie

On Fri, Jul 8, 2011 at 8:50 PM, William Wolfe-Wylie
<[hidden email]> wrote:
> I'd be interested to hear from people on this list what the main
> concerns are with converting shapefiles to KML. What kind of
> information is lost in the conversion? How detrimental is that? What
> about converting XML files that are commonly found on open data sites
> to CSV? Or, again, am I falling into an apples-and-oranges comparison?

Not all XML can be converted to CSV. However, some can be, like at the
following link:
HTTP://VILLE.MONTREAL.QC.CA/pls/portal/portalcon.contrevenants_recherche?p_mot_recherche=,tous,
In that file, there is a list of <contrevenant> within a top-level
<contrevenants> tag. Each <contrevenant> is a record, or row in CSV.
Within each, there is only one level of nesting. Each item at that
level can be converted to a CSV column. More complex XML files, e.g.
one in which tags have attributes, could not be converted to CSV
easily or at all.

For what I'm interested in, most shapefiles I encounter can be
converted to KML without detriment. However, if I want to, for
example, merge two shapefiles to find all police stations (from one
file) that fall within the boundaries of a municipality (within
another file), I need to use tools that work on shapefiles.

Karl Dubost

(Open) Data jobs market

In reply to this post by Karl Dubost

And related to what I was saying about Data jobs market

Le 8 juil. 2011 à 21:06, Karl Dubost a écrit :
> Press organizations and cities will need these competent people too.
> As usual, the most tech/data savy will explore the fields. Some people will change their career path, but eventually with more understanding that it needs dedicated resources, budgets will be created in the city for opening such positions. And sincerely that is good. Though it will take time, pain, and a lot of seduction ;)

Spotted this today

In Jobs for Data Scientists Explode Across The Market

In what's likely just the beginning of a long-term
story, job listings indexed by employment search
engine Indeed.com indicate that market demand for
data scientists and people capable of working with
"big data" took a huge leap over the last year.
David Smith of Revolution Analytics performed
several related queries and posted the results
today on his company's blog.

More at http://www.readwriteweb.com/archives/jobs_for_data_scientists_explode_across_the_market.php

--
Karl Dubost
Montréal, QC, Canada
http://www.la-grange.net/karl/

Jonathan Brun-2

Re: (Open) Data jobs market

At 39 messages in this thread, this has got to be the longest one on Civic Access.

I just got around to reading it, sorry.

The one point that I would like to voice an inane internet opinion on is:

Developers do not have hearts of gold, in fact they want hearts made of gold to mount on their wall above their 27 inch apple cinema display, which is why they work on for-profit projects and not on open-data.

In Montreal, there has been 1 person who has consistently volunteered his time on Open-Data. About 4 others have put in a good number of hours. And about 40 people donated 1-2 days at our Hackathons. Montreal is a big city with lots of amazing developers, but as we have seen from the output here and elsewhere, it takes groups like MySociety and the Sunlight Foundation and the New York Times to build apps and visualizations that will not be very profitable. Until we have that in Canada and in Montreal, we will be stuck with a lot of un-used open-data as indicated by the original article.

Have a great summer everyone!

P.S. Montreal Hackathon in September or October, stay tuned, we'll have stickers!

MontrealOuvert.net
jonathanbrun.com

On 2011-07-21, at 7:29 AM, Karl Dubost wrote:

And related to what I was saying about Data jobs market

Le 8 juil. 2011 à 21:06, Karl Dubost a écrit :
Press organizations and cities will need these competent people too.
As usual, the most tech/data savy will explore the fields. Some people will change their career path, but eventually with more understanding that it needs dedicated resources, budgets will be created in the city for opening such positions. And sincerely that is good. Though it will take time, pain, and a lot of seduction ;)

Spotted this today

   In Jobs for Data Scientists Explode Across The Market

   In what's likely just the beginning of a long-term
   story, job listings indexed by employment search
   engine Indeed.com indicate that market demand for
   data scientists and people capable of working with
   "big data" took a huge leap over the last year.
   David Smith of Revolution Analytics performed
   several related queries and posted the results
   today on his company's blog.

   More at http://www.readwriteweb.com/archives/jobs_for_data_scientists_explode_across_the_market.php

--
Karl Dubost
Montréal, QC, Canada
http://www.la-grange.net/karl/

_______________________________________________
CivicAccess-discuss mailing list
[hidden email]
http://lists.pwd.ca/mailman/listinfo/civicaccess-discuss