This is pretty cool:
"Soda vs. Pop with Twitter"
http://blog.echen.me/2012/07/06/soda-vs-pop-with-twitter/However, one of the problems with using Twitter data is that the
license explicitly the collection of data for re-distribution.
This means that if you collect some tweets, do some analysis on them,
and publish a paper, you cannot make your tweet collection available
to others.
This has significantly impacted the research area that I work in
(natural language processing and information retrieval) to the extent
that research corpora (=text data set) have been yanked after Twitter
told the researchers to cease and desist.
These two corpora have been yanked:
1 -
http://snap.stanford.edu/data/twitter7.html2 -
http://homepages.inf.ed.ac.uk/miles/papers/socmed10.pdfThere are a few examples of corpora available, but require a signed
agreement that you will not redistribute
(
http://trec.nist.gov/data/tweets/tweets2011-agreement.pdf ).
So while there are various instances of very useful big data out
there, most of them are in the hands of private companies that do not
have open data policies. They all have much more data on most of us
than any single government.
Related: How Recent Changes to Twitter's Terms of Service Might Hurt
Academic Research,
http://www.readwriteweb.com/archives/how_recent_changes_to_twitters_terms_of_service_mi.php-Glen
--
-
http://zzzoot.blogspot.com/-