Using Twitter as a source for regional linguistic data (and R for the analysis)
Posted by Glen Newton on Jul 20, 2012; 10:43pm URL: http://civicaccess.416.s1.nabble.com/Using-Twitter-as-a-source-for-regional-linguistic-data-and-R-for-the-analysis-tp4900.html
However, one of the problems with using Twitter data is that the
license explicitly the collection of data for re-distribution.
This means that if you collect some tweets, do some analysis on them,
and publish a paper, you cannot make your tweet collection available
to others.
This has significantly impacted the research area that I work in
(natural language processing and information retrieval) to the extent
that research corpora (=text data set) have been yanked after Twitter
told the researchers to cease and desist.
So while there are various instances of very useful big data out
there, most of them are in the hands of private companies that do not
have open data policies. They all have much more data on most of us
than any single government.