Individuals publish their genetic data as open source, releasing all rights to them

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Individuals publish their genetic data as open source, releasing all rights to them

Pierrot Péladeau
Another instance of a combination of trends in some circles :

    * the publishing of one's own medical data
    * the application of open data principles to one's own data
    * the crowd sourcing of research, and
    * some faith in the generally beneficial net effects of information
      technologies (wonders if many Arabs or Chinese would do the same)

Pierrot

------------------------------------------------------------------------


    Open Sourcing My Genetic Data

ByManuSporny <http://manu.sporny.org/author/manusporny/>On*February 12,
2011*InGenetics <http://manu.sporny.org/category/genetics/>With29
Comments
<http://manu.sporny.org/2011/public-domain-genome/#comments>Permanent
Link to Open Sourcing My Genetic DataPermalink
<http://manu.sporny.org/2011/public-domain-genome/>

http://manu.sporny.org/2011/public-domain-genome/

Today, I published all of my known genetic data as open source
andreleased all my rights to the data
<https://github.com/msporny/dna/raw/master/README>. Roughly 1 million of
my genetic markers are now in the public domain. I believe that I’m one
of the first people in the world tocommit my genetic data into a
decentralized source control system
<https://github.com/msporny/dna>[ed:orta
<https://github.com/orta/dna>was the first]. The first reactions that I
received when I told some of my friends that I was going to do this was
a combination of shock and skepticism.

/“Why would you do something like that?”/
/“Aren’t you afraid that somebody is going to use that against you?/
/“What if your healthcare provider got a hold of that? They’d love to
look through it in order to deny you for some pre-existing condition!”/
/“Ugh, I’d never want to know that sort of stuff about myself!”/
/“What if somebody clones you!?”/

I’ve thought long and hard about each of those questions and the many
more that you ask yourself before publishing this sort of personal data.
There are large privacy implications in doing this. However, speaking
solely for myself, I think the benefits outweigh the drawbacks. I’ll
explain my thought process behind each of those questions in a separate
blog post.

However, the result of that thought process is that I’m releasing my
genetic data today – that’s what I’d like to focus on in this blog post.
So, let’s explore exactly what this data is and how I hope people that
write software will use it.


    Your Genetic Code

There is a website called23andme.com <http://www.23andme.com/>that is in
the business of analyzing your DNA. To become a member of the service,
you pay a fee, they send you a test tube, you spit in the test tube and
send it back to them. They then take your spit and place it onto
something called a*genotyping beadchip*. In this particular case, my
spit was placed onto the/Illumina OmniExpress Plus Genotyping Beadchip/.
This particular chip is capable of detecting around one million genetic
markers. These markers are called*single-nucleotide polymorphisms*orSNP
<http://en.wikipedia.org/wiki/Single-nucleotide_polymorphism>s
(pronounced ‘snip’) for short.

In combination, these SNPs can tell you quite a bit about your genetic
makeup. Things such as your eye color, hair color, hair curl, whether
you are at an increased risk for diabetes, where your ancestors came
from, or even things like if you’re resistant to the HIV virus or if you
have the type of muscles that would make you a good sprinter.

There are around 10 million SNPs in the human genome, the Illumina chip
can currently analyze around 1 million of them (966,977 – to be exact).
Of those roughly 1 million pieces of data, all of science only knows
what around 14,515 of them do. Of the SNPs that we know about, we’re
still shaky about all of the things that many of them affect – we’re not
so sure about what the data is telling us. On the 23andme site, they
only list around 160 SNPs and their effect on you. This means that of
the raw data I’m publishing today, science still doesn’t know what
952,462 of these markers do. Talk about a treasure trove of information,
just waiting to be unlocked! As science marches steadily onward, we’ll
learn more about each one of those 952,462 markers and how they affect
how we are born, grow, live and die.

One of the best features of 23andme is that they allow you to download
your entire genetic profile from the Illumina chip in a raw,
non-proprietary format. This is very big news for people that are
capable programmers. It means that for the first time in history, there
is an inexpensive service that can extract, decode and export your
genetic information to a non-proprietary file format.


    Commit-ment

As an open source software developer, there are certain commits that you
make to a public source code repository that leave you feeling better
about the state of the world. This was certainly one of them for me:

msporny@tao:~/work/dna$ git add ManuSporny-genome.txt
msporny@tao:~/work/dna$ git commit -a
[master a08b027] Added my genome into source control.
  1 files changed, 966992 insertions(+), 0 deletions(-)
  create mode 100644 ManuSporny-genome.txt

Doing that made me realize how quickly we’re narrowing in on some of the
most debilitating human diseases. It gave me hope that our children may
enjoy a far better quality of healthcare than we do today. Most of all,
it gave me hope that we will be able to better help the nurses, doctors
and medical researchers as a society – more than with just money, but
with our time, expertise and energy. That commit sent chills up my spine
– to me, it symbolized a brighter future for all of us.

So, now that all of us can get a hold of that data, what can we do with it?


    Analyzing your Genetic Data

23andme does a great job giving you reports on research that they’re
confident of, for example, I’m at a 13.4% increased risk for Age-related
Macular Degeneration. The average is 7% – which means that I’m about
1.91 times more likely than the average person to start losing my
eyesight as a result of old age. This makes sense as one of my
grandparents has a bad case of age-related macular degeneration. There
are around 160 of these types of reports that you get with your 23andme
data, but what if you want to dive deeper into your genetic code?

Code is code, whether it is 1s and 0s or A, G, C, and Ts. Analyzing code
and data is something that many Computer Scientists do quite often and
quite well. Think of the amount of data that Facebook, Google and
Twitter deal with on a daily basis. Think about how quickly you can
search over a trillion documents on Google (less than a second in most
cases).

Personally, I was expecting the same sort of instant searching and
analysis functionality on 23andme. It’s just not there. Don’t get me
wrong, 23andme is a great service and if this kind of stuff interests
you, you should definitely get a kit right now. The kits go on sale
twice a year. I got my spit analyzed for $150 total – it’s a deal, any
way that you look at it. That and you get instant access to your raw
data – that’s the best part.

However, searching through your raw data on 23andme sucks. Remember,
there are only about 160 reports on the 23andme site, but there are over
14,515 SNPs that are known. If you want to find out more than just the
160 reports that 23andme has, there is this great website out there
calledSNPedia.com <http://www.snpedia.com/>. SNPedia is basically the
Wikipedia of genetic information.

Keep in mind there are usually many SNPs that come into play for traits
like eye color, hair color or certain types of cancer, or where your
ancestors came from. 23andme does the heavy lifting for most of their
reports, but there are many SNPs that they don’t show you in their
reports. So, if you want to find out about anything that is not on the
23andme site, you have to manually search for the SNPs you’re looking
for on SNPedia. To make this even more difficult, SNPs have fairly
opaque names likers1815739 <http://www.snpedia.com/index.php/Rs1815739>.

If you are looking for more than 1 SNP, it can take a long time. You
have to first look up the original SNP that interests you on SNPedia.
Once you have the original marker on the screen, it might link to
upwards of 10 additional SNPs that affect the trait you’re researching.
You have to manually type in each SNP one-by-one into the 23andme site,
click “Search”, write down the sequence for that SNP, such as “GG” or
“AA” and repeat this process for as many SNPs as you’re looking for.


    What can Web Programmers do for Genetics?

Manually searching for these markers is unnecessarily time consuming.
Doing stuff like this is why we have computers – they’re good at
computing! Your genetic data fits in 25 Megabytes of memory – a tiny,
tiny fraction of the tiniest USB thumb-drive. This genetic data is the
equivalent of 5 MP3 songs, a small website, or 5-7 high resolution
digital photos. You can type a Google search for “eye color” and get
back a result in less than a second after searching/the entire
Internet/. Why can’t you do that for your genetic data?

I think programmers, especially Web programmers, can do better. That’s
the driving reason that I’m releasing this data into the public domain.
I’d like to see an open source website that can search SNPedia in the
blink of an eye – just like Google Instant does. If I type in “blood
type” it should tell me all of the things it can find out about my blood
type. If I type in “eyes”, it should be able to tell me everything that
it knows about me concerning macular degeneration, eye color, etc. There
is a lot of data out there on SNPedia, we just need a nice, personalized
interface to work with it.

That’s just one idea, though. There are thousands of other ideas hidden
away out there on the Web. One of them may be hiding in that beautiful
brain of yours. I hope that you will share this story with other people
that may be interested in helping us to reduce suffering in the world. I
hold great hope for this new technology – we are primed for some amazing
health-related advances in our lifetime. If you know how to program,
design or write – you can help. You can start by blogging or tweeting
about this post, or you can:

Download Manu Sporny’s genetic data. <https://github.com/msporny/dna>


Reply | Threaded
Open this post in threaded view
|

Re: Individuals publish their genetic data as open source, releasing all rights to them

catherine
It is interesting to note that 23andMe was co-founded by Anne Wojcicki,
who is married to Sergey Brin of Google. Also of interest, Google invested
a total of 11.9 million dollars in 23andMe.

http://en.wikipedia.org/wiki/23andMe


--
Catherine Roy
http://www.catherine-roy.net



On Sun, February 13, 2011 11:54 am, Pierrot Péladeau wrote:

> Another instance of a combination of trends in some circles :
>
>     * the publishing of one's own medical data
>     * the application of open data principles to one's own data
>     * the crowd sourcing of research, and
>     * some faith in the generally beneficial net effects of information
>       technologies (wonders if many Arabs or Chinese would do the same)
>
> Pierrot
>
> ------------------------------------------------------------------------
>
>
>     Open Sourcing My Genetic Data
>
> ByManuSporny <http://manu.sporny.org/author/manusporny/>On*February 12,
> 2011*InGenetics <http://manu.sporny.org/category/genetics/>With29
> Comments
> <http://manu.sporny.org/2011/public-domain-genome/#comments>Permanent
> Link to Open Sourcing My Genetic DataPermalink
> <http://manu.sporny.org/2011/public-domain-genome/>
>
> http://manu.sporny.org/2011/public-domain-genome/
>
> Today, I published all of my known genetic data as open source
> andreleased all my rights to the data
> <https://github.com/msporny/dna/raw/master/README>. Roughly 1 million of
> my genetic markers are now in the public domain. I believe that I’m one
> of the first people in the world tocommit my genetic data into a
> decentralized source control system
> <https://github.com/msporny/dna>[ed:orta
> <https://github.com/orta/dna>was the first]. The first reactions that I
> received when I told some of my friends that I was going to do this was
> a combination of shock and skepticism.
>
> /“Why would you do something like that?”/
> /“Aren’t you afraid that somebody is going to use that against you?/
> /“What if your healthcare provider got a hold of that? They’d love to
> look through it in order to deny you for some pre-existing condition!”/
> /“Ugh, I’d never want to know that sort of stuff about myself!”/
> /“What if somebody clones you!?”/
>
> I’ve thought long and hard about each of those questions and the many
> more that you ask yourself before publishing this sort of personal data.
> There are large privacy implications in doing this. However, speaking
> solely for myself, I think the benefits outweigh the drawbacks. I’ll
> explain my thought process behind each of those questions in a separate
> blog post.
>
> However, the result of that thought process is that I’m releasing my
> genetic data today – that’s what I’d like to focus on in this blog post.
> So, let’s explore exactly what this data is and how I hope people that
> write software will use it.
>
>
>     Your Genetic Code
>
> There is a website called23andme.com <http://www.23andme.com/>that is in
> the business of analyzing your DNA. To become a member of the service,
> you pay a fee, they send you a test tube, you spit in the test tube and
> send it back to them. They then take your spit and place it onto
> something called a*genotyping beadchip*. In this particular case, my
> spit was placed onto the/Illumina OmniExpress Plus Genotyping Beadchip/.
> This particular chip is capable of detecting around one million genetic
> markers. These markers are called*single-nucleotide polymorphisms*orSNP
> <http://en.wikipedia.org/wiki/Single-nucleotide_polymorphism>s
> (pronounced ‘snip’) for short.
>
> In combination, these SNPs can tell you quite a bit about your genetic
> makeup. Things such as your eye color, hair color, hair curl, whether
> you are at an increased risk for diabetes, where your ancestors came
> from, or even things like if you’re resistant to the HIV virus or if you
> have the type of muscles that would make you a good sprinter.
>
> There are around 10 million SNPs in the human genome, the Illumina chip
> can currently analyze around 1 million of them (966,977 – to be exact).
> Of those roughly 1 million pieces of data, all of science only knows
> what around 14,515 of them do. Of the SNPs that we know about, we’re
> still shaky about all of the things that many of them affect – we’re not
> so sure about what the data is telling us. On the 23andme site, they
> only list around 160 SNPs and their effect on you. This means that of
> the raw data I’m publishing today, science still doesn’t know what
> 952,462 of these markers do. Talk about a treasure trove of information,
> just waiting to be unlocked! As science marches steadily onward, we’ll
> learn more about each one of those 952,462 markers and how they affect
> how we are born, grow, live and die.
>
> One of the best features of 23andme is that they allow you to download
> your entire genetic profile from the Illumina chip in a raw,
> non-proprietary format. This is very big news for people that are
> capable programmers. It means that for the first time in history, there
> is an inexpensive service that can extract, decode and export your
> genetic information to a non-proprietary file format.
>
>
>     Commit-ment
>
> As an open source software developer, there are certain commits that you
> make to a public source code repository that leave you feeling better
> about the state of the world. This was certainly one of them for me:
>
> msporny@tao:~/work/dna$ git add ManuSporny-genome.txt
> msporny@tao:~/work/dna$ git commit -a
> [master a08b027] Added my genome into source control.
>   1 files changed, 966992 insertions(+), 0 deletions(-)
>   create mode 100644 ManuSporny-genome.txt
>
> Doing that made me realize how quickly we’re narrowing in on some of the
> most debilitating human diseases. It gave me hope that our children may
> enjoy a far better quality of healthcare than we do today. Most of all,
> it gave me hope that we will be able to better help the nurses, doctors
> and medical researchers as a society – more than with just money, but
> with our time, expertise and energy. That commit sent chills up my spine
> – to me, it symbolized a brighter future for all of us.
>
> So, now that all of us can get a hold of that data, what can we do with
> it?
>
>
>     Analyzing your Genetic Data
>
> 23andme does a great job giving you reports on research that they’re
> confident of, for example, I’m at a 13.4% increased risk for Age-related
> Macular Degeneration. The average is 7% – which means that I’m about
> 1.91 times more likely than the average person to start losing my
> eyesight as a result of old age. This makes sense as one of my
> grandparents has a bad case of age-related macular degeneration. There
> are around 160 of these types of reports that you get with your 23andme
> data, but what if you want to dive deeper into your genetic code?
>
> Code is code, whether it is 1s and 0s or A, G, C, and Ts. Analyzing code
> and data is something that many Computer Scientists do quite often and
> quite well. Think of the amount of data that Facebook, Google and
> Twitter deal with on a daily basis. Think about how quickly you can
> search over a trillion documents on Google (less than a second in most
> cases).
>
> Personally, I was expecting the same sort of instant searching and
> analysis functionality on 23andme. It’s just not there. Don’t get me
> wrong, 23andme is a great service and if this kind of stuff interests
> you, you should definitely get a kit right now. The kits go on sale
> twice a year. I got my spit analyzed for $150 total – it’s a deal, any
> way that you look at it. That and you get instant access to your raw
> data – that’s the best part.
>
> However, searching through your raw data on 23andme sucks. Remember,
> there are only about 160 reports on the 23andme site, but there are over
> 14,515 SNPs that are known. If you want to find out more than just the
> 160 reports that 23andme has, there is this great website out there
> calledSNPedia.com <http://www.snpedia.com/>. SNPedia is basically the
> Wikipedia of genetic information.
>
> Keep in mind there are usually many SNPs that come into play for traits
> like eye color, hair color or certain types of cancer, or where your
> ancestors came from. 23andme does the heavy lifting for most of their
> reports, but there are many SNPs that they don’t show you in their
> reports. So, if you want to find out about anything that is not on the
> 23andme site, you have to manually search for the SNPs you’re looking
> for on SNPedia. To make this even more difficult, SNPs have fairly
> opaque names likers1815739 <http://www.snpedia.com/index.php/Rs1815739>.
>
> If you are looking for more than 1 SNP, it can take a long time. You
> have to first look up the original SNP that interests you on SNPedia.
> Once you have the original marker on the screen, it might link to
> upwards of 10 additional SNPs that affect the trait you’re researching.
> You have to manually type in each SNP one-by-one into the 23andme site,
> click “Search”, write down the sequence for that SNP, such as “GG” or
> “AA” and repeat this process for as many SNPs as you’re looking for.
>
>
>     What can Web Programmers do for Genetics?
>
> Manually searching for these markers is unnecessarily time consuming.
> Doing stuff like this is why we have computers – they’re good at
> computing! Your genetic data fits in 25 Megabytes of memory – a tiny,
> tiny fraction of the tiniest USB thumb-drive. This genetic data is the
> equivalent of 5 MP3 songs, a small website, or 5-7 high resolution
> digital photos. You can type a Google search for “eye color” and get
> back a result in less than a second after searching/the entire
> Internet/. Why can’t you do that for your genetic data?
>
> I think programmers, especially Web programmers, can do better. That’s
> the driving reason that I’m releasing this data into the public domain.
> I’d like to see an open source website that can search SNPedia in the
> blink of an eye – just like Google Instant does. If I type in “blood
> type” it should tell me all of the things it can find out about my blood
> type. If I type in “eyes”, it should be able to tell me everything that
> it knows about me concerning macular degeneration, eye color, etc. There
> is a lot of data out there on SNPedia, we just need a nice, personalized
> interface to work with it.
>
> That’s just one idea, though. There are thousands of other ideas hidden
> away out there on the Web. One of them may be hiding in that beautiful
> brain of yours. I hope that you will share this story with other people
> that may be interested in helping us to reduce suffering in the world. I
> hold great hope for this new technology – we are primed for some amazing
> health-related advances in our lifetime. If you know how to program,
> design or write – you can help. You can start by blogging or tweeting
> about this post, or you can:
>
> Download Manu Sporny’s genetic data. <https://github.com/msporny/dna>
>
> _______________________________________________
> CivicAccess-discuss mailing list
> [hidden email]
> http://lists.pwd.ca/mailman/listinfo/civicaccess-discuss
>