Quick navigation: Home   |    Site Map   ||    References   |    Biography   ||    Copyright   |    Other copyright   |    Contact us   |    Advert   |   
 

Re: [ccp4bb] New human genome policy - please read.

- Protein crystallography

Main steps:

   - Protein purification
   - Crystallisation

Special:

   - Programs for crystallography
   - X-ray detectors

Basic tutorials:

   - Chemistry
   - Protein
   - Peptide
   - Amino Acids

Xtal community:

   - CCP4BB

CCP4bb navigation

CCP4bb <-- 1999 <-- November 1999 <-- 30 November 1999
Previous message:
Subject: NCBHT: severe warning
From: Marius Schmidt marius {- dot -} schmidt {- at -} PH {- dot -} TUM {- dot -} DE
Date: 2009-04-01
Next message:
Subject:
From: KUMARASWAMI MUTHIAH megunagu {- at -} HOTMAIL {- dot -} COM
Date: 2009-04-01


Subject: Re: New human genome policy - please read.
From: Warren DeLano warren {- at -} DELSCI {- dot -} COM
Date: 2009-04-01

You know Kevin,

April fools notwithstanding, you idea actually makes good sense in a
tiny-url sort of way. There would of course be collisions, and thus,
need for a global disambiguation registry, but society could do a whole
lot worse than something like:

http://prot.seq.db/3fc28e91d74b39ec/a6

(translated: protein sequence hash #afc28e91274739ec, registry index
#a6)

as a way of unambiguously storing, referring to, and retrieving known
sequences.

The URL, when requested, would of course simply return the registered
sequence. Keeping the scope extremely narrow like that would be the key
to the registry's success: just "natural 20" sequences with no
annotations.

Optimal details might differ of course (CRC64 is suboptimal for ASCII
sequences), but as a general concept, I do think you're on to something
powerful here...

Cheers,
Warren

> -----Original Message-----
> From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of
> Kevin Cowtan
> Sent: Wednesday, April 01, 2009 5:02 AM
> To: CCP4BB@JISCMAIL.AC.UK
> Subject: Re: [ccp4bb] New human genome policy - please read.
>
> Why molecular weight? That's just arbitrary.
>
> There is a simple way of referring to proteins which avoids any
> ambiguity - by it's sequence. When referring to a protein, we should
use
> its sequence as an identifier. No ambiguity.
>
> Now, I understand that some smart people in America are now solving
> proteins of more than a dozen aa in length. For these, quoting the
whole
> sequence could be a bit long. Fortunately this is a solved problem:
all
> we need to do is quote a CRC64 hash of the ascii representation of the
> protein sequence. This gives a name space big enough that we can name
> about 4 billion proteins before the probability of a name clash
becomes
> significant.
>
>
> James Stroud wrote:
> > I think actually *naming* the proteins would be too extreme. Even
the
> > current alpha-numeric system is overwrought. I liked it better when
we
> > just called proteins "p75" or "p105". For instance, how many
proteins in
> > the human genome are 75 kD, anyway? My guess is not enough to make
the
> > situation ambiguous in any catastrophic way.
>
>
>

CCP4bb navigation

CCP4bb <-- 1999 <-- November 1999 <-- 30 November 1999
Previous message:
Subject: NCBHT: severe warning
From: Marius Schmidt marius {- dot -} schmidt {- at -} PH {- dot -} TUM {- dot -} DE
Date: 2009-04-01
Next message:
Subject:
From: KUMARASWAMI MUTHIAH megunagu {- at -} HOTMAIL {- dot -} COM
Date: 2009-04-01



ProteinCrystallography.org: Copyright 2006-2010 by Quid United Ltd