Description of protein
Proteins are large biological molecules with molecular weight up to few million Daltons. For convenience, the protein weight is measured in thousands Daltons or kiloDaltons (kDa). Proteins are made of amino acids linked into linear chains, called polypeptide chains. Amino acids links between each other by peptide bonds - this peptide bond is formed between the carboxyl and amino groups of neighbouring amino acids. Proteins are formed by one or several polypeptide chains. The sequence of the polypeptide chain is defined by a gene with genetic code. There are only 20 standard amino acids are exists in living organism. Sometimes these amino acids are chemically modified in the protein after protein synthesis. In total the number of different proteins, which it is possible to produce from 20 amino acids is enormous. For example for 10 amino acid sequence it is possible to have 2010 different sequences, which is approximately equal to 1013 or 10 trillions of different structures.
The structures build from 2 to 100 amino acids with molecular weight up to 10 kDa are usually called peptides. Longer polypeptide structures are classified as proteins. Some other classifications are appeal to the conformations stability of the amino acid chain. In this classification - peptides have many different conformations and can randomly change them, whereas proteins are structurally rigid with only one preferable conformation. These classifications are not strict and used only as a guideline.
It is almost impossible to estimate the total number of different proteins in the nature. For example only in E. coli cell about 3000 different proteins are known.
History of protein study
The word PROTEIN comes from Greek language (prota) which means "of primary importance". This name was introduced by Jons Jakob Berzelius in 1838 for large organic compounds with almost equivalent empirical formulas. This name was used because the studied organic compounds were primitive but seems to be very important for animal nutrition.
The next crucial step of the protein study was made by James B. Sumner in 1926 by showing that enzymes could be isolated and crystallized.
In 1955 Sir Frederick Sanger sequenced or determined the complete amino acid sequence if the first protein - insulin. This is a first prove that all proteins have specific structure.
In 1958 the three-dimensional structures of haemoglobin and myoglobin were solved by Max Perutz and Sir John Cowdery Kendrew, respectively. These structures were solved by X-ray diffraction analysis.
Protein in living organisms
Generally speaking, proteins do everything in the living cells. All functions of the living organisms are related with proteins. Each protein or group of proteins are responsible for they own specific function. That is why in bacterial cells, proteins make about a half of the dry weight of cells.
Classification by protein functions
Proteins are responsible for many different functions in the living cell. It is possible to classify proteins on the basis of their functions. Very often, proteins can carry few functions and such proteins can be placed into different groups, but despite this, it is possible to assign main group for each protein.
Enzymes - proteins that catalyze chemical and biochemical reactions within living cell and outside. This group of proteins probably is the biggest and most important group of the proteins. Enzymes are responsible for all metabolic reactions in the living cells. Well known and very interesting examples are: DNA- and RNA-polymerases, dehydrogenases etc.
Hormones - proteins that are responsible for the regulation of many processes in organisms. Hormones are usually quite small and can be classifies as peptides. Most known protein hormones are: insulin, grows factor, lipotropin, prolactin etc. Many protein hormones are predecessor of peptide hormones, such as endorfine, enkephalin etc. It is possible to increase this group of proteins by adding of all protein venoms.
Transport proteins - These proteins are transporting or store some other chemical compounds and ions. Some of them are well known: cytochrome C - electron transport; haemoglobin and myoglobin - oxygen transport; albumin - fatty acid transport in the blood stream etc. It is possible to classify trance membrane protein channels as a transport proteins as well.
Immunoglobulin or Antibodies - proteins that involved into immune response of the organism to neutralize large foreign molecules, which can be a part of an infection. Sometimes antibodies can act as enzymes. Sometimes this group of proteins is considered as a bigger group of protective proteins with adding such proteins as lymphocyte antigen-recognizing receptors, antivirals agents such as interferon, tumor necrosis factor (TNF). Probably the clotting of blood proteins, such as fibrin and thrombin should be classified as protective proteins as well.
Structural proteins - These proteins are maintain structures of other biological components, like cells and tissues. Collagen, elastin, α-keratin, sklerotin, fibroin - these proteins are involved into formation of the whole organism body. Bacterial proteoglycans and virus coating proteins also belongs to this group of proteins. Currently we do not know about other functions of these proteins.
Motor proteins. These proteins can convert chemical energy into mechanical energy. Actin and myosin are responsible for muscular motion. Sometimes it is difficult to make a strict separation between structural and motion proteins.
Receptors These proteins are responsible for signal detection and translation into other type of signal. Sometimes these proteins are active only in complex with low molecular weight compounds. Very well known member of this protein family id rhodopsin - light detecting protein. Many receptors are transmembrane proteins.
Signalling proteins - This group of proteins is involved into signalling translation process. Usually they significantly change conformation in presence of some signalling molecules. These proteins can act as enzymes. Other proteins, usually small, can interact with receptors. Classical example of this group of proteins is GTPases.
Storage proteins. These proteins contain energy, which can be released during metabolism processes in the organism. Egg ovalbumin and milk casein are such proteins. Almost all proteins can be digested and used as a source of energy and building material by other organisms.
Classification of proteins by location in the living cell
Protein classification can be based on their appearance in the living cell. According to this, it is possible to classify all proteins into four main groups.
Membrane or transmembrane proteins - these proteins are located within cell membrane lipid bi-layer. These proteins can be completely or partially burred in membrane.
Internal proteins - these proteins are located within living cell and all functions are related with intercellular needs.
External or secret proteins - these proteins are functions outside the cell they produced. Such type of proteins is more common for multicells organisms.
Virus proteins - These proteins are present only in virus organism, usually as a coat for viral particle.
Classification of proteins by posttranslational modification
After protein translation some of them are subjected to posttranslational modification. This modification can be related with many different aspects of changes. Again this classification split all proteins into overlapped groups.
Native proteins - these proteins are not changed after translation.
Glico-proteins - these proteins are modified by covalent binding with linear or branched oligosaccharides.
Cleaved proteins - the polypeptide chain of these proteins are cleaved into two or more pieces.
Proteins with disulphide bonds. In these proteins pair of cysteins are linked between each other by S-S or disulphide bond (disulphide bridge)
Protein complexes. Some proteins produce protein complexes of homo- and hetero- nature.
Chemically modified proteins - in these proteins some residues are chemically modified by covalent bonding with other chemical compounds.
Prions - these proteins are folded wrongly during translation, or change their configuration straight after translation.
Protein structure organisation (primary, secondary, ternary and quaternary)
The structural organisation of protein can be divided into four different levels.
Primary structure or protein sequence. The protein sequence, or amino acid sequence in polypeptide chain defines the protein primary structure. DNA (or RNA in viruses) codes the primary protein structure and this is comprehensive information for the protein structure and functions.
Secondary structure. One of the main conformational parameter of the amino acid structure is the value of the PHI and PSI angles. These angles completely define the conformation of the polypeptide chain. With some special values for these angles the main chain can adopt specially classified conformations, like alpha-helix or beta-strand. The other main feature of the protein secondary structure is the local stabilisation by hydrogen bonds. These conformations are classifies as a protein secondary structure.
Ternary structure, protein 3D structure or protein folding. Ternary structure or protein fold completely define the structural organization of the protein molecule in 3d.
Quaternary structure. The interaction between several protein molecules forms protein complexes, with their structure defined as a quaternary structure.
Analysis of protein
Proteins can be analysed in vitro and in vivo. In vitro analysis of protein is performed in well controlled environment, usually with purified protein. In contrast, studies in vivo are performed directly in living organism, which can shows the role of the protein in the living cell, but much more difficult to perform. The studies of purified proteins are more accurate and methodically easier, but sometimes the lack of original environment can result in hiding of some key properties of studied proteins.
In this protein crystallography guide we will describe mainly protein structure determination by crystallographic methods and in-brief will describe all necessary preliminary steps.
Nowadays, more than 10000 (ten thousand) protein structures were solved with atomic details. Millions of mutations allow us to investigate the ways to change protein properties in almost any directions. Computer simulations allow us to model protein behaviour for a reasonably long time. But despite these great achievements, we still have a few crucial questions without an answer.
Protein structure or folding of protein. The structure of the protein is ultimately defined by its primary structure, or amino acid sequence. There is no theories or computational techniques at the moment witch will allow us to predict the new protein folding by its sequence.
Protein evolution. How proteins was developed during organisms evolution is unclear.
Protein crystallisation. It is impossible to predict in which conditions a new protein will be crystallized.
Protein science is waiting for the answers for these questions. Of course, it is much more white spots in the protein structure and functions, but these questions are crucial.