CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6VJM_1 1ZVF_1 8TQB_1 Letter Amino acid
38 7 44 R Arginine
53 14 45 E Glutamic acid
51 14 64 V Valine
10 5 33 C Cysteine
46 12 48 K Lycine
16 4 21 M Methionine
60 5 75 S Serine
48 12 61 G Glycine
80 13 85 L Leucine
28 12 39 P Proline
48 10 54 T Threonine
19 3 15 W Tryptophan
50 5 62 A Alanine
30 12 41 N Asparagine
30 12 40 D Aspartic acid
29 6 33 Q Glutamine
14 6 17 H Histidine
48 11 59 I Isoleucine
37 7 48 F Phenylalanine
27 6 37 Y Tyrosine

6VJM_1|Chain A|Gamma-aminobutyric acid type B receptor subunit 1|Homo sapiens (9606)
>1ZVF_1|Chains A, B|3-hydroxyanthranilate 3,4-dioxygenase|Saccharomyces cerevisiae (4932)
>8TQB_1|Chains A, B|Metabotropic glutamate receptor 3|Rattus norvegicus (10116)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6VJM , Knot 296 762 0.86 40 308 687
GSERRAVYIGALFPMSGGWPGGQACQPAVEMALEDVNSRRDILPDYELKLIHHDSKCDPGQATKYLYELLYNDPIKIILMPGCSSVSTLVAEAARMWNLIVLSYGSSSPALSNRQRFPTFFRTHPSATLHNPTRVKLFEKWGWKKIATIQQTTEVFTSTLDDLEERVKEAGIEITFRQSFFSDPAVPVKNLKRQDARIIVGLFYETEARKVFCEVYKERLFGKKYVWFLIGWYADNWFKIYDPSINCTVDEMTEAVEGHITTEIVMLNPANTRSISNMTSQEFVEKLTKRLKRHPEETGGFQEAPLAYDAIWALALALNKTSGGGGRSGVRLEDFNYNNQTITDQIYRAMNSSSFEGVSGHVVFDASGSRMAWTLIEQLQGGSYKKIGYYDSTKDDLSWSKTDKWIGGSPPADQTLVIKTFRFLSQKLFISVSVLSSLGIVLAVVCLSFNIYNSHVRYIQNSQPNLNNLTAVGCSLALAAVFPLGLDGYHIGRNQFPFVCQARLWLLGLGFSLGYGSMFTKIWWVHTVFTKKEEKKEWRKTLEPWKLYATVGLLVGMDVLTLAIWQIVDPLHRTIETFAKEEPKEDIDVSILPQLEHCSSRKMNTWLGIFYGYKGLLLLLGIFLAYETKSVSTEKINDHRAVGMAIYNVAVLCLITAPVTMILSSQQDAAFAFASLAIVFSSYITLVVLFVPKMRRLITRGEWQSEAQDTMKTGSSTNNNEEEKSRLLEKENRELEKIIAEKEERVSELRHQLQSRLEVLFQ
1ZVF , Knot 85 176 0.83 40 138 171
AMFNTTPINIDKWLKENEGLLKPPVNNYCLHKGGFTVMIVGGPNERTDYHINPTPEWFYQKKGSMLLKVVDETDAEPKFIDIIINEGDSYLLPGNVPHSPVRFADTVGIVVEQDRPGGENDKIRWYCSHCRQVVHESELQMLDLGTQVKEAILDFENDVEKRTCFHCKTLNYARPQ
8TQB , Knot 346 921 0.85 40 329 840
MKMLTRLQILMLALFSKGFLLSLGDHNFMRREIKIEGDLVLGGLFPINEKGTGTEECGRINEDRGIQRLEAMLFAIDEINKDNYLLPGVKLGVHILDTCSRDTYALEQSLEFVRASLTKVDEAEYMCPDGSYAIQENIPLLIAGVIGGSYSSVSIQVANLLRLFQIPQISYASTSAKLSDKSRYDYFARTVPPDFYQAKAMAEILRFFNWTYVSTVASEGDYGETGIEAFEQEARLRNICIATAEKVGRSNIRKSYDSVIRELLQKPNARVVVLFMRSDDSRELIAAANRVNASFTWVASDGWGAQESIVKGSEHVAYGAITLELASHPVRQFDRYFQSLNPYNNHRNPWFRDFWEQKFQCSLQNKRNHRQVCDKHLAIDSSNYEQESKIMFVVNAVYAMAHALHKMQRTLCPNTTKLCDAMKILDGKKLYKEYLLKINFTAPFNPNKGADSIVKFDTFGDGMGRYNVFNLQQTGGKYSYLKVGHWAETLSLDVDSIHWSRNSVPTSQCSDPCAPNEMKNMQPGDVCCWICIPCEPYEYLVDEFTCMDCGPGQWPTADLSGCYNLPEDYIKWEDAWAIGPVTIACLGFLCTCIVITVFIKHNNTPLVKASGRELCYILLFGVSLSYCMTFFFIAKPSPVICALRRLGLGTSFAICYSALLTKTNCIARIFDGVKNGAQRPKFISPSSQVFICLGLILVQIVMVSVWLILETPGTRRYTLPEKRETVILKCNVKDSSMLISLTYDVVLVILCTVYAFKTRKCPENFNEAKFIGFTMYTTCIIWLAFLPIFYVTSSDYRVQTTTMCISVSLSGFVVLGCLFAPKVHIVLFQPQKNVVTHRLHLNRFSVSGTATTYSQSSASTYVPTVCNGREVLDSTTSSLAGLVPRGSAAAKSAWSHPQFEKGGGSGGGSGGGSWSHPQFEK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6VJM_1)}(2) \setminus P_{f(1ZVF_1)}(2)|=196\), \(|P_{f(1ZVF_1)}(2) \setminus P_{f(6VJM_1)}(2)|=26\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100001101111111011111101001110111001000001110001011000000011010001001100011011111100010011101101101111001000111000001101100010101001001011001110011010000011000100100010011101010001100111110010000101111110000100110010000111000111111101001101001010001001001101010001111011000010010000110010001000100011100111100111111111000011110011010010000001000100110000101101011101010011101100101100001100000000101000001111011100011100101100011101011001111111101010100001001000010100101110011111111111010011000111100101111111101101011001111001100000000100010110101011111110110111101101100010011000100010101110100000001001111101001111111111100000100001000011111100111101101110111000001111110111110001011111110100110010100010001001000000000000110000001001110000010010001000101110
Pair \(Z_2\) Length of longest common subsequence
6VJM_1,1ZVF_1 222 4
6VJM_1,8TQB_1 97 4
1ZVF_1,8TQB_1 235 5

Newick tree

 
[
	1ZVF_1:12.97,
	[
		6VJM_1:48.5,8TQB_1:48.5
	]:80.47
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{938 }{\log_{20} 938}-\frac{176}{\log_{20}176})=209.\)
Status Protein1 Protein2 d d1/2
Query variables 6VJM_1 1ZVF_1 270 164.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]