CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7ZGK_1 2CEI_1 6MMC_1 Letter Amino acid
32 13 11 A Alanine
36 11 4 Q Glutamine
4 1 1 W Tryptophan
28 7 4 R Arginine
40 16 8 E Glutamic acid
33 6 9 I Isoleucine
24 6 3 F Phenylalanine
23 12 2 N Asparagine
3 3 1 C Cysteine
46 7 5 G Glycine
12 5 0 M Methionine
49 12 12 S Serine
73 6 10 V Valine
23 9 5 Y Tyrosine
28 15 4 D Aspartic acid
10 10 2 H Histidine
56 22 7 L Leucine
39 12 9 K Lycine
38 3 1 P Proline
48 7 6 T Threonine

7ZGK_1|Chain A|Complement C3 beta chain|Homo sapiens (9606)
>2CEI_1|Chain A|FERRITIN HEAVY CHAIN|HOMO SAPIENS (9606)
>6MMC_1|Chain A|Carbon regulatory PII-like protein SbtB|Cyanobium sp. PCC 7001 (180281)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7ZGK , Knot 246 645 0.82 40 268 586
SPMYSIITPNILRLESEETMVLEAHDAQGDVPVTVTVHDFPGKKLVLSSEKTVLTPATNHMGNVTFTIPANREFKSEKGRNKFVTVQATFGTQVVEKVVLVSLQSGYLFIQTDKTIYTPGSTVLYRIFTVNHKLLPVGRTVMVNIENPEGIPVKQDSLSSQNQLGVLPLSWDIPELVNMGQWKIRAYYENSPQQVFSTEFEVKEYVLPSFEVIVEPTEKFYYIYNEKGLEVTITARFLYGKKVEGTAFVIFGIQDGEQRISLPESLKRIPIEDGSGEVVLSRKVLLDGVQNPRAEDLVGKSLYVSATVILHSGSDMVQAERSGIPIVTSPYQIHFTKTPKYFKPGMPFDLMVFVTNPDGSPAYRVPVAVQGEDTVQSLTQGDGVAKLSINTHPSQKPLSITVRTKKQELSEAEQATRTMQALPYSTVGNSNNYLHLSVLRTELRPGETLNVNFLLRMDRAHEAKIRYYTYLIMNKGRLLKAGRQVREPGQDLVVLPLSITTDFIPSFRLVAYYTLIGASGQREVVADSVWVDVKDSCVGSLVVKSGQSEDRQPVPGQQMTLKIEGDHGARVVLVAVDKGVFVLNKKNKLTQSKIWDVVEKADIGCTPGSGKDYAGVFSDAGLTFTSSSGQQTAQRAELQCPQPAA
2CEI , Knot 87 183 0.82 40 137 177
MTTASTSQVRQNYHQDSEAAINRQINLELYASYVYLSMSYYFDRDDVALKNFAKYFLHQSHEEREHAEKLMKLQNQRGGRIFLQDIQKPDCDDWESGLNAMECALHLEKNVNQSLLELHKLATDKNDPHLCDFIETHYLNEQVKAIKELGDHVTNLRKMGAPESGLAEYLFDKHTLGDSDNES
6MMC , Knot 52 104 0.77 38 86 102
SSQQVWKLVIITEEILLKKVSKIIKEAGASGYTVLAAAGEGSRNVRSTGEPSVSHAYSNIKFEVLTASRELADQIQDKVVAKYFDDYSCITYISTVEALRAHKF

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7ZGK_1)}(2) \setminus P_{f(2CEI_1)}(2)|=170\), \(|P_{f(2CEI_1)}(2) \setminus P_{f(7ZGK_1)}(2)|=39\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:011001101011010000011101001010111010100111001110000011011000110101011100010000100011010101100110011110100101110000010011001100110100011111001110100101111000010000011111101011011011010101000001001100010100011101011101000100100001101010101101001010111111100100010110010011100101011100011101100101001110010101011100100110100011111001001010001001011111011111001010110011111010001001001011101010001000110101000000100100100010111000110000010101100010110010101110100100101000001110010110110010011001111110100011101011100011110100011100111010000110111001000000111100101010100110111111001111100000100001101100101100110100011110011101000010001001010010111
Pair \(Z_2\) Length of longest common subsequence
7ZGK_1,2CEI_1 209 3
7ZGK_1,6MMC_1 200 4
2CEI_1,6MMC_1 153 4

Newick tree

 
[
	7ZGK_1:10.52,
	[
		6MMC_1:76.5,2CEI_1:76.5
	]:33.02
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{828 }{\log_{20} 828}-\frac{183}{\log_{20}183})=179.\)
Status Protein1 Protein2 d d1/2
Query variables 7ZGK_1 2CEI_1 228 145.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]