CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8DYE_1 3ADE_1 1TVL_1 Letter Amino acid
52 20 38 A Alanine
19 16 17 N Asparagine
27 12 30 I Isoleucine
51 22 32 L Leucine
14 9 5 M Methionine
25 16 21 P Proline
38 23 26 R Arginine
32 16 25 D Aspartic acid
18 12 0 C Cysteine
40 14 35 E Glutamic acid
26 4 21 K Lycine
20 4 31 F Phenylalanine
36 19 27 T Threonine
5 6 5 W Tryptophan
25 18 10 Y Tyrosine
43 28 22 V Valine
23 10 17 Q Glutamine
64 35 36 G Glycine
21 11 22 H Histidine
32 23 34 S Serine

8DYE_1|Chain A|Succinate dehydrogenase [ubiquinone] flavoprotein subunit, mitochondrial|Homo sapiens (9606)
>3ADE_1|Chain A|Kelch-like ECH-associated protein 1|Mus musculus (10090)
>1TVL_1|Chain A|protein YTNJ|Bacillus subtilis (1423)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8DYE , Knot 245 611 0.85 40 272 572
QYPVVDHEFDAVVVGAGGAGLRAAFGLSEAGFNTACVTKLFPTRSHTVAAQGGINAALGNMEEDNWRWHFYDTVKGSDWLGDQDAIHYMTEQAPAAVVELENYGMPFSRTEDGKIYQRAFGGQSLKFGKGGQAHRCCCVADRTGHSLLHTLYGRSLRYDTSYFVEYFALDLLMENGECRGVIALCIEDGSIHRIRAKNTVVATGGYGRTYFSCTSAHTSTGDGTAMITRAGLPCQDLEFVQFHPTGIYGAGCLITEGCRGEGGILINSQGERFMERYAPVAKDLASRDVVSRSMTLEIREGRGCGPEKDHVYLQLHHLPPEQLATRLPGISETAMIFAGVDVTKEPIPVLPTVHYNMGGIPTNYKGQVLRHVNGQDQIVPGLYACGEAACASVHGANRLGANSLLDLVVFGRACALSIEESCRPGDKVPPIKPNAGEESVMNLDKLRFADGSIRTSELRLSMQKSMQNHAAVFRVGSVLQEGCGKISKLYGDLKHLKTFDRGMVWNTDLVETLELQNLMLCALQTIYGAEARKESRGAHAREDYKVRIDEYDYSKPIQGQQKKPFEEHWRKHTLSYVDVGTGKVTLEYRPVIDKTLNEADCATVPPAIRSY
3ADE , Knot 129 318 0.78 40 183 276
MTLHKPTQAVPCRAPKVGRLIYTAGGYFRQSLSYLEAYNPSNGSWLRLADLQVPRSGLAGCVVGGLLYAVGGRNNSPDGNTDSSALDCYNPMTNQWSPCASMSVPRNRIGVGVIDGHIYAVGGSHGCIHHSSVERYEPERDEWHLVAPMLTRRIGVGVAVLNRLLYAVGGFDGTNRLNSAECYYPERNEWRMITPMNTIRSGAGVCVLHNCIYAAGGYDGQDQLNSVERYDVETETWTFVAPMRHHRSALGITVHQGKIYVLGGYDGHTFLDSVECYDPDSDTWSEVTRMTSGRSGVGVAVTMEPCRKQIDQQNCTCY
1TVL , Knot 187 454 0.84 38 226 427
MSLTRADFIQFGAMIHGVGGTTDGWRHPDVDPSASTNIEFYMKKAQTAEKGLFSFIFIADGLFISEKSIPHFLNRFEPITILSALASVTKNIGLVGTFSTSFTEPFTISRQLMSLDHISGGRAGWNLVTSPQEGAARNHSKSNLPEHTERYEIAQEHLDVVRGLWNSWEHDAFIHNKKTGQFFDQAKLHRLNHKGKYFQVEGPLNIGRSKQGEPVVFQAGSSETGRQFAAKNADAIFTHSNSLEETKAFYADVKSRAADEGRDPSSVRIFPGISPIVADTEEEAEKKYREFAELIPIENAVTYLARFFDDYDLSVYPLDEPFPDIGDVGKNAFQSTTDRIKREAKARNLTLREVAQEMAFPRTLFIGTPERVASLIETWFNAEAADGFIVGSDIPGTLDAFVEKVIPILQERGLYRQDYRGGTLRENLGLGIPQHQSVLHSSHHEGGSHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8DYE_1)}(2) \setminus P_{f(3ADE_1)}(2)|=135\), \(|P_{f(3ADE_1)}(2) \setminus P_{f(8DYE_1)}(2)|=46\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:00111000101111111111101111100111001010011100000111011101111010000101010001010011100011001000111111010001111000001010001111001011011010000011000100110010100100000011001110111001000111110100101001010001110110100010000100001010111001111000101101010110111011001001011111000100110001111001100011000101010010101100001010100111001100111100011111110100011111101000111110000101100101000111110101011010101100111001101111101011010000011001111010110001101001011010100001010100010001111011011001010100101010010010011110001100101001110110010110100000110100000101000000011010000110001000010010110101010001110001001001011111000
Pair \(Z_2\) Length of longest common subsequence
8DYE_1,3ADE_1 181 4
8DYE_1,1TVL_1 148 4
3ADE_1,1TVL_1 169 4

Newick tree

 
[
	3ADE_1:91.62,
	[
		8DYE_1:74,1TVL_1:74
	]:17.62
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{929 }{\log_{20} 929}-\frac{318}{\log_{20}318})=164.\)
Status Protein1 Protein2 d d1/2
Query variables 8DYE_1 3ADE_1 208 154
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]