CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7TEV_1 6GAP_1 1IUE_1 Letter Amino acid
32 20 4 A Alanine
14 5 1 F Phenylalanine
17 31 6 S Serine
18 1 6 Y Tyrosine
19 24 5 T Threonine
6 0 0 W Tryptophan
24 14 10 D Aspartic acid
35 20 4 G Glycine
27 16 6 I Isoleucine
43 32 8 L Leucine
23 4 2 P Proline
30 17 4 V Valine
19 19 3 R Arginine
6 0 7 C Cysteine
7 13 3 Q Glutamine
6 4 1 M Methionine
18 17 5 N Asparagine
27 11 13 E Glutamic acid
8 9 2 H Histidine
25 4 8 K Lycine

7TEV_1|Chains A, B, C|Ornithine aminotransferase, mitochondrial|Homo sapiens (9606)
>6GAP_1|Chains A, B, C|Outer capsid protein sigma-1|Mammalian orthoreovirus 3 Dearing (10886)
>1IUE_1|Chains A, B|FERREDOXIN|Plasmodium falciparum (36329)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7TEV , Knot 170 404 0.84 40 225 384
GPPTSDDIFEREYKYGAHNYHPLPVALERGKGIYLWDVEGRKYFDFLSSYSAVNQGHCHPKIVNALKSQVDKLTLTSRAFYNNVLGEYEEYITKLFNYHKVLPMNTGVEAGETACKLARKWGYTVKGIQKYKAKIVFAAGNFWGRTLSAISSSTDPTSYDGFGPFMPGFDIIPYNDLPALERALQDPNVAAFMVEPIQGEAGVVVPDPGYLMGVRELCTRHQVLFIADEIQTGLARTGRWLAVDYENVRPDIVLLGKALSGGLYPVSAVLCDDDIMLTIKPGEHGSTYGGNPLGCRVAIAALEVLEEENLAENADKLGIILRNELMKLPSDVVTAVRGKGLLNAIVIKETKDWDAWKVCLRLRDNGLLAKPTHGDIIRFAPPLVIKEDELRESIEIINKTILSF
6GAP , Knot 106 261 0.75 36 136 235
MGSSHHHHHHSSGLVPRGSHMASSKGLESRVSALEKTSQIHSDTILRITQGLDDANKRIIALEQSRDDLVASVSDAQLAISRLESSIGALQTVVNGLDSSVTQLGARVGQLETGLAELRVDHDNLVARVDTAERNIGSLTTELSTLTLRVTSIQADFESRISTLERTAVTSAGAPLSIRNNRMTMGLNDGLTLSGNNLAIRLPGNTGLNIQNGGLQFRFNTDQFQIVNNNLTLKTTVFDSINSRIGATEQSYVASAVTPLR
1IUE , Knot 53 98 0.82 38 82 94
AFYNITLRTNDGEKKIECNEDEYILDASERQNVELPYSCRGGSCSTCAAKLVEGEVDNDDQSYLDEEQIKKKYILLCTCYPKSDCVIETHKEDELHDM

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7TEV_1)}(2) \setminus P_{f(6GAP_1)}(2)|=135\), \(|P_{f(6GAP_1)}(2) \setminus P_{f(7TEV_1)}(2)|=46\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11100001100000011000011111100101101101010001011000011001000101101100010010100011000111000001001100001111001101100100110011001011000010111111011100101100000100001111111110111000111100110010111111011010111111011011110010000011111001001110010111100001010111110110111011011100001110101100100011011100111111011000011001001111100011011001101101011101111000001011010101000111101001011011111110000100010110001101
Pair \(Z_2\) Length of longest common subsequence
7TEV_1,6GAP_1 181 4
7TEV_1,1IUE_1 193 5
6GAP_1,1IUE_1 154 3

Newick tree

 
[
	7TEV_1:98.44,
	[
		6GAP_1:77,1IUE_1:77
	]:21.44
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{665 }{\log_{20} 665}-\frac{261}{\log_{20}261})=112.\)
Status Protein1 Protein2 d d1/2
Query variables 7TEV_1 6GAP_1 144 115
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]