CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6YGU_1 4XNJ_1 1HMV_1 Letter Amino acid
3 14 19 W Tryptophan
15 12 16 R Arginine
4 0 2 C Cysteine
13 38 42 I Isoleucine
8 19 6 M Methionine
18 40 19 S Serine
4 49 26 A Alanine
8 9 36 Q Glutamine
14 14 47 E Glutamic acid
13 16 61 K Lycine
16 9 24 D Aspartic acid
17 43 34 G Glycine
8 6 8 H Histidine
4 36 13 F Phenylalanine
15 22 38 P Proline
11 10 20 N Asparagine
18 61 49 L Leucine
6 23 39 T Threonine
5 19 22 Y Tyrosine
19 43 39 V Valine

6YGU_1|Chains A, C|ATP dependent RNA helicase (Dob1)-like protein|Chaetomium thermophilum (209285)
>4XNJ_1|Chain A|Di-or tripeptide:H+ symporter|Streptococcus thermophilus LMG 18311 (264199)
>1HMV_1|Chains A, C, E, G|HIV-1 REVERSE TRANSCRIPTASE (SUBUNIT P66)|Human immunodeficiency virus 1 (11676)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6YGU , Knot 102 219 0.83 40 153 209
MDEALIKDYHSIREQIDQYTKDMVLVMQHPTNCVKYINPGRLMHVVTSDGTDFGWGVIINFYERRPERNNPNPGWSPQESYVVEVLLRLSSDSGSVDSKLKDNQCIPAGIAPVTQKNDPGRWEVVPCLLSCMHGLSQIKLHVPDKKSGGSMDDPETRRRVGKSLLEVQRRFEDGIPHMDPIENMHIRDVEFKKLLRKIEVLESRLVANPLHNSGGSGGS
4XNJ , Knot 192 483 0.82 38 210 441
MEDKGKTFFGQPLGLSTLFMTEMWERFSYYGMRAILLYYMWFLISTGDLHITRATAASIMAIYASMVYLSGTIGGFVADRIIGARPAVFWGGVLIMLGHIVLALPFGASALFGSIILIIIGTGFLKPNVSTLVGTLYDEHDRRRDAGFSIFVFGINLGAFIAPLIVGAAQEAAGYHVAFSLAAIGMFIGLLVYYFGGKKTLDPHYLRPTDPLAPEEVKPLLVKVSLAVAGFIAIIVVMNLVGWNSLPAYINLLTIVAIAIPVFYFAWMISSVKVTSTEHLRVVSYIPLFIAAVLFWAIEEQGSVVLATFAAERVDSSWFPVSWFQSLNPLFIMLYTPFFAWLWTAWKKNQPSSPTKFAVGLMFAGLSFLLMAIPGALYGTSGKVSPLWLVGSWALVILGEMLISPVGLSVTTKLAPKAFNSQMMSMWFLSSSVGSALNAQLVTLYNAKSEVAYFSYFGLGSVVLGIVLVFLSKRIQGLMQGVE
1HMV , Knot 222 560 0.83 40 252 513
PISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKIGPENPYNTPVFAIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGDAYFSVPLDEDFRKYTAFTIPSINNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFKKQNPDIVIYQYMDDLYVGSDLEIGQHRTKIEELRQHLLRWGLTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKDSWTVNDIQKLVGKLNWASQIYPGIKVRQLCKLLRGTKALTEVIPLTEEAELELAENREILKEPVHGVYYDPSKDLIAEIQKQGQGQWTYQIYQEPFKNLKTGKYARMRGAHTNDVKQLTEAVQKITTESIVIWGKTPKFKLPIQKETWETWWTEYWQATWIPEWEFVNTPPLVKLWYQLEKEPIVGAETFYVDGAANRETKLGKAGYVTNKGRQKVVPLTNTTNQKTELQAIYLALQDSGLEVNIVTDSQYALGIIQAQPDKSESELVNQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDKLVSAGIRKIL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6YGU_1)}(2) \setminus P_{f(4XNJ_1)}(2)|=64\), \(|P_{f(4XNJ_1)}(2) \setminus P_{f(6YGU_1)}(2)|=121\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100111000001000100000011111001000100101101101100010011111110100001000010111010000110111010000101000100000111111110000011010111011001011001010110000110100100000110011010001001110101100101001010011001011000111011000110110
Pair \(Z_2\) Length of longest common subsequence
6YGU_1,4XNJ_1 185 3
6YGU_1,1HMV_1 199 4
4XNJ_1,1HMV_1 160 4

Newick tree

 
[
	6YGU_1:10.85,
	[
		4XNJ_1:80,1HMV_1:80
	]:20.85
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{702 }{\log_{20} 702}-\frac{219}{\log_{20}219})=135.\)
Status Protein1 Protein2 d d1/2
Query variables 6YGU_1 4XNJ_1 171 125.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]