CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4AOS_1 6AHE_1 2LQL_1 Letter Amino acid
40 14 11 E Glutamic acid
12 14 2 K Lycine
6 9 3 M Methionine
22 13 3 F Phenylalanine
27 8 8 P Proline
33 17 7 S Serine
34 11 3 T Threonine
54 24 4 G Glycine
11 5 4 H Histidine
12 2 1 W Tryptophan
20 7 3 Y Tyrosine
45 20 5 V Valine
37 17 1 D Aspartic acid
26 14 4 I Isoleucine
18 9 2 N Asparagine
2 3 8 C Cysteine
10 7 13 Q Glutamine
39 24 6 L Leucine
59 37 16 A Alanine
42 12 9 R Arginine

4AOS_1|Chain A|STEROID MONOOXYGENASE|RHODOCOCCUS RHODOCHROUS (1829)
>6AHE_1|Chains A, B, C, D|Enoyl-[acyl-carrier-protein] reductase [NADH]|Acinetobacter baumannii ATCC 19606 = CIP 70.34 = JCM 6841 (575584)
>2LQL_1|Chain A|Coiled-coil-helix-coiled-coil-helix domain-containing protein 5|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4AOS , Knot 216 549 0.82 40 246 506
MNGQHPRSVVTAPDATTGTTSYDVVVVGAGIAGLYAIHRFRSQGLTVRAFEAASGVGGVWYWNRYPGARCDVESIDYSYSFSPELEQEWNWSEKYATQPEILAYLEHVADRFDLRRDIRFDTRVTSAVLDEEGLRWTVRTDRGDEVSARFLVVAAGPLSNANTPAFDGLDRFTGDIVHTARWPHDGVDFTGKRVGVIGTGSSGIQSIPIIAEQAEQLFVFQRSANYSIPAGNVPLDDATRAEQKANYAERRRLSRESGGGSPHRPHPKSALEVSEEERRAVYEERWKLGGVLFSKAFPDQLTDPAANDTARAFWEEKIRAVVDDPAVAELLTPKDHAIGAKRIVTDSGYYETYNRDNVELVDLRSTPIVGMDETGIVTTGAHYDLDMIVLATGFDAMTGSLDKLEIVGRGGRTLKETWAAGPRTYLGLGIDGFPNFFNLTGPGSPSVLANMVLHSELHVDWVADAIAYLDARGAAGIEGTPEAVADWVEECRNRAEASLLNSANSWYLGANIPGRPRVFMPFLGGFGVYREIITEVAESGYKGFAILEG
6AHE , Knot 119 267 0.83 40 169 252
MTQGLLAGKRFLIAGVASKLSIAYGIAQALHREGAELAFTYPNEKLKKRVDEFAEQFGSKLVFPCDVAVDAEIDNAFAELAKHWDGVDGVVHSIGFAPAHTLDGDFTDVTDRDGFKIAHDISAYSFVAMARAAKPLLQARQGCLLTLTYQGSERVMPNYNVMGMAKASLEAGVRYLASSLGVDGIRVNAISAGPIRTLAASGIKSFRKMLDANEKVAPLKRNVTIEEVGNAALFLCSPWASGITGEILYVDAGFNTVGMSQSMMDDE
2LQL , Knot 59 113 0.82 40 89 108
GSHMQAALEVTARYCGRELEQYGQCVAAKPESWQRDCHYLKMSIAQCTSSHPIIRQIRQACAQPFEAFEECLRQNEAAVGNCAEHMRRFLQCAEQVQPPRSPATVEAQPLPAS

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4AOS_1)}(2) \setminus P_{f(6AHE_1)}(2)|=114\), \(|P_{f(6AHE_1)}(2) \setminus P_{f(4AOS_1)}(2)|=37\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101001001101101001000001111111111101100100011010110110111111010001110001001000001010100010100001001011101001100101000101000100111000110101000010010101111111110010011101100101011001011001101010011111010011001111100100111100010001111011100100100010010000100001110100101001101000000110000101111110011100100111000101110001011100111101101000111100110001000000000101101000111110001110011000101111101101101010010111011001000111110001111101110110101110101110111000101011101110101011111010101110110000001010110010010111011101011111111111000110011001001111101
Pair \(Z_2\) Length of longest common subsequence
4AOS_1,6AHE_1 151 4
4AOS_1,2LQL_1 219 4
6AHE_1,2LQL_1 168 4

Newick tree

 
[
	2LQL_1:10.91,
	[
		4AOS_1:75.5,6AHE_1:75.5
	]:28.41
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{816 }{\log_{20} 816}-\frac{267}{\log_{20}267})=150.\)
Status Protein1 Protein2 d d1/2
Query variables 4AOS_1 6AHE_1 187 136
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]