CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4ZNP_1 1YKF_1 5JVT_1 Letter Amino acid
0 20 6 D Aspartic acid
0 26 4 I Isoleucine
0 21 6 P Proline
0 9 6 S Serine
0 13 5 T Threonine
0 3 5 Q Glutamine
0 10 4 H Histidine
0 4 3 W Tryptophan
0 37 2 V Valine
0 14 6 R Arginine
0 10 5 N Asparagine
0 23 8 L Leucine
0 15 4 M Methionine
0 14 4 F Phenylalanine
21 35 7 A Alanine
17 4 1 C Cysteine
0 21 6 E Glutamic acid
24 43 8 G Glycine
0 24 8 K Lycine
0 6 6 Y Tyrosine

4ZNP_1|Chains A, B|pfI Riboswitch|synthetic construct (32630)
>1YKF_1|Chains A, B, C, D|NADP-DEPENDENT ALCOHOL DEHYDROGENASE|Thermoanaerobacter brockii (29323)
>5JVT_1|Chains A, D, G|Friend leukemia integration 1 transcription factor|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4ZNP , Knot 23 73 0.45 8 16 42
GGGAUACAGGACUGGCGGAUUAGUGGGAAACCACGUGGACUGUAUCCGAAAAAAAGCCGACCGCCUGGGCAUC
1YKF , Knot 145 352 0.80 40 192 332
MKGFAMLSIGKVGWIEKEKPAPGPFDAIVRPLAVAPCTSDIHTVFEGAIGERHNMILGHEAVGEVVEVGSEVKDFKPGDRVVVPAITPDWRTSEVQRGYHQHSGGMLAGWKFSNVKDGVFGEFFHVNDADMNLAHLPKEIPLEAAVMIPDMMTTGFHGAELADIELGATVAVLGIGPVGLMAVAGAKLRGAGRIIAVGSRPVCVDAAKYYGATDIVNYKDGPIESQIMNLTEGKGVDAAIIAGGNADIMATAVKIVKPGGTIANVNYFGEGEVLPVPRLEWGCGMAHKTIKGGLCPGGRLRMERLIDLVFYKRVDPSKLVTHVFRGFDNIEKAFMLMKDKPKDLIKPVVILA
5JVT , Knot 55 104 0.81 40 92 101
GPHMPGSGQIQLWQFLLELLSDSANASCITWEGTNGEFKMTDPDEVARRWGERKSKPNMNYDKLSRALRYYYDKNIMTKVHGKRYAYKFDFHGIAQALQPHPTE

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4ZNP_1)}(2) \setminus P_{f(1YKF_1)}(2)|=10\), \(|P_{f(1YKF_1)}(2) \setminus P_{f(4ZNP_1)}(2)|=186\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1111010111100110111001101111110010101110010100011111111100110010001110100
Pair \(Z_2\) Length of longest common subsequence
4ZNP_1,1YKF_1 196 3
4ZNP_1,5JVT_1 108 1
1YKF_1,5JVT_1 186 3

Newick tree

 
[
	1YKF_1:10.81,
	[
		4ZNP_1:54,5JVT_1:54
	]:51.81
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{425 }{\log_{20} 425}-\frac{73}{\log_{20}73})=108.\)
Status Protein1 Protein2 d d1/2
Query variables 4ZNP_1 1YKF_1 141 81
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]

Graphviz Engine:
Graphviz Engine: