CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
9HKE_1 6XHH_1 2PKD_1 Letter Amino acid
33 5 7 N Asparagine
36 16 5 L Leucine
11 2 2 M Methionine
5 5 1 W Tryptophan
14 11 7 R Arginine
2 2 0 C Cysteine
27 18 6 E Glutamic acid
29 13 5 G Glycine
25 8 5 K Lycine
18 9 6 S Serine
20 8 12 T Threonine
19 13 10 V Valine
15 16 4 Q Glutamine
21 11 2 F Phenylalanine
22 12 7 A Alanine
23 9 7 D Aspartic acid
6 3 2 H Histidine
23 11 9 I Isoleucine
16 5 6 P Proline
6 5 8 Y Tyrosine

9HKE_1|Chain A|Flavin-dependent monooxygenase|Escherichia coli (562)
>6XHH_1|Chains A, B|JSC1_58120g3|[Leptolyngbya] sp. JSC-1 (1487953)
>2PKD_1|Chains A, B, C, D, E, F|SLAM family member 5|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
9HKE , Knot 159 371 0.84 40 221 358
NLLSDKNVAIIGGGPVGLTMAKLLQQNGIDVSVYERDNDREARIFGGTLDLHKGSGQEAMKKAGLLQTYYDLALPMGVNIADEKGNILSTKNVKPENRFDNPEINRNDLRAILLNSLENDTVIWDRKLVMLEPGKKKWTLTFENKPSETADLVIIANGGMSKVRKFVTDTEVEETGTFNIQADIHHPEVNCPGFFQLCNGNRLMAAHQGNLLFANPNNNGALHFGISFKTPDEWKNQTQVDFQNRNSVVDFLLKEFSDWDERYKELIRVTSSFVGLATRIFPLGKSWKSKRPLPITMIGDAAHLMPPFAGQGVNSGLMDALILSDNLTNGKFNSIEEAIENYEQQMFIYGKEAQEESTQNEIEMFKPDFTF
6XHH , Knot 91 182 0.86 40 139 178
MEQALNRVITKIRQVSDLESIFSTTTQEVRRLFGIERVTIYKFREDYFGDFITESEAGGWRKLVGSGWEDPYLNEHQGGRFQQNQPFVVDDIYLGETIWEEGKFNLQKPKRPLTDCHIEALESFEVKSCAVVAIFQGQKLWGLLSAFQNSAPRHWDEAEVQLLMRVADQLGVAIQQAEYLAQ
2PKD , Knot 60 111 0.84 38 93 107
MKDSEIFTVNGILGESVTFPVNIQEPRQVKIIAWTSKTSVAYVTPGDSETAPVVTVTHRNYYERIHALGPNYNLVISDLRMEDAGDYKADINTQADPYTTTKRYNLQIYRR

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(9HKE_1)}(2) \setminus P_{f(6XHH_1)}(2)|=125\), \(|P_{f(6XHH_1)}(2) \setminus P_{f(9HKE_1)}(2)|=43\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:01100001111111111101101100011010100000000101111010100101001100111100000111111101100010110000101000100101000010111100100001110001111011000101010001000101111101110010011000010001010101010010100111101001001111001011110100011101110100100100000101000001101110010010000001101000111110011111001000011110111011011111110110011101111000100101001001100000011101001000000001011010101
Pair \(Z_2\) Length of longest common subsequence
9HKE_1,6XHH_1 168 4
9HKE_1,2PKD_1 180 4
6XHH_1,2PKD_1 146 3

Newick tree

 
[
	9HKE_1:91.25,
	[
		6XHH_1:73,2PKD_1:73
	]:18.25
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{553 }{\log_{20} 553}-\frac{182}{\log_{20}182})=107.\)
Status Protein1 Protein2 d d1/2
Query variables 9HKE_1 6XHH_1 134 99
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]