CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3BNH_1 1QNS_1 6PHJ_1 Letter Amino acid
42 8 0 E Glutamic acid
30 18 2 L Leucine
11 2 1 M Methionine
18 10 0 P Proline
11 15 1 W Tryptophan
20 18 2 Y Tyrosine
10 8 0 C Cysteine
30 34 1 G Glycine
25 29 4 S Serine
25 33 3 T Threonine
58 10 1 K Lycine
16 16 2 F Phenylalanine
45 28 1 A Alanine
17 5 2 R Arginine
16 28 1 N Asparagine
32 18 3 D Aspartic acid
16 7 1 H Histidine
22 13 0 I Isoleucine
22 24 1 V Valine
19 20 3 Q Glutamine

3BNH_1|Chain A|Cytochrome c-552|Wolinella succinogenes (844)
>1QNS_1|Chain A|ENDO-1,4-B-D-MANNANASE|TRICHODERMA REESEI (51453)
>6PHJ_1|Chain A|Glucagon|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3BNH , Knot 191 485 0.81 40 243 444
SNINEREKERVALNKTAHSQGIEGKAMSEEWARYYPRQFDSWKKTKESDNITDMLKEKPALVVAWAGYPFSKDYNAPRGHYYALQDNINTLRTGAPVDGKTGPLPSACWTCKSPDVPRIIEQDGELEYFTGKWAKYGDEIVNTIGCYNCHDDKSAELKSKVPYLDRGLSAAGFKTFAESTHQEKRSLVCAQCHVEFYFKKTEWKDDKGVDKTAMVVTLPWSKGISTEQMEAYYDEINFADWTHGISKTPMLKAQHPDWELYKTGIHGQKGVSCADCHMPYTQEGAVKYSDHKVGNPLDNMDKSCMNCHRESEQKLKDIVKQKFERKEFLQDIAFDNIGKAHLETGKAMELGATDAELKEIRTHIRHAQWRADMAIAGHGSFFHAPEEVLRLLASGNEEAQKARIKLVKVLAKYGAIDYVAPDFETKEKAQKLAKVDMEAFIAEKLKFKQTLEQEWKKQAIAKGRLNPESLKGVDEKSSYYDKTKK
1QNS , Knot 144 344 0.81 40 198 337
ASSFVTISGTQFNIDGKVGYFAGTNCYWCSFLTNHADVDSTFSHISSSGLKVVRVWGFNDVNTQPSPGQIWFQKLSATGSTINTGADGLQTLDYVVQSAEQHNLKLIIPFVNNWSDYGGINAYVNAFGGNATTWYTNTAAQTQYRKYVQAVVSRYANSTAIFAWELGNEPRCNGCSTDVIVQWATSVSQYVKSLDSNHLVTLGDEGLGLSTGDGAYPYTYGEGTDFAKNVQIKSLDFGTFHLYPDSWGTNYTWGNGWIQTHAAACLAAGKPCVFEEYGAQQNPCTNEAPWQTTSLTTRGMGGDMFWQWGDTFANGAQSNSDPYTVWYNSSNWQCLVKNHVDAIN
6PHJ , Knot 20 29 0.77 32 28 27
HSQGTFTSDYSKYLDSRRAQDFVQWLMNT

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3BNH_1)}(2) \setminus P_{f(1QNS_1)}(2)|=114\), \(|P_{f(1QNS_1)}(2) \setminus P_{f(3BNH_1)}(2)|=69\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:00100000001110001000110101100011000100100100000000100110001111111110110000011010001100010010011110100111101010000101101100010100101011001001100110000000001010001101001101111001100000000011010001010100001000011000111101110011000010100001011010011000111010010101000110100110010001100001110000001101100100001000000000100110001000011001110011010100101101110010100100010010101011111010110110011011101000100101011011100111001110100000100110101011110010100010001000111010101001011000000000000
Pair \(Z_2\) Length of longest common subsequence
3BNH_1,1QNS_1 183 3
3BNH_1,6PHJ_1 231 4
1QNS_1,6PHJ_1 194 3

Newick tree

 
[
	6PHJ_1:11.24,
	[
		3BNH_1:91.5,1QNS_1:91.5
	]:19.74
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{829 }{\log_{20} 829}-\frac{344}{\log_{20}344})=131.\)
Status Protein1 Protein2 d d1/2
Query variables 3BNH_1 1QNS_1 166 142
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]