CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6QVS_1 1ZGN_1 2QNL_1 Letter Amino acid
8 2 7 H Histidine
17 12 8 K Lycine
18 10 9 S Serine
6 2 3 W Tryptophan
15 12 5 Y Tyrosine
11 8 8 R Arginine
14 8 11 N Asparagine
15 18 7 G Glycine
6 4 0 C Cysteine
14 14 6 V Valine
17 7 5 I Isoleucine
25 32 27 L Leucine
18 11 8 P Proline
16 9 8 T Threonine
11 15 15 A Alanine
12 13 10 Q Glutamine
15 10 9 E Glutamic acid
17 13 8 D Aspartic acid
7 2 3 M Methionine
13 7 5 F Phenylalanine

6QVS_1|Chains A, B|Beta-galactoside alpha-2,6-sialyltransferase 1|Homo sapiens (9606)
>1ZGN_1|Chains A, B|Glutathione S-transferase P|Homo sapiens (9606)
>2QNL_1|Chain A|Uncharacterized protein|Cytophaga hutchinsonii ATCC 33406 (269798)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6QVS , Knot 126 275 0.85 40 193 271
GIKFSAEALRCHLRDHVNVSMVEVTDFPFNTSEWEGYLPKESIRTKAGPWGRCAVVSSAGSLKSSQLGREIDDHDAVLRFNGAPTANFQQDVGTKTTIRLMNSQLVTTEKRFLKDSLYNEGILIVWDPSVYHSDIPKWYQNPDYNFFNNYKTYRKLHPNQPFYILKPQMPWELWDILQEISPEEIQPNPPSSGMLGIIIMMTLCDQVDIYEFLPSKRKTDVCYYYQKFFDSACTMGAYHPLLYEKNLVKHLNQGTDEDIYLLGKATLPGFRTIHC
1ZGN , Knot 97 209 0.82 40 137 200
PPYTVVYFPVRGRCAALRMLLADQGQSWKEEVVTVETWQEGSLKASCLYGQLPKFQDGDLTLYQSNTILRHLGRTLGLYGKDQQEAALVDMVNDGVEDLRCKYISLIYTNYEAGKDDYVKALPGQLKPFETLLSQNQGGKTFIVGDQISFADYNLLDLLLIHEVLAPGCLDAFPLLSAYVGRLSARPKLKAFLASPEYVNLPINGNGKQ
2QNL , Knot 78 162 0.81 38 120 157
GMHTQEALFVRLALDAWNTQSSRTDKLIQSLSNEALAVETAPGRNSGTYLLGHLTAVHDAMLPLLELGDTLYPQLAPVFIQNPDKSGLEKPEINDLRLYWSLVQERLANQFNQLQPADWFNKHAAISREDFLKEPHRNKLSVLINRTNHMAYHLGQLAYLKK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6QVS_1)}(2) \setminus P_{f(1ZGN_1)}(2)|=111\), \(|P_{f(1ZGN_1)}(2) \setminus P_{f(6QVS_1)}(2)|=55\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11010101100010001010110100111000010101100010001111100111001101000011001000011101011101010001100001011000110000011000100011111101010000110100010001100000000101001101101011101101100101001010110011111111101000101001110000001000000110010011100111000011001001000010111010111100100
Pair \(Z_2\) Length of longest common subsequence
6QVS_1,1ZGN_1 166 4
6QVS_1,2QNL_1 183 4
1ZGN_1,2QNL_1 147 3

Newick tree

 
[
	6QVS_1:91.50,
	[
		1ZGN_1:73.5,2QNL_1:73.5
	]:18.00
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{484 }{\log_{20} 484}-\frac{209}{\log_{20}209})=79.7\)
Status Protein1 Protein2 d d1/2
Query variables 6QVS_1 1ZGN_1 103 88.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]