CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6XWN_1 9IVQ_1 6LZG_1 Letter Amino acid
46 10 32 G Glycine
55 11 60 L Leucine
18 8 34 S Serine
14 7 28 Y Tyrosine
12 7 32 D Aspartic acid
34 3 22 I Isoleucine
18 7 34 K Lycine
20 8 27 F Phenylalanine
18 5 27 P Proline
15 8 18 R Arginine
2 7 16 H Histidine
25 5 32 T Threonine
54 15 31 V Valine
17 6 21 M Methionine
3 0 20 W Tryptophan
47 9 38 A Alanine
10 11 39 N Asparagine
0 1 8 C Cysteine
9 8 30 Q Glutamine
13 5 47 E Glutamic acid

6XWN_1|Chains A, B, C|Proton/glutamate symporter, SDF family|Thermococcus kodakarensis (strain ATCC BAA-918 / JCM 12380 / KOD1) (69014)
>9IVQ_1|Chains A, B, C, D, F[auth E], G[auth F], H[auth G], I[auth H], K[auth M], L[auth N], M[auth O], N[auth P], P[auth Q], Q[auth R], R[auth S], S[auth T]|Ras GTPase-activating protein-binding protein 1|Homo sapiens (9606)
>6LZG_1|Chain A|Angiotensin-converting enzyme 2|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6XWN , Knot 165 430 0.77 38 195 391
MGKSLLRRYLDYPVLWKILWGLVLGAVFGLIAGHFGYAGAVKTYIKPFGDLFVRLLKMLVMPIVLASLVVGAASISPARLGRVGVKIVVYYLATSAMAVFFGLIVGRLFNVGANVNLGSGTGKAIEAQPPSLVQTLLNIVPTNPFASLAKGEVLPVIFFAIILGIAITYLMNRNEERVRKSAETLLRVFDGLAEAMYLIVGGVMQYAPIGVFALIAYVMAEQGVRVVGPLAKVVGAVYTGLFLQIVITYFILLKVFGIDPIKFIRKAKDAMITAFVTRSSSGTLPVTMRVAEEEMGVDKGIFSFTLPLGATINMDGTALYQGVTVLFVANAIGHPLTLGQQLVVVLTAVLASIGTAGVPGAGAIMLAMVLQSVGLDLTPGSPVALAYAMILGIDAILDMGRTMVNVTGDLAGTVIVAKTEKELDESKWIS
9IVQ , Knot 70 141 0.82 38 116 137
GASMVMEKPSPLLVGREFVRQYYTLLNQAPDMLHRFYGKNSSYVHGGLDSNGKPADAVYGQKEIHRKVMSQNFTNCHTKIRHVDAHATLNDGVVVQVMGLLSNNNQALRRFMQTFVLAPEGSVANKFYVHNDIFRYQDEVF
6LZG , Knot 246 596 0.88 40 288 569
STIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQNMNNAGDKWSAFLKEQSTLAQMYPLQEIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTILNTMSTIYSTGKVCNPDNPQECLLLEPGLNEIMANSLDYNERLWAWESWRSEVGKQLRPLYEEYVVLKNEMARANHYEDYGDYWRGDYEVNGVDGYDYSRGQLIEDVEHTFEEIKPLYEHLHAYVRAKLMNAYPSYISPIGCLPAHLLGDMWGRFWTNLYSLTVPFGQKPNIDVTDAMVDQAWDAQRIFKEAEKFFVSVGLPNMTQGFWENSMLTDPGNVQKAVCHPTAWDLGKGDFRILMCTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKSIGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEMKREIVGVVEPVPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLHKCDISNSTEAGQKLFNMLRLGKSEPWTLALENVVGAKNMNVRPLLNYFEPLFTWLKDQNKNSFVGWSTDWSPYA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6XWN_1)}(2) \setminus P_{f(9IVQ_1)}(2)|=129\), \(|P_{f(9IVQ_1)}(2) \setminus P_{f(6XWN_1)}(2)|=50\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1100110001001111011111111111111110110111100010111011101101111111110111111010110110111011100110011111111111011011101011010101101011011001101110011101101011111111111111100110000001000100110110111011011111110011111111110111001101111110111110011110111001111011110110110010011101110000010111010110001110011101011111010101011001101111101110110110011111011110110111111111111111001110101101111101111110111011001101010111011110000010000110
Pair \(Z_2\) Length of longest common subsequence
6XWN_1,9IVQ_1 179 4
6XWN_1,6LZG_1 161 4
9IVQ_1,6LZG_1 230 4

Newick tree

 
[
	9IVQ_1:10.52,
	[
		6XWN_1:80.5,6LZG_1:80.5
	]:29.02
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{571 }{\log_{20} 571}-\frac{141}{\log_{20}141})=125.\)
Status Protein1 Protein2 d d1/2
Query variables 6XWN_1 9IVQ_1 152 101.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]