CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
9FHO_1 1YJR_1 4NKW_1 Letter Amino acid
28 3 28 D Aspartic acid
8 3 4 C Cysteine
9 4 19 H Histidine
38 7 58 L Leucine
13 3 25 P Proline
15 6 32 S Serine
19 4 25 T Threonine
19 4 28 A Alanine
2 0 7 W Tryptophan
30 5 38 K Lycine
8 2 12 M Methionine
16 1 28 N Asparagine
17 1 23 F Phenylalanine
13 2 8 Y Tyrosine
20 6 29 V Valine
21 6 25 E Glutamic acid
19 0 20 Q Glutamine
28 6 29 G Glycine
16 9 36 I Isoleucine
17 3 20 R Arginine

9FHO_1|Chain A|arginine kinase|Dermatophagoides pteronyssinus (6956)
>1YJR_1|Chain A|Copper-transporting ATPase 1|Homo sapiens (9606)
>4NKW_1|Chains A, B, C, D|Steroid 17-alpha-hydroxylase/17,20 lyase|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
9FHO , Knot 156 356 0.85 40 216 340
MVDQAVLDKLEAGYQKLQASADCHSLLKKYLTRQVLDACKNRKTQMGATLLDVVQSGFENLDSGVGLYAPDAESYTLFKELFDPVIDDYHKGFKPTDKHPPTDFGDVNTLCNVDPDNQFVISTRVRCGRSLQGYPFNPCLTEAQYKEMEDKVKGQLNSFEGELKGTYYPLLGMDKATQQQLIDDHFLFKEGDRFLQAANACRYWPVGRGIFHNDNKTFLIWCNEEDHLRIISMQKGGDLKQVFSRLINGVNHIEKKLPFSRDDRLGFLTFCPTNLGTTIRASVHIKLPKLAADRKKLEEIAGKYNLQVRGTAGEHTESVGGVYDISNKRRMGLTEYQAVKEMQDGILELIKIEKSM
1YJR , Knot 42 75 0.80 36 65 71
MGDGVLELVVRGMTCASCVHKIESSLTKHRGILYCSVALATNKAHIKYDPEIIGPRDIIHTIESLGFEPSLVKIE
4NKW , Knot 203 494 0.85 40 249 474
MAKKTGAKYPKSLLSLPLVGSLPFLPRHGHMHNNFFKLQKKYGPIYSVRMGTKTTVIVGHHQLAKEVLIKKGKDFSGRPQMATLDILSNNRKGIAFADSGAHWQLHRRLAMATFALFKDGDQKLEKIICQEISTLCDMLATHNGQSIDISFPVFVAVTNVISLICFNTSYKNGDPELNVIQNYNEGIIDNLSKDSLVDLVPWLKIFPNKTLEKLKSHVKIRNDLLNKILENYKEKFRSDSITNMLDTLMQAKMNSDNGNAGPDQDSELLSDNHILTTIGDIFGAGVETTTSVVKWTLAFLLHNPQVKKKLYEEIDQNVGFSRTPTISDRNRLLLLEATIREVLRLRPVAPMLIPHKANVDSSIGEFAVDKGTEVIINLWALHHNEKEWHQPDQFMPERFLNPAGTQLISPSVSYLPFGAGPRSCIGEILARQELFLIMAWLLQRFDLEVPDDGQLPSLEGIPKVVFLIDSFKVKIKVRQAWREAQAEGSTHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(9FHO_1)}(2) \setminus P_{f(1YJR_1)}(2)|=172\), \(|P_{f(1YJR_1)}(2) \setminus P_{f(9FHO_1)}(2)|=21\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11001110010110001010100001100010001101000000011101101100110010011110110100001100110111000001101000011001101001001010001110001001001010110101001000010001010100101010100011111001000011000111001001101101000111101110000001111000000101101001101001100110110010001110000011110101001100101010101101110000100111000101010110000011110010000011100001100100111011010001
Pair \(Z_2\) Length of longest common subsequence
9FHO_1,1YJR_1 193 3
9FHO_1,4NKW_1 181 3
1YJR_1,4NKW_1 224 3

Newick tree

 
[
	1YJR_1:10.81,
	[
		9FHO_1:90.5,4NKW_1:90.5
	]:18.31
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{431 }{\log_{20} 431}-\frac{75}{\log_{20}75})=109.\)
Status Protein1 Protein2 d d1/2
Query variables 9FHO_1 1YJR_1 140 82.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]