CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2JDI_1 7MAK_1 1VAV_1 Letter Amino acid
49 3 14 A Alanine
32 4 24 R Arginine
5 1 3 H Histidine
48 12 23 L Leucine
14 2 6 F Phenylalanine
0 2 5 W Tryptophan
13 4 8 N Asparagine
24 5 16 Q Glutamine
32 4 9 E Glutamic acid
39 12 10 I Isoleucine
31 7 2 K Lycine
17 6 13 P Proline
33 0 23 S Serine
29 4 12 D Aspartic acid
2 2 0 C Cysteine
49 13 20 G Glycine
10 2 2 M Methionine
26 8 12 T Threonine
16 1 10 Y Tyrosine
41 7 10 V Valine

2JDI_1|Chains A, B, C|ATP SYNTHASE SUBUNIT ALPHA HEART ISOFORM|BOS TAURUS (9913)
>7MAK_1|Chains A, B|Protease|Human immunodeficiency virus 1 (11676)
>1VAV_1|Chains A, B|Alginate lyase PA1167|Pseudomonas aeruginosa (208964)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2JDI , Knot 205 510 0.83 38 229 476
QKTGTAEVSSILEERILGADTSVDLEETGRVLSIGDGIARVHGLRNVQAEEMVEFSSGLKGMSLNLEPDNVGVVVFGNDKLIKEGDIVKRTGAIVDVPVGEELLGRVVDALGNAIDGKGPIGSKARRRVGLKAPGIIPRISVREPMQTGIKAVDSLVPIGRGQRELIIGDRQTGKTSIAIDTIINQKRFNDGTDEKKKLYCIYVAIGQKRSTVAQLVKRLTDADAMKYTIVVSATASDAAPLQYLAPYSGCSMGEYFRDNGKHALIIYDDLSKQAVAYRQMSLLLRRPPGREAYPGDVFYLHSRLLERAAKMNDAFGGGSLTALPVIETQAGDVSAYIPTNVISITDGQIFLETELFYKGIRPAINVGLSVSRVGSAAQTRAMKQVAGTMKLELAQYREVAAFAQFGSDLDAATQQLLSRGVRLTELLKQGQYSPMAIEEQVAVIYAGVRGYLDKLEPSKITKFENAFLSHVISQHQALLGKIRTDGKISEESDAKLKEIVTNFLAGFEA
7MAK , Knot 52 99 0.80 38 79 94
PQITLWKRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVLVGPTPVNVIGRNLLTQIGCTLNF
1VAV , Knot 99 222 0.80 38 139 212
PDLSTWNLTIPQGRPAITISTSQLQRDYRSDYFQRTADGIRFWVPVNGSHTRNSEFPRSELRETLSSGRPYNWRYARADNWLEATLRIEAVPSTRRMIIGQIHSDGSNSGQAAPLVKLLYQLRLDQGRVQALVRERPDDGGTRAYTLMDGIPLGQPFSYRIGVSRSGLLSVSVNGSALEQQLDPQWAYQGLYFKAGLYLQDNRGPSSEGGRATFSELRVSHQ

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2JDI_1)}(2) \setminus P_{f(7MAK_1)}(2)|=168\), \(|P_{f(7MAK_1)}(2) \setminus P_{f(2JDI_1)}(2)|=18\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:000101010011000111100010100010110110111010110010100110100110110101010011111110001100101100011110111100111011011101101011110010001110111111010100110011011001111101000111100001000111001100001001000000100101111000001101100100101100011101010011110011100100110010001001111000100011100010111001110010110110100011001101001111101011111000110101011001101001011100011001101110111010011011000110011101010110000111110110010110001100110100110010001111000111101110101001010010010011100110000111101000101000001010011001111101
Pair \(Z_2\) Length of longest common subsequence
2JDI_1,7MAK_1 186 3
2JDI_1,1VAV_1 160 4
7MAK_1,1VAV_1 150 3

Newick tree

 
[
	2JDI_1:90.31,
	[
		1VAV_1:75,7MAK_1:75
	]:15.31
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{609 }{\log_{20} 609}-\frac{99}{\log_{20}99})=149.\)
Status Protein1 Protein2 d d1/2
Query variables 2JDI_1 7MAK_1 185 110
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]