CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7RSV_1 7XNZ_1 3PMG_1 Letter Amino acid
26 29 25 R Arginine
28 6 26 N Asparagine
42 27 35 D Aspartic acid
36 6 19 Q Glutamine
40 32 30 E Glutamic acid
31 29 24 P Proline
4 5 4 W Tryptophan
33 55 49 A Alanine
7 4 5 C Cysteine
77 35 41 L Leucine
46 17 38 K Lycine
29 19 30 T Threonine
38 53 38 V Valine
26 58 50 G Glycine
18 13 10 H Histidine
29 20 45 I Isoleucine
16 12 11 M Methionine
22 8 31 F Phenylalanine
40 35 35 S Serine
24 15 15 Y Tyrosine

7RSV_1|Chains A, B|Phosphatidylinositol 3-kinase catalytic subunit type 3|Homo sapiens (9606)
>7XNZ_1|Chains A, B, C, D|Putative cystathionine beta-synthase Rv1077|Mycobacterium tuberculosis H37Rv (83332)
>3PMG_1|Chains A, B|Phosphoglucomutase-1|Oryctolagus cuniculus (9986)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7RSV , Knot 245 612 0.85 40 271 564
HHHHHHGENLYFQGSDHDLKPNAATRDQLNIIVSYPPTKQLTYEEQDLVWKFRYYLTNQEKALTKFLKCVNWDLPQEAKQALELLGKWKPMDVEDSLELLSSHYTNPTVRRYAVARLRQADDEDLLMYLLQLVQALKYENFDDIKNGLEPTKKDSQSSVSENVSNSGINSAEIDSSQIITSPLPSVSSPPPASKTKEVPDGENLEQDLCTFLISRACKNSTLANYLYWYVIVECEDQDTQQRDPKTHEMYLNVMRRFSQALLKGDKSVRVMRSLLAAQQTFVDRLVHLMKAVQRESGNRKKKNERLQALLGDNEKMNLSDVELIPLPLEPQVKIRGIIPETATLFKSALMPAQLFFKTEDGGKYPVIFKHGDDLRQDQLILQIISLMDKLLRKENLDLKLTPYKVLATSTKHGFMQFIQSVPVAEVLDTEGSIQNFFRKYAPSENGPNGISAEVMDTYVKSCAGYCVITYILGVGDRHLDNLLLTKTGKLFHIDFGYILGRDPKPLPPPMKLNKEMVEGMGGTQSEQYQEFRKQCYTAFLHLRRYSNLILNLFSLMVDANIPDIALEPDKTVKKVQDKFRLDLSDEEAVHYMQSLIDESVHALFAAVVEQIH
7XNZ , Knot 188 478 0.80 40 211 428
MARIAQHISELIGGTPLVRLNSVVPDGAGTVAAKVEYLNPGGSSKDRIAVKMIEAAEASGQLKPGGTIVEPTSGNTGVGLALVAQRRGYKCVFVCPDKVSEDKRNVLIAYGAEVVVCPTAVPPHDPASYYSVSDRLVRDIDGAWKPDQYANPEGPASHYVTTGPEIWADTEGKVTHFVAGIGTGGTITGAGRYLKEVSGGRVRIVGADPEGSVYSGGAGRPYLVEGVGEDFWPAAYDPSVPDEIIAVSDSDSFDMTRRLAREEAMLVGGSCGMAVVAALKVAEEAGPDALIVVLLPDGGRGYMSKIFNDAWMSSYGFLRSRLDGSTEQSTVGDVLRRKSGALPALVHTHPSETVRDAIGILREYGVSQMPVVGAEPPVMAGEVAGSVSERELLSAVFEGRAKLADAVSAHMSPPLRMIGAGELVSAAGKALRDWDALMVVEEGKPVGVITRYDLLGFLSEGAGRRKLAAALEHHHHHH
3PMG , Knot 228 561 0.85 40 257 527
VKIVTVKTKAYPDQKPGTSGLRKRVKVFQSSTNYAENFIQSIISTVEPAQRQEATLVVGGDGRFYMKEAIQLIVRIAAANGIGRLVIGQNGILSTPAVSCIIRKIKAIGGIILTASHNPGGPNGDFGIKFNISNGGPAPEAITDKIFQISKTIEEYAICPDLKVDLGVLGKQQFDLENKFKPFTVEIVDSVEAYATMLRNIFDFNALKELLSGPNRLKIRIDAMHGVVGPYVKKILCEELGAPANSAVNCVPLEDFGGHHPDPNLTYAADLVETMKSGEHDFGAAFDGDGDRNMILGKHGFFVNPSDSVAVIAANIFSIPYFQQTGVRGFARSMPTSGALDRVANATKIALYETPTGWKFFGNLMDASKLSLCGEESFGTGSDHIREKDGLWAVLAWLSILATRKQSVEDILKDHWHKFGRNFFTRYDYEEVEAEGATKMMKDLEALMFDRSFVGKQFSANDKVYTVEKADNFEYHDPVDGSVSKNQGLRLIFADGSRIIFRLSGTGSAGATIRLYIDSYEKDNAKINQDPQVMLAPLISIALKVSQLQERTGRTAPTVIT

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7RSV_1)}(2) \setminus P_{f(7XNZ_1)}(2)|=111\), \(|P_{f(7XNZ_1)}(2) \setminus P_{f(7RSV_1)}(2)|=51\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:000000100101010000101011000010111001100010000001110100010000011001100101011001001101110101101000101100000010100011101001000011101101101100001001001101000000001000100011001010000110011101001111000001101001000100111001000001100101011100000000000100001010110010011101000101100111100011001101101100001000000001011110000101001011111101010101111001011001111101110000110011110010010000111011011001100001010101001110000011101100111101100010100110001100011011010110001000110011001111100010011100010110101101110010111111010001101111000000001000000111010000011101101110101101110100010010001010100001100100110001011111110010
Pair \(Z_2\) Length of longest common subsequence
7RSV_1,7XNZ_1 162 6
7RSV_1,3PMG_1 122 4
7XNZ_1,3PMG_1 146 4

Newick tree

 
[
	7XNZ_1:81.77,
	[
		7RSV_1:61,3PMG_1:61
	]:20.77
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1090 }{\log_{20} 1090}-\frac{478}{\log_{20}478})=159.\)
Status Protein1 Protein2 d d1/2
Query variables 7RSV_1 7XNZ_1 199 176
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]