CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4UCR_1 4UVJ_1 7KIY_1 Letter Amino acid
3 10 46 M Methionine
21 12 44 P Proline
15 24 105 S Serine
11 12 103 Y Tyrosine
21 11 41 R Arginine
23 33 98 E Glutamic acid
19 7 54 G Glycine
21 37 90 I Isoleucine
25 22 65 A Alanine
3 3 16 C Cysteine
13 9 45 Q Glutamine
7 12 47 H Histidine
14 37 139 K Lycine
20 23 117 N Asparagine
19 30 89 D Aspartic acid
41 54 140 L Leucine
13 27 108 F Phenylalanine
19 20 86 T Threonine
2 6 9 W Tryptophan
14 17 63 V Valine

4UCR_1|Chain A|DNA LIGASE|HAEMOPHILUS INFLUENZAE (727)
>4UVJ_1|Chains A, B|COHESIN SUBUNIT SCC3|SACCHAROMYCES CEREVISIAE (4932)
>7KIY_1|Chain A|Cytoadherence linked asexual protein 3|Plasmodium falciparum (5833)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4UCR , Knot 138 324 0.82 40 194 306
MTNIQTQLDNLRKTLRQYEYEYHVLDNPSVPDSEYDRLFHQLKALELEHPEFLTSDSPTQRVGAKPLSGFSQIRHEIPMLSLDNAFSDAEFNAFVKRIEDRLILLPKPLTFCCEPKLDGLAVSILYVNGELTQAATRGDGTTGEDITANIRTIRNVPLQLLTDNPPARLEVRGEVFMPHAGFERLNKYALEHNEKTFANPRNAAAGSLRQLDPNITSKRPLVLNAYGIGIAEGVDLPTTHYARLQWLKSIGIPVNPEIRLCNGADEVLGFYRDIQNKRSSLGYDIDGTVLKINDIALQNELGFISKAPRWAIAYKFPAQEELTL
4UVJ , Knot 164 406 0.80 40 196 375
MDSVKEIVLPLFYDLLNAASIESADILCPLLESFITFSLDDWISIGYETELKKITDKTIKAFMDSTIGNSKVDMKYDIFAKFIHHIHHFEKKELQEKFLNQIATLKIHLKKFLQEKMDPNNSRDDYKDLTCSLYELYINKLTILGRDYPIEVDEELLQLFLNNFVSRIPIMFQDFDDSTAQEINFKMLVLLATWNLEKWREIIEKVRDYENSISKDLRSVWKPIAAIIGRLNTLVISLAATNETFENINSLFYLKWSACTSLMDIIVAIKIFELKLPADATTWRYSMSEQFPFYLHDNASKVLLKIFLYLESLFAKQVDVQLERVADEDANLNDLPETGFFENIETEFLLFTVKLKGLMKLNILDERFASRVALNKEKLGPLFKKIVDDTIMENPEPNKKHHHHHH
7KIY , Knot 517 1505 0.83 40 341 1245
MVSFFKTPIIIFFFLLCLNEKVLCSINENENLGENKNENANVNTPENLNKLLNEYDNIEQLKSMIGNDELHKNLTILEKLILESLEKDKLKYPLLKQGTEQLIDISKFNKKNITDADDETYIIPTVQSSFHDIVKYEHLIKEQSIEIYNSDISDKIKKKIFIVRTLKTIKLMLIPLNSYKQNNDLKSALEELNNVFTNKEAQKESSPIGDHGTFFRKLLTHVRTIKENEDIENKGETLILGDNKIDVMNSNDFFFTTNSNVKFMENLDDITNQYGLGLINHLGPHLIALGHFTVLKLALKNYKNYFEAKSIKFFSWQKILEFSMSDRFKVLDMMCDHESVYYSEKKRRKTYLKVDRSNTSMECNILEYLLHYFNKYQLEIIKTTQDTDFDLHGMMEHKYIKDYFFSFMCNDPKECIIYHTNQFKKEANEENTFPEQEEPNRQISAFNLYLNYYYFMKRYSSYGVKKTLYVHLLNLTGLLNYDTRSYVTSLYLPGYYNAVEMSFTEEKEFSKLFESLIQCIEKCHSDQARQISKDSNLLNDITKCDLCKGAFLYSNMKFDEVPSMLQKFYLYLTKGLKIQKVSSLIKTLDIYQDYSNFLSHDINWYTFLFLFRLTSFKEISKKNVAEAMYLNIKDEDTFNKTIVTNYWYPSPIKKYYTLYVRKHIPNNLVDELEKLMKSGTLEKMKKSLTFLVHVNSFLQLDFFHQLNEPPLGLPRSYPLSLVLEHKFKEWMDSSPAGFYFSNYQNPYVRKDLHDKVLSQKFEPPKMNQWNKVLKSLIECAYDMYFEQRHVKNLYKYHNIYNINNKLMLMRDSIDLYKTHFDDVLFFADIFNMRKYMTATPVYKKVKDRVYHTLHSITGNSVNFYKYGIIYGFKVNKEILKEVVDELYSIYNFNTDIFTDTSFLQTVYLLFRRIEETYRTQRRDDKISVNNVFFMNVANNYSKLNKEEREIEIHNSMASRYYAKTMFAAFQMLFSTMLSNNVDNLDKAYGLSENIQVATSTSAFLTFAYVYNGSIMDSVTNSLLPPYAKKPITQLKYGKTFVFSNYFMLASKMYDMLNYKNLSLLCEYQAVASANFYSAKKVGQFIGRKFLPITTYFLVMRISWTHAFTTGQHLIAAFDPLNTNTSPKPNGGSGIYKSPESFFFTHALAAEASKYLFFYFFTNLYLDAYKSFPGGFGPAIKEQTQHVQEQTYERKPSVHSFNRNFFMELANGFMYAFCFFAISQMYAYFENINFYITSNFRFLDRYYGVFNKYFINYARIKLKEITSDLLIKYEREAYLSMKKYGYLGEVIAARLSPKDKIMNYVHETNDDVMSNLRRYDMENAFKNKMSTYVDDFAFFDDCGKNEQFLNERCDYCPVIEEVEETELFTTTGDKNTNKTTEIKKQTSTYIDTEKMNEADSADSDDEKDSDTPDNELMIARFHGAGHHHHHHHHHHDYKDDDDKGLVPRGSAAAAYPYDVPDYASAWSHPQFEKGGGSGGGSGGSAWSHPQFEKGPDRKAAVSHWQQ

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4UCR_1)}(2) \setminus P_{f(4UVJ_1)}(2)|=84\), \(|P_{f(4UVJ_1)}(2) \setminus P_{f(4UCR_1)}(2)|=86\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100100010010001000000001100101100000011001011010010110000100011101101100100011110100110010101110010001111101101000101011110110101010011001010010010101001001110110001110101010111101110010001100000011010011110100101010000111101011111011011000010101100111110101010011001111000100000011001010110100111000111100110111100111000101
Pair \(Z_2\) Length of longest common subsequence
4UCR_1,4UVJ_1 170 3
4UCR_1,7KIY_1 169 4
4UVJ_1,7KIY_1 167 6

Newick tree

 
[
	4UCR_1:85.16,
	[
		7KIY_1:83.5,4UVJ_1:83.5
	]:1.66
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{730 }{\log_{20} 730}-\frac{324}{\log_{20}324})=111.\)
Status Protein1 Protein2 d d1/2
Query variables 4UCR_1 4UVJ_1 138 126.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]