CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6GSG_1 5FYF_1 5ULS_1 Letter Amino acid
9 8 8 W Tryptophan
18 41 34 R Arginine
39 28 35 G Glycine
12 25 69 K Lycine
11 20 18 M Methionine
32 19 38 T Threonine
26 33 49 D Aspartic acid
6 1 3 C Cysteine
28 28 54 S Serine
19 8 24 Y Tyrosine
15 17 22 Q Glutamine
8 6 17 H Histidine
18 24 41 I Isoleucine
31 41 62 L Leucine
16 29 42 V Valine
30 29 36 A Alanine
21 21 28 N Asparagine
9 35 61 E Glutamic acid
13 26 30 F Phenylalanine
22 31 20 P Proline

6GSG_1|Chain A|Catechol oxidase|Aspergillus oryzae (strain ATCC 42149 / RIB 40) (510516)
>5FYF_1|Chains A, B|CYTOCHROME P450|MARINOBACTER HYDROCARBONOCLASTICUS (2743)
>5ULS_1|Chains A, B|Endoplasmin|Canis lupus familiaris (9615)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6GSG , Knot 162 383 0.83 40 232 367
AATATLPTTASSSTAVASSQLDQLANFAYNVTTDSVAGGSESKRGGCTLQNLRVRRDWRAFSKTQKKDYINSVLCLQKLPSRTPAHLAPGARTRYDDFVATHINQTQIIHYTGTFLAWHRYFIYEFEQALRDECSYTGDYPYWNWGADADNMEKSQVFDGSETSMSGNGEYIPNQGDIKLLLGNYPAIDLPPGSGGGCVTSGPFKDYKLNLGPAALSLPGGNMTAAANPLTYNPRCMKRSLTTEILQRYNTFPKIVELILDSDDIWDFQMTMQGVPGSGSIGVHGGGHYSMGGDPGRDVYVSPGDTAFWLHHGMIDRVWWIWQNLDLRKRQNAISGTGTFMNNPASPNTTLDTVIDLGYANGGPIAMRDLMSTTAGPFCYVYL
5FYF , Knot 191 470 0.83 40 242 451
MPTLPRTFDDIQSRLINATSRVVPMQRQIQGLKFLMSAKRKTFGPRRPMPEFVETPIPDVNTLALEDIDVSNPFLYRQGQWRAYFKRLRDEAPVHYQKNSPFGPFWSVTRFEDILFVDKSHDLFSAEPQIILGDPPEGLSVEMFIAMDPPKHDVQRSSVQGVVAPKNLKEMEGLIRSRTGDVLDSLPTDKPFNWVPAVSKELTGRMLATLLDFPYEERHKLVEWSDRMAGAASATGGEFADENAMFDDAADMARSFSRLWRDKEARRAAGEEPGFDLISLLQSNKETKDLINRPMEFIGNLTLLIVGGNDTTRNSMSGGLVAMNEFPREFEKLKAKPELIPNMVSEIIRWQTPLAYMRRIAKQDVELGGQTIKKGDRVVMWYASGNRDERKFDNPDQFIIDRKDARNHMSFGYGVHRCMGNRLAELQLRILWEEILKRFDNIEVVEEPERVQSNFVRGYSRLMVKLTPNS
5ULS , Knot 268 691 0.84 40 275 631
MGSSHHHHHHSSGLVPRGSHMVVQREEEAIQLDGLNASQIRELREKSEKFAFQAEVNRMMKLIINSLYKNKEIFLRELISNASDALDKIRLISLTDENALAGNEELTVKIKCDKEKNLLHVTDTGVGMTREELVKNLGTIAKSGTSEFLNKMTEAQEDGQSTSELIGQFGVGFYSAFLVADKVIVTSKHNNDTQHIWESDSNEFSVIADPRGNTLGRGTTITLVLKEEASDYLELDTIKNLVKKYSQFINFPIYVWSSKTGGGGKTVWDWELMNDIKPIWQRPSKEVEDDEYKAFYKSFSKESDDPMAYIHFTAEGEVTFKSILFVPTSAPRGLFDEYGSKKSDYIKLYVRRVFITDDFHDMMPKYLNFVKGVVDSDDLPLNVSRETLQQHKLLKVIRKKLVRKTLDMIKKIADEKYNDTFWKEFGTNIKLGVIEDHSNRTRLAKLLRFQSSHHPSDITSLDQYVERMKEKQDKIYFMAGSSRKEAESSPFVERLLKKGYEVIYLTEPVDEYCIQALPEFDGKRFQNVAKEGVKFDESEKTKESREAIEKEFEPLLNWMKDKALKDKIEKAVVSQRLTESPCALVASQYGWSGNMERIMKAQAYQTGKDISTNYYASQKKTFEINPRHPLIKDMLRRVKEDEDDKTVSDLAVVLFETATLRSGYLLPDTKAYGDRIERMLRLSLNIDPDAK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6GSG_1)}(2) \setminus P_{f(5FYF_1)}(2)|=81\), \(|P_{f(5FYF_1)}(2) \setminus P_{f(6GSG_1)}(2)|=91\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11010110010000111000100110110010000111100000110010010100010110000000010011010011000110111110000001110010000110001011110001100100110000000100101011101001000011010000101010011001010111100111011110111010011100001011111101111010111011000100100010001100000110110111000011010101011110101110111000111011001010110011110011100111110010100000110101011001101000100110110101111110011000111100101
Pair \(Z_2\) Length of longest common subsequence
6GSG_1,5FYF_1 172 4
6GSG_1,5ULS_1 171 4
5FYF_1,5ULS_1 123 4

Newick tree

 
[
	6GSG_1:92.43,
	[
		5ULS_1:61.5,5FYF_1:61.5
	]:30.93
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{853 }{\log_{20} 853}-\frac{383}{\log_{20}383})=126.\)
Status Protein1 Protein2 d d1/2
Query variables 6GSG_1 5FYF_1 162 146.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]