CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8BSH_1 7GYE_1 2DSW_1 Letter Amino acid
62 41 32 A Alanine
11 3 5 C Cysteine
52 26 25 T Threonine
30 23 22 R Arginine
46 29 24 D Aspartic acid
35 21 13 E Glutamic acid
38 30 27 G Glycine
79 30 34 L Leucine
16 4 4 M Methionine
48 13 15 P Proline
62 22 25 S Serine
46 17 15 N Asparagine
37 15 12 Q Glutamine
39 19 21 F Phenylalanine
40 29 19 V Valine
18 6 9 H Histidine
44 27 16 I Isoleucine
31 28 19 K Lycine
6 1 8 W Tryptophan
28 10 16 Y Tyrosine

8BSH_1|Chain A|Protein transport protein SEC23|Saccharomyces cerevisiae (4932)
>7GYE_1|Chain A|Heat shock 70 kDa protein 1A|Homo sapiens (9606)
>2DSW_1|Chain A|Chitinase-3-like protein 1|Ovis aries (9940)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8BSH , Knot 295 768 0.85 40 302 697
MDFETNEDINGVRFTWNVFPSTRSDANSNVVPVGCLYTPLKEYDELNVAPYNPVVCSGPHCKSILNPYCVIDPRNSSWSCPICNSRNHLPPQYTNLSQENMPLELQSTTIEYITNKPVTVPPIFFFVVDLTSETENLDSLKESIITSLSLLPPNALIGLITYGNVVQLHDLSSETIDRCNVFRGDREYQLEALTEMLTGQKPTGPGGAASHLPNAMNKVTPFSLNRFFLPLEQVEFKLNQLLENLSPDQWSVPAGHRPLRATGSALNIASLLLQGCYKNIPARIILFASGPGTVAPGLIVNSELKDPLRSHHDIDSDHAQHYKKACKFYNQIAQRVAANGHTVDIFAGCYDQIGMSEMKQLTDSTGGVLLLTDAFSTAIFKQSYLRLFAKDEEGYLKMAFNGNMAVKTSKDLKVQGLIGHASAVKKTDANNISESEIGIGATSTWKMASLSPYHSYAIFFEIANTAANSNPMMSAPGSADRPHLAYTQFITTYQHSSGTNRIRVTTVANQLLPFGTPAIAASFDQEAAAVLMARIAVHKAETDDGADVIRWLDRTLIKLCQKYADYNKDDPQSFRLAPNFSLYPQFTYYLRRSQFLSVFNNSPDETAFYRHIFTREDTTNSLIMIQPTLTSFSMEDDPQPVLLDSISVKPNTILLLDTFFFILIYHGEQIAQWRKAGYQDDPQYADFKALLEEPKLEAAELLVDRFPLPRFIDTEAGGSQARFLLSKLNPSDNYQDMARGGSTIVLTDDVSLQNFMTHLQQVAVSGQA
7GYE , Knot 164 394 0.83 40 210 374
GPLGSMAKAAAIGIDLGTTYSCVGVFQHGKVEIIANDQGNRTTPSYVAFTDTERLIGDAAKNQVALNPQNTVFDAKRLIGRKFGDPVVQSDMKHWPFQVINDGDKPKVQVSYKGETKAFYPEEISSMVLTKMKEIAEAYLGYPVTNAVITVPAYFNDSQRQATKDAGVIAGLNVLRIINEPTAAAIAYGLDRTGKGERNVLIFDLGGGTFDVSILTIDDGIFEVKATAGDTHLGGEDFDNRLVNHFVEEFKRKHKKDISQNKRAVRRLRTACERAKRTLSSSTQASLEIDSLFEGIDFYTSITRARFEELCSDLFRSTLEPVEKALRDAKLDKAQIHDLVLVGGSTRIPKVQKLLQDFFNGRDLNKSINPDEAVAYGAAVQAAILIKSTRAAAS
2DSW , Knot 157 361 0.85 40 218 345
YKLICYYTSWSQYREGDGSCFPDAIDPFLCTHVIYSFANISNNEIDTWEWNDVTLYDTLNTLKNRNPKLKTLLSVGGWNFGPERFSAIASKTQSRRTFIKSVPPFLRTHGFDGLDLAWLYPGRRDKRHLTTLVKEMKAEFIREAQAGTEQLLLSAAVSAGKIAIDRGYDIAQISRHLDFISLLTYDFHGAWRQTVGHHSPLFAGNEDASSRFSNADYAVSYMLRLGAPANKLVMGIPTFGRSFTLASSKTDVGAPVSGPGVPGRFTKEKGILAYYEICDFLHGATTHRFRDQQVPYATKGNQWVAYDDQESVKNKARYLKNRQLAGAMVWALDLDDFRGTFCGQNLTFPLTSAVKDVLAEV

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8BSH_1)}(2) \setminus P_{f(7GYE_1)}(2)|=117\), \(|P_{f(7GYE_1)}(2) \setminus P_{f(8BSH_1)}(2)|=25\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101000001011010101110000010001111101001100000101110011100110000110100110100001001100000011100001000011101000010010001101111111110100000010010001100101111011111100101101001000010000110100000101100110100101111110011011001011010011111001010100110010100101111001101010110110111010000111011111011101111111000100110000010000100000100100011001110100101111000011100100100001111110011001110000101110000101011101011100000101011110101100001001000011111000101101010000111101100110001110111010010110001100000001000101001100111110111110100011111110111001000011011011000110100001000000100101110101010100010000110110001000110001100000000111101010010100010111100101010011110011111100100110100110000100101011100101011011100111101100011100101110010100000011011001110001010011001001110101
Pair \(Z_2\) Length of longest common subsequence
8BSH_1,7GYE_1 142 4
8BSH_1,2DSW_1 160 4
7GYE_1,2DSW_1 156 3

Newick tree

 
[
	2DSW_1:81.50,
	[
		8BSH_1:71,7GYE_1:71
	]:10.50
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1162 }{\log_{20} 1162}-\frac{394}{\log_{20}394})=201.\)
Status Protein1 Protein2 d d1/2
Query variables 8BSH_1 7GYE_1 258 191.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]