CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3SUC_1 8ZKT_1 9AZI_1 Letter Amino acid
52 49 1 T Threonine
37 55 0 D Aspartic acid
16 43 0 Q Glutamine
41 52 0 E Glutamic acid
21 35 3 H Histidine
45 191 3 L Leucine
5 14 0 M Methionine
48 126 2 A Alanine
9 19 2 C Cysteine
63 21 0 I Isoleucine
5 35 0 W Tryptophan
31 29 1 K Lycine
27 49 1 F Phenylalanine
21 76 0 P Proline
78 131 2 S Serine
28 26 1 Y Tyrosine
33 102 5 R Arginine
57 14 2 N Asparagine
94 94 1 G Glycine
61 100 3 V Valine

3SUC_1|Chain A|Preneck appendage protein|Bacillus phage phi29 (10756)
>8ZKT_1|Chain A|Polycystin-1|Homo sapiens (9606)
>9AZI_1|Chain A|Designed Zinc finger protein 5.3|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3SUC , Knot 288 772 0.82 40 266 675
HHHHHHFADLVIQVIDELKQFGVSVKTYGAKGDGVTDDIRAFEKAIESGFPVYVPYGTFMVSRGIKLPSNTVLTGAGKRNAVIRFMDSVGRGESLMYNENVTTGNENIFLSSFTLDGNNKRLGQGISGIGGSRESNLSIRACHNVYIRDIEAVDCTLHGIDITCGGLDYPYLGDGTTAPNPSENIWIENCEATGFGDDGITTHHSQYINILNCYSHDPRLTANCNGFEIDDGSRHVVLSNNRSKGCYGGIEIKAHGDAPAAYNISINGHMSVEDVRSYNFRHIGHHAATAPQSVSAKNIVASNLVSIRPNNKRGFQDNATPRVLAVSAYYGVVINGLTGYTDDPNLLTETVVSVQFRARNCSLNGVVLTGFSNSENGIYVIGGSRGGDAVNISNVTLNNSGRYGVSIGSGIENVSITNISGIGDGINSPVALVSTINSNPEISGLSSIGYPTVARVAGTDYNDGLTLFNGAFRASTTSSGKIHSEGFIMGSTSGCEASVSKSGVLTSSSSKTSSERSLIAGSSTSEAKGTYNTILGSLGAVADEQFAALISASQSRASGNHNLILSSYGINTTGSYKVNGGFEKINWELDSLNGRIKARDTVTGGNTWSDFAQYFESLGGQVIETGYLVTLEKGKIRKAEKGEKIIGVISETAGFVLGESSFEWQGAVLKNEFGGIIYEEVTTEDGVKFKRPLPNPDFDPNKNYIPRSQRREWHVVGLLGQIAVRIDETVKQGHSIDAVGGVATDGDNFIVQEITTPYTKEKGYGVAIVLVK
8ZKT , Knot 435 1261 0.82 40 306 980
MDYKDDDDKGASLFVPPSHVRFVFPEPTADVNYIVMLTCAVCLVTYMVMAAILHKLDQLDASRGRAIPFCGQRGRFKYEILVKTGWGRGSGTTAHVGIMLYGVDSRSGHRHLDGDRAFHRNSLDIFRIATPHSLGSVWKIRVWHDNKGLSPAWFLQHVIVRDLQTARSAFFLVNDWLSVETEANGGLVEKEVLAASDAALLRFRRLLVAELQRGFFDKHIWLSIWDRPPRSRFTRIQRATCCVLLICLFLGANAVWYGAVGDSAYSTGHVSRLSPLSVDTVAVGLVSSVVVYPVYLAILFLFRMSRSKVAGSPSPTPAGQQVLDIDSCLDSSVLDSSFLTFSGLHAEQAFVGQMKSDLFLDDSKSLVCWPSGEGTLSWPDLLSDPSIVGSNLRQLARGQAGHGLGPEEDGFSLASPYSPAKSFSASDEDLIQQVLAEGVSSPAPTQDTHMETDLLSSLSSTPGEKTETLALQRLGELGPPSPGLNWEQPQAARLSRTGLVEGLRKRLLPAWCASLAHGLSLLLVAVAVAVSGWVGASFPPGVSVAWLLSSSASFLASFLGWEPLKVLLEALYFSLVAKRLHPDEDDTLVESPAVTPVSARVPRVRPPHGFALFLAKEEARKVKRLHGMLRSLLVYMLFLLVTLLASYGDASCHGHAYRLQSAIKQELHSRAFLAITRSEELWPWMAHVLLPYVHGNQSSPELGPPRLRQVRLQEALYPDPPGPRVHTCSAAGGFSTSDYDVGWESPHNGSGTWAYSAPDLLGAWSWGSCAVYDSGGYVQELGLSLEESRDRLRFLQLHNWLDNRSRAVFLELTRYSPAVGLHAAVTLRLEFPAAGRALAALSVRPFALRRLSAGLSLPLLTSVCLLLFAVHFAVAEARTWHREGRWRVLRLGAWARWLLVALTAATALVRLAQLGAADRQWTRFVRGRPRRFTSFDQVAQLSSAARGLAASLLFLLLVKAAQQLRFVRQWSVFGKTLCRALPELLGVTLGLVVLGVAYAQLAILLVSSCVDSLWSVAQALLVLCPGTGLSTLCPAESWHLSPLLCVGLWALRLWGALRLGAVILRWRYHALRGELYRPAWEPQDYEMVELFLRRLRLWMGLSKVKEFRHKVRFEGMEPLPSRSSRGSKVSPDVPPPSAGSDASHPSTSSSQLDGLSVSLGRLGTRCEPEPSRLQAVFEALLTQFDRLNQATEDVYQLEQQLHSLQGRRSSRAPAGSSRGPSPGLRPALPSRLARASRGVDLATGPSRTPLRAKNKVHPSST
9AZI , Knot 17 27 0.69 26 24 24
YSCNVCGKAFVLSRHLNRHLRVHRRAT

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3SUC_1)}(2) \setminus P_{f(8ZKT_1)}(2)|=45\), \(|P_{f(8ZKT_1)}(2) \setminus P_{f(3SUC_1)}(2)|=85\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:0000001101110110010011101000110101100010110011001111011010111001101100011011100011101100110100110000100100011100101010000110110111100000101010001010010110001011010011100101101001101000111000010111001100000001011000000101010001101001000111000000100111010101011110010101010100100001001100110110010100111001101010000110001010111101001111011010000101100011010101000010111101100000110111100110110100101000100110110110010100101110110011111001000101011001101011011100000110110111010000010100011111000100101000111000000000000111100000101000011101111100011111010000101000111000110001000101110010101001010101000101100100110010011101100101101001010010010011111000111111000101011110001111100010000110100111010101000011000000101111110111010001001001011111100100111001001000001011111110
Pair \(Z_2\) Length of longest common subsequence
3SUC_1,8ZKT_1 130 4
3SUC_1,9AZI_1 254 3
8ZKT_1,9AZI_1 286 3

Newick tree

 
[
	9AZI_1:15.58,
	[
		3SUC_1:65,8ZKT_1:65
	]:86.58
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{2033 }{\log_{20} 2033}-\frac{772}{\log_{20}772})=307.\)
Status Protein1 Protein2 d d1/2
Query variables 3SUC_1 8ZKT_1 382 309.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]