CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2BZN_1 8IVD_1 9ASH_1 Letter Amino acid
17 4 56 I Isoleucine
31 46 51 S Serine
26 36 60 A Alanine
16 24 57 D Aspartic acid
38 68 35 G Glycine
16 13 36 F Phenylalanine
1 5 6 W Tryptophan
29 24 28 V Valine
10 11 40 Y Tyrosine
16 24 40 R Arginine
6 13 51 N Asparagine
8 40 6 C Cysteine
11 18 26 Q Glutamine
15 14 11 H Histidine
25 35 76 L Leucine
19 21 41 T Threonine
22 23 51 E Glutamic acid
24 25 57 K Lycine
11 4 22 M Methionine
10 33 9 P Proline

2BZN_1|Chains A, B, C, D, E, F, G, H|GMP REDUCTASE 2|HOMO SAPIENS (9606)
>8IVD_1|Chains A, B, C, D|Insulin-like growth factor-binding protein 7,Complement component C1q receptor|Homo sapiens (9606)
>9ASH_1|Chain A|CRISPR system single-strand-specific deoxyribonuclease Cas10/Csm1 (subtype III-A)|Lactococcus lactis subsp. lactis (1360)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2BZN , Knot 148 351 0.82 40 202 334
MGSSHHHHHHSSGLVPRGSLDFKDVLLRPKRSTLKSRSEVDLTRSFSFRNSKQTYSGVPIIAANMDTVGTFEMAKVLCKFSLFTAVHKHYSLVQWQEFAGQNPDCLEHLAASSGTGSSDFEQLEQILEAIPQVKYICLDVANGYSEHFVEFVKDVRKRFPQHTIMAGNVVTGEMVEELILSGADIIKVGIGPGSVCTTRKKTGVGYPQLSAVMECADAAHGLKGHIISDGGCSCPGDVAKAFGAGADFVMLGGMLAGHSESGGELIERDGKKYKLFYGMSSEMAMKKYAGGVAEYRASEGKTVEVPFKGDVEHTIRDILGGIRSTCTYVGAAKLKELSRRTTFIRVTQQVN
8IVD , Knot 191 481 0.81 40 224 435
DYKDDDDADTCGPCEPASCPPLPPLGCLLGETRDACGCCPMCARGEGEPCGGGGAGRGYCAPGMECVKSRKRRKGKAGAAAGGPGVSGVCVCKSRYPVCGSDGTTYPSGCQLRAASQRAESRGEKAITQVSKGTCEQGGSGGSSGGSGSGDTEAVVCVGTACYTAHSGKLSAAEAQNHCNQNGGNLATVKSKEEAQHVQRVLAQLLRREAALTARMSKFWIGLQREKGKCLDPSLPLKGFSWVGGGEDTPYSNWHKELRNSCISKRCVSLLLDLSQPLLPSRLPKWSEGPCGSPGSPGSNIEGFVCKFSFKGMCRPLALGGPGQVTYTTPFQTTSSSLEAVPFASAANVACGEGDKDETQSHYFLCKEKAPDVFDWGSSGPLCVSPKYGCNFNNGGCHQDCFEGGDGSFLCGCRPGFRLLDDLVTCASRNPCSSSPCRGGATCVLGPHGKNYTCRCPQGYQLDSSQLDCVDVDHHHHHHHA
9ASH , Knot 288 759 0.84 40 279 702
MDKINLVCGSLLADIGKIIYRGTSERAKHSKLGGDFIKSFEQFRNTELTDCIRYHHAQEITSVKSNKEKNSLFYITYIADNISSGMDRRKDLEEGAEGFNWDKKVALGSVFNVLNEKEKGRQNYSYPFVARTRIKEEPLNFPTATQNQYTTSYYDGLITDMKTILQRLKPDKEHINSLLQMMESLWSYVPSSTDKNQLVDISLYDHSRTTAAIASAIYDYFQAENITDYQKELFDYNATEFYDKNAFLMMNFDMSGVQNFIYNISGSKALKSLRARSFYLDMLLEYISDNLLEKLELSRANILYVGGGHAYLLLANTNKTKAILSDFEHDLKTWFLDKFKIDLYVAMAYTEVSANDLMNHNGHYRDIYRRLSQKTSAKKANRYTAEEILNLNHQGTENARECRECKRSDLLIEEDDICEICDSLQKVSRDLTRENIFVIANEGVLDMPFGKKMSALSYSQADKLKKSNAEVQIYAKNISEIGQNLMTRIDMGDYTYRSDFHEMLEEVEVGINRLGVLRADVDNLGQAFINGIPDDYLSISRTATFSRAMSRFFKNYLNQLLAEKSYKINVIYAGGDDLFMIGAWQDILDFSIVLKQKFADFTQNKLSISAGIGMFREKYPVARMASLTGDLEDAAKDYKPDERAVQATKNAVTLFDATNVFSWDTLENDIFVKLDAITKNFEKLDETGKAFIYRLIDLLRGVNENQQINIARLAYTLSRMEEKIGKTFAQELYNWANADRKTLIMALEIYILKTRERAA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2BZN_1)}(2) \setminus P_{f(8IVD_1)}(2)|=70\), \(|P_{f(8IVD_1)}(2) \setminus P_{f(2BZN_1)}(2)|=92\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110000000000111101010100111010000100000101000101000000001111111010011010110110010110110000011010011100100100111001010001001001101110100101011010000110110010001100011110110101100111011011011111101000000011101010111001011011010110011000110110111111011111111110000110110001000011011000111000111110001001001011101010001001111100000011110100100000110100010
Pair \(Z_2\) Length of longest common subsequence
2BZN_1,8IVD_1 162 6
2BZN_1,9ASH_1 159 4
8IVD_1,9ASH_1 183 4

Newick tree

 
[
	8IVD_1:88.59,
	[
		2BZN_1:79.5,9ASH_1:79.5
	]:9.09
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{832 }{\log_{20} 832}-\frac{351}{\log_{20}351})=130.\)
Status Protein1 Protein2 d d1/2
Query variables 2BZN_1 8IVD_1 159 138
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]