CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8SLW_1 6YFO_1 5NXM_1 Letter Amino acid
29 1 1 M Methionine
25 1 7 W Tryptophan
67 8 19 D Aspartic acid
50 2 11 Q Glutamine
78 4 13 E Glutamic acid
69 11 22 G Glycine
26 7 10 N Asparagine
156 11 26 L Leucine
61 5 12 F Phenylalanine
54 14 12 T Threonine
83 12 17 V Valine
68 6 7 R Arginine
53 5 24 K Lycine
57 3 17 P Proline
76 16 17 S Serine
28 3 8 Y Tyrosine
91 12 13 A Alanine
20 0 1 C Cysteine
34 1 12 H Histidine
44 9 9 I Isoleucine

8SLW_1|Chains A, B, C, D|Transient receptor potential cation channel subfamily M member 5|Rattus norvegicus (10116)
>6YFO_1|Chains AA[auth AB], AB[auth AC], AC[auth AD], A[auth AA], BA[auth AF], BB[auth AG], BC[auth AH], B[auth AE], CA[auth AJ], CB[auth AK], CC[auth AL], C[auth AI], DA[auth AN], DB[auth AO], DC[auth AP], D[auth AM], EA[auth AR], EB[auth AS], EC[auth AT], E[auth AQ], FA[auth AV], FB[auth AW], FC[auth AX], F[auth AU], GA[auth AZ], GB[auth BA], GC[auth BB], G[auth AY], HA[auth BD], HB[auth BE], HC[auth BF], H[auth BC], IA[auth BH], IB[auth BI], IC[auth BJ], I[auth BG], JA[auth BL], JB[auth BM], JC[auth BN], J[auth BK], KA[auth BP], KB[auth BQ], KC[auth BR], K[auth BO], LA[auth BT], LB[auth BU], LC[auth BV], L[auth BS], MA[auth BX], MB[auth BY], M[auth BW], NA[auth CA], NB[auth CB], N[auth BZ], OA[auth CD], OB[auth CE], O[auth CC], PA[auth CG], PB[auth CH], P[auth CF], QA[auth CJ], QB[auth CK], Q[auth CI], RA[auth CM], RB[auth CN], R[auth CL], SA[auth CP], SB[auth CQ], S[auth CO], TA[auth CS], TB[auth CT], T[auth CR], UA[auth CV], UB[auth CW], U[auth CU], VA[auth CY], VB[auth CZ], V[auth CX], WA[auth DB], WB[auth DC], W[auth DA], XA[auth DE], XB[auth DF], X[auth DD], YA[auth DH], YB[auth DI], Y[auth DG], ZA[auth DK], ZB[auth DL], Z[auth DJ]|coat protein|Leviviridae sp. (2027243)
>5NXM_1|Chain A|Carbonic anhydrase 2|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8SLW , Knot 420 1169 0.84 40 335 1012
MDYKDDDDKLEMPMAQSSCPGSPPDTGDGWEPVLCKGEVNFGGSGKKRSKFVKVPSNVAPSMLFELLLTEWHLPAPNLVVSLVGEERLFAMKSWLRDVLRKGLVKAAQSTGAWILTSALHVGLARHVGQAVRDHSLASTSTKVRVVAIGMASLDRILHRQLLDGVQAQEDTPIHYPADEGSTQGPLCPLDSNLSHFILVEPGTLGSGNDGLAELQLSLEKHISQQRTGYGGTSSIQIPVLCLLVNGDPSTLERMSRAVEQAAPWLILAGSGGIADVLAALVGQPHLLVPQVTEKQFREKFPSECFSWEAIVHWTELLQNIAAHPHLLTVYDFEQEGSEDLDTVILKALVKACKSHSRDAQDYLDELKLAVAWDRVDIAKSEIFNGDVEWKSCDLEEVMTDALVSNKPDFVRLFVDSGADMAEFLTYGRLQQLYHSVSPKSLLFELLERKHEEGRLTLAGLGAQQTRELPVGLPAFSLHEVSRVLKDFLHDACRGFYQDGRRMEERGPPKRPAGQKWLPDLSRKSEDPWRDLFLWAVLQNRYEMATYFWAMGREGVAAALAACKIIKEMSHLEKEAEVARTMREAKYEQLALDLFSECYSNSEDRAFALLVRRNHSWSRTTCLHLATEADAKAFFAHDGVQAFLTKIWWGDMATGTPILRLLGAFTCPALIYTNLISFSEDAPQRMDLEDLQEPDSLDMEKSFLCSHGGQLEKLTEAPRAPGDLGPQAAFLLTRWRKFWGAPVTVFLGNVVMYFAFLFLFSYVLLVDFRPPPQGPSGSEVTLYFWVFTLVLEEIRQGFFTNEDTRLVKKFTLYVEDNWNKCDMVAIFLFIVGVTCRMVPSVFEAGRTVLAIDFMVFTLRLIHIFAIHKQLGPKIIIVERMMKDVFFFLFFLSVWLVAYGVTTQALLHPHDGRLEWIFRRVLYRPYLQIFGQIPLDEIDEARVNCSLHPLLLDSSASCPNLYANWLVILLLVTFLLVTNVLLMNLLIAMFSYTFQVVQGNADMFWKFQRYHLIVEYHGRPALAPPFILLSHLSLVLKQVFRKEAQHKQQHLERDLPDPVDQKIITWETVQKENFLSTMEKRRRDSEEEVLRKTAHRVDLIAKYIGGLREQEKRIKCLESQANYCMLLLSSMTDTLAPGGTYSSSQNCGRRSQPASARDREYLEAGLPHSDT
6YFO , Knot 63 131 0.78 38 100 127
SIIGSSIKTGATSASITGGSDITFALTGQTVTNGLNVSVSEDTDYRTRRNATFKSRVPTVVNGNYSKGKNEVVFVIPMSLDSGETVFNSVRIALEIHPALASASVKDLRLIGAQLLTDADYDSFWTLGALA
5NXM , Knot 110 258 0.79 40 174 247
HHWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTHTAKYDPSLKPLSVSYDQATSLRILNNGHAFNVEFDDSQDKAVLKGGPLDGTYRLIQFHFHWGSLDGQGSEHTVDKKKYAAELHLVHWNTKYGDFGKAVQQPDGLAVLGIFLKVGSAKPGLQKVVDVLDSIKTKGKSADFTNFDPRGLLPESLDYWTYPGSLTTPPLLECVTWIVLKEPISVSSEQVLKFRKLNFNGEGEPEELMVDNWRPAQPLKNRQIKASFK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8SLW_1)}(2) \setminus P_{f(6YFO_1)}(2)|=239\), \(|P_{f(6YFO_1)}(2) \setminus P_{f(8SLW_1)}(2)|=4\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10000000010111100001101100101101110010101110100000110110011101110111001011110111011100011110011001100111011000111110011011110011011000011000001011111110100110001101101000011001100100011101100010011110110110100111010101000100000101100010111101110101001001001100111111111011110111111101011110100001000110001010111010011001110101101001000100010011101110100000001000100101111100101100011010101000010011001110001011011100110110110010100100010100111011000000101011111100000111111110100100110011001001100010010001110011100111010000001100111111100000110011111001111111100110010010001011001001000011101100000000001111110000010000010110010101111001101110011110110101110111110011110001101000110010100100100101000110001101001001101110111011111001001111110111101110111111100111101011101101001010111101110010011100000011001010100010000111111111110001110110110011110111101011011110001110111100110011111111011111011000111010010101110011001010111011100100101000101111000100101010111111110111100111101111110001011010101110100001110001011111111110010111001100010000001000110110001101001000011001000000000011000100101110011110000001001000100011110010001111100000000100001101000001011110000
Pair \(Z_2\) Length of longest common subsequence
8SLW_1,6YFO_1 243 4
8SLW_1,5NXM_1 195 4
6YFO_1,5NXM_1 168 4

Newick tree

 
[
	8SLW_1:11.58,
	[
		5NXM_1:84,6YFO_1:84
	]:33.58
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1300 }{\log_{20} 1300}-\frac{131}{\log_{20}131})=314.\)
Status Protein1 Protein2 d d1/2
Query variables 8SLW_1 6YFO_1 393 217.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]