CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6VMB_1 8QUZ_1 7LTM_1 Letter Amino acid
19 6 22 D Aspartic acid
39 12 42 G Glycine
3 4 12 H Histidine
41 10 22 I Isoleucine
13 3 8 M Methionine
31 4 47 T Threonine
37 13 23 V Valine
15 8 46 N Asparagine
1 1 5 C Cysteine
43 20 19 E Glutamic acid
0 7 12 W Tryptophan
31 17 23 R Arginine
50 22 35 L Leucine
20 6 17 K Lycine
12 8 27 F Phenylalanine
17 8 34 P Proline
30 8 41 S Serine
18 5 28 Y Tyrosine
54 17 24 A Alanine
33 9 31 Q Glutamine

6VMB_1|Chains A, B, C|ATP synthase subunit alpha, chloroplastic|Spinacia oleracea (3562)
>8QUZ_1|Chains A, B|Chlorite Dismutase|Cyanothece sp. PCC 7425 (395961)
>7LTM_1|Chains AA[auth 2], AB[auth 3], A[auth 1], BA[auth 5], BB[auth 6], B[auth 4], CA[auth 8], CB[auth A], C[auth 7], DA[auth C], DB[auth D], D[auth B], E, EA[auth F], EB[auth G], FA[auth I], FB[auth J], F[auth H], GA[auth L], GB[auth M], G[auth K], HA[auth O], HB[auth P], H[auth N], IA[auth R], I[auth Q], JA[auth T], J[auth S], KA[auth V], K[auth U], LA[auth X], L[auth W], MA[auth Z], M[auth Y], NA[auth b], N[auth a], OA[auth d], O[auth c], PA[auth f], P[auth e], QA[auth h], Q[auth g], RA[auth j], R[auth i], SA[auth l], S[auth k], TA[auth n], T[auth m], UA[auth p], U[auth o], VA[auth r], V[auth q], WA[auth t], W[auth s], XA[auth v], X[auth u], YA[auth x], Y[auth w], ZA[auth z], Z[auth y]|Capsid protein|Adeno-associated virus - 8 (202813)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6VMB , Knot 197 507 0.80 38 217 473
MATIRADEISKIIRERIEGYNREVKVVNTGTVLQVGDGIARIHGLDEVMAGELVEFEEGTIGIALNLESNNVGVVLMGDGLMIQEGSSVKATGRIAQIPVSEAYLGRVINALAKPIDGRGEITASESRLIESPAPGIMSRRSVYEPLQTGLIAIDAMIPVGRGQRELIIGDRQTGKTAVATDTILNQQGQNVICVYVAIGQKASSVAQVVTNFQERGAMEYTIVVAETADSPATLQYLAPYTGAALAEYFMYRERHTLIIYDDLSKQAQAYRQMSLLLRRPPGREAYPGDVFYLHSRLLERAAKLSSLLGEGSMTALPIVETQAGDVSAYIPTNVISITDGQIFLSADLFNAGIRPAINVGISVSRVGSAAQIKAMKKVAGKLKLELAQFAELEAFAQFASDLDKATQNQLARGQRLRELLKQPQSAPLTVEEQVMTIYTGTNGYLDSLELDQVRKYLVELRTYVKTNKPEFQEIISSTKTFTEEAEALLKEAIQEQMERFLLQEQA
8QUZ , Knot 87 188 0.80 40 137 184
GPGYQDPNNRYSFIGGRTGQWQVVKIRNVLGPGLQLVEKVNILNGAVAEIPLDSAWRLQGFASNIRYAIRTELEALQAVQPMLNRAEAILAVLIPIKKSAQWWEMAQDERRDIFERESHHTAVGLEYLPGVARRLLHCRDLGEEFDFLTWFEFAPEHSSAFNELLLRMRASKEWEYVEREVEVWLKRL
7LTM , Knot 209 518 0.84 40 260 494
DGVGSSSGNWHCDSTWLGDRVITTSTRTWALPTYNNHLYKQISNGTSGGATNDNTYFGYSTPWGYFDFNRFHCHFSPRDWQRLINNNWGFRPKRLSFKLFNIQVKEVTQNEGTKTIANNLTSTIQVFTDSEYQLPYVLGSAHQGCLPPFPADVFMIPQYGYLTLNNGSQAVGRSSFYCLEYFPSQMLRTGNNFQFTYTFEDVPFHSSYAHSQSLDRLMNPLIDQYLYYLSRTQTTSNGRGVTLGFSQGGPNTMANQAKNWLPGPCYRQQRVSTYPLQNNNSNFAWTAGTKYHLNGRNSLANPGIAMATHKDDEERFFPSNGILIFGKQNAARDNADYSDVMLTSEEEIKTTNPVATEEYGIVADNGQTQTTAPQIGTVNSQGALPGMVWQNRDVYLQGPIWAKIPHTDGNFHPSPLMGGFGLKHPPPQILIKNTPVPADPRSTFNGDKLNSFITQYSTGQVSVEIEWELQKENSKRWNPEIQYTSNYYKSTSVDFAVNTEGVYSEPRPIGTRYLTRNL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6VMB_1)}(2) \setminus P_{f(8QUZ_1)}(2)|=124\), \(|P_{f(8QUZ_1)}(2) \setminus P_{f(6VMB_1)}(2)|=44\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110101001001100010100001011001011011011101011001111011010010111110100001111111011110010010101011011100101101101110110101010100001100111111000010011001111101111110100011110000100111000110001001101011110010011011001000111000111100100110100111001111100110000001110001000101000101110011100101101101000110011010011101010111110001101010110011010010111010110111011101110100110110101100111010101101101011101100100100001101001001100100111010001101001001010010100100011010001000010100110000010001011100110001001110001
Pair \(Z_2\) Length of longest common subsequence
6VMB_1,8QUZ_1 168 4
6VMB_1,7LTM_1 149 3
8QUZ_1,7LTM_1 201 3

Newick tree

 
[
	8QUZ_1:97.91,
	[
		6VMB_1:74.5,7LTM_1:74.5
	]:23.41
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{695 }{\log_{20} 695}-\frac{188}{\log_{20}188})=143.\)
Status Protein1 Protein2 d d1/2
Query variables 6VMB_1 8QUZ_1 178 122
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]