CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6YFH_1 6LKG_1 2VVS_1 Letter Amino acid
8 10 40 D Aspartic acid
1 0 5 C Cysteine
0 12 13 H Histidine
3 0 16 W Tryptophan
4 9 53 E Glutamic acid
15 14 36 T Threonine
4 10 35 Y Tyrosine
8 15 46 N Asparagine
11 3 40 G Glycine
11 17 60 L Leucine
8 12 62 K Lycine
6 4 17 M Methionine
10 4 39 P Proline
22 6 50 A Alanine
6 6 28 R Arginine
5 14 40 Q Glutamine
7 12 43 I Isoleucine
4 9 28 F Phenylalanine
10 11 38 S Serine
13 4 48 V Valine

6YFH_1|Chains AA[auth AB], AB[auth AC], AC[auth AD], A[auth AA], BA[auth AF], BB[auth AG], BC[auth AH], B[auth AE], CA[auth AJ], CB[auth AK], CC[auth AL], C[auth AI], DA[auth AN], DB[auth AO], DC[auth AP], D[auth AM], EA[auth AR], EB[auth AS], EC[auth AT], E[auth AQ], FA[auth AV], FB[auth AW], FC[auth AX], F[auth AU], GA[auth AZ], GB[auth BA], GC[auth BB], G[auth AY], HA[auth BD], HB[auth BE], HC[auth BF], H[auth BC], IA[auth BH], IB[auth BI], IC[auth BJ], I[auth BG], JA[auth BL], JB[auth BM], JC[auth BN], J[auth BK], KA[auth BP], KB[auth BQ], KC[auth BR], K[auth BO], LA[auth BT], LB[auth BU], LC[auth BV], L[auth BS], MA[auth BX], MB[auth BY], M[auth BW], NA[auth CA], NB[auth CB], N[auth BZ], OA[auth CD], OB[auth CE], O[auth CC], PA[auth CG], PB[auth CH], P[auth CF], QA[auth CJ], QB[auth CK], Q[auth CI], RA[auth CM], RB[auth CN], R[auth CL], SA[auth CP], SB[auth CQ], S[auth CO], TA[auth CS], TB[auth CT], T[auth CR], UA[auth CV], UB[auth CW], U[auth CU], VA[auth CY], VB[auth CZ], V[auth CX], WA[auth DB], WB[auth DC], W[auth DA], XA[auth DE], XB[auth DF], X[auth DD], YA[auth DH], YB[auth DI], Y[auth DG], ZA[auth DK], ZB[auth DL], Z[auth DJ]|coat protein|Leviviridae sp. (2027243)
>6LKG_1|Chains A[auth B], D|Sensor protein kinase HptS|Staphylococcus aureus (strain NCTC 8325) (93061)
>2VVS_1|Chain A|O-GLCNACASE BT_4395|BACTEROIDES THETAIOTAOMICRON VPI-5482 (226186)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6YFH , Knot 75 156 0.81 38 118 153
PAMTNIVLRDDQTSVATKTLIPIVSDGNMSVWRENAANVPIDGQIKLTGQWERMKDGTYRLNAKLEVPVMETAGAGGAYVAPPKVAYKVTASLTLYAPSRSTIADRANAMKMLSAVLCGADATAGTTLSPQSVTGDAWKNSALPFVFGFINQAFPT
6LKG , Knot 83 172 0.82 36 130 168
STIHQHVDESQSSLHHTEKQIQTFSTQHNNSFQELDLTNHHDVTATKRELLKLIHQQPATLYYELSGPNQFITNNYEHLNTKNMYLFSTHQLKFKNSTYMLKIYMANTPRLSEIKKDNRQFALIVDQYDNILYANDDRFTIGEKYRPQQFGFMNESVKLNHADHRLIIYKDI
2VVS , Knot 288 737 0.86 40 304 683
MKNNKIYLLGACLLCAVTTFAQNVSLQPPPQQLIVQNKTIDLPAVYQLNGGEEANPHAVKVLKELLSGKQSSKKGMLISIGEKGDKSVRKYSRQIPDHKEGYYLSVNEKEIVLAGNDERGTYYALQTFAQLLKDGKLPEVEIKDYPSVRYRGVVEGFYGTPWSHQARLSQLKFYGKNKMNTYIYGPKDDPYHSAPNWRLPYPDKEAAQLQELVAVANENEVDFVWAIHPGQDIKWNKEDRDLLLAKFEKMYQLGVRSFAVFFDDISGEGTNPQKQAELLNYIDEKFAQVKPDINQLVMCPTEYNKSWSNPNGNYLTTLGDKLNPSIQIMWTGDRVISDITRDGISWINERIKRPAYIWWNFPVSDYVRDHLLLGPVYGNDTTIAKEMSGFVTNPMEHAESSKIAIYSVASYAWNPAKYDTWQTWKDAIRTILPSAAEELECFAMHNSDLGPNGHGYRREESMDIQPAAERFLKAFKEGKNYDKADFETLQYTFERMKESADILLMNTENKPLIVEITPWVHQFKLTAEMGEEVLKMVEGRNESYFLRKYNHVKALQQQMFYIDQTSNQNPYQPGVKTATRVIKPLIDRTFATVVKFFNQKFNAHLDATTDYMPHKMISNVEQIKNLPLQVKANRVLISPANEVVKWAAGNSVEIELDAIYPGENIQINFGKDAPCTWGRLEISTDGKEWKTVDLKQKESRLSAGLQKAPVKFVRFTNVSDEEQQVYLRQFVLTIEKK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6YFH_1)}(2) \setminus P_{f(6LKG_1)}(2)|=73\), \(|P_{f(6LKG_1)}(2) \setminus P_{f(6YFH_1)}(2)|=85\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:111001110000001100011111001010110001101110101010101001001000101010111100111111011110110010101010110000110010110110111011010110010100101011000111111111001110
Pair \(Z_2\) Length of longest common subsequence
6YFH_1,6LKG_1 158 3
6YFH_1,2VVS_1 212 4
6LKG_1,2VVS_1 222 4

Newick tree

 
[
	2VVS_1:11.72,
	[
		6YFH_1:79,6LKG_1:79
	]:37.72
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{328 }{\log_{20} 328}-\frac{156}{\log_{20}156})=52.4\)
Status Protein1 Protein2 d d1/2
Query variables 6YFH_1 6LKG_1 65 62
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]