CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7TGW_1 1BNE_1 8WIS_1 Letter Amino acid
10 0 6 M Methionine
75 4 21 F Phenylalanine
64 3 16 P Proline
76 7 31 A Alanine
83 6 29 N Asparagine
80 10 51 G Glycine
49 3 33 E Glutamic acid
23 2 11 H Histidine
74 8 37 I Isoleucine
100 7 30 L Leucine
92 9 43 T Threonine
46 6 19 R Arginine
30 2 12 C Cysteine
60 4 15 Q Glutamine
90 4 33 V Valine
61 9 22 D Aspartic acid
91 8 39 S Serine
63 8 32 K Lycine
10 3 6 W Tryptophan
54 7 17 Y Tyrosine

7TGW_1|Chains A, B, C|Spike glycoprotein|Severe acute respiratory syndrome coronavirus 2 (2697049)
>1BNE_1|Chains A, B, C|BARNASE|Bacillus amyloliquefaciens (1390)
>8WIS_1|Chains A, B, C|Hemagglutinin|Wuhan asiatic toad influenza virus (2116482)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7TGW , Knot 441 1231 0.85 40 330 1083
QCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHVISGTNGTKRFDNPVLPFNDGVYFASIEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKVCEFQFCNDPFLDHKNNKSWMESEFRVYSSANNCTFEYVSQPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPIIVREPEDLPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKCTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFDEVFNATRFASVYAWNRKRISNCVADYSVLYNLAPFFTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGNIADYNYKLPDDFTGCVIAWNSNKLDSKVSGNYNYLYRLFRKSNLKPFERDISTEIYQAGNKPCNGVAGFNCYFPLRSYGFRPTYGVGHQPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVNFNFNGLKGTGVLTESNKKFLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTNTSNQVAVLYQGVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTRAGCLIGAEYVNNSYECDIPIGAGICASYQTQTKSHRRARSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLKRALTGIAVEQDKNTQEVFAQVKQIYKTPPIKYFGGFNFSQILPDPSKPSKRSPIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFKGLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGPALQIPFPMQMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTPSALGKLQDVVNHNAQALNTLVKQLSSKFGAISSVLNDIFSRLDPPEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYEQYIKGSGYIPEAPRDGQAYVRKDGEWVLLSTFLGHHHHHH
1BNE , Knot 59 110 0.84 38 89 105
AQVINTFDGVADYLQTYHKLPDNYITKSEAQALGWVASKGNLCDVAPGKSIGGDIFSNREGKLPGKSGRTWREADINYTCGFRNSDRILYSSDWLIYKTTDHYQTFTKIR
8WIS , Knot 207 503 0.85 40 247 477
NQICIGKAIKPINGTVETVSRMAKVTGMKKVGGERMQKICAKGEQIHDSSSACGIVSHHLKQEGCDFPFLLNKPKFATTGPMNTSTTGFNFYLTEKAKSWMNITWRVLGENKDFGDNLVEKYGESGATSEGATLKNYYWYVPTAKPGPVVYEKLAECTGTIYYGALLSDAEAGYIAVTGRNVTERWDVRFTGSSESSISFSGPKQSPMEEYIIKSVRSSVDTVRNIIILDSGRVKKGETFSISLSSGAVVIPTIFCDGDFAVTPQVQIDKDCASDCHSAYGSFPNGSSFIIHHSVHTVGSCPPSILRNFDVIDGYEATWEETKQSRGFFGAILGFFTGGIQGAIDGWYGVTNHDTGKGTAADQTSTQKAVEAITNKLNEAIENGNQRYNQLYGLARTQAELLGNLGKEVNDLRLETFTEFIRLETILVNTRIIEEHQAIGSKKKEEVKRLLGPNALDLGNGCFNLTHTCDSNCVNSISRGTYTRENYIHNVTLAGTPKIDGVV

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7TGW_1)}(2) \setminus P_{f(1BNE_1)}(2)|=246\), \(|P_{f(1BNE_1)}(2) \setminus P_{f(7TGW_1)}(2)|=5\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:0010100000111100001001100100110001100000111111001011011010010001001111100110110100001101111100100000011110010011101001010001110000000110001010001000010010011110101001010010011100101010100000111100100110110110111011111010010011110000101100001101111100110101001110000010100110011011000000100101001100000101010001101101001011001101001101011000010001100011001111101000110100100101001010011101001001111001011000001100101011110000100010100001001100001011000100010011001001111100011100011010011100100111101011011101011000001100001010101101011100000011110011001100001100100101101010011110110110000001111001100001111101001010101000100110001101111001000000011111110100000000000100110001110010111000110000011110010101000111101000010001010100000001110010100010011011110000000011101001000111001111010011101001000011001110010110111100010011011100110100101101111110001110000111110100110111111101111101100101111000110000011100100111010001000101110100110001011001100100011110011001100101101010100110101001000100011011010101011100100011100001010101001101100110111110100111000010011110001010110011110010011100001001011000001101000111111000100110101001000100010000010101101011010110100010010011001000110100110000010101011011001010100010111100111000000
Pair \(Z_2\) Length of longest common subsequence
7TGW_1,1BNE_1 251 4
7TGW_1,8WIS_1 117 4
1BNE_1,8WIS_1 208 3

Newick tree

 
[
	1BNE_1:12.72,
	[
		7TGW_1:58.5,8WIS_1:58.5
	]:70.22
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1341 }{\log_{20} 1341}-\frac{110}{\log_{20}110})=331.\)
Status Protein1 Protein2 d d1/2
Query variables 7TGW_1 1BNE_1 421 227
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]