CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2YPF_1 1BIG_1 3OPF_1 Letter Amino acid
2 1 4 W Tryptophan
2 1 13 Y Tyrosine
44 2 9 T Threonine
19 1 52 R Arginine
17 6 1 C Cysteine
85 3 8 Q Glutamine
44 2 62 P Proline
1 2 20 F Phenylalanine
26 2 16 S Serine
13 2 4 N Asparagine
39 0 36 E Glutamic acid
37 0 6 H Histidine
1 2 2 M Methionine
105 0 80 L Leucine
24 6 8 K Lycine
84 2 39 V Valine
108 0 50 A Alanine
11 1 16 D Aspartic acid
66 4 63 G Glycine
30 0 5 I Isoleucine

2YPF_1|Chain A|AVRBS3|XANTHOMONAS CAMPESTRIS (339)
>1BIG_1|Chain A|TOXIN BMTX1|Mesobuthus martensii (34649)
>3OPF_1|Chains A, B, C|Putative uncharacterized protein TTHA0988|Thermus thermophilus (300852)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2YPF , Knot 100 758 0.29 40 128 196
VDLRTLAYSQQQQEKIKPKVRSTVAQHHEALVAHAFTHAHIVALSQHPAALATVAVKYQDMIAALPEATHEAIVAVAKQWSGARALEALLTVAGELRGPPLQLDTGQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPEQVVAIASHDGGKQALETVQRLLPVLCQAHGLTPQQVVAIASNGGGKQALETVQRLLPVLCQAHGLTPEQVVAIASNIGGKQALETVQALLPVLCQAHGLTPQQVVAIASNGGGKQALETVQRLLPVLCQAHGLTPEQVVAIASNIGGKQALETVQALLPVLCQAHGLTPEQVVAIASNIGGKQALETVQALLPVLCQAHGLTPEQVVAIASNIGGKQALETVQALLPVLCQAHGLTPEQVVAIASHDGGKQALETVQRLLPVLCQAHGLTPEQVVAIASHDGGKQALETVQRLLPVLCQAHGLTPQQVVAIASNGGGKQALETVQRLLPVLCQAHGLTPEQVVAIASNIGGKQALETVQALLPVLCQAHGLTPEQVVAIASNIGGKQALETVQALLPVLCQAHGLTPEQVVAIASHDGGKQALETVQRLLPVLCQAHGLTPEQVVAIASHDGGKQALETVQRLLPVLCQAHGLTPEQVVAIASHDGGKQALETVQRLLPVLCQAHGLTPQQVVAIASNGGGKQALETVQRLLPVLCQAAGLTPEQVVAIASHDGGKQALETVQRLLPVLCQAHGLTPQQVVAIASNGGGRPALESIVAQLSRPDPGSSAALEHHHHHH
1BIG , Knot 22 37 0.71 30 30 33
QFTDVKCTGSKQCWPVCKQMFGKPNGKCMNGKCRCYS
3OPF , Knot 182 494 0.76 40 164 410
MVRGFYLRFGEGVSEEANRRALALAEALLRAPPPGLLDAVPAYGVLYLEYDPRRLSRGRLLRLLKGLPQERAEEGRVVEIPVRYDGEDLPEVASRLGLSLEAVKALHQKPLYRVYALGFTPGFPFLAEVEPALRLPRKPHPRPRVPAHAVAVAGVQTGIYPLPSPGGWNLLGTSLVAVYDPHRETPFLLRPGDRVRFLEAEGPTPPEPRPLELLPEEPRLPALLVEEPGLMDLVVDGGRFLGGHLGLARSGPLDAPSARLANRLVGNGAGAPLLEFAYKGPVLTALRDLVAAFAGYGFVALLEGEEIPPGQSFLWPRGKTLRFRPRGPGVRGYLAVAGGLEVRPFLGSASPDLRGRIGRPLWAGDVLGLEALRPVRPGRAFPQRPLPEAFRLRLLPGPQFAGEAFRALCSGPFRVARADRVGVELLGPEVPGGEGLSEPTPLGGVQVPPSGRPLVLLADKGSLGGYAKPALVDPRDLWLLGQARPGVEIHFTSG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2YPF_1)}(2) \setminus P_{f(1BIG_1)}(2)|=118\), \(|P_{f(1BIG_1)}(2) \setminus P_{f(2YPF_1)}(2)|=20\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10100110000000010101000110000111101100101111000111110111000011111101000111111001011011011101110101111010010110110011101101101100110111101010011111000110011001001111100101101001111100111001100100111110010110100111110011100110010111111001011010011111001110011001001111100101101001111100111001100101111110010110100111110011100110010111111001011010011111001110011001011111100101101001111100011001100100111110010110100111110001100110010011111001011010011111001110011001001111100101101001111100111001100101111110010110100111110011100110010111111001011010011111000110011001001111100101101001111100011001100100111110010110100111110001100110010011111001011010011111001110011001001111100111101001111100011001100100111110010110100111110011101110011101001011001110000000
Pair \(Z_2\) Length of longest common subsequence
2YPF_1,1BIG_1 138 2
2YPF_1,3OPF_1 138 4
1BIG_1,3OPF_1 170 2

Newick tree

 
[
	3OPF_1:80.02,
	[
		2YPF_1:69,1BIG_1:69
	]:11.02
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{795 }{\log_{20} 795}-\frac{37}{\log_{20}37})=221.\)
Status Protein1 Protein2 d d1/2
Query variables 2YPF_1 1BIG_1 89 51.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]