CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1IPJ_1 1TBY_1 6XHZ_1 Letter Amino acid
22 14 26 A Alanine
33 10 6 N Asparagine
37 3 7 E Glutamic acid
8 1 10 H Histidine
0 5 1 W Tryptophan
33 6 3 Q Glutamine
44 9 14 L Leucine
26 2 4 F Phenylalanine
31 6 14 S Serine
21 5 1 K Lycine
0 2 2 M Methionine
12 5 9 Y Tyrosine
21 8 6 D Aspartic acid
0 8 6 C Cysteine
19 11 37 G Glycine
26 5 3 I Isoleucine
29 14 13 R Arginine
21 2 11 P Proline
10 5 19 T Threonine
23 9 24 V Valine

1IPJ_1|Chains A, B, C|BETA-CONGLYCININ, BETA CHAIN|Glycine max (3847)
>1TBY_1|Chain A|HUMAN LYSOZYME|Homo sapiens (9606)
>6XHZ_1|Chain A|N4: hypothetical protein|Streptomyces monomycini (371720)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1IPJ , Knot 171 416 0.82 34 204 386
LKVREDENNPFYLRSSNSFQTLFENQNGRIRLLQRFNKRSPQLENLRDYRIVQFQSKPNTILLPHHADADFLLFVLSGRAILTLVNNDDRDSYNLHPGDAQRIPAGTTYYLVNPHDHQNLKIIKLAIPVNKPGRYDDFFLSSTQAQQSYLQGFSHNILETSFHSEFEEINRVLLGEEEEQRQQEGVIVELSKEQIRQLSRRAKSSSRKTISSEDEPFNLRSRNPIYSNNFGKFFEITPEKNPQLRDLDIFLSSVDINEGALLLPHFNSKAIVILVINEGDANIELVGIKEQQQKQKQEEEPLEVQRYRAELSEDDVFVIPAAYPFVVNATSNLNFLAFGINAENNQRNFLAGEKDNVVRQIERQVQELAFPGSAQDVERLLKKQRESYFVDAQPQQKEEGSKGRKGPFPSILGALY
1TBY , Knot 68 130 0.84 40 108 128
KVFERCELARTLKRLGMDGYRGISLANWMCLAKWESGYNTRATNYNAGDRSTDYGIFQINSRLWCNDGKTPGAVNACHLSCSALLQDNIADAVACAKRVVRDPQGIRAWVAWRNRCQNRDVRQYVQGCGV
6XHZ , Knot 90 216 0.74 40 122 194
MAPTAVRGGNVLFSASGRCTVGFNATKGGTYYAIMEGRCVGGARDWYADAARTVHVGVTEAVRYPGDDYAVIRYTNTAVSYPGEIDLGGGRYLDVTGAARPVVGQSVCLPGATTGRHCGRVEAVNVSVNHPEGTVSGLVRTSACTEPGTAAGRPAVSGSTAVGLALGGGGNCASGGTTYLQPVLPALAAFGLTLHGSSLEVLFQGPGGSSHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1IPJ_1)}(2) \setminus P_{f(1TBY_1)}(2)|=138\), \(|P_{f(1TBY_1)}(2) \setminus P_{f(1IPJ_1)}(2)|=42\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10100000011010000010011000010101100100001010010000110100010011110010101111110101110110000000001011010011110000110100000101101111100110000111000010000101100011000100010010011110000000001111010000100100010000000100000110100001100001101101010001010010111001010011111101000111111100101010111100000000000011010000101000011111110111101000101111110100000011110000110010001001111101001001100000001101010000010010011110111110
Pair \(Z_2\) Length of longest common subsequence
1IPJ_1,1TBY_1 180 3
1IPJ_1,6XHZ_1 172 4
1TBY_1,6XHZ_1 152 3

Newick tree

 
[
	1IPJ_1:91.68,
	[
		6XHZ_1:76,1TBY_1:76
	]:15.68
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{546 }{\log_{20} 546}-\frac{130}{\log_{20}130})=122.\)
Status Protein1 Protein2 d d1/2
Query variables 1IPJ_1 1TBY_1 149 98.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]