CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1LAN_1 5TSV_1 9QGF_1 Letter Amino acid
29 15 21 I Isoleucine
25 16 26 T Threonine
7 4 3 W Tryptophan
33 15 25 V Valine
14 15 18 Q Glutamine
37 16 20 E Glutamic acid
42 18 40 G Glycine
8 6 17 H Histidine
11 10 2 M Methionine
52 20 35 A Alanine
21 11 20 R Arginine
22 10 18 N Asparagine
24 7 18 D Aspartic acid
39 18 35 L Leucine
28 9 17 S Serine
9 4 15 Y Tyrosine
7 4 1 C Cysteine
33 11 9 K Lycine
20 4 12 F Phenylalanine
23 18 23 P Proline

1LAN_1|Chain A|LEUCINE AMINOPEPTIDASE|Bos taurus (9913)
>5TSV_1|Chains A, B|HIV-1 CA protein|Human immunodeficiency virus type 1 group M subtype B (isolate NY5) (11698)
>9QGF_1|Chains A, B|All1865 protein|Nostoc sp. PCC 7120 = FACHB-418 (103690)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1LAN , Knot 202 484 0.86 40 248 455
TKGLVLGIYSKEKEEDEPQFTSAGENFNKLVSGKLREILNISGPPLKAGKTRTFYGLHEDFPSVVVVGLGKKTAGIDEQENWHEGKENIRAAVAAGCRQIQDLEIPSVEVDPCGDAQAAAEGAVLGLYEYDDLKQKRKVVVSAKLHGSEDQEAWQRGVLFASGQNLARRLMETPANEMTPTKFAEIVEENLKSASIKTDVFIRPKSWIEEQEMGSFLSVAKGSEEPPVFLEIHYKGSPNASEPPLVFVGKGITFDSGGISIKAAANMDLMRADMGGAATICSAIVSAAKLDLPINIVGLAPLCENMPSGKANKPGDVVRARNGKTIQVDNTDAEGRLILADALCYAHTFNPKVIINAATLTGAMDIALGSGATGVFTNSSWLWNKLFEASIETGDRVWRMPLFEHYTRQVIDCQLADVNNIGKYRSAGACTAAAFLKEFVTHPKWAHLDIAGVMTNKDEVPYLRKGMAGRPTRTLIEFLFRFSQ
5TSV , Knot 106 231 0.83 40 158 223
PIVQNLQGQMVHQCISPRTLNAWVKVVEEKAFSPEVIPMFSALSCGATPQDLNTMLNTVGGHQAAMQMLKETINEEAAEWDRLHPVHAGPIAPGQMREPRGSDIAGTTSTLQEQIGWMTHNPPIPVGEIYKRWIILGLNKIVRMYSPTSILDIRQGPKEPFRDYVDRFYKTLRAEQASQEVKNAATETLLVQNANPDCKTILKALGPGATLEEMMTACQGVGGPGHKARVL
9QGF , Knot 155 375 0.81 40 206 354
MGHHHHHHGSGSTNINLFSSYQLGELELPNRIVMAPLTRSRAGEGNVPHQLNAIYYGQRASAGLIIAEATQVTPQGQGYPHTPGIHSPEQVAGWKLVTDTVHQQGGRIFLQLWHVGRISHPDLQPDGGLPVAPSAIAPKGEVLTYEGKKPYVTPRALDTSEIPAIVEQYRQGAANALAAGFDGVEIHAANGYLIDQFLRDGTNQRTDEYGGAIENRARLLLEVTEAITSVWDSQRVGVRLSPLTTLNGCVDSHPLETFGYVAQALNRFNLSYLHIFEAIDADIRHGGTVVPTSHLRDRFTGTLIVNGGYTREKGDTVIANKAADLVAFGTLFISNPDLPERLEVNAPLNQADPKTFYGGGEKGYTDYPFLAVANK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1LAN_1)}(2) \setminus P_{f(5TSV_1)}(2)|=135\), \(|P_{f(5TSV_1)}(2) \setminus P_{f(1LAN_1)}(2)|=45\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:0011111100000000010100110010011010100110101111011000010110001101111111000111000001001000101111110001001011010101010101110111111000001000001110101010000011001111101001100110011001010011011000100101000111010011000011011011010001111101000101010011111110110100111010111010110101111101001110110101110111111100011010100110110100100101000010101111011001001010111011010111011110110111000011100110101001001101111000000110001101001100001110011111001100101101011111000001101001111010001101110100
Pair \(Z_2\) Length of longest common subsequence
1LAN_1,5TSV_1 180 4
1LAN_1,9QGF_1 150 4
5TSV_1,9QGF_1 168 3

Newick tree

 
[
	5TSV_1:90.71,
	[
		1LAN_1:75,9QGF_1:75
	]:15.71
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{715 }{\log_{20} 715}-\frac{231}{\log_{20}231})=135.\)
Status Protein1 Protein2 d d1/2
Query variables 1LAN_1 5TSV_1 175 126
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]