CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3JTX_1 4HJA_1 5TQN_1 Letter Amino acid
24 15 61 S Serine
13 7 46 T Threonine
14 11 37 Y Tyrosine
15 7 40 N Asparagine
6 4 4 C Cysteine
18 8 30 F Phenylalanine
27 5 51 P Proline
24 9 46 D Aspartic acid
44 10 85 L Leucine
20 9 49 K Lycine
27 7 52 V Valine
27 5 53 G Glycine
7 4 25 H Histidine
21 9 54 I Isoleucine
5 4 15 M Methionine
7 3 14 W Tryptophan
36 2 57 A Alanine
21 10 37 R Arginine
17 4 27 Q Glutamine
23 10 56 E Glutamic acid

3JTX_1|Chains A, B|aminotransferase|Neisseria meningitidis Z2491 (122587)
>4HJA_1|Chain A|Protection of telomeres protein 1|Schizosaccharomyces pombe (284812)
>5TQN_1|Chains A, B|Seed linoleate 13S-lipoxygenase-1|Glycine max (3847)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3JTX , Knot 169 396 0.85 40 223 383
GMNTLLKQLKPYPFARLHEAMQGISAPEGMEAVPLHIGEPKHPTPKVITDALTASLHELEKYPLTAGLPELRQACANWLKRRYDGLTVDADNEILPVLGSREALFSFVQTVLNPVSDGIKPAIVSPNPFYQIYEGATLLGGGEIHFANCPAPSFNPDWRSISEEVWKRTKLVFVCSPNNPSGSVLDLDGWKEVFDLQDKYGFIIASDECYSEIYFDGNKPLGCLQAAAQLGRSRQKLLMFTSLSKRSNVPGLRSGFVAGDAELLKNFLLYRTYHGSAMSIPVQRASIAAWDDEQHVIDNRRLYQEKFERVIPILQQVFDVKLPDASFYIWLKVPDGDDLAFARNLWQKAAIQVLPGRFLARDTEQGNPGEGYVRIALVADVATCVKAAEDIVSLYR
4HJA , Knot 71 143 0.82 40 116 139
MSDSFSLLSQITPHQRCSFYAQVIKTWYSDKNFTLYVTDYTENELFFPMSPYTSSSRWRGPFGRFSIRCILWDEHDFYCRNYIKEGDYVVMKNVRTKIDHLGYLECILHGDSAKRYNMSIEKVDSEEPELNEIKSRKRLYVQN
5TQN , Knot 319 839 0.85 40 305 777
MFSAGHKIKGTVVLMPKNELEVNPDGSAVDNLNAFLGRSVSLQLISATKADAHGKGKVGKDTFLEGINTSLPTLGAGESAFNIHFEWDGSMGIPGAFYIKNYMQVEFFLKSLTLEAISNQGTIRFVCNSWVYNTKLYKSVRIFFANHTYVPSETPAPLVEYREEELKSLRGNGTGERKEYDRIYDYDVYNDLGNPDKSEKLARPVLGGSSTFPYPRRGRTGRGPTVTDPNTEKQGEVFYVPRDENLGHLKSKDALEIGTKSLSQIVQPAFESAFDLKSTPIEFHSFQDVHDLYEGGIKLPRDVISTIIPLPVIKELYRTDGQHILKFPQPHVVQVSQSAWMTDEEFAREMIAGVNPCVIRGLEEFPPKSNLDPAIYGDQSSKITADSLDLDGYTMDEALGSRRLFMLDYHDIFMPYVRQINQLNSAKTYATRTILFLREDGTLKPVAIELSLPHSAGDLSAAVSQVVLPAKEGVESTIWLLAKAYVIVNDSCYHQLMSHWLNTHAAMEPFVIATHRHLSVLHPIYKLLTPHYRNNMNINALARQSAINANGIIETTFLPSKYSVEMSSAVYKNWVFTDQALPADLIKRGVAIKDPSTPHGVRLLIEDYPYAADGLEIWAAIKTWVQEYVPLYYARDDDVKNDSELQHWWKEAVEKGHGDLKDKPWWPKLQTLEDLVEVCLIIIWIASALHAAVNFGQYPYGGLIMNRPTASRRLLPEKGTPEYEEMINNHEKAYLRTITSKLPTLISLSVIEILSTHASDEVYLGQRDNPHWTSDSKALQAFQKFGNKLKEIEEKLVRRNNDPSLQGNRLGPVQLPYTLLYPSSEEGLTFRGIPNSISI

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3JTX_1)}(2) \setminus P_{f(4HJA_1)}(2)|=150\), \(|P_{f(4HJA_1)}(2) \setminus P_{f(3JTX_1)}(2)|=43\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110011001010111010011011011011011110110100101011001101010010001101111010010101100000110101000111111000111011001101100110111101011001001101111101011001110101010010001100001111001001010110101100110100001111100000001010100111010111011000001111001000001111001111101011001110000010110111001011110000011000010000100111110011010110101011101101001111001100111011110111000001011010101111101100101100110100
Pair \(Z_2\) Length of longest common subsequence
3JTX_1,4HJA_1 193 3
3JTX_1,5TQN_1 146 5
4HJA_1,5TQN_1 233 4

Newick tree

 
[
	4HJA_1:11.10,
	[
		3JTX_1:73,5TQN_1:73
	]:43.10
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{539 }{\log_{20} 539}-\frac{143}{\log_{20}143})=115.\)
Status Protein1 Protein2 d d1/2
Query variables 3JTX_1 4HJA_1 148 99.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]