CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4DKP_1 2WSD_1 9JBX_1 Letter Amino acid
30 30 54 I Isoleucine
11 16 56 F Phenylalanine
21 31 89 V Valine
14 26 84 A Alanine
13 32 45 D Aspartic acid
29 33 58 T Threonine
7 9 8 W Tryptophan
7 26 28 Y Tyrosine
28 32 58 G Glycine
26 25 58 K Lycine
20 42 134 L Leucine
7 8 22 M Methionine
17 46 60 P Proline
19 29 79 S Serine
36 23 26 N Asparagine
16 4 25 C Cysteine
21 35 65 E Glutamic acid
9 23 26 H Histidine
7 28 56 R Arginine
15 15 48 Q Glutamine

4DKP_1|Chains A, B[auth C]|clade A/E 93TH057 HIV-1 gp120 core|HUMAN IMMUNODEFICIENCY VIRUS TYPE 1 (11686)
>2WSD_1|Chain A|SPORE COAT PROTEIN A|BACILLUS SUBTILIS (1423)
>9JBX_1|Chain A|Endoplasmic reticulum transmembrane helix translocase|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4DKP , Knot 149 353 0.82 40 203 330
VWKDADTTLFCASDAKAHETEVHNVWATHACVPTDPNPQEIHLENVTENFNMWKNNMVEQMQEDVISLWDQSLQPCVKLTGGSVIKQACPKISFDPIPIHYCTPAGYVILKCNDKNFNGTGPCKNVSSVQCTHGIKPVVSTQLLLNGSLAEEEIIIRSENLTNNAKTIIVHLNKSVEINCTRPSNGGSGSGGDIRKAYCEINGTKWNKVLKQVTEKLKEHFNNKTIIFQPPSGGDLEITMHSFNCRGEFFYCNTTQLFNNTCIGNETMKGCNGTITLPCKIKQIINMWQGTGQAMYAPPIDGKINCVSNITGILLTRDGGANNTSNETFRPGGGNIKDNWRSELYKYKVVQIE
2WSD , Knot 212 513 0.86 40 271 487
MTLEKFVDALPIPDTLKPVQQSKEKTYYEVTMEECTHQLHRDLPPTRLWGYNGLFPGPTIEVKRNENVYVKWMNNLPSTHFLPIDHTIHHSDSQHEEPEVKTVVHLHGGVTPDDSDGYPEAWFSKDFEQTGPYFKREVYHYPNQQRGAILWYHDHAMALTRLNVYAGLVGAYIIHDPKEKRLKLPSDEYDVPLLITDRTINEDGSLFYPSAPENPSPSLPNPSIVPAFCGETILVNGKVWPYLEVEPRKYRFRVINASNTRTYNLSLDNGGDFIQIGSDGGLLPRSVKLNSFSLAPAERYDIIIDFTAYEGESIILANSAGCGGDVNPETDANIMQFRVTKPLAQKDESRKPKYLASYPSVQHERIQNIRTLKLAGTQDEYGRPVLLLNNKRWHDPVTETPKVGTTEIWSIINPTRGTHPIHLHLVSFRVLDRRPFDIARYQESGELSYTGPAVPPPPSEKGWKDTIQAHAGEVLRIAATFGPYSGRYVWHCHALEHEDYDMMRPMDITDPHK
9JBX , Knot 395 1079 0.85 40 323 941
GHWSVHAHCALTCTPEYDPSKATFVKVVPTPNNGSTELVALHRNEGEDGLEVLSFEFQKIKYSYDALEKKQFLPVAFPVGNAFSYYQSNRGFQEDSEIRAAEKKFGSNKAEMVVPDFSELFKERATAPFFVFQVFCVGLWCLDEYWYYSVFTLSMLVAFEASLVQQQMRNMSEIRKMGNKPHMIQVYRSRKWRPIASDEIVPGDIVSIGRSPQENLVPCDVLLLRGRCIVDEAMLTGESVPQMKEPIEDLSPDRVLDLQADSRLHVIFGGTKVVQHIPPQKATTGLKPVDSGCVAYVLRTGFNTSQGKLLRTILFGVKRVTANNLETFIFILFLLVFAIAAAAYVWIEGTKDPSRNRYKLFLECTLILTSVVPPELPIELSLAVNTSLIALAKLYMYCTEPFRIPFAGKVEVCCFDKTGTLTSDSLVVRGVAGLRDGKEVTPVSSIPVETHRALASCHSLMQLDDGTLVGDPLEKAMLTAVDWTLTKDEKVFPRSIKTQGLKIHQRFHFASALKRMSVLASYEKLGSTDLCYIAAVKGAPETLHSMFSQCPPDYHHIHTEISREGARVLALGYKELGHLTHQQAREVKREALECSLKFVGFIVVSCPLKADSKAVIREIQNASHRVVMITGDNPLTACHVAQELHFIEKAHTLILQPPSEKGRQCEWRSIDGSIVLPLARGSPKALALEYALCLTGDGLAHLQATDPQQLLRLIPHVQVFARVAPKQKEFVITSLKELGYVTLMCGDGTNDVGALKHADVGVALLANAPERVVERRRRPRDSPTLSNSGIRATSRTAKQRSGLPPSEEQPTSQRDRLSQVLRDLEDESTPIVKLGDASIAAPFTSKLSSIQCICHVIKQGRCTLVTTLQMFKILALNALILAYSQSVLYLEGVKFSDFQATLQGLLLAGCFLFISRSKPLKTLSRERPLPNIFNLYTILTVMLQFFVHFLSLVYLYREAQARSPEKQEQFVDLYKEFEPSLVNSTVYIMAMAMQMATFAINYKGPPFMESLPENKPLVWSLAVSLLAIIGLLLGSSPDFNSQFGLVDIPVEFKLVIAQVLLLDFCLALLADRVLQFFLG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4DKP_1)}(2) \setminus P_{f(2WSD_1)}(2)|=60\), \(|P_{f(2WSD_1)}(2) \setminus P_{f(4DKP_1)}(2)|=128\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11001000110100101000010011100101100101001010010001011000110010001101100010101010110110010101010111100001110111000000101011000100100001101110001110101100011100001000100111010001010000100110101101001000101001001100100010001000011101101101010100100010110000001100001100010100101011001001101101010110111101010010010111100011100000001011110100010001000011010
Pair \(Z_2\) Length of longest common subsequence
4DKP_1,2WSD_1 188 4
4DKP_1,9JBX_1 180 4
2WSD_1,9JBX_1 132 6

Newick tree

 
[
	4DKP_1:99.19,
	[
		9JBX_1:66,2WSD_1:66
	]:33.19
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{866 }{\log_{20} 866}-\frac{353}{\log_{20}353})=138.\)
Status Protein1 Protein2 d d1/2
Query variables 4DKP_1 2WSD_1 180 149.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]