CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1JSY_1 7OPK_1 7ZMH_1 Letter Amino acid
9 26 7 Q Glutamine
5 28 8 M Methionine
27 28 16 T Threonine
37 44 34 V Valine
7 7 3 C Cysteine
36 72 13 E Glutamic acid
23 52 27 G Glycine
15 48 41 I Isoleucine
13 29 20 Y Tyrosine
23 83 25 A Alanine
17 34 11 N Asparagine
42 70 65 L Leucine
35 64 6 K Lycine
16 32 28 F Phenylalanine
0 12 6 W Tryptophan
25 57 9 R Arginine
31 56 6 D Aspartic acid
10 32 1 H Histidine
30 63 16 P Proline
17 46 36 S Serine

1JSY_1|Chain A|Bovine arrestin-2 (full length)|Bos taurus (9913)
>7OPK_1|Chain A|5'-3' exoribonuclease|Chaetomium thermophilum (strain DSM 1495 / CBS 144.50 / IMI 039719) (759272)
>7ZMH_1|Chain A[auth 1]|NADH-ubiquinone oxidoreductase chain 1|Chaetomium thermophilum var. thermophilum DSM 1495 (759272)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1JSY , Knot 172 418 0.82 38 210 392
MGDKGTRVFKKASPNGKLTVYLGKRDFVDHIDLVEPVDGVVLVDPEYLKERRVYVTLTCAFRYGREDLDVLGLTFRKDLFVANVQSFPPAPEDKKPLTRLQERLIKKLGEHAYPFTFEIPPNLPCSVTLQPGPEDTGKACGVDYEVKAFCAENLEEKIHKRNSVRLVIRKVQYAPERPGPQPTAETTRQFLMSDKPLHLEASLDKEIYYHGEPISVNVHVTNNTNKTVKKIKISVRQYADICLFNTAQYKCPVAMEEADDTVAPSSTFCKVYTLTPFLANNREKRGLALDGKLKHEDTNLASSTLLREGANREILGIIVSYKVKVKLVVSRGGLLGDLASSDVAVELPFTLMHPKPKEEPPHREVPEHETPVDTNLIELDTNDDDIVFEDFARQRLKGMKDDKEEEEDGTGSPRLNDR
7OPK , Knot 329 883 0.84 40 301 795
MGIPAAFRWLSNKYPKIISPVVEERPIVMPDGTEIPVDATRPNPNGEEFDNLYLDMNGIVHPCSHPEDKPAPKDEEEMMIEIFKYTDRIVKMVRPRKILMIAVDGVAPRAKMNQQRSRRFRAAQEAKEKEEEKKQLLKMLRKEKGSNMQEEPLETVVKKAFDSNSITPGTPFMDILAASLRYWCAYKLNTDPAWAKLKVIISDATVPGEGEHKIMEFIRSQRSSPEHNPNTRHVIYGLDADLIMLGLATHEPHFRVLREDVFFQEAKARLCKLCGQKGHDERSCKGEAKQKQGEFDEKDHAQPLKPFIWLHVSILREYLAAELEVPNLPFRWDLERAIDDWVFLCFFVGNDFLPHLPALEIRENGIDTLTAIWKDNLPIMGGYLTKDGHVDLERAQYILNGLAKQEDAIFRRRREVEERREANAKRRKLNQQGAHAKGAADSHAGKSGRKHVPEAAGPLPGMALFPITNPPPPAITHDMVMKGRSVDQANLANKSAASVLKSQIQSMMAQKAATNANGAEKDVSADGTTTAPASALGKRKAELIEEDAATNTDTDSVTDGTGSDNEGPVDTVRLWEEGYADRYYEQKFKVDPKDIEFRHKVGRAYAEGLAWVLQYYYQGCPSWEWFYPYHYAPFAADFVDLAKMEIKFEKGRISRPFEQLMSVLPAASRHAIPEVYHDLMTDPNSPIIDFYPEEFEIDLNGKKMAWQGVALLPFIEMPRLLAAMKEREHLLSEEDRARNEPGFDVLLISDAHPGLYEDITSHFYSKKQGAPKFKLNPRRSDGLAGKVEKIEGYVPHGSLVYPLARNSMPDVDYDRSITVRYIMPSSAHQHKSMLLRGVKLPPPALSRSDIEIIRSKAKNAGRSYGGAPLRNNYNSLEHHHHHH
7ZMH , Knot 152 378 0.79 40 176 334
MSYSQTINSLVEVVLVLVPSLVGIAYVTVGERKTMGSMQRRLGPNAVGIYGLLQAFADALKLLLKEYVGPTQANLVLFFLGPVITLIFSLLGYAVIPYGPGLAVNDLSTGILYMLAVSSLATYGILLAGWSANSKYAFLGSLRSTAQLISYELVLSSSILLVIMLSGSLSLTVIVESQRAIWYILPLLPVFIIFFIGSVAETNRAPFDLAEAESELVSGFMTEHAAVIFVFFFLAEYGSIVLMCILTSILFLGGYLLISLLDIIYNNLLSWIVIGKYIIFIFPFWGPVFIDLGLYEIISYLYNAPTVEGSFYGLSLGVKTSILIFVFIWTRASFPRIRFDQLMSFCWTVLLPILFALIVLVPCILYSFNIFPVNISLL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1JSY_1)}(2) \setminus P_{f(7OPK_1)}(2)|=26\), \(|P_{f(7OPK_1)}(2) \setminus P_{f(1JSY_1)}(2)|=117\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1100100110010101010101100011001011011011111010010000101010011001000101111010001111010011111000011001000110011001011010111011001010111000101011000101101001000100000101110010011001110101000001110001101010100010001011010101000000010010101000101011001000011110010001110001001001011110000001111010100000011000110011000111111000101011100111110110001110111011010100011000110000110001101000000111001100010110000000001010101000
Pair \(Z_2\) Length of longest common subsequence
1JSY_1,7OPK_1 143 5
1JSY_1,7ZMH_1 168 4
7OPK_1,7ZMH_1 183 4

Newick tree

 
[
	7ZMH_1:92.63,
	[
		1JSY_1:71.5,7OPK_1:71.5
	]:21.13
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1301 }{\log_{20} 1301}-\frac{418}{\log_{20}418})=228.\)
Status Protein1 Protein2 d d1/2
Query variables 1JSY_1 7OPK_1 289 210
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]