CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6MFW_1 9FXV_1 6IYH_1 Letter Amino acid
48 21 3 N Asparagine
73 24 6 G Glycine
33 17 2 M Methionine
45 19 6 F Phenylalanine
60 21 15 T Threonine
83 20 10 V Valine
62 13 1 Q Glutamine
56 24 4 P Proline
13 10 1 W Tryptophan
47 11 4 Y Tyrosine
64 25 3 R Arginine
41 8 9 H Histidine
59 34 11 K Lycine
84 25 16 A Alanine
74 25 8 D Aspartic acid
7 11 1 C Cysteine
94 44 5 E Glutamic acid
77 28 7 I Isoleucine
143 48 16 L Leucine
47 33 14 S Serine

6MFW_1|Chain A|Linear gramicidin synthase subunit A|Brevibacillus parabrevis (54914)
>9FXV_1|Chain A|Polymerase acidic protein|Influenza A virus (A/California/07/2009(H1N1)) (641809)
>6IYH_1|Chain A|Alpha chain|Acipenser persicus (61968)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6MFW , Knot 436 1210 0.85 40 335 1056
GAMGRILFLTTFMSKGNKVVRYLESLHHEVVICQEKVHAQSANLQEIDWIVSYAYGYILDKEIVSRFRGRIINLHPSLLPWNKGRDPVFWSVWDETPKGVTIHLIDEHVDTGDILVQEEIAFADEDTLLDCYNKANQAIEELFIREWENIVHGRIAPYRQTAGGTLHFKADRDFYKNLNMTTVRELLALKRLCAEPKRGEKPIDKTFHQLFEQQVEMTPDHVAVVDRGQSLTYKQLNERANQLAHHLRGKGVKPDDQVAIMLDKSLDMIVSILAVMKAGGAYVPIDPDYPGERIAYMLADSSAAILLTNALHEEKANGACDIIDVHDPDSYSENTNNLPHVNRPDDLVYVMYTSGSTGLAKGVMIEHHNLVNFCEWYRPYFGVTPADKALVYSSFSFDGSALDIFTHLLAGAALHIVPSERKYDLDALNDYCNQEGITISYLPTGAAEQFMQMDNQSFRVVITGGDVLKKIERNGTYKLYNGYGPTECTIMVTMFEVDKPYANIPIGKPIDRTRILILDEALALQPIGVAGELFIVGEGLGRGYLNRPELTAEKFIVHPQTGERMYRTGDRARFLPDGNIEFLGRLDNLVKIRGYRIEPGEIEPFLMNHPLIELTTVLAKEQADGRKYLVGYYVAPEEIPHGELREWLGNDLPDYMIPTYFVHMKAFPLTANGKVDRRALPDVQADAELLGEDYVAPTDELEQQLAQVWSHVLGIPQMGIDDHFLERGGDSIKVMQLIHQLKNIGLSLRYDQLFTHPTIRQLKRLLTEQKQVSLEPLRELDEQAEYETSAVEKRMYIIQQQDVESIAYNVVYTINFPLTVDTEQIRVALEQLVLRHEGLRSTYHMRGDEIVKRIVPRAELSFVRQTGEEESVQSLLAEQIKPFDLAKAPLLRAGVIETADKKVLWFDSHHILLDGLSKSILARELQALLGQQVLSPVEKTYKSFARWQNEWFASDEYEQQIAYWKTLLQGELPAVQLPTKKRPPQLTFDGAIQMYRVNPEITRKLKATAAKHDLTLYMLMLTIVSIWLSKMNSDSNQVILGTVTDGRQHPDTRELLGMFVNTLPLLLSIDHEESFLHNLQQVKAKLLPALQNQYVPFDKILEAARVKREGNRHPLFDVMFMMQGAPETELESNMHHINAGISKFDLTLEVLERENGLNIVFEYNTHLFDEGMILRMVAQFEHLLLQAVHGLDQQVKRFELVAAAENLYFQ
9FXV , Knot 191 461 0.84 40 242 442
SIEPFLRTTPRPLRLPDGPLCHQRSKFLLMDALKLSIEDPSHEGEGIPLYDAIKCMKTFFGWKEPNIVKPHEKGINPNYLMAWKQVLAELQDIENEEKIPRTKNMKRTSQLKWALGENMAPEKVDFDDCKDVGDLKQYDSDEPEPRSLASWVQNEFNKACELTDSSWIELDEIGEDVAPIEHIASMRRNYFTAEVSHCRATEYIMKGVYINTALLNASCAAMDDFQLIPMISKCRTKEGRRKTNLYGFIIKGRSHLRNDTDVVNFVSMEFSLTDPRLEPHKWEKYCVLEIGDMLLRTAIGQVSRPMFLYVRTNGTSKIKMKWGMEMRRCLLQSLQQIESMIEAESSVKEKDMTKEFFENKSETWPIGESPRGVEEGSIGKVCRTLLAKSVFNSLYASPQLEGFSAESRKLLLIVQALRDNLEPGTFDLGGLYEAIEECLINDPWVLLNASWFNSFLTHALK
6IYH , Knot 67 142 0.78 40 103 137
SLTSADKSHVKSIWSKASGKAEELGAEALGRMLEVFPNTKTYFSHYADLSVSSGQVHTHGKKILDAITTAVNHIDDITGTMTALSTLHAKTLRVDPANFKILSHTILVVLALYFPADFTPEVHLACDKFLASVSHTLATKYR

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6MFW_1)}(2) \setminus P_{f(9FXV_1)}(2)|=126\), \(|P_{f(9FXV_1)}(2) \setminus P_{f(6MFW_1)}(2)|=33\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1111011110011001001100100100011100001010010100101110010101100011001010110101011110010011110110001011010110001001011100011110000110000010011001110010011010111000011101010100010001010010011110010101001001100010011000101010011110010010000100010011001010110100011111000101110111110111101110100110011011100011111001100001011001101001000000000110100100110110001001110111100001101001001011101100111000101010110110011111110111000000101100000001101001101110011010000101110110110010001000100101100001110110100101011110110000111100111101111110111110111010100101010011101001001000100101110101011101001101010010110101111001110100111000101000111001110011010100111001100111001101011110101010001110101010111000111000100011011001111101110001100110010110110010011101000011001010010011000001010110010001000001100010110000100110011001011101000010111001110001100000101001100111010101100010000100111001011011011110111100100011110000111011000111001011110011011000000110100011100000001101001101011110110000110101011101001010100010101100010101111011011100100000011110100100010000111111001111101000001100100101011111000011100110110100010001110111110111000100010010111001010101100001101110000011001111011101001110110110001001011111001010
Pair \(Z_2\) Length of longest common subsequence
6MFW_1,9FXV_1 159 5
6MFW_1,6IYH_1 238 3
9FXV_1,6IYH_1 203 3

Newick tree

 
[
	6IYH_1:11.17,
	[
		6MFW_1:79.5,9FXV_1:79.5
	]:39.67
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1671 }{\log_{20} 1671}-\frac{461}{\log_{20}461})=305.\)
Status Protein1 Protein2 d d1/2
Query variables 6MFW_1 9FXV_1 388 267
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]