_ <span class="ot">-></span> endField <span class="fu">>></span> return <span class="st">""</span></code></pre>
<p>A field starts with a field name, which is recorded. It is followed with multiple spaces. Since I am only intersted in version and gene symbol, if the field name was VERSION or FEATURES, more information is extracted, otherwise an empty string is returned.</p>
<pre class="sourceCode literate haskell"><code class="sourceCode haskell">fieldName <span class="fu">=</span> many1 upper
-endField <span class="fu">=</span> manyTill anyChar (try separator <span class="fu"><|></span> try eof)
-separator <span class="fu">=</span> newline <span class="fu">>></span> notFollowedBy (char <span class="ch">' '</span>)</code></pre>
+endField <span class="fu">=</span> <span class="kw">do</span>
+ skipMany ((noneOf <span class="st">"\n"</span>) <span class="fu"><|></span> try continuation)
+ separator
+ return <span class="st">""</span>
+continuation <span class="fu">=</span> newline <span class="fu">>></span> lookAhead (char <span class="ch">' '</span>) <span class="fu">>></span> return <span class="ch">'\n'</span>
+separator <span class="fu">=</span> newline <span class="fu">>></span> notFollowedBy (char <span class="ch">' '</span>) <span class="fu">>></span> return <span class="ch">'\n'</span></code></pre>
<p>A field name is upper case. Fields continue with any character until a separator or the end of the file is reached. A separator is a newline character not followed by a space.</p>
<pre class="sourceCode literate haskell"><code class="sourceCode haskell">getVersionNumber <span class="fu">=</span> <span class="kw">do</span>
<span class="kw">let</span> versionNumberChar <span class="fu">=</span> oneOf <span class="fu">$</span> <span class="st">"NXRM_"</span> <span class="fu">++</span> [<span class="ch">'0'</span><span class="fu">..</span><span class="ch">'9'</span>] <span class="fu">++</span> <span class="st">"."</span>
NM_030615.2 KIF25
real 0m0.036s
-user 0m0.028s
-sys 0m0.012s</code></pre>
-<p>The Haskell parser is unfortunately 100 times slower.</p>
+user 0m0.020s
+sys 0m0.020s</code></pre>
+<p>The Haskell parser is unfortunately 50 times slower.</p>
<pre><code>$ time cat refSeqIdSymbol.testdata.gb | ./refSeqIdSymbol | head
NM_025073.2 SIKE1
NM_181712.4 KANK4
NM_005355.3 KIF25
NM_030615.2 KIF25
-real 0m4.963s
-user 0m4.808s
-sys 0m0.176s</code></pre>
+real 0m1.515s
+user 0m1.500s
+sys 0m0.036s</code></pre>
an empty string is returned.
> fieldName = many1 upper
-> endField = manyTill anyChar (try separator <|> try eof)
-> separator = newline >> notFollowedBy (char ' ')
+> endField = do
+> skipMany ((noneOf "\n") <|> try continuation)
+> separator
+> return ""
+> continuation = newline >> lookAhead (char ' ') >> return '\n'
+> separator = newline >> notFollowedBy (char ' ') >> return '\n'
A field name is upper case. Fields continue with any character until a
separator or the end of the file is reached. A separator is a newline
NM_030615.2 KIF25
real 0m0.036s
-user 0m0.028s
-sys 0m0.012s
+user 0m0.020s
+sys 0m0.020s
~~~~~
-The Haskell parser is unfortunately 100 times slower.
+The Haskell parser is unfortunately 50 times slower.
~~~~~
$ time cat refSeqIdSymbol.testdata.gb | ./refSeqIdSymbol | head
NM_005355.3 KIF25
NM_030615.2 KIF25
-real 0m4.963s
-user 0m4.808s
-sys 0m0.176s
+real 0m1.515s
+user 0m1.500s
+sys 0m0.036s
~~~~~