From: Charles Plessy Date: Tue, 23 Dec 2014 09:34:34 +0000 (+0900) Subject: Trois fois plus rapide en utilisant « skipMany » et en réduisant les appels à « try ». X-Git-Url: https://source.charles.plessy.org/?a=commitdiff_plain;h=a67fa69dba2df4eff1c2a24db8d25a138fba5f5c;p=source%2F.git Trois fois plus rapide en utilisant « skipMany » et en réduisant les appels à « try ». --- diff --git a/Haskell/refSeqIdSymbol.html b/Haskell/refSeqIdSymbol.html index 8ea95200..8eb67b41 100644 --- a/Haskell/refSeqIdSymbol.html +++ b/Haskell/refSeqIdSymbol.html @@ -65,8 +65,12 @@ parseGbRecord r = case parse gbR _ -> endField >> return ""

A field starts with a field name, which is recorded. It is followed with multiple spaces. Since I am only intersted in version and gene symbol, if the field name was VERSION or FEATURES, more information is extracted, otherwise an empty string is returned.

fieldName = many1 upper
-endField  = manyTill anyChar (try separator <|> try eof)
-separator = newline >> notFollowedBy (char ' ')
+endField = do + skipMany ((noneOf "\n") <|> try continuation) + separator + return "" +continuation = newline >> lookAhead (char ' ') >> return '\n' +separator = newline >> notFollowedBy (char ' ') >> return '\n'

A field name is upper case. Fields continue with any character until a separator or the end of the file is reached. A separator is a newline character not followed by a space.

getVersionNumber = do
   let versionNumberChar = oneOf $ "NXRM_" ++ ['0'..'9'] ++ "."
@@ -122,9 +126,9 @@ NM_005355.3     KIF25
 NM_030615.2     KIF25
 
 real    0m0.036s
-user    0m0.028s
-sys     0m0.012s
-

The Haskell parser is unfortunately 100 times slower.

+user 0m0.020s +sys 0m0.020s +

The Haskell parser is unfortunately 50 times slower.

$ time cat refSeqIdSymbol.testdata.gb | ./refSeqIdSymbol | head
 NM_025073.2     SIKE1
 NM_181712.4     KANK4
@@ -137,6 +141,6 @@ NM_182482.2     BAGE2
 NM_005355.3     KIF25
 NM_030615.2     KIF25
 
-real    0m4.963s
-user    0m4.808s
-sys     0m0.176s
+real 0m1.515s +user 0m1.500s +sys 0m0.036s diff --git a/Haskell/refSeqIdSymbol.lhs b/Haskell/refSeqIdSymbol.lhs index abacdd65..8537a010 100644 --- a/Haskell/refSeqIdSymbol.lhs +++ b/Haskell/refSeqIdSymbol.lhs @@ -137,8 +137,12 @@ field name was VERSION or FEATURES, more information is extracted, otherwise an empty string is returned. > fieldName = many1 upper -> endField = manyTill anyChar (try separator <|> try eof) -> separator = newline >> notFollowedBy (char ' ') +> endField = do +> skipMany ((noneOf "\n") <|> try continuation) +> separator +> return "" +> continuation = newline >> lookAhead (char ' ') >> return '\n' +> separator = newline >> notFollowedBy (char ' ') >> return '\n' A field name is upper case. Fields continue with any character until a separator or the end of the file is reached. A separator is a newline @@ -241,11 +245,11 @@ NM_005355.3 KIF25 NM_030615.2 KIF25 real 0m0.036s -user 0m0.028s -sys 0m0.012s +user 0m0.020s +sys 0m0.020s ~~~~~ -The Haskell parser is unfortunately 100 times slower. +The Haskell parser is unfortunately 50 times slower. ~~~~~ $ time cat refSeqIdSymbol.testdata.gb | ./refSeqIdSymbol | head @@ -260,7 +264,7 @@ NM_182482.2 BAGE2 NM_005355.3 KIF25 NM_030615.2 KIF25 -real 0m4.963s -user 0m4.808s -sys 0m0.176s +real 0m1.515s +user 0m1.500s +sys 0m0.036s ~~~~~