Quantcast
Viewing all articles
Browse latest Browse all 186

Why is MatchQ faster than StringMatchQ in testing string matches?

Suppose I have myStrs, a list of one million alphabets (i.e., 26 million single-character strings):

charRange = CharacterRange["A", "Z"];myStrs = Flatten[ConstantArray[charRange, 10^6]]; // AbsoluteTiming{0.258808, Null}ScientificForm[Length[myStrs], DigitBlock -> 3](* OUTPUT: *)(* {0.258808, Null} *)(* 26,000,000 *)

I wish to create a function that will return True if a string is "A" or "B", and False. I have come up with several methods for doing this:

Some of these methods have different forms: operator and/or listable forms.

(* METHOD 1: equality testing (Equal, ==) with Or (||) *)(* METHOD 1a, mapped *)Prepend[AbsoluteTiming[output1a = Map[# == "A" || # == "B" &, myStrs];],"METHOD 1a (==, mapped)"](* METHOD 2: StringMatchQ with Alternatives (|) *)(* METHOD 2a: mapped *)Prepend[AbsoluteTiming[output2a = Map[StringMatchQ[#, "A" | "B"] &, myStrs];],"METHOD 2a (StringMatchQ, mapped)"](* METHOD 2b: operator form *)Prepend[AbsoluteTiming[output2b = StringMatchQ["A" | "B"][myStrs];],"METHOD 2b (StringMatchQ, operator form)"](* METHOD 2b: listable form *)Prepend[AbsoluteTiming[output2c = StringMatchQ[myStrs, "A" | "B"];],"METHOD 2c (StringMatchQ, listable form)"](* METHOD 3: MatchQ with Alternatives (|) *)(* METHOD 3a: mapped *)Prepend[AbsoluteTiming[output3a = Map[MatchQ[#, "A" | "B"] &, myStrs];],"METHOD 3a (MatchQ, mapped)"](* METHOD 4: MemberQ *)(* METHOD 4a: MemberQ, mapped *)Prepend[AbsoluteTiming[output4a = Map[MemberQ[{"A", "B"}, #] &, myStrs];],"METHOD 4a (MemberQ, mapped)"]

Before presenting the output, I will show that the output lists are all identical:

(* Check that the resulting lists are all identical. *)(output1a == output2a) && (output2a == output2b) && (output2b == output2c)&& (output2c == output3a) && (output3a == output4a)(* Tally the output. *)Tally[output1a](* OUTPUT: *)(* True *)(* {26 000 000, {True, True, False, False, False, False, False, False, False, False}} *)(* {{True, 2 000 000}, {False, 24 000 000}} *)

Here are the timings, as measured by AbsoluteTiming, that I obtained on different systems and Mathematica versions:

Mathematica 14.0.0 on Windows 11

  • Processor: 13th Gen Intel(R) Core(TM) i7-13700, 2.10 GHz (x86-64; released Q1 2023)
  • Total cores: 16; performance-core base frequency: 2.10 GHz
  • $ProcessorCount: 16
(* Mathematica 14.0.0 on 64-bit Windows 11 Enterprise *)(* OUTPUT: *)(* {"METHOD 1a (==, mapped)", 10.65, Null} *)(* {"METHOD 2a (StringMatchQ, mapped)", 17.2123, Null} *)(* {"METHOD 2b (StringMatchQ, operator form)", 1.20552, Null} *)(* {"METHOD 2c (StringMatchQ, listable form)", 1.20461, Null} *)(* {"METHOD 3a (MatchQ, mapped)", 5.91739, Null} *)(* {"METHOD 4a (MemberQ, mapped)", 8.67486, Null} *)

Mathematica 13.3.1 on Windows 11

  • Processor: 13th Gen Intel(R) Core(TM) i7-13700, 2.10 GHz (x86-64; released Q1 2023)
  • Total cores: 16; performance-core base frequency: 2.10 GHz
  • $ProcessorCount: 16
(* Mathematica 13.3.1 on 64-bit Windows 11 Enterprise *)(* Processor: 13th Gen Intel(R) Core(TM) i7-13700, 2.10 GHz *)(* OUTPUT: *)(* {"METHOD 1a (==, mapped)", 12.2645, Null} *)(* {"METHOD 2a (StringMatchQ, mapped)", 19.5481, Null} *)(* {"METHOD 2b (StringMatchQ, operator form)", 1.23778, Null} *)(* {"METHOD 2c (StringMatchQ, listable form)", 1.2151, Null} *)(* {"METHOD 3a (MatchQ, mapped)", 7.78893, Null} *)(* {"METHOD 4a (MemberQ, mapped)", 9.70204, Null} *)

Mathematica 9.0 on Windows 7

  • Processor: 4th Gen Intel(R) Core(TM) i7-4790, 4.00 GHz (x86-64; released Q2 2014)
  • Total cores: 4; processor base frequency: 4.00 GHz
  • $ProcessorCount: 4
(* Mathematica 9.0 on 64-bit Windows 7 Pro *)(* (Mathematica 9.0 does not include the operator form for StringMatchQ,  so I cannot use Method 2b on Mathematica 9.) *)(* Processor: 4th Gen Intel(R) Core(TM) i7-4790, 4.00 GHz *)(* OUTPUT: *)(* {"METHOD 1a (==, mapped)", 19.562671, Null} *)(* {"METHOD 2a (StringMatchQ, mapped)", 33.438184, Null} *)(* {"METHOD 2c (StringMatchQ, listable form)", 3.348192, Null} *)(* {"METHOD 3a (MatchQ, mapped)", 23.965319, Null} *)(* {"METHOD 4a (MemberQ, mapped)", 19.509194, Null} *)

Summary of timings

  • Mathematica 14.0.0 & 13.3.1: Method 2c ~ 2b < 3a < 4a < 1a < 2a

    • In other words, 2c(StringMatchQ,listable)~2b(StringMatchQ,operator)<<3a(MatchQ,mapped)<4a(MemberQ,mapped)<1a(==,mapped)<2a(StringMatchQ,mapped)
  • Mathematica 9.0: Method 2c < 4a ~ 1a < 3a < 2a

    • In other words, 2c(StringMatchQ,listable)<4a(MemberQ,mapped)~1a(==,mapped)<3a(MatchQ,mapped)<2a(StringMatchQ,mapped)

Overall summary

  • Method 2c(StringMatchQ,listable) had the best performance in all Mathematica versions tested: 1.20461, 1.2151, and 3.348192 seconds for versions 14.0.0, 13.3.1, and 9.0, respectively.
  • Method 2a(StringMatchQ,mapped) had the poorest performance in all Mathematica versions tested: seconds for versions 14.0.0, 13.3.1, and 9.0, respectively.

My question

Why is Method 3a(MatchQ,mapped) significantly faster than Method 2a(StringMatchQ,mapped)?

  • v14: 5.91739 vs. 17.2123 sec; v13.3.1: 7.78893 vs. 19.5481; and v9: 23.965319 vs. 33.438184. In other words, 2a(MatchQ,mapped) is ~2.9x (v14), ~2.5x (v13.3.1), and ~1.4x (v9) faster than 3a(StringMatchQ,mapped), I think.
  • This surprises me because I would've thought that StringMatchQ would be more optimized for strings than MatchQ in the case that StringMatchQ is NOT asked to do any "preprocessing" on the strings (e.g., implicitly, IgnoreCase -> False and SpellingCorrection -> False in StringMatchQ in the present case).
  • Without being able to look "under the hood," so to speak, of the Wolfram language, it's difficult to say with certainty why MatchQ is so much faster than StringMatchQ. But does anyone have any insight or potential explanations?

Take-home message

  • It seems to me that a take-home message from this exercise is that "listability" in StringMatchQ is highly optimized. (Interestingly, however, StringMatchQ does not appear to actually have attribute Listable: Attributes[StringMatchQ] returns only {Protected} in the Mathematica versions that I tested. Hence my quotation marks around "listability.") I suspect that "listability" in other Wolfram language built-in functions are similarly well optimized.
  • Are there any other glaring take-home messages in the results above?

Viewing all articles
Browse latest Browse all 186

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>