Suppose I have myStrs
, a list of one million alphabets (i.e., 26 million single-character strings):
charRange = CharacterRange["A", "Z"];myStrs = Flatten[ConstantArray[charRange, 10^6]]; // AbsoluteTiming{0.258808, Null}ScientificForm[Length[myStrs], DigitBlock -> 3](* OUTPUT: *)(* {0.258808, Null} *)(* 26,000,000 *)
I wish to create a function that will return True
if a string is "A"
or "B"
, and False
. I have come up with several methods for doing this:
- using equality testing (
Equal
,==
) StringMatchQ
;MatchQ
; andMemberQ
Some of these methods have different forms: operator and/or listable forms.
(* METHOD 1: equality testing (Equal, ==) with Or (||) *)(* METHOD 1a, mapped *)Prepend[AbsoluteTiming[output1a = Map[# == "A" || # == "B" &, myStrs];],"METHOD 1a (==, mapped)"](* METHOD 2: StringMatchQ with Alternatives (|) *)(* METHOD 2a: mapped *)Prepend[AbsoluteTiming[output2a = Map[StringMatchQ[#, "A" | "B"] &, myStrs];],"METHOD 2a (StringMatchQ, mapped)"](* METHOD 2b: operator form *)Prepend[AbsoluteTiming[output2b = StringMatchQ["A" | "B"][myStrs];],"METHOD 2b (StringMatchQ, operator form)"](* METHOD 2b: listable form *)Prepend[AbsoluteTiming[output2c = StringMatchQ[myStrs, "A" | "B"];],"METHOD 2c (StringMatchQ, listable form)"](* METHOD 3: MatchQ with Alternatives (|) *)(* METHOD 3a: mapped *)Prepend[AbsoluteTiming[output3a = Map[MatchQ[#, "A" | "B"] &, myStrs];],"METHOD 3a (MatchQ, mapped)"](* METHOD 4: MemberQ *)(* METHOD 4a: MemberQ, mapped *)Prepend[AbsoluteTiming[output4a = Map[MemberQ[{"A", "B"}, #] &, myStrs];],"METHOD 4a (MemberQ, mapped)"]
Before presenting the output, I will show that the output lists are all identical:
(* Check that the resulting lists are all identical. *)(output1a == output2a) && (output2a == output2b) && (output2b == output2c)&& (output2c == output3a) && (output3a == output4a)(* Tally the output. *)Tally[output1a](* OUTPUT: *)(* True *)(* {26 000 000, {True, True, False, False, False, False, False, False, False, False}} *)(* {{True, 2 000 000}, {False, 24 000 000}} *)
Here are the timings, as measured by AbsoluteTiming
, that I obtained on different systems and Mathematica versions:
Mathematica 14.0.0 on Windows 11
- Processor: 13th Gen Intel(R) Core(TM) i7-13700, 2.10 GHz (x86-64; released Q1 2023)
- Total cores: 16; performance-core base frequency: 2.10 GHz
$ProcessorCount
: 16
(* Mathematica 14.0.0 on 64-bit Windows 11 Enterprise *)(* OUTPUT: *)(* {"METHOD 1a (==, mapped)", 10.65, Null} *)(* {"METHOD 2a (StringMatchQ, mapped)", 17.2123, Null} *)(* {"METHOD 2b (StringMatchQ, operator form)", 1.20552, Null} *)(* {"METHOD 2c (StringMatchQ, listable form)", 1.20461, Null} *)(* {"METHOD 3a (MatchQ, mapped)", 5.91739, Null} *)(* {"METHOD 4a (MemberQ, mapped)", 8.67486, Null} *)
Mathematica 13.3.1 on Windows 11
- Processor: 13th Gen Intel(R) Core(TM) i7-13700, 2.10 GHz (x86-64; released Q1 2023)
- Total cores: 16; performance-core base frequency: 2.10 GHz
$ProcessorCount
: 16
(* Mathematica 13.3.1 on 64-bit Windows 11 Enterprise *)(* Processor: 13th Gen Intel(R) Core(TM) i7-13700, 2.10 GHz *)(* OUTPUT: *)(* {"METHOD 1a (==, mapped)", 12.2645, Null} *)(* {"METHOD 2a (StringMatchQ, mapped)", 19.5481, Null} *)(* {"METHOD 2b (StringMatchQ, operator form)", 1.23778, Null} *)(* {"METHOD 2c (StringMatchQ, listable form)", 1.2151, Null} *)(* {"METHOD 3a (MatchQ, mapped)", 7.78893, Null} *)(* {"METHOD 4a (MemberQ, mapped)", 9.70204, Null} *)
Mathematica 9.0 on Windows 7
- Processor: 4th Gen Intel(R) Core(TM) i7-4790, 4.00 GHz (x86-64; released Q2 2014)
- Total cores: 4; processor base frequency: 4.00 GHz
$ProcessorCount
: 4
(* Mathematica 9.0 on 64-bit Windows 7 Pro *)(* (Mathematica 9.0 does not include the operator form for StringMatchQ, so I cannot use Method 2b on Mathematica 9.) *)(* Processor: 4th Gen Intel(R) Core(TM) i7-4790, 4.00 GHz *)(* OUTPUT: *)(* {"METHOD 1a (==, mapped)", 19.562671, Null} *)(* {"METHOD 2a (StringMatchQ, mapped)", 33.438184, Null} *)(* {"METHOD 2c (StringMatchQ, listable form)", 3.348192, Null} *)(* {"METHOD 3a (MatchQ, mapped)", 23.965319, Null} *)(* {"METHOD 4a (MemberQ, mapped)", 19.509194, Null} *)
Summary of timings
Mathematica 14.0.0 & 13.3.1: Method 2c ~ 2b < 3a < 4a < 1a < 2a
- In other words, 2c(
StringMatchQ,listable
)~2b(StringMatchQ,operator
)<<3a(MatchQ,mapped
)<4a(MemberQ,mapped
)<1a(==,mapped
)<2a(StringMatchQ,mapped
)
- In other words, 2c(
Mathematica 9.0: Method 2c < 4a ~ 1a < 3a < 2a
- In other words, 2c(
StringMatchQ,listable
)<4a(MemberQ,mapped
)~1a(==,mapped
)<3a(MatchQ,mapped
)<2a(StringMatchQ,mapped
)
- In other words, 2c(
Overall summary
- Method 2c(
StringMatchQ,listable
) had the best performance in all Mathematica versions tested: 1.20461, 1.2151, and 3.348192 seconds for versions 14.0.0, 13.3.1, and 9.0, respectively. - Method 2a(
StringMatchQ,mapped
) had the poorest performance in all Mathematica versions tested: seconds for versions 14.0.0, 13.3.1, and 9.0, respectively.
My question
Why is Method 3a(MatchQ,mapped
) significantly faster than Method 2a(StringMatchQ,mapped
)?
- v14: 5.91739 vs. 17.2123 sec; v13.3.1: 7.78893 vs. 19.5481; and v9: 23.965319 vs. 33.438184. In other words, 2a(
MatchQ,mapped
) is ~2.9x (v14), ~2.5x (v13.3.1), and ~1.4x (v9) faster than 3a(StringMatchQ,mapped
), I think. - This surprises me because I would've thought that
StringMatchQ
would be more optimized for strings thanMatchQ
in the case thatStringMatchQ
is NOT asked to do any "preprocessing" on the strings (e.g., implicitly,IgnoreCase -> False
andSpellingCorrection -> False
inStringMatchQ
in the present case). - Without being able to look "under the hood," so to speak, of the Wolfram language, it's difficult to say with certainty why
MatchQ
is so much faster thanStringMatchQ
. But does anyone have any insight or potential explanations?
Take-home message
- It seems to me that a take-home message from this exercise is that "listability" in
StringMatchQ
is highly optimized. (Interestingly, however,StringMatchQ
does not appear to actually have attributeListable
:Attributes[StringMatchQ]
returns only{Protected}
in the Mathematica versions that I tested. Hence my quotation marks around "listability.") I suspect that "listability" in other Wolfram language built-in functions are similarly well optimized. - Are there any other glaring take-home messages in the results above?