Quantcast
Channel: Active questions tagged string-manipulation - Mathematica Stack Exchange
Viewing all articles
Browse latest Browse all 186

Extract doi from a text string

$
0
0

I have a large list with bibliometric information including the doi like this example:

stringdata = {{"name surname, 2207, journal name, vol5, p1233, doi 10.1016/j.enpol.2125.13.027"}, {"name surname, 2007, journal name, vol20, p33, doi [doi 10.1167/0020886984272851, 10.1167/0020886984272851]"}};

Note that the second row contains 2 dois (their are often but not always identical) Now I want to extract the doi for each row by using StringCaseslike this:

doiextr1 = StringCases[stringdata[[1]], Shortest[ "10." ~~ doiprefix__ ~~ "/"] ~~ doisuffix___ ~~ WordBoundary :> StringInsert[doiprefix, "10.", 1] <> "/" <> doisuffix]

which yields:{{"10.1016/j.enpol.2125.13.027"}}for the first row

and

doiextr2 = StringCases[stringdata[[2]], Shortest[ "10." ~~ doiprefix__ ~~ "/"] ~~ doisuffix___ ~~ ", " :> StringInsert[doiprefix, "10.", 1] <> "/" <> doisuffix]which yields{{"10.1167/0020886984272851"}}for the second row.

My question is: I am looking for a code that can extract both rows. In the case of the second row just the first doi. Ist there a way to combine WordBoundaryand ", "in StringCases??

many thx


Viewing all articles
Browse latest Browse all 186

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>