Quantcast
Channel: Active questions tagged string-manipulation - Mathematica Stack Exchange
Viewing all articles
Browse latest Browse all 186

How to extract citations from a document

$
0
0

I would like to extract citations from a document. I can use the following:

txt = "Lorem ipsum dolor sit amet Name0 (2000), consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum (Name1, 2017). Name2 et al. (2022) lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat (Name3a Name3b, 2013). Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua (Name4a and Name4b, 1983; Name5, 1979), Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur (Name6 et al., 2015; Name7 et al., 2011).";dates = ToString /@   Sort@Select[    ToExpression /@      StringCases[txt, x : NumberString], # < 10^4 && # > 10^3 &];DeleteDuplicates[ StringJoin[StringPart[txt, Range[# - 30, #2]]] & @@@   Flatten[StringPosition[txt, #] & /@ dates, 1]]

to extract a selection from the text that contains the desitred terms. Here is an expample of the output I am currently getting:

{"me4a and Name4b, 1983; Name5, 1979", "na aliqua (Name4a and Name4b, 1983", "m ipsum dolor sit amet Name0 (2000", "6 et al., 2015; Name7 et al., 2011", "odo consequat (Name3a Name3b, 2013", "nulla pariatur (Name6 et al., 2015", "t anim id est laborum (Name1, 2017", " (Name1, 2017). Name2 et al. (2022"}

How should I refine or rewrite this to get:

{"Name0 (2000)", "(Name1, 2017)", "Name2 et al. (2022)", "Name3a Name3b, 2013)", "(Name4a and Name4b, 1983", "Name5, 1979", "Name6 et al., 2015", "Name7 et al., 2011"}

I thought of just doing StringCases[txt, Shortest["(" ~~ a : __ ~~ ")"] -> a] but this doesn't give desired results either.


Viewing all articles
Browse latest Browse all 186

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>