Finding SHORTEST Text
Posted: 2023-03-17 14:46:30
I feel as if I ought to know this, but I just can't make it work.
I have a list of bibliographic references that I need to clean up. Every entry has an abbreviation at the beginning, followed by a colon and space, then the actual entry. Here are three example lines:
Chambers: E. K. Chambers, _English Literature at the Close of the Middle Ages_, Oxford, 1945, 1947
Chaucer/Benson: Larry D. Benson, general editor, _The Riverside Chaucer_, third edition, Houghton Mifflin, 1987
Dobson/Taylor: R. B. Dobson and J. Taylor, _Rymes of Robyn Hood: An Introduction to the English Outlaw_, University of Pittsburg Press, 1976
What I want to eliminate is everything up to the first colon (the parts shown in bold), leaving the rest. So the above should become
E. K. Chambers, _English Literature at the Close of the Middle Ages_, Oxford, 1945, 1947
Larry D. Benson, general editor, _The Riverside Chaucer_, third edition, Houghton Mifflin, 1987
R. B. Dobson and J. Taylor, _Rymes of Robyn Hood: An Introduction to the English Outlaw_, University of Pittsburg Press, 1976
So what tried to do is change
\n[^\n\f]+\: +
that is, (return)(any text)(colon)(space)(1+)
to just plain
\n
that is, (return).
This of course works for the first two lines, but the third line comes back as
An Introduction to the English Outlaw_, University of Pittsburg Press, 1976
It took all text to the last colon in the line, not the first colon.
How do I get the shortest match, not the longest?
I have a list of bibliographic references that I need to clean up. Every entry has an abbreviation at the beginning, followed by a colon and space, then the actual entry. Here are three example lines:
Chambers: E. K. Chambers, _English Literature at the Close of the Middle Ages_, Oxford, 1945, 1947
Chaucer/Benson: Larry D. Benson, general editor, _The Riverside Chaucer_, third edition, Houghton Mifflin, 1987
Dobson/Taylor: R. B. Dobson and J. Taylor, _Rymes of Robyn Hood: An Introduction to the English Outlaw_, University of Pittsburg Press, 1976
What I want to eliminate is everything up to the first colon (the parts shown in bold), leaving the rest. So the above should become
E. K. Chambers, _English Literature at the Close of the Middle Ages_, Oxford, 1945, 1947
Larry D. Benson, general editor, _The Riverside Chaucer_, third edition, Houghton Mifflin, 1987
R. B. Dobson and J. Taylor, _Rymes of Robyn Hood: An Introduction to the English Outlaw_, University of Pittsburg Press, 1976
So what tried to do is change
\n[^\n\f]+\: +
that is, (return)(any text)(colon)(space)(1+)
to just plain
\n
that is, (return).
This of course works for the first two lines, but the third line comes back as
An Introduction to the English Outlaw_, University of Pittsburg Press, 1976
It took all text to the last colon in the line, not the first colon.
How do I get the shortest match, not the longest?