RegEx, can someone explain, please
Posted: 2022-03-09 09:47:46
On the Scrivener forum, a user wanted to search their project simultaneously for the word "house" as well as "garden", "gardens", "gardener" etc, but without finding "household" etc. Obviously, the answer is to use RegEx, so I turned to NWP as the best way to do it, as I'm not fluent in RegEx. I created a sentence, "They had a huge house, with a household staff of 20, and 4 acres of garden managed by a team of ten gardeners."
Another poster suggested the RegEx (\bhouse\b|\bgarden\b), which would only find "garden", not the derivatives. I used Powerfind to set up "0 or more lower case characters" and then switched the search to Powerfind Pro to turn it into code, which gave me \p{Lower}*. So my final Powerfind Pro expression was (\bhouse\b|\bgarden\p{Lower}*\b). That performed the search perfectly on my sentence in both NWP and Scrivener. However…
Wanting to know more about it, I looked up the relevant sections in the NWP manual which told me that:
\b is the code for "Backspace", yet in my expression the pairs act as boundaries, essentially equivalent to "Whole word"; the manual lists \m as the beginning of a word, but from the examples the end of the word seems to be \M, though the latter is not explained anywhere;
In the manual, I couldn't find any reference to \p, though the {lower} is fully understandable. The manual gives [[:lower:]] as the code for finding any lower case alphabetical character.
So, could someone please explain why my RegEx works in both apps? Rewriting it to follow the NWP manual also works in NWP (of course!), but it doesn't work in Scrivener.

Mark
Another poster suggested the RegEx (\bhouse\b|\bgarden\b), which would only find "garden", not the derivatives. I used Powerfind to set up "0 or more lower case characters" and then switched the search to Powerfind Pro to turn it into code, which gave me \p{Lower}*. So my final Powerfind Pro expression was (\bhouse\b|\bgarden\p{Lower}*\b). That performed the search perfectly on my sentence in both NWP and Scrivener. However…
Wanting to know more about it, I looked up the relevant sections in the NWP manual which told me that:
\b is the code for "Backspace", yet in my expression the pairs act as boundaries, essentially equivalent to "Whole word"; the manual lists \m as the beginning of a word, but from the examples the end of the word seems to be \M, though the latter is not explained anywhere;
In the manual, I couldn't find any reference to \p, though the {lower} is fully understandable. The manual gives [[:lower:]] as the code for finding any lower case alphabetical character.
So, could someone please explain why my RegEx works in both apps? Rewriting it to follow the NWP manual also works in NWP (of course!), but it doesn't work in Scrivener.

Mark