PowerFind Pro

Everything you’ve learned about finding and replacing text with Normal Find and PowerFind also applies to PowerFind Pro.

PowerFind Pro uses metacharacters11 to represent the wild cards and other specialized feature characters which are used in the Find & Replace window. While PowerFind used bubbles to represent the special feature characters, PowerFind Pro uses text which you type from the keyboard (or choose from the menus) in one, two, and three character combinations to represent these same special characters and more. PowerFind Pro metacharacters enable even more elaborate Find/Replace options. When PowerFind Pro is the active method you can still choose any of the commands from the Find & Replace window’s menus, however, Nisus Writer Pro displays them as actual textual metacharacters, not as the bubble expressions.

The metacharacters Nisus Writer Pro uses for Find and Replace expressions in PowerFind Pro are based on GREP (often called a “Global Regular Expression Parser”), a feature of UNIX® operating systems. Even if you do not have a programming background or a familiarity with UNIX, with a little practice, you’ll be surprised how easily you can learn to manipulate text with PowerFind Pro and extend your use of Nisus Writer Pro from an elaborate electronic typewriter to a full featured writing, and text manipulating, environment.

You can study the examples already discussed in the section “Examples of putting PowerFind to use” beginning on page 472, but here as well, simply choose PowerFind Pro (regex) instead of the menu command: Search Type > PowerFind.

The following examples illustrate some of PowerFind Pro’s capabilities and provide useful procedures that you can adapt. The instructions assume the Find & Replace window is open and that you chose the menu command: Search Type > PowerFind Pro (regex).

Exercises, or examples of putting PowerFind Pro to use

Find a seven-digit phone number

This expression finds a seven-digit phone number that includes a hyphen.

Enter \d\d\d-\d\d\d\d in the Find box.

\d is the metacharacter that represents any numeric (0 through 9) character

- represents the - (hyphen) character.

Find any number of trailing spaces in a document or tabs at the end of paragraphs

Use this expression to clean up electronic files that contain extra characters.

Enter [[:blank:]]+$ in the Find box.

[[:blank:]]+$ is a metacharacter statement made up of the PowerFind Pro metacharacters which signify

[[:blank:]]
space or tab character

+ is equivalent to image-336-2.jpg (1+ on the Repeat menu).

$ located at the end of a paragraph character.

Find the invisible null (ASCII (Unicode) 0) character

Enter \0 in the Find box.

\0 is the metacharacter that represents the invisible null character (ASCII (Unicode) code 0).

Find repeated groups of characters

Here’s how to find 121212, as well as 1212 and 12, as in the phrase, “Send 12 of part number 121212 to room 1212 at building 412 in Troy, NY 12180.” With Whole Word turned on, Nisus Writer Pro would find the first three numbers, 12, 121212, and 1212. With Whole Word turned off, Nisus Writer Pro would find the 12 in 412 and 12180 as well.

Enter (12)+ in the Find box.

( and ) are equivalent to the parentheses commands in the Match menu.

12 is the expression that you want to find (you can replace this with any other expression you want to find)

+ “one or more times” character

Find any set of characters

Suppose you want to search your document for all punctuation marks.

Enter [,.;?!]+ in the Find box.

[ and ] the brackets stand for a user-defined wild card, that is whatever appears within them. In this case Nisus Writer Pro interprets the punctuation marks literally.

+ is equivalent to +is equivalent to image-337-2.jpg (1+ on the Repeat menu).

Characters with special meaning

The characters listed here have special meaning in PowerFind Pro expressions. To use one of these characters literally in an expression, precede it with a backslash “\”. This includes the backslash itself (so if you want to find a backslash, you need to have “\\” in the Find box.)

Characters with Special Meaning

Name

Appearance

Meaning

backslash

\

Changes character meaning

caret

^

Represents the beginning of paragraph position

dollar sign

$

Matches the end of the paragraph position

asterisk

*

Expression that precedes occurs zero or more times

plus

+

Expression that precedes occurs one or more times

question mark

?

Expression that precedes occurs zero or once

period

.

Any character not including the Return

brackets

[ and ]

For enclosing user-defined wild cards

vertical line
backslash

|

For finding the character either before or after this character “Or”. By itself it finds anything, but selects nothing.

Table 30
Characters with special meaning in PowerFind Pro

As you work with PowerFind Pro, keep in mind that the meanings of most special characters depend on their context.

Modifier characters

Use the modifier characters backslash and colon to change how Nisus Writer Pro interprets what follows them.

\ the backslash

In PowerFind Pro, the backslash changes the meaning of the character or characters that follow it. For example the character n, is not a metacharacter and has no special meaning. However, \n represents the New Line character in an expression.

Parenthesized expressions

Use parentheses in PowerFind Pro Find expressions in the same manner as you do with PowerFind with these additional guidelines.

The ( ) characters in PowerFind Pro correspond to the Capture expression found in the Match menu in PowerFind and explained on page 463. For example, the search for (my expression) (where my expression represents some text such as the airplane) continues as if the ( and ) were not present. Nisus Writer Pro then remembers the matched expression, that is, the Found or Captured# expression that fits the parameters of the expression my expression. To refer to the first parenthesized expression again in the Find what expression or in the Replace with expression, use the metacharacter \1, this is the PowerFind Pro equivalent to PowerFind’s image-398.png.

Use parentheses to include repeated characters in one expression, then append a repeat metacharacter such as +. For example, the expression (AB)+ finds AB followed by AB repeated one or more times in succession.

You can refer to strings of text you have found in any Find expression even within that same Find expression, but you must first place that segment of the Find expression in parentheses in order to refer to it later. For example the text:
ABCDEFABGHABabcdABabcdEF
has only five patterns repeated in its twenty-four characters.

image.pdf-2.jpg 

Figure 422
Repeated text patterns

Nisus Writer Pro can find each of these patterns with the Find expression

(AB)([[:upper:]]+?)(EF)\1([[:upper:]]+?)\1([[:lower:]]+?)\1\5\3

Rearranging the Replace expression (and adding various formatting options):

\2\4\5\1\2\3\3\4\1\5\5\2\1\4

arrives at this result:

CDGHabcdABCDEFEFGHABabcdabcdCDABGH

Nisus Writer Pro can ignore, or not “capture” some of the expressions it finds.

The metacharacters used in the PowerFind expression for AnyWord ((?:\m\w+\M)) “(?:” and “?)” have the exact same function as ( and ) except that they do not create a replace expression (i.e. \1, \2, \3, etc.). These allow you to search for a particular string of text without creating a corresponding Replace with expression. You can use these for grouping other metacharacters together to apply a single modifier to them. For example in the pattern in Figure 422 you can, as an alternative, decide not to capture certain strings and rearrange the replace pattern as illustrated in Figure 423.

image.pdf-3.jpg

Figure 423
Uncaptured text patterns

(AB)(?:[[:upper:]]+?)(EF)\1([[:upper:]]+?)\1([[:lower:]]+?)\1\4\2

Rearranging the Replace expression (and adding various formatting options):

\2\4\1\2\3\3\4\1\2\1\4

arrives at this result:

EFabcdABEFGHGHabcdABEFABabcd

Neither of these tasks may seem particularly useful for normal word processing. However, you can use Nisus Writer Pro as a text processor to manipulate raw text.

As illustrated here, you can replace a wide variety of formatting in a Replace with expression.
As explained on page
446 you can find literal text with specific formatting.
You can also search for expressions using metacharacters where formatting assigned to the first character in the Find what expression.
However, you cannot search for an expression using metacharacters that uses multiple formats. In this instance Nisus Writer Pro removes the formatting it cannot search for and then runs the search.

Pre-defined wild cards

The following metacharacters are predefined wild cards. Each metacharacter finds a character from a given set. You can choose some of these metacharacters from the Wild Card menu available in the Find & Replace window. Wild cards are only meaningful when you use them to construct the Find expression. If you enter them in the replacement pattern, Nisus Writer Pro interprets them literally because they do not represent a unique match.

Wild Card What it Does

[[:alpha:]] Finds any lowercase or uppercase character; the equivalent of the command AnyLetter ([A-Za-z])in the Wild Card menu, plus all modified alphabetics typed using the key. This finds any character Unicode considers an alphabetic.

[[:alpha:]_] Finds any alphabetic or diacritical “modified alphabetic,” underscore character “_” (ASCII (Unicode) 95 _). This finds any character Unicode considers an alphabetic plus the underscore character as used in “file_names” on DOS machines.

[^[:alpha:]] Finds any non-alphabetic character including the underscore character “_” (ASCII 95 _).

[^[:alpha:]_] Finds any non-alphabetic character excluding the underscore character “_” (ASCII 95 _).

[[:blank:]] Finds either a space (ASCII (Unicode) 32) or a tab (ASCII (Unicode) 9); same as [\s\t].

\d Finds any digit from 0 through 9; same as menu command AnyDigit ([0-9])from the Wild Card menu.

[[:xdigit:]] Finds any digit or alphabetic character from a to f; covers the ranges 0-9 a-f A-F; use to find hexadecimal numerals.

[[:lower:]] Finds any lowercase alphabetic character from a to z plus modified alphabetics; the equivalent of the command LowercaseLetter ([a-z])in the Wild Card menu.

[[:alnum:]] Finds any alphanumeric character; the equivalent of the command AnyLetterOrDigit ([A-Za-z0-9]) in the Wild Card menu.

\w Finds any alphanumeric, underscore character; same as [[:alnum:]] but includes the underscore character.

[[:upper:]] Finds any uppercase alphabetic character in the range A to Z; the equivalent of the command UppercaseLetter ([A-Z])in the Wild Card menu.

\W Finds any non-alphanumeric character (excluding a Return and an underscore); opposite of \w.

[[:punct:]] Finds any “punctuation” character such as ( [ send text as rtf | send rtf | rtf | text as rtf; ( _ @ # % & * \ ? ! } ] ).

User-defined wild cards

PowerFind Pro also allows the user to define wild cards.

There are two ways to define wild cards. One way is to list all the characters that the wild card matches between brackets []. For example, the expression [abc] is a wild card that matches any of the three letters a, b, or c and nothing else.

Another way is to list all the characters the wild card does not match between brackets and a caret [^]. Here the caret means “not in the set.” For example, [^abc] is a wild card that finds any character that is not a, b, c, or Return. To include the Return character in the set, use the expression [[^abc]||[\n]].

[∑] (any character from a set ∑ excluding Return)

[∑] can define a range, an enumerated set, or a combination of both. To define a range use a hyphen “-” between the start of the range and the end of the range. For example, the wild card [a-f] represents any character in the alphabetic range from a through f. If you use the bracket to define a set that includes any digit, you can use \d or other escapes inside the character class. For example [a-z\d] will match all letters and numbers. Similarly [\d\s] finds all digits and white space characters. The range assumes the order of increasing ASCII (Unicode) codes. Enter the hexadecimal representation for space \x20.

Be sure to uniquely define each specified character, which means you cannot use certain wild cards. A backslash \ preceding any other character causes Nisus Writer Pro to interpret the character literally. For example, to use the six characters , ], [, -, ^ and \ in a set begin the string with \. Enter \ twice to include the \ character itself.

Nisus Writer Pro interprets all characters inside the brackets [] literally. For example, a search for [:a] matches a colon or an a.

[^∑] (any character not from a set ∑)

[^∑] finds any character not in the set . For example, [^a-z] finds any character not a lowercase alphabetic and the pattern [^!-~] finds any character whose ASCII (Unicode) code is not in the range 33 to 126. A particularly useful application of this metacharacter is for finding columns of text; the expression [^\t]+\t finds anything that is not a tab (which appears one or more times) followed by a tab.

Characters with a unique match

These metacharacters usually match a character that does not print. Because these characters are a unique match, you can use them in Find and Replace expressions.

Character Finds

\0 (null) Null character (ASCII (Unicode) code 0)

\b Backspace character (ASCII (Unicode) code 8)

\t Tab character (ASCII (Unicode) code 9); press to insert an actual tab character in an expression

\n Newline (macOS end of line character)

\v Vertical Tab

\f Page breaks or a form feed character (ASCII (Unicode) code 12); use \f as a replace expression to remove all page breaks created by choosing the menu command: Insert > Page Break as well as all the various section breaks inserted using the menu command: Insert > Section Break

\s Any White Space character

Repeat characters

The plus, asterisk, and minus signs: +  *  - signal the repeat of the previous character or parenthesized expression. Repeat characters follow a character or parenthesized expression.

When searching backward, Nisus Writer Pro finds the shortest sequence.

Character What it Does

+ Finds one or more occurrences of whatever that character or expression matches; the equivalent of the command 1+ on the Repeat menu

* Finds zero or more occurrences of whatever that character or expression matches; the equivalent of the command 0+ on the Repeat menu (“it may or may not be there”) menu (it may or may not be there)

? Finds zero or one occurrence of whatever that character or expression matches; the equivalent of the command 0 or 1 on the Repeat menu; you can also use minus inside brackets to signify a range

??, +?, *? Finds the shortest match allowed from each of the three repeat characters.

Match characters

Matching characters using PowerFind Pro works much like matching characters using PowerFind. The main differences are the metacharacters used and the added flexibility PowerFind Pro offers.

Character What it Does

( to ) Defines a replaceable parenthesized expression that can be recalled by numerical order; the equivalent of the various Capture commands in the Match menu

\1 to \20 Finds parenthesized expressions 1 to 20; the equivalent of the commands Captured1 through Captured10 and the menu command: Match > OtherCaptured

(?: to ) Defines a parenthesized expression that does not create a “back reference”.

PowerFind Pro find expressions

Expression Finds

\(([^(]|\n)+\) Any text between parentheses

["“].+?["”] Any text between quotes

^[[:alpha:]]+ Any alphabetic word at the beginning of a paragraph

\m(\w+)\s\1\M
Any two consecutive duplicate “words”; which don’t have to be alphabetic and can be separated by a Tab or a
Return

([\x20\t])\1+ Two or more spaces or tabs in a row; to remove extra blank characters use \1 in the Replace expression

[\x20\t]+$ Any number of spaces or tabs that end a paragraph; to remove trailing blanks at the end of a paragraph, leave the Replace box blank

^ Beginning of a paragraph that contains text or images

(["”])([.,]) A quotation mark followed by a period or a comma; conventionally the comma and the period are enclosed within quotation marks—to rearrange these punctuation marks and the quotation mark use \2\1 in the Replace expression

([;])(["”]) A semicolon or a colon followed by a quotation mark; conventionally the colon and semicolon aren’t enclosed within quotation marks—to rearrange these punctuation marks and the quotation mark use \2\1 in the Replace expression

\x20+(?=\.) Any trailing blanks before a period

[G-Z] Finds any uppercase character except A through F

PowerFind Pro replace expressions

If a metacharacter or expression has a unique match then it can be used in constructing replacement patterns. If a metacharacter is not allowed in a particular context, Nisus Writer Pro will not let you select it from the menu in PowerFind. No such restriction applies when using PowerFind Pro.

Definitions in Nisus Writer Pro

Item What it Defines

Alphabetic Characters
All characters in the range Aa through Zz plus all the and alphabetics (all modified alphabetics European
languages use). Modified alphabetics are also called Diacriticals, sometimes called “delayed strike” characters;

Alphanumeric Characters
All alphabetics and digits 0 through 9, Control Characters,
ASCII (Unicode) code characters 0-12 and 14-31 (these are “non-printing” characters);

Word Any combination of alphanumeric characters (from as few as one character to extremely long strings) surrounded by non-alphanumeric characters.

Advanced exercises, or more examples of putting PowerFind to use

These examples explain useful expressions in PowerFind Pro. You can use them as helpful tools for preparing documents as well as for developing more complex user-defined expressions.

Swap the sequence of words

Swap any two consecutive words even if they are separated by a Return or a tab, use the Find expression (\m\w+\M)(\s+)(\m\w+\M) and Replace expression \3\2\1. In this example, “report status” replaces “status report.” Here’s what the characters mean

Character What it Means

(\m\w+\M) Finds a whole word (as defined above)

(\s+) Finds any sequence of blank characters such as spaces, tabs, or Returns

\3 Represents the third parenthesized expression which is the second of the two words found

\2 Finds whatever the second parenthesized expression (\s+) matches

\1 Finds the first parenthesized expression which is the first of the two words

Find any and/or all words that begin and end with specified characters

You can find any or all words that begin and end with characters you specify. For example all words that share the same prefix and suffix (such as “preparation” and “prestidigitation”). The example \m(a)([^\s]*)(d)\M finds every word that begins with “a” and ends with “d” from “ad” through “and” to “ampersand.”

Character What it Means

( and ) Expression delimiters, useful if you later want to manipulate the three segments of the words

\m Finds the beginning of a word

a Not a metacharacter, but here it represents the character or string of characters that must appear at the beginning of the word

\S* Finds any string of characters that is not a space (the middle of the word which doesn’t even need to exist)

d Not a metacharacter, but here it represents the character or string of characters that must appear at the end of the word

Change multiple periods to ellipses

The following expressions find a sequences of two or more periods that follow an alphabetic or a space and replaces them with an ellipsis “…”.

1. In the Find what text box enter: "([[:alpha:]]|\s)(\.\.+).

2. In the Replace with text box enter: \1….

Make sure your spaces follow the punctuation

Fast typists frequently press the Space Bar before typing their punctuation. The next expression finds one or more spaces that precede any punctuation and places them after the punctuation.

1. In the Find what text box enter: (\s)([[:punct:]]).

2. In the Replace with text box enter: \2\1.

Make sure your punctuation appears inside quotation marks

Common American practice is to have commas and periods appear inside of quotation marks. The following expression finds these punctuation marks that appear outside quotes and places them inside the quotes.

1. In the Find what text box enter: (”)(\.|,).

2. In the Replace with text box enter: \2\1.

If you want all punctuation to appear inside quotation marks use this expression:

1. In the Find what text box enter: (”)[[:punct:]].

2. In the Replace with text box enter: \2\1.

3. In the Find what text box enter: (\s)(…).

4. In the Replace with text box enter: \2.

Replace two or more spaces with one space

People who learned to type using typewriters learned to type two spaces at the end of each sentence. This was important because the typewriter’s characters were in a monospaced font (each character had the same width). Typing two spaces at the end of each sentence helped the reader to scan the text. On the computer, most fonts are “fractional” and the old requirement for two spaces after a period no longer applies.12

The following expression replaces two or more spaces with one space.

1. In the Find what text box enter: ([\x20][\x20]+).

2. In the Replace with text box enter: \x20.

While you may not see them, you may have typed many spaces at the end of paragraphs. Though this does not affect the meaning of your text, it may affect the formatting. The following expression removes “trailing spaces” (those spaces that hang at the end of a line before ASCII (Unicode) 10).

1. In the Find what text box enter: ([\x20])(\n).

2. In the Replace with text box enter: \2.

Find ten-digit phone numbers

You may have received files that have many phone numbers in them in multiple formats and you need to find them to make them all match. Alternatively, you may have a file with a phone number in it, but you don’t remember what the number is and you need to find it. To find phone numbers, with or without area codes, use this Find expression:

In the Find what text box enter:
(1[\x20\t]*-?[\x20\t]*)?(\(?\d\d\d\)?)?[\x20\t]*-?[\x20\t]*\d\d\d[\x20\t]*-?[\x20\t]*\d\d\d\d.

The above expression finds phone numbers in any of these forms (notice the presence or absence of spaces and hyphens)

1-(858) 481 - 1477

1-(858) 481 1477

1-858-481-1477

18584811477

(858)-481 - 1477

(858) 481-1477

858-481-1477

858 481-1477

8584811477

481-1477

481 1477

4811477


Previous Chapter
PowerFind
<<  index  >>
 
Next Chapter
Use the Formatting Examiner