Powerfind ninja needed for search question

Everything related to our flagship word processor.
Post Reply
sethgodin
Posts: 27
Joined: 2006-12-14 09:39:15

Powerfind ninja needed for search question

Post by sethgodin »

I have a 50 page document, it contains 100 blog posts. Each blog post begins with a headline, which is a sentence without a period. For whatever reason, the headline is repeated for each post.

The second one is formatted with a style, so I can easily find them all and delete them all, leaving every post with just one headline, not the duplicate.

BUT

I'd like the headline that remains to be bolded.

Some of the things I've imagined but have been unable to figure out how to do:

Find every sentence that doesn't end with a period (series of words followed by a carriage return)

Find every unperioded sentence that is followed by something with style HEADER3

Find any string of words that is followed by a carriage return and then the identical string of words...

Am I thinking about this the wrong way?

Thanks for your help! Nisus is amazing.

(it won't let me attached it, here it is in case my description isn't helpful: https://www.dropbox.com/s/5yf3kxmlj31ee ... s.rtf?dl=0 )
User avatar
phspaelti
Posts: 1313
Joined: 2007-02-07 00:58:12
Location: Japan

Re: Powerfind ninja needed for search question

Post by phspaelti »

Generally all of your solutions would be possible approaches, though some are easier to approach than others.

Since you have the headlines already with a style you can just keep those and delete the ones before them. So find paragraphs that are followed by an identical paragraph. In Powerfind this will look like this:
Find Preceding Duplicate Paragraphs.jpg
Find Preceding Duplicate Paragraphs.jpg (118.81 KiB) Viewed 12718 times
philip
User avatar
phspaelti
Posts: 1313
Joined: 2007-02-07 00:58:12
Location: Japan

Re: Powerfind ninja needed for search question

Post by phspaelti »

Also one other approach you can use in similar cases:
  1. Select the styled paragraphs
  2. Replace the selected text with—or just insert—a special marker of some kind
  3. Find the paragraph followed by the special marker
This method can be useful when the styled paragraph does not have some otherwise recognizable pattern. It’s also easier than trying to do the same via styled search.
philip
sethgodin
Posts: 27
Joined: 2006-12-14 09:39:15

Re: Powerfind ninja needed for search question

Post by sethgodin »

super helpful!

thanks, will try it now.
adryan
Posts: 561
Joined: 2014-02-08 12:57:03
Location: Australia

Re: Powerfind ninja needed for search question

Post by adryan »

G'day, sethgodin et al

Philip has provided excellent solutions for you.

Another would be to delete the first paragraph following each Section Break — if blog posts were separated by Section Breaks.

Yet another would be to make use of any date stamping that might be associated with a post.

But allow me to address the broader issue raised by your query about whether you are thinking about it in the right way.

The general approach would be to formulate a Find string that finds either or both of the paragraphs in question. It's then a simple exercise to delete the one you don't want.

You've been trying to do this, so you're on the right track, but some of your candidates could lead to misadventure.

For example: "Find every sentence that doesn't end with a period (series of words followed by a carriage return)". The assumption here is that all such sentences occur only in blog headings. It's very easy to omit a concluding period at the end of an ordinary (non-heading) paragraph. One might even have been there originally but then been deleted inadvertently during some subsequent cut-and-paste maneuver or the like. Even in your own posting here, not every paragraph ends with a punctuation mark. So this particular approach is risky.

Another example: "Find any string of words that is followed by a carriage return and then the identical string of words…". In your present situation, I would only do this when the string consisted of an entire paragraph, because it's not inconceivable that you would find (and delete) the wrong thing.

The aim is to find something that is consistent throughout your document and specific enough to find all the sought occurrences and no unsought ones. This can indeed be a real challenge at times.

The key is to know your document. And not just what it should contain, but what it actually does contain. Only then can you tailor grep expressions that will do what you want. (I include PowerFind expressions here.)

With a 50-page document, it would be prudent to attack it with the power of grep only after making a duplicate, in case of mishap. I might add that it is playing with fire to rely on Undo (Cmd-Z) to bail you out of trouble. Any number of things could go astray and prevent you from backtracking. 50 pages can be a lot to unscramble or rewrite.

Cheers,
Adrian
MacBook Pro (M1 Pro, 2021)
macOS Ventura
Nisus Writer user since 1996
adryan
Posts: 561
Joined: 2014-02-08 12:57:03
Location: Australia

Re: Powerfind ninja needed for search question

Post by adryan »

G'day, sethgodin et al

Another thought…

If a document such as this comes from the Web, it’s worth looking at the source code to see if it contains features that could be useful in formulating the Find string. I’m thinking here about such things as link anchors. One paragraph in each pair may have them, while the other may not.

Cheers,
Adrian
MacBook Pro (M1 Pro, 2021)
macOS Ventura
Nisus Writer user since 1996
sethgodin
Posts: 27
Joined: 2006-12-14 09:39:15

Re: Powerfind ninja needed for search question

Post by sethgodin »

Thanks Adrian

the original method worked great. The breakthrough for me was discovering that a paragraph according to Nisus doesn't need punctuation.

I hadn't expected that.

this board is super helpful. Thank you all.
adryan
Posts: 561
Joined: 2014-02-08 12:57:03
Location: Australia

Re: Powerfind ninja needed for search question

Post by adryan »

G'day, sethgodin et al

It's probably worth pointing out here that, regardless of whether there is a concluding punctuation mark or not, not everything that looks like a paragraph is in fact a paragraph (in the sense recognized by the Find & Replace system).

A line that is shorter than others may end in a soft return or a tab character rather than a paragraph return. In such circumstances there may be clues (in the line spacing, for example) that suggest no ordinary paragraph return is in play. Soft returns are very common in text derived from documents on the Net (including email). All should be revealed when you have Show Invisibles in operation.

Text found by searching for AnyParagraph will not stop at the soft return but will go on to include subsequent text until a paragraph return or end of document is reached. A trailing paragraph return character is not included in the Found text unless you include it in the Find string.

Cheers,
Adrian
MacBook Pro (M1 Pro, 2021)
macOS Ventura
Nisus Writer user since 1996
User avatar
phspaelti
Posts: 1313
Joined: 2007-02-07 00:58:12
Location: Japan

Re: Powerfind ninja needed for search question

Post by phspaelti »

"Paragraph" as used here is a term of art. If you look "under the hood", you can see that it amounts to any repetition of 'text' characters, i.e., characters excluding the return/newline. So it puts no conditions on punctuation. Paragraph in this sense is what computer types usually call "line", except that the Nisus wildcard requires a positive specification of at least one character.

Nisus has over time rearranged its wildcards to try to bring them closer to what non-technical people might consider the intuitive definition. While I understand this, it can sometimes lead to surprising results. Meanwhile the Powerfind bubbles can sometimes obscure what is actually going on. So it can be helpful to become literate in Powerfind Pro.

NB: The reason I put 'text' character in quotes is that the Powerfind Pro/regex for text character is usually the period ('.'). This would be equivalent to '[^\n]', i.e., "anything that is not a return/newline". But the Nisus wildcard "text character is defined as '[^\n\f]', which means "anything that is not a newline or a (page) break". In traditional regex page breaks are text characters!

NB2: Note that since the Nisus wildcard "Paragraph" is '(?:^.+$)' under the hood, it too will include breaks as 'text' characters of the paragraph! So if you construct your own paragraph using wildcards [Start of Paragraph][AnyText][End of Paragraph] you will not get the same result if your document contains breaks.
philip
adryan
Posts: 561
Joined: 2014-02-08 12:57:03
Location: Australia

Re: Powerfind ninja needed for search question

Post by adryan »

G'day, Philip et al

I had noted the PowerFind Pro expression, but I confess I don't quite understand why it doesn't just begin with the caret. Enlightenment is sought.

Cheers,
Adrian
MacBook Pro (M1 Pro, 2021)
macOS Ventura
Nisus Writer user since 1996
User avatar
phspaelti
Posts: 1313
Joined: 2007-02-07 00:58:12
Location: Japan

Re: Powerfind ninja needed for search question

Post by phspaelti »

adryan wrote: 2020-12-11 23:57:31 I had noted the PowerFind Pro expression, but I confess I don't quite understand why it doesn't just begin with the caret. Enlightenment is sought.
You mean why the expression is wrapped with '(?: … )'?

This creates a non-capturing unit for the paragraph. But now that I think about it, I'm having a hard time thinking of a case when this would be useful for this expression.

In principle this could be useful if you wanted to follow the wildcard with a repeat mark of some kind. But to successfully match two or more paragraphs in a row you would need to have the return/newline as part of the expression. Since this expression specifically excludes the newline, and also forces the match to the start and end of the paragraph a repetition symbol will never be useful. If you wanted to match two, or more paragraphs you would need to follow this wildcard with a newline and then wrap the whole thing in parentheses before using the repeat. I'm not sure how many people would be able to figure that out without looking at the Powerfind Pro code.

This demonstrates what I was trying to say above. While such pre-combined wildcards may seem user-friendly they are also quite likely to lead to frustration unless you can parse the underlying expression.

I think Nisus would be better off calling this "Any Paragraph Contents", or something like that, and providing another wild card "Any Paragraph" that would basically amount to this: '(?:^.+\n)'. That could at least be followed by a repeat symbol and actually find something.
philip
adryan
Posts: 561
Joined: 2014-02-08 12:57:03
Location: Australia

Re: Powerfind ninja needed for search question

Post by adryan »

G'day, Philip et al

Thanks for the detailed explanation, Philip.

Even so, something was not gelling for me. So I looked really hard at your post and again at the Find & Replace dialog box. It finally dawned on me that it's a colon, not a period, that follows the question mark. Just not obvious with that particular juxtaposition of characters and the font size I usually use. No wonder I had trouble understanding the expression and couldn't find anything in the User Guide to explicate what was going on!

So now what you say makes perfect sense.

I generally just use "^.+\n", so sethgodin's query has forced me to examine all this in more detail.

Thanks again for more food for thought, Philip.

I've now got a very difficult decision to make: Submit a feature request to ban the use of colons as modifier characters in grep expressions, or change my font size in the F/R dialog box? I suppose I could just use Normal Find, but that smacks of overreaction to me….

Cheers,
Adrian
MacBook Pro (M1 Pro, 2021)
macOS Ventura
Nisus Writer user since 1996
Post Reply