GREP

Everything related to our flagship word processor.
Post Reply
levelbest
Posts: 8
Joined: 2018-04-04 06:29:10

GREP

Post by levelbest » 2018-12-28 10:42:32

I am doing a lot of work cleaning up an old PDF file. It was very dirty so the rendered text has a high degree of junk in it. I am finding periods in random places inside of words. Also comas and other characters are in words.

Starting with clearing the extra random periods, I am looking for a GREP assignment that could say, if a period is between two text characters then, delete it. In this document the only use of a period would be at the end of a sentence. My thinking is that a character, a period, and a space, is not the same as a period between two characters.

And, looking at the end of a line probably isn't going to help as there are so many broken up text blocks, how can I define a line consistently at this point? So, basically I am looking for a way in GREP to identify if aq period is surrounded by text characters and to ignore a text character, a period and a space.

There are several options in the find dialog. I tried experimenting and on the first one almost deleted most of my text. "(AnyText).(AnyText)". I undid this of course. Then I thought, better ask here, I might learn something.

I have had Nisus since the 80s. I have not used the grep features in a very long time. I can't find a good primer on learning GREP. Used to be the Nisus owners manual was nice to sit with and study.
Last edited by levelbest on 2018-12-28 14:05:24, edited 1 time in total.

Vanceone
Posts: 136
Joined: 2013-05-03 07:06:31

Re: GREP

Post by Vanceone » 2018-12-28 11:18:04

You don't need to use GREP, thankfully. You can use Powerfind instead.

The key concept is capturing the text you want.

So, try this "Find" expression: (Capture((AnyTextCharacter))).(Capture()(AnyTextCharacter)())

What I did what insert a "Capture" block from the "match" submenu. I then put the AnyTextCharacter wildcard inside the capture block. After that capture block, without spaces, I put a period, then repeated the process. This find expression means find any text character, then a period, then another text character.

The Replace block just had Captured1 captured2. without the period. Try it out, it worked for me.

levelbest
Posts: 8
Joined: 2018-04-04 06:29:10

Re: GREP

Post by levelbest » 2018-12-28 12:52:20

Thanks, it's not quite working for me. Perhaps I have not understood your example?

Trying:
FIND: (Capture((AnyTextCharacter))).(Capture()(AnyTextCharacter)())
REPLACE: (Capture((AnyTextCharacter)))(Capture()(AnyTextCharacter)())

Hitting Replace All

Getting error beep as in, nothing to find.

Same results when trying:
FIND: (Capture((AnyTextCharacter))).(Capture()(AnyTextCharacter)())
REPLACE: Capture1Capture2

FIND: (Capture((AnyTextCharacter))).(Capture()(AnyTextCharacter)())
REPLACE: "Capture1""Capture2"

Note: In both cases I am leaving no space as I am trying to erase any random periods in words.

If you meant use a variable as in Capture1, Capture2, I am not understanding this step.
Does the expression (Capture((AnyTextCharacter))) create a variable named Capture1?
Does using the expression twice, create a second variable named Capture2, again without declaring it?

User avatar
phspaelti
Posts: 997
Joined: 2007-02-07 00:58:12
Location: Japan

Re: GREP

Post by phspaelti » 2018-12-29 06:24:56

Hello levelbest,
first let me make one important fix to Vance's basic idea. You shouldn't use the AnyTextCharacter wildcard. Spaces count as text characters and you are seeking to eliminate periods not followed by space. So you should probably use AnyWordCharacter instead.

Now let's talk about the whole capture thing. It is indeed the case that the captures that you put in the Find box result in a series of numbered variables. And it is also true that if you use Captured1 multiple times in the Replace box that you will get that part duplicated. And using Capture brackets in the Replace box is obviously not going to work at all, they serve no purpose there. What Vance was trying to do is capture the character before and after the period and then keep them in the replace expression. So he used two pairs of capture brackets the first around the first text wildcard and the second around the second. The first bracket pair creates the variable Captured1 and the second the Captured2. I'm not sure why this didn't work for you.

Now I would actually recommend a different approach. First note that you are really distinguishing periods by what follows only: WordCharacter => delete, space (or other?) => keep. Instead of Capture brackets you can actually used the FollowedBy (or NotFollowedBy) brackets. This will absolve you from needing to deal with the capture variables.
Delete_periods.png
Delete_periods.png (106.35 KiB) Viewed 2681 times
That should do it.
philip

Isus43
Posts: 2
Joined: 2019-01-01 03:16:30

Re: GREP

Post by Isus43 » 2019-01-02 05:37:15

I am quiet new to nisus - and of course to powerfind....
I now just want a searching by fontsizes or by bold/italic... other parameter are not importand in the moment.

So, how ist eg. the "syntax" for eg. search for words with fonstize 15?

Thank you!

User avatar
martin
Official Nisus Person
Posts: 4483
Joined: 2002-07-11 17:14:10
Location: San Diego, CA
Contact:

Re: GREP

Post by martin » 2019-01-04 14:14:02

Isus43 wrote:
2019-01-02 05:37:15
I now just want a searching by fontsizes or by bold/italic... other parameter are not importand in the moment.
If you want to search by formatting there are a few tools in Nisus Writer to help you. I'll explain a few different options:

The Formatting Examiner palette
This is a new palette in Nisus Writer Pro 3 that makes searching by formatting very simple. Here's what you do:

1. Select any piece of text that has the desired formatting, eg: font size 15.
2. In the palette's formatting list select the formatting you want to search for (you can select one or more from the list).
3. Click the button with the gear icon to show a menu and use one of the Find commands:
palette.png
palette.png (53.76 KiB) Viewed 2452 times

Formatting Sensitive Search Expressions
You can use PowerFind (or regex) and the find panel's "Formatting Sensitive" option to match all text with a particular set of attributes. Here's how:

1. Enter a search expression that can match the text characters you're interested in.
If you want to match most regular text (words, numbers, and punctuation) then the (AnyText) bubble works nicely.
2. Select the search expression in the "Find what" field.
3. Use the main menus to apply the desired formatting. This will automatically engage the "Formatting Sensitive" option.
4. Click the find panel's search buttons to select the matching text.
find.png
find.png (35.44 KiB) Viewed 2452 times

Find By Formatting Macros
One last option is the macros under the menu Macros > Find. The "Select By Font" macro will let you pick from all fonts in the document. The "Select By Attributes" macro lets you pick from a big list of formatting, as applied to the selected document text.


I hope this helps!

Isus43
Posts: 2
Joined: 2019-01-01 03:16:30

Re: GREP

Post by Isus43 » 2019-01-06 02:20:07

Thank you martin!

Post Reply