Strange things happen when opening a PDF document into NWP

Everything related to our flagship word processor.
Post Reply
kkatzmar
Posts: 4
Joined: 2014-12-02 12:29:50

Strange things happen when opening a PDF document into NWP

Post by kkatzmar »

I have thousands of documents from previous word processors that I converted into PDFs so that I can search the entire text with DEVONThink, or Spotlight. When I open an old PDF document into NWP, some very strange things happen:

---about one in ten lines, but not all, have lost the spacing between words. Other lines are fine, but I spend hours reinserting spaces between words in random lines. Very odd.
---even odder: NWP will not fully display words that have a "ff" or an "fi" in them, such as "first" or "effect." I have to go through the document and type in the missing "ff" and "fi". This seems like a terribly specialized bug.

I didn't expect to preserve paragraph formatting or other niceties when going from PDF to NWP, but these bugs seem like weirdnesses beyond the pale of weird. Has anyone else run into this? Anyone found a solution? Exorcism, perhaps?
User avatar
xiamenese
Posts: 543
Joined: 2006-12-08 00:46:44
Location: London or Exeter, UK

Re: Strange things happen when opening a PDF document into N

Post by xiamenese »

kkatzmar wrote:I have thousands of documents from previous word processors that I converted into PDFs so that I can search the entire text with DEVONThink, or Spotlight. When I open an old PDF document into NWP, some very strange things happen:

---about one in ten lines, but not all, have lost the spacing between words. Other lines are fine, but I spend hours reinserting spaces between words in random lines. Very odd.
---even odder: NWP will not fully display words that have a "ff" or an "fi" in them, such as "first" or "effect." I have to go through the document and type in the missing "ff" and "fi". This seems like a terribly specialized bug.

I didn't expect to preserve paragraph formatting or other niceties when going from PDF to NWP, but these bugs seem like weirdnesses beyond the pale of weird. Has anyone else run into this? Anyone found a solution? Exorcism, perhaps?
It might help diagnosis if you said how you "open an old PDF document" in NWP; do you open it from the "Open" command in the 'File' menu, or do you copy the text in the PDF and then paste it into an NWP document? If the latter, do you just use "Paste" or do you use "Paste Text Only"? Does any of those methods make a difference?

On the latter point, I would suspect that, in NWP, you are using a font that doesn't have the glyphs for ligatures like 'ff' and 'fi'. They have a totally different code-point to 'f' and 'i', so, if I'm right, they are not displaying as the font in use in NWP is blank for that code point. Try highlighting the whole text and changing the font to one that does have the ligatures.

Mark
User avatar
martin
Official Nisus Person
Posts: 5227
Joined: 2002-07-11 17:14:10
Location: San Diego, CA
Contact:

Re: Strange things happen when opening a PDF document into N

Post by martin »

Mark's suggestion to try different ways of getting your PDF content into NWP is a good one. Opening the PDF in Preview and using copy-paste might prove to be more effective than opening a PDF directly in NWP. That's because opening a PDF file in NWP directly extracts just the text that the system (OSX) provides "for free" to all applications.

It's very nice of Apple to provide some PDF import to all apps, but the quality can be lacking. It really varies greatly depending on the PDF. The text from some PDFs comes through very nicely, while others leave a lot to be desired– especially PDFs created by a scan or OCR. I've also seen text encoding troubles depending on the fonts used in the PDF, or if the PDF was ever resaved by Apple's Preview. For example, adding just a simple red circle or comment to a PDF can disrupt some of the encoded characters. The text might appear correctly on screen, but the underlying character codes are no longer valid for operations like copy-paste.
Post Reply