PDF anomaly with Chinese

Everything related to our flagship word processor.
Post Reply
User avatar
xiamenese
Posts: 543
Joined: 2006-12-08 00:46:44
Location: London or Exeter, UK

PDF anomaly with Chinese

Post by xiamenese »

An anomaly that came to light through a post on the Scrivener forum. A Chinese user was finding s/he couldn't copy and paste Chinese text from a PDF compiled from Scrivener. I did a bit of detective work on it and found that this was true when running Catalina 10.15.7, but not running Big Sur 11.2.2. so I did further experimentation and found that the same is true for Chinese printed to PDF by NWP under 10.15.7. In other words, this is an Apple bug, that they have largely solved with Big Sur. I'm posting this here so anyone using other CJK or non-Roman languages and on 10.15.7 (or perhaps earlier versions of MacOS) can check if the same is true for them (if it is of any relevance!).

To this end I attach a zip containing: (1) pdf_test.rtf: (2) pdf_test.pdf, a PDF printed from NWP under 10.15.7; (3) pdf_test_2.pdf, a short file printed from NWP after import from DOCX, to see if that made any difference; (4) pdf_test_BS.pdf, the longer text printed from NWP running under 11.2.2.

If you open the three PDFs in Preview, highlight and copy the text and paste it into a TextEdit file, the first two will show there is something there, but the Chinese is not visible; the third (11.2.2) shows the Chinese text, though there seems to be a problem with the font in the heading; on 10.15.7 it is undisplayed, on 11.2.2 my version of TextEdit doesn't recognise the font! You can try it yourself, if you're interested, using the RTF.

I think this will be true of many other programs based on the Apple TextKit.
PDF_trial.zip
(213.35 KiB) Downloaded 484 times
:)

Mark
Vanceone
Posts: 211
Joined: 2013-05-03 07:06:31

Re: PDF anomaly with Chinese

Post by Vanceone »

Interesting. I'm on Mojave and all of your results are true under that OS; copy and pasting from your PDF's shows that only your last one under 11.2.2 will copy any Chinese text.

When I Export to PDF from Nisus (latest version) under Mojave I get a PDF of Chinese but copy/pasting that results in a different form of gibberish than what you have. So it's broken in Mojave too, just differently broken. Glad to see Apple may have fixed it in Big Sur.
User avatar
xiamenese
Posts: 543
Joined: 2006-12-08 00:46:44
Location: London or Exeter, UK

Re: PDF anomaly with Chinese

Post by xiamenese »

Mmm. It’s a really weird one. The OP in the Scrivener forum is working on a thesis/dissertation (not sure which as I believe US English reverses the usage in comparison with UK English, so as s/he is Chinese …) so presumably will be sending PDFs of papers from which fellow researchers may want to quote!

Thanks for confirming. My M1 MBA is a replacement for my sadly defunct 17" MBP running High Sierra. :)

Mark
User avatar
martin
Official Nisus Person
Posts: 5227
Joined: 2002-07-11 17:14:10
Location: San Diego, CA
Contact:

Re: PDF anomaly with Chinese

Post by martin »

Thanks for sharing the wisdom of your experiences Mark!

You're right that this is a long-standing Apple bug that affects certain fonts and languages when generating PDF data. Trying to copy (or find) text in the PDF does not work properly because the underlying text was incorrectly encoded. It's a nefarious bug because the PDF looks just fine on screen and in printouts.

I'm glad to hear that Apple finally made efforts to fix this in Big Sur. We reported the problem to them ages ago.
User avatar
xiamenese
Posts: 543
Joined: 2006-12-08 00:46:44
Location: London or Exeter, UK

Re: PDF anomaly with Chinese

Post by xiamenese »

Thank you Martin.

:)

Mark
Post Reply