Actually a bit more perusing the result from the Nisus Macro Language Reference clearly shows that the disagreement is indeed due to the treatment of tables. Each cell in each table is its own text object, and when Nisus searches through all these objects the results come in a highly jumbled order. This order is a type of "hash" ordering. That is the text objects of the document are not in document order, but rather an internal order that allows Nisus to work faster.
If it were truly important to get this "exactly right" you would have to first order all the words by document order, i.e., in such a way that all the words in a table were ordered between the last word before the table and the first word after the table. Depending on your needs, you might have to do the same for footnotes as well.
How to display Character Count without multiple clicks
Re: How to display Character Count without multiple clicks
Would it be getting greedy to ask if this Document Statistics macro could be extended to do Word/Character Count in all open documents?
It's fantastic that the macro counts now text within tables!
Japan Rich
It's fantastic that the macro counts now text within tables!
Japan Rich
Re: How to display Character Count without multiple clicks
I'm too lazy right now to work out any better interface. This piles all the info for all the files into a table in a new document.
- Attachments
-
Document Statistics (All Open Docs).nwm
- (4.61 KiB) Downloaded 440 times
philip
Re: How to display Character Count without multiple clicks
I'm very pleased to say, that with your help I think I have now got exactly what I wanted.phspaelti wrote:The deviation where the macro selects 3 words instead of 2 is actually a "bug".
I discovered that my "Lorem Ipsum" dummy text (the beginning of Conrad's Heart of Darkness), which I have always used for testing, contained several Acute Accents #180 [00B4], and they were, so it seems, responsible for the deviation I mentioned. The text I used contained, for example, "other´s" instead of "other's". The Statistics Palette counts "other´s" as 3 words!
I am glad that I asked, because otherwise I would never have noticed this.
I added a few lines to your macro, and now it works perfectly for me.
Thanks again, Philip.
Code: Select all
# This macro asks how many words should be selected from the beginning of the frontmost document
$my_choice = Prompt Options "How many words should I select from the beginning of the document?", "", "OK", "12000", "6500", "1500", "800", "650", "300", "150"
# Another way to ask the user how many words (s)he wants
#$my_choice = Prompt Input ‚How many words do you want from the beginning of the document?‘, ‚Enter the number you want…‘, ‚OK‘, ‚6500‘
$doc = Document.active
$words = $doc.text.findAll @Text<\w+>, 'E'
if $words.count > $my_choice
$words6500 = $words.subarrayInRange Range.new(0,$my_choice)
$doc.setSelection $words6500
else
Prompt 'This document has only ' & $words.count & ' words'
End
$loc1 = 1
Select End
$loc2 = Selection Location
$lengd = $loc2-$loc1
Set selection $loc1,$lengd
Red
Select End
Re: How to display Character Count without multiple clicks
This is absolutely wonderful, Philip.Regarding the macro [color=#0040FF]Document Statistics (All Open Docs) [/color]phspaelti wrote:I'm too lazy right now to work out any better interface.
These two lines at the end make it perhaps a bit easier for the eye.
Table:Fit to Contents
Table:Align Cells:Center
Re: How to display Character Count without multiple clicks
Many thanks! The table output works nicely with the two added lines.
I realize that what I meant was to SUM the total words in all open files.
It was easy enough to copy/paste into an Excel sheet to add up the output of words from all open files, but I wonder if a Nisus macro could also add up the words counted from multiple open files.
Apologies in advance, but I also realized a different problem that would be wonderful to have solved:
Is it possible to display a count of only English words (single-byte characters) when a document contains a table or tables with separate columns for both English and Japanese (double-byte characters)?
At the moment I need to select by hand the English column to get a count.
This can get laborious when a document has multiple tables and the adding to a total has to be done by hand.
I realize I have already asked way too much, but perhaps there are other people who can also benefit!
Japan Rich
I realize that what I meant was to SUM the total words in all open files.
It was easy enough to copy/paste into an Excel sheet to add up the output of words from all open files, but I wonder if a Nisus macro could also add up the words counted from multiple open files.
Apologies in advance, but I also realized a different problem that would be wonderful to have solved:
Is it possible to display a count of only English words (single-byte characters) when a document contains a table or tables with separate columns for both English and Japanese (double-byte characters)?
At the moment I need to select by hand the English column to get a count.
This can get laborious when a document has multiple tables and the adding to a total has to be done by hand.
I realize I have already asked way too much, but perhaps there are other people who can also benefit!
Japan Rich
- martin
- Official Nisus Person
- Posts: 5230
- Joined: 2002-07-11 17:14:10
- Location: San Diego, CA
- Contact:
Re: How to display Character Count without multiple clicks
Perhaps Philip will oblige, since he is so very generous in sharing his time and macro talents, but here's another idea: use two macros to accomplish this task. The first macro would select just Japanese or English text, and then Philip's macro will give you stats for the selection.JapanRich wrote:Is it possible to display a count of only English words (single-byte characters) when a document contains a table or tables with separate columns for both English and Japanese (double-byte characters)?
At the moment I need to select by hand the English column to get a count.
I don't know much about Japanese, but here's an attempt to provide a macro that selects just Japanese or English text. The regular expressions (regex) might need some adjusting. I tried to include all the Japanese character ranges (eg: Hiragana, Katakana, Han, etc) but I don't doubt that I missed something. The macro also treats some neutral characters (like numbers) next to Japanese text as though they were also Japanese.
I should also say there's no easy technical solution to select just English, since Latin characters form the basis for so many languages. There's no sure way for a macro to know the language of a word comprised of Latin characters, eg: "hello" and "hallo" use nearly the same characters but come from different languages. This could be resolved if your documents consistently have correctly applied language attributes, but instead this macro just gives the option to select all Japanese or all Non-Japanese.
I hope that helps!
- Attachments
-
Select Japanese or Inverse.nwm
- (3.71 KiB) Downloaded 472 times
Re: How to display Character Count without multiple clicks
Hello Martin,
I would have to say that this macro wouldn't make me very happy with what it selects. The main problem is that it is missing character U+30FC. (I also find the attempt to capture "nearby punctuation" not very successful.)
But this story with U+30FC is something I really don't get. Why is this character not included in the \p{Katakana} wildcard? Who decided this? Is this part of the Unicode specification? The character seems to be inside the Katakana block, which makes sense, since that it is its only valid purpose.
NWP has the script blocks and for "Katakana" it tries to make up for the situation by providing (?:\p{Katakana}|(?<=\p{Katakana})\u30FC) which works well enough. Meanwhile for "Hiragana" it provides (?:\p{Hiragana}|(?<=\p{Hiragana})\u30FC) which doesn't really seem justified, since U+30FC isn't used for Hiragana. (Except maybe in unusual contexts like Manga. But such marginal contexts could be adequately covered by more technical solutions.) Of course I wouldn't be surprised if many Japanese use U+30FC for some idiotic purposes such as instead of Western hyphens, or for decoration, etc.
As it stands the \p{Katakana} wildcard is completely inadequate to capture actual Katakana.
Anyhow here is the macro that I use to select Japanese: It's just one line, but has lot of lines of explanation.
I would have to say that this macro wouldn't make me very happy with what it selects. The main problem is that it is missing character U+30FC. (I also find the attempt to capture "nearby punctuation" not very successful.)
But this story with U+30FC is something I really don't get. Why is this character not included in the \p{Katakana} wildcard? Who decided this? Is this part of the Unicode specification? The character seems to be inside the Katakana block, which makes sense, since that it is its only valid purpose.
NWP has the script blocks and for "Katakana" it tries to make up for the situation by providing (?:\p{Katakana}|(?<=\p{Katakana})\u30FC) which works well enough. Meanwhile for "Hiragana" it provides (?:\p{Hiragana}|(?<=\p{Hiragana})\u30FC) which doesn't really seem justified, since U+30FC isn't used for Hiragana. (Except maybe in unusual contexts like Manga. But such marginal contexts could be adequately covered by more technical solutions.) Of course I wouldn't be surprised if many Japanese use U+30FC for some idiotic purposes such as instead of Western hyphens, or for decoration, etc.
As it stands the \p{Katakana} wildcard is completely inadequate to capture actual Katakana.
Anyhow here is the macro that I use to select Japanese: It's just one line, but has lot of lines of explanation.
philip
- martin
- Official Nisus Person
- Posts: 5230
- Joined: 2002-07-11 17:14:10
- Location: San Diego, CA
- Contact:
Re: How to display Character Count without multiple clicks
Thanks for sharing your Japanese text selection macro Philip. I'm sure your macro's Japanese character coverage is much better than my own macro's, as I have no knowledge of the language. I've incorporated your character ranges into a modified version of my original macro, in case anyone finds it useful.
- Attachments
-
Select All Japanese or Inverse.nwm
- (3.72 KiB) Downloaded 460 times