Why does the Split object surround everything with quotation marks?

Get help using and writing Nisus Writer Pro macros.
Post Reply
waltzmn
Posts: 39
Joined: 2013-05-05 12:52:00

Why does the Split object surround everything with quotation marks?

Post by waltzmn » 2019-12-23 07:03:25

Good people,

I'm working on a macro to take data from one CSV file, reformat it, and export it as another CSV file. And learning how much I didn't know about the macro language while I'm at it. :-)

To load the data from the first CSV file, I'm using the .split command:

Code: Select all

$TheFields = $MyCurrentRecord.split(',')
This will take a line of data delimited by commas and turns it into a set of items in $TheFields

The relevant code isn't very good, but I'll post it anyway so you can see the context:

Code: Select all

Document.setActive $TempDoc
Select Document Start		# Make sure we start splitting at the beginning of the document!

# Now start running through the paragraphs

While Select Next Paragraph
	# There IS a next paragraph to select
	Copy
	$MyCurrentRecord = Read Clipboard
	$TheFields = $MyCurrentRecord.split(',')
	Document.setActive $FinalDoc
	Type text $TheFields
	Type newline
	Document.setActive $TempDoc
End
I'm sure people can tell me a better way than the Copy and Read clipboard steps (and I will accept help :-)), but that's merely inefficient. What is not working is the .split step. It appears that it is correctly splitting the fields, but then putting quote marks around them. So, for example, if I started with this record in $TempDoc,

Field1,Field2,"Field 3. with quotes",Field 4

What I get out in $FinalDoc is

"Field1","Field2",""Field 3. with quotes"","Field 4"

I could theoretically live with the quotes around Field1, Field2, and Field 4 -- it's valid CSV -- but not the excess quotes around Field 3. I could surely figure out some replace commands to fix this, but that strikes me as non-optimal. Is there a way to make the split command NOT put quotes around the fields? Or is this just a quirk of using the Type command to list $TheFields?

User avatar
phspaelti
Posts: 1060
Joined: 2007-02-07 00:58:12
Location: Japan

Re: Why does the Split object surround everything with quotation marks?

Post by phspaelti » 2019-12-23 09:04:59

Hello Bob,
Just to answer your immediate question, it isn't the the .split command that is putting quotes around things, it's the Type text command that's doing that. Or to put it more accurately, the Type text creates a string context; your variable $TheFields is an array, so in the string context it will output the string representation of the array, which is quotes around the values and separate the values with comma. If you wanted to output the array with comma separated values and no quotes you can use the join command.

Code: Select all

Type text $TheFields.join(',')
But if you're doing this, then I wonder why you are splitting them in the first place.
You could, of course, join the values back together with other separators, e.g., a space, a return, a tab, etc. Alternatively you might process the values of the array one at a time in some way. Not sure what you are trying to achieve here.
philip

User avatar
phspaelti
Posts: 1060
Joined: 2007-02-07 00:58:12
Location: Japan

Re: Why does the Split object surround everything with quotation marks?

Post by phspaelti » 2019-12-23 09:24:37

Now let's address the efficiency issue. Your current code keeps switching back and forth between two documents using the GUI. It also collects the paragraph info one paragraph at a time. The much more efficient method is to locate the paragraphs in $TempDoc first , then assemble the info, then switch to $FinalDoc and output all the info in one go.
For this purpose I'm going to ignore what you do with the paragraphs in the middle, and just focus on the above steps:

Code: Select all

# Get all the (non-empty) paragraphs in $TempDoc
# NB: the 'paragraphs' created by the find command are text selection objects
$paragraphs = $TempDoc.find '^.+$', 'Ea'

# Process the paragraphs one at a time and save the results in an output array
$output = Array.new
foreach $paragraph in $paragraphs
    $paraText = $paragraph.subtext
    # do stuff with the paragraph text here
    
    # then put the result in the output array
    $output.push $paraText
end

# Output the processed paragraphs in $FinalDoc at the end
$FinalDoc.insertAtIndex $finalDoc.text.length, $output.join("\n")
Obviously there are many ways to tailor each of these steps, depending on what you are trying to do. For example if $FinalDoc does not exist yet, you can just create it at the same time, like this:

Code: Select all

# Output the processed paragraphs in a new document
Document.newWithText $output.join("\n")
philip

User avatar
phspaelti
Posts: 1060
Joined: 2007-02-07 00:58:12
Location: Japan

Re: Why does the Split object surround everything with quotation marks?

Post by phspaelti » 2019-12-23 09:54:23

And one more point: You mention "valid CSV". If the stuff you are trying to split into fields is CSV you might note that the .split command will not give you the correct result in the general case, since quoted fields will be used to protect commas within fields. But I'm sure you know that already. So if you want to split CSV you may want to use a different method altogether. Basically the most efficient approach in Nisus (since the macro language is not really procedural) will be to use the .find command on the relevant text object. This will again create an array of text selections from which you can get the substrings (or subtexts if you prefer, though CSV is usually unformatted). If you write the find expression cleverly you can also get rid of the protective quotes at the same time.
philip

waltzmn
Posts: 39
Joined: 2013-05-05 12:52:00

Re: Why does the Split object surround everything with quotation marks?

Post by waltzmn » 2019-12-23 12:01:43

Many thanks for all the help!

on 2019-12-23 12:04:59 phspaelti wrote:
Hello Bob,
Just to answer your immediate question, it isn't the the .split command that is putting quotes around things, it's the Type text command that's doing that. Or to put it more accurately, the Type text creates a string context; your variable $TheFields is an array, so in the string context it will output the string representation of the array, which is quotes around the values and separate the values with comma. If you wanted to output the array with comma separated values and no quotes you can use the join command.
CODE: SELECT ALL

Type text $TheFields.join(',')
But if you're doing this, then I wonder why you are splitting them in the first place.
You could, of course, join the values back together with other separators, e.g., a space, a return, a tab, etc. Alternatively you might process the values of the array one at a time in some way. Not sure what you are trying to achieve here.
To explain why I was writing code that is, frankly, silly (reading a set of fields then writing them again)... I don't intend to do that. :-) This is the initial stage of what is going to be a very long macro. (It probably should have been done in Perl, but we decided not to, for complicated reasons.) When it is done, it will takes the data from one CSV file and arrange it into another CSV file with different fields. If it really matters, the input data is a "household" -- one or more people all listed in one record. We have to take all the people in each household and create a record for each. What starts as one record in the input file might be as many as six in the output file!

So what I'm really trying to do is get the data from the first CSV file into an array, so that I can test various fields and write the proper output CSV. I'm still on the first part of that, trying to get the data into an array. So your answer is very helpful; the .split was working, and I was misinterpreting it.

On 2019-12-23 09:54:23, phspaelti wrote:
And one more point: You mention "valid CSV". If the stuff you are trying to split into fields is CSV you might note that the .split command will not give you the correct result in the general case, since quoted fields will be used to protect commas within fields. But I'm sure you know that already.
That problem I'd already taken care of; the commas inside quotes have been changed to another character. So the only commas in the input string are those that separate the fields in the CSV. Regular expressions I can do; my problem is figuring out the NWP macro language without the a complete reference for people who learned the wrong sort of programming. :-)
If you write the find expression cleverly you can also get rid of the protective quotes at the same time.
I'm working on it. :-) Some other parts of my code are already better than they would have been because of your help. Some of your other suggestions here use tricks I hadn't learned yet -- e.g. using join instead of type -- but you have given me much to work with. Again, thanks!

Post Reply