Converting MMD syntax to RTF

Everything related to our flagship word processor.
ptram
Posts: 280
Joined: 2007-10-21 14:59:09

Converting MMD syntax to RTF

Post by ptram »

Hi,

I wonder if someone is using an external text editor with the MultiMarkdown synthax (the one replacing, e.g., italics with *asterisks* or _underscores_, and using "#" as header markers). Since Apple refused to release a netbook, I do, being a PC netbook with a full screen editor my most often used mobile wordprocessor.

http://fletcherpenney.net/multimarkdown/"

What I miss, due to the severe use I do of italics, is a way of quickly converting all the structuring and formatting into actual formatting codes (and the reverse, even if I don't find myself using this as often). A macro would help a lot. Maybe Martin is listening and has some spare time...

Best regards,
Paolo
Kino
Posts: 400
Joined: 2008-05-17 04:02:32

Re: Converting MMD syntax to RTF

Post by Kino »

I hope these macros work for you.

Code: Select all

# Enclose Italic text between * and *

$prefix = Cast to String '*'
$suffix = Cast to String '*'

Require Pro Version 1.3
$doc = Document.active
if $doc == undefined
	exit  # because there is no open document
end
$doc = $doc.copy  # duplicate the document and work on the copy

# gather selections in Italic
$sels = Array.new
foreach $text in $doc.allTexts
	if $text.length
		$i = 0
		while $i < $text.length
			$attr = $text.displayAttributesAtIndex $i
			$range = $text.rangeOfDisplayAttributesAtIndex $i
			if $attr.italic == true
				$sel = TextSelection.new $text, $range
				$sels.appendValue $sel
			end
			$i = $range.bound
		end
	end
end

# merge continuous selections
$i = $sels.count - 1
while $i > 0
	if $sels[$i].text.isSameObject $sels[$i-1].text
		if $sels[$i].location == $sels[$i-1].bound
			$sels[$i-1].length += $sels[$i].length
			$sels.removeValueAtIndex $i
		end
	end
	$i -= 1
end

# insert $prefix and $suffix
foreach $sel in reversed $sels
	$sel.text.insertAtIndex $sel.bound, $suffix
	$sel.text.insertAtIndex $sel.location, $prefix
end

Code: Select all

# Convert *Italic text* to real Italic

$command = ':Format:Italic'
$doc = Document.active
if $doc == undefined
	exit  # because there is no open document
end
$doc = $doc.copy  # duplicate the document and work on the copy

$numFound = Replace All '\*([^*]+)\*', '\1', 'E'
# $numFound = Replace All '_([^_]+)_', '\1', 'E'  # use this if you prefer _Italic text_

if $numFound > 0
	$check = Menu State $command
	if $check != 1
		Menu $command
	end
end
For header markers, I don't understand well what they are and how you use them. Are they header text for each section? Do your documents already have section breaks?

If I'm not mistaken, attribute sensitive is broken for find command (u option). It works for replace command (U option) but I don't like it because it is hard for me to distinguish "." (AnyCharacter) in Italic from "." in regular style.

Edit (30 minutes later): No, attribute sensitive find is not broken. Before restarting Nisus Writer Pro, it was broken and Find All '.+', 'Eu' did not work...
ptram
Posts: 280
Joined: 2007-10-21 14:59:09

Re: Converting MMD syntax to RTF

Post by ptram »

Kyno,

That's fantastic - thank you! With the due change to adapt them to the Italian localization, I had them work perfectly. Now I have a precious tool to exchange documents between Nisus and any text editor.

As for "headers", I should have written "headings", sorry. I guess it is clear, at this point, that they are the equivalent of heading levels in Nisus. So,

# = Heading 1
## = Heading 2

and so on.

Paolo
Kino
Posts: 400
Joined: 2008-05-17 04:02:32

Re: Converting MMD syntax to RTF

Post by Kino »

ptram wrote:# = Heading 1
## = Heading 2

and so on.
Ah, I got it.

The macros below identify TOC level of style objects by parsing RTF code. This imposes a severe limitation on paragraph style names. Those consisting of 0-9, A-Z, a-z and space, e.g. "Heading 1", "Chapter", "Section", do not cause a problem. Some ASCII symbols are OK but some others and all non-ASCII characters (e.g. accented letters) are not.

A while ago (after the release of the current 1.3), I sent a feadback requesting a macro command which returns the TOC level of a style object. So hopefully, in a future version of Nisus Writer Pro, it will be possible to write macros for this kind of purpose, more easily and without the limitation.

Code: Select all

# Paragraph styles to Heading tags

$level = Cast to String '#'
$sep = Cast to String ' '

Require Pro Version 1.3
$doc = Document.active
if $doc == undefined
	exit  # because there is no open document
end
# $doc = $doc.copy  # duplicate the document and work on the copy

$paraStyles = Hash.new
foreach $style in $doc.paragraphStyles
	$paraStyles{$style.name} = $style
end
$TOCStyles = Array.new
$rtf = $doc.text
$rtf = Encode RTF $rtf
$rtf = Cast to String $rtf
$i = 1

while $i
	$exp = '(?<=\x5Ctcl' & $i
	$exp &= '\x20)[^;]+(?=;})'
	$found = $rtf.find $exp, 'E'
	if $found != undefined
		$TOCStyles.appendValue $paraStyles{$found.substring}
		$i += 1
	else
		$i = 0
	end
end

if ! $TOCStyles.count
	exit 'No style having TOC level found, exit...'
end

$sels = $doc.text.findAll '^(?=\p{Any})', 'E', '-am'
foreach $sel in reversed $sels
	$attr = $sel.text.attributesAtIndex $sel.location
	$i = $TOCStyles.indexOfValue $attr.paragraphStyle
	if $i > -1
		$tag = $level
		while $i > 0
			$tag &= $level
			$i -= 1
		end
		$tag &= $sep
		$sel.text.insertAtIndex $sel.location, $tag
	end
end

Code: Select all

# Heading tags to Paragraph styles

Require Pro Version 1.3
$doc = Document.active
if $doc == undefined
	exit  # because there is no open document
end
# $doc = $doc.copy  # duplicate the document and work on the copy

$paraStyles = Hash.new
foreach $style in $doc.paragraphStyles
	$paraStyles{$style.name} = $style
end
$TOCStyles = Array.new
$rtf = $doc.text
$rtf = Encode RTF $rtf
$rtf = Cast to String $rtf
$i = 1

while $i
	$exp = '(?<=\x5Ctcl' & $i
	$exp &= '\x20)[^;]+(?=;})'
	$found = $rtf.find $exp, 'E'
	if $found != undefined
		$TOCStyles.appendValue $paraStyles{$found.substring}
		$i += 1
	else
		$i = 0
	end
end

if ! $TOCStyles.count
	exit 'No style having TOC level found, exit...'
end

$i = 0
while $i < $TOCStyles.count
	$sels = $doc.text.findAll "^##{$i}[\t\x20]*(?!#)", 'E'
	if $sels.count > 0
		Push Target Selection $sels
			$TOCStyles[$i].apply
		Pop Target Selection
		foreach $sel in reversed $sels
			$sel.text.deleteInRange $sel.range
		end
	end
	$i += 1
end
ptram
Posts: 280
Joined: 2007-10-21 14:59:09

Re: Converting MMD syntax to RTF

Post by ptram »

Kino,

These new macros work as fantastically as the other two. Thank you very much!

Paolo
jb
Posts: 92
Joined: 2007-11-09 15:27:25

Re: Converting MMD syntax to RTF

Post by jb »

These macros are very helpful. Thank you, Kino.

The only trouble I have is that the two Paragraph macros seem to rely on having TOC styles in place. That is, text marked with # and ## does not acquire the paragraph styles if they do not also have TOC styles.

For me, though, this just provokes another question.

I wonder whether it would be possible to produce a macro that could take marked-up ‘plain text’ and convert those markings into paragraph styles, without relying on TOC.
For example, if I’m writing in an environment that doesn’t transmit paragraph styles when the text is brought into Nisus (yes, formattings can be preserved but not styles, I think), it might be possible to get around this with a maro. No?

Perhaps if I knew more about MMD I’d know the answer here. I know it handles headings (as we see) but I don’t know if it can do the same with, say, block quotation. Most useful, I think, would be some way to mark a paragraph in the plain text version and then have a macro that could convert those markings into whatever paragraph style one wants. This would convert plain text into a Nisus document with styles. Or am I dreaming?
Kino
Posts: 400
Joined: 2008-05-17 04:02:32

Re: Converting MMD syntax to RTF

Post by Kino »

jb wrote:I wonder whether it would be possible to produce a macro that could take marked-up ‘plain text’ and convert those markings into paragraph styles, without relying on TOC.
It is possible to write such a macro but I’m afraid it would not make the operation easier. A macro cannot create a style object from nothing and it cannot apply a style that the document does not have. A solution would be to embed the style in the macro file and paste it. For example, you can use the following macro to add "Block Quote" style to the frontmost document.

Code: Select all

# You have to apply "Block Quote" style on
#     Insert Attributed Text "\n***"
# near the end of the macro.

$doc = Document.active
if $doc == undefined  # if there is no open document
	exit
end

# check if the frontmost document has already "Block Quote" style
$hasBlockQuoteStyle = false
foreach $style in $doc.paragraphStyles
	if $style.name == 'Block Quote'
		$hasBlockQuoteStyle = true
		break
	end
end

if $hasBlockQuoteStyle == false
	Select Document End
	Insert Attributed Text "\n***"
	Delete
end
But I think you need many other styles in addition to "Block Quote" when you format a document created by another application. Then, isn’t it easier to import styles via Insert → New Style → Import from Style Library... or by copying and pasting them manually from another Nisus document before starting formatting it? Most likely the imported styles include "Block Quote" and styles having TOC settings. Also using Style Library, you can easily keep the consistency of your collection of styles.
Perhaps if I knew more about MMD I’d know the answer here. I know it handles headings (as we see) but I don’t know if it can do the same with, say, block quotation. Most useful, I think, would be some way to mark a paragraph in the plain text version and then have a macro that could convert those markings into whatever paragraph style one wants. This would convert plain text into a Nisus document with styles. Or am I dreaming?
I don’t know about MMD either. However, it is possible to write such a macro regardless of the make-up format/syntax. For example, if you use <StyleName> ... </StyleName> (i.e. <Block Quote> ... </Block Quote>, <Emphatic> ... </Emphatic>, etc.), the following macro will do the job, assuming you have already imported styles corresponding to those tags.

Code: Select all

Select Document Start
while Find Next '<([^>]+)>\p{Any}+?</\1>', 'E-W$'
	Menu $1
end

Replace All '</?[^>]+>', '', 'E'  # remove all tags
ptram
Posts: 280
Joined: 2007-10-21 14:59:09

Re: Converting MMD syntax to RTF

Post by ptram »

My preferred solution for TOC styles, is to open a template and copy all its styles to the converted document. So, my entire process is this:

1. Apply the "Convert _italic_ to real italic" macro to the original text document. This command also opens a copy of the document, that will be the one I'll work from now on.
2. Open the preferred template, copy all its styles from the Style Sheet view, and paste them into the work document.
3. Apply the "Convert # Headings to Paragraph styles" macro to the work document. (I didn't enable this macro to create a copy of the document, so that windows do not proliferate.)
4. Save the work document in RTF.

Paolo
jb
Posts: 92
Joined: 2007-11-09 15:27:25

Re: Converting MMD syntax to RTF

Post by jb »

Kino wrote:... A macro cannot create a style object from nothing and it cannot apply a style that the document does not have.
Sorry. This is the part I missed. Yes, it is much easier to use the Style Library (or copy/paste into a new doc based on a template).
Kino wrote: I don’t know about MMD either. However, it is possible to write such a macro regardless of the make-up format/syntax. For example, if you use <StyleName> ... </StyleName> (i.e. <Block Quote> ... </Block Quote>, <Emphatic> ... </Emphatic>, etc.), the following macro will do the job, assuming you have already imported styles corresponding to those tags.

Code: Select all

Select Document Start
while Find Next '<([^>]+)>\p{Any}+?</\1>', 'E-W$'
	Menu $1
end

Replace All '</?[^>]+>', '', 'E'  # remove all tags

This is great. Many thanks.

Now....
Since this last macro is short and seems relatively simple—for you, anyway (!)—I dare to ask:
Is it possible to do this in reverse?
Kino
Posts: 400
Joined: 2008-05-17 04:02:32

Re: Converting MMD syntax to RTF

Post by Kino »

jb wrote:Is it possible to do this in reverse?
Yeah, we can do something like this.

Code: Select all

$stylesToIgnore = Hash.new 'Note Reference', 1, 'Note Reference in Note', 1, 'Comment', 1, 'Table Cell', 1

$doc = Document.active
if $doc == undefined  # if no document is open...
	exit
end


# Insert tags for Paragraph styles

$sels = $doc.text.findAll '^[^\f]+$', 'En', '-am'
foreach $sel in reversed $sels
	$attr = $sel.text.attributesAtIndex $sel.location
	$para = $attr.paragraphStyleName
	if $para != ''
		if $stylesToIgnore{$para} == undefined
			$openingTag = Cast to String "<$para>"
			$closingTag = Cast to String "</$para>"
			$sel.text.insertAtIndex $sel.bound, $closingTag
			$sel.text.insertAtIndex $sel.location, $openingTag
		end
	end
end


# Insert tags for Character styles

$charForSels = Hash.new
foreach $text in $doc.allTexts
	if $text.length
		$i = 0
		while $i < $text.length
			$attr = $text.attributesAtIndex $i
			$range = $text.rangeOfAttributesAtIndex $i
			$char = $attr.characterStyleName
			if $char != ''
				if $stylesToIgnore{$char} == undefined
					$sel = TextSelection.new $text, $range
					$charForSels{$sel} = $char
				end
			end
			$i = $range.bound
		end
	end
end

$sels = $charForSels.keys
$sels.sort

foreach $sel in reversed $sels
	$char = $charForSels{$sel}
	$openingTag = Cast to String "<$char>"
	$closingTag = Cast to String "</$char>"
	$sel.text.insertAtIndex $sel.bound, $closingTag
	$sel.text.insertAtIndex $sel.location, $openingTag
end
If you want the macro to ignore Normal style, add it at the end of the very first command so that it will be

Code: Select all

$stylesToIgnore = Hash.new 'Note Reference', 1, 'Note Reference in Note', 1, 'Comment', 1, 'Table Cell', 1, 'Normal', 1
jb
Posts: 92
Joined: 2007-11-09 15:27:25

Re: Converting MMD syntax to RTF

Post by jb »

Kino, You are a Wizard.
Thank you.
ptram
Posts: 280
Joined: 2007-10-21 14:59:09

Re: Converting MMD syntax to RTF

Post by ptram »

Kino,

I found a small problem when using the "Paragraph styles to Heading tags" with numbered headings (Nidified style). It just seems not to work.

Not a major problem, since I can simply go to the Style Sheet view, select all interested heading styles, remove the numvering style, and run the macro.

Paolo
Kino
Posts: 400
Joined: 2008-05-17 04:02:32

Re: Converting MMD syntax to RTF

Post by Kino »

ptram wrote:I found a small problem when using the "Paragraph styles to Heading tags" with numbered headings (Nidified style). It just seems not to work.
The macro works only for paragraph styles associated with TOC levels. I don’t know what “Nidified style” is but perhaps is it a kind of number list? Unfortunately no macro command for getting the level of a list item is available. Or is the problem something else? If so, please post a sample document (zipped), randomized using Martin’s Redact macro if necessary.
http://nisus.com/files/GetMacroRedirect.php?file=Redact
ptram
Posts: 280
Joined: 2007-10-21 14:59:09

Re: Converting MMD syntax to RTF

Post by ptram »

Kino,

I sent you a personal message. I suspect it is some problem with overlapping between numbering styles and TOC levels.

Paolo
Kino
Posts: 400
Joined: 2008-05-17 04:02:32

Re: Converting MMD syntax to RTF

Post by Kino »

Thanks a lot for the sample file. Please test this one which seems to work fine with it.

Code: Select all

# Paragraph styles to Heading tags (rev. 1)

$level = Cast to String '#'
$sep = Cast to String ' '

Require Pro Version 1.3
$doc = Document.active
if $doc == undefined
	exit  # because there is no open document
end

# Uncomment the line below by removing "#" if you want the macro to duplicate the document and work on a copy
# $doc = $doc.copy

# Make list item ordinary text
Select Document End
while Select Previous List Item
	$sel = $doc.textSelection
	$sel.text.replaceInRange $sel.range, $sel.substring
end

# Get all paragraph style objects in the document
$paraStyles = Hash.new
foreach $style in $doc.paragraphStyles
	$paraStyles{$style.name} = $style
end

$TOCStyles = Array.new
$rtf = $doc.text
$rtf = Encode RTF $rtf
$rtf = Cast to String $rtf
$i = 1

# Get paragraph styles associated with TOC levels
while $i
	$exp = '(?<=\x5Ctcl' & $i
	$exp &= '\x20)[^;]+(?=;})'
	$found = $rtf.find $exp, 'E-i'
	if $found != undefined
		$name = $found.substring
		$name.findAndReplace '^\x5Cls\d+\x20', '', 'E-i' # remove list item tag if any
		$TOCStyles.appendValue $paraStyles{$name}
		$i += 1
	else
		$i = 0
	end
end

if ! $TOCStyles.count
	exit 'No style having TOC level found, exit...'
end

# Get TextSelection objects for all paragraph starts
$sels = $doc.text.findAll '^(?=\p{Any})', 'E', '-am'

# Insert Heading tags
foreach $sel in reversed $sels
	$attr = $sel.text.attributesAtIndex $sel.location
	$i = $TOCStyles.indexOfValue $attr.paragraphStyle
	if $i > -1
		$tag = $level
		while $i > 0
			$tag &= $level
			$i -= 1
		end
		$tag &= $sep
		$sel.text.insertAtIndex $sel.location, $tag
	end
end
Post Reply