=head1 NAME MKDoc::Text::Structured - Another Text to HTML module =head1 SYNOPSIS my $text = some_structured_text(); my $html = MKDoc::Text::Structured::process ($text); =head1 SUMMARY L is a library which allows simple syntaxic text construct to be turned into HTML. These constructs are the ones you would be using when writing a text email or newsgroup message. L follows the KISS philosophy. Comparing with similar modules which try to implement as many HTML constructs as possible, this module is incredibly conservative. =cut package MKDoc::Text::Structured; use MKDoc::Text::Structured::Factory; use strict; use warnings; our $Text = ''; our $VERSION = 0.83; sub process { my $text = shift; # Mac + DOS carriage returns -> bang! $text =~ s/\r\n/\n/gs; $text =~ s/\r/\n/gs; # Trailing spaces -> fizzle! # (except when the line is made only of trailing spaces...) $text = join "\n", map { chomp(); s/\s+$// unless (/^\s*$/); $_; } split /\n/, $text; my @lines = split /\n/, $text; my @result = (); my $current = undef; while (scalar @lines) { my $line = shift (@lines); $current ||= MKDoc::Text::Structured::Factory->new ($line); $current || next; if ($current->is_ok ($line)) { $current->add_line ($line); } else { push @result, $current->process(); unshift (@lines, $line); $current = undef; } } push @result, $current->process() if ($current); return join "\n", @result; } 1; __END__ =head1 Block level elements =head2 P Paragraphs are defined by blocks of text separated by one or more empty lines. The text: This is a paragraph, until it meets an empty line. This is another paragraph. Would become:

This is a paragraph, until it meets an empty line.

This is another paragraph.

=head2 H1, H2, H3 Headlines are really just like a paragraph, except that they have the following syntax: The text: ========== Headline 1 ========== Headline 2 ========== Headline 3 ---------- Would become:

Headline 1

Headline 2

Headline 3

The advantage in treating headlines just like paragraph is that multi-line headlines are no problem. Also, it means you can use *strong* and _emphasized_ within a headline (see STRONG and EM sections). =head2 PRE Pre-formatted text looks like a paragraph, except that it must be indented with at least one space character. The text: This is a paragraph, until it meets an empty line. But this is pre-formatted text. Hey Hey Ho Ho! This is another paragraph. Would become:

This is a paragraph, until it meets an empty line.

But this is pre-formatted text.
  Hey  Hey Ho  Ho!

This is another paragraph.

Again, you can use *strong* and _emphasized_ within pre-formatted text (see STRONG and EM sections). =head1 Inline Elements =head2 STRONG The text: This is *strong text* Would become:

This is strong text

Note 1: The star character will act as a 'strong' marker only when: - The "opening" star is preceded by whitespace or carriage return, - The "closing" star is followed by whitespace or carriage return, or punctuation immediately followed by whitespace or carriage return. In other words, you can write 3*3*2 = 18 safely. The module tries to follow the DWIM ("Do What I Mean") philosophy as much as possible. Note 2: This can only work within one block level element. It will not work across paragraphs or lists (See UL, LI and OL, LI sections). Example 1: * Hello, *I will not * be bold* * but * *I will be* Example 2: This is a paragraph. *Nothing in this paragraph is going to be bold. Nor in this one*. =head2 EM The text: This is _emphasized text_ Would become:

This is emphasized text

Same notes as for bold / strong text also applied for emphasized text. =head2 Entity substitution Characters that would otherwise be interpreted as XML are encoded. i.e. &, < and > become & < and > Additionally some standard typed versions of special characters are substituted with a richer and better-looking HTML entity: -- surrounded by whitespace becomes — - surrounded by whitespace becomes – ... becomes … (tm) becomes ™ (r) becomes ® (c) becomes © x between numbers becomes × '' surrounding text becomes ‘ ’ "" surrounding text becomes “ ” =head1 Nested Structures =head2 BLOCKQUOTE Quoted text is text that starts with a 'greater than' character and followed by a space on each line. > > Hey, that's pretty cool! > Well, sort-of I think it's pretty cool... Would become:

Hey, that's pretty cool!

Well, sort-of

I think it's pretty cool...

=head2 UL, LI Ordered lists and unordered lists can be constructed and nested: The text: * An item * Another item * Headlines work too ================== I can write *paragraphs within lists*. And even _pre-formatted text_! - Also, I can have sub-lists - That's no problem - Notice that '*' and '-' have the same meaning. It's just syntaxic sugar, really :-) Would become: =head2 OL, LI Un-ordered lists and unordered lists can be constructed and nested: The text: 1. An item 2. Another item 3. Headlines work too ================== * An un-ordered list * Can be nested * It should all work nicely. Would become:
  1. An item

  2. Another item

  3. Headlines work too

    • An un-ordered list

    • Can be nested

    • It should all work nicely.

=head1 Hyperlinks This module uses L to locate URIs such as http://mkdoc.com/ and turn them into clickable links. Add rel="nofollow" attributes to tags like so: local $MKDoc::Text::Structured::Inline::NoFollow = 1; Additionally, once the XHTML fragment is produced, you could use L to hyperlink it against a glossary of hyperlinks. =head1 Smilies Basic smilies such as :-) and :-( are wrapped in a CSS class: :-) :-( =head1 Long Words Long words are split up into fragments separated by spaces if the length exceeds a 78 character default. Change the default length using a package variable: local $MKDoc::Text::Structured::Inline::LongestWord = 12; Disable this fuctionality by setting a value of 0. =head1 AUTHOR Copyright 2003 - MKDoc Holdings Ltd. Author: Jean-Michel Hiver This module is free software and is distributed under the same license as Perl itself. Use it at your own risk. =head1 SEE ALSO MKDoc: http://www.mkdoc.com/ Help us open-source MKDoc. Join the mkdoc-modules mailing list: mkdoc-modules@lists.webarch.co.uk =cut