Regular Expressions in Writer PDF
Regular Expressions in Writer PDF
http://wiki.openoce.org/wiki/Documentation/How_To...
ndice
1 Introduction 2 Where regular expressions may be used in OOo 3 A simple example 4 The least you need to know about regular expressions 5 How regular expressions are applied in OpenOce.org 6 Literal characters 7 Special characters 8 Single character match . ? 9 Repeating match + * {m,n} 10 Positional match ^ $ \< \> 11 Alternative matches | [...] 12 POSIX bracket expressions [:alpha:] [:digit:] etc.. 13 Grouping (...) and backreferences \x $x 14 Tabs, newlines, paragraphs \t \n $ 15 Hexadecimal codes \xXXXX 16 The 'Replace with' box \t \n & $1 $2 17 Troubleshooting OOo regular expressions 18 Tips and Tricks
Introduction
In simple terms, regular expressions are a clever way to nd & replace text (similar to 'wildcards'). Regular expressions can be both powerful and complex, and it is easy for inexperienced users to make mistakes. We describe the use of OpenOce.org regular expressions aiming to be clear enough for the novice, while detailing the aspects that can cause confusion to more experienced users. A typical use for regular expressions is in nding text in a Writer document; for instance to locate all occurrences of man or woman in your document, you could search using a regular expression which would nd both words. Regular expressions are very common in some areas of computing, and are often known as regex or regexp. Not all regex are the same - so reading the relevant
1 de 13
22-05-2013 09:47
http://wiki.openoce.org/wiki/Documentation/How_To...
manual is sensible.
2 de 13
22-05-2013 09:47
http://wiki.openoce.org/wiki/Documentation/How_To...
You should check the status of the regular expression option each time you bring up the dialog, as it defaults to 'o'.
A simple example
If you have little or no experience of regular expressions, you may nd it easiest to study them in Writer rather than say Calc. In Writer, bring up the Find and Replace dialog from the Edit menu. On the dialog, choose More Options and tick the Regular Expressions box In the Search box enter r.d - the dot here means 'any single character'. Clicking the Find All button will now nd all the places where an r is followed by another character followed by a d, for instance 'red' or 'hotrod' or 'bride' or 'your dog' (this last example is r followed by a space followed by d - the space is a character). If you type xxx into the Replace with box, and click the Replace All button, these become 'xxx', 'hotxxx', 'bxxxe', 'youxxxog' That may not be very useful, but it shows the principle. We'll continue to use the Find and Replace dialog to explain in more detail.
3 de 13
22-05-2013 09:47
http://wiki.openoce.org/wiki/Documentation/How_To...
OpenOce.org regular expressions appear to divide the text to be searched into portions and examine each portion separately. In Writer, text appears to be divided into paragraphs. For example x.*z will not match x at the end of a paragraph with z beginning the next paragraph ( x.*z means x then any or no characters then z). Paragraphs seem to be treated separately (although we discuss some special cases at the end of this HowTo).
In addition Writer considers each table cell and each text frame separately. Text frames are examined after all the other text / table cells on all pages have been examined. In the Find & Replace dialog, regular expressions may be used in the Search for box. In general they may not be used in the Replace with box. The exceptions are discussed later.
Literal characters
If your regular expression contains characters other than the so-called 'special characters' . ^ $ * + ? \ [ ( { | then those characters are matched literally. For example: red matches red redraw and Freddie. OpenOce.org allows you to choose whether you care if a character is 'UPPER CASE' or 'lower case'. If you tick the box to 'match case' on the Find and Replace dialog, then red will not match Red or FRED; if you un-tick that box then the case is ignored and both will be matched.
Special characters
The special characters are . ^ $ * + ? \ [ ( { | They have special meanings in a regular expression, as we're about to describe.
4 de 13 22-05-2013 09:47
http://wiki.openoce.org/wiki/Documentation/How_To...
If you wish to match one of these characters literally, place a backslash '\' before it. For example: to match $100 use \$100 - the \$ is taken to mean $ .
http://wiki.openoce.org/wiki/Documentation/How_To...
For example: 'r.*d' matches 'red' but in Writer if your paragraph is actually 'The referee showed him the red card again' the match found is 'referee showed him the red card' - that is, the rst 'r' and the last possible 'd'. Regular expressions are greedy by nature. You may specify how many times you wish the match to be repeated, with curly brackets { }. For example a{1,4}rgh! will match argh!, aargh!, aaargh! and aaaargh! - in other words between 1 and 4 a's then rgh!. Also note that a{3}rgh! will match precisely 3 a's, ie aaargh!, and a{2,}rgh! (with a comma) will match at least 2 a's, for example aargh! and aaaaaaaargh!.
6 de 13
22-05-2013 09:47
http://wiki.openoce.org/wiki/Documentation/How_To...
http://wiki.openoce.org/wiki/Documentation/How_To...
There is much confusion in the OpenOce.org community about these. The Help itself is also far from clear. There are a number of 'POSIX bracket expressions' (sometimes called 'POSIX character classes') available in OpenOce.org regular expressions, of the form [:classname:] which allow a match with any of the characters in that class. For instance [:digit:] stands for any of the digits 0123456789. These (by denition) may only appear inside the square brackets of an alternative match - so a valid syntax would be [abc[:digit:]], which should match a, b, c, or any digit 0-9. A correct syntax to match just any one digit would be [[:digit:]]. Unfortunately this does not work as it should! The correct syntax does not work at all, but currently an incorrect syntax ([:digit:]) will actually match a digit, as long as it is outside the square brackets of an alternative match. (Obviously this is unsatisfactory, and is the subject of issue 64368 (http://qa.openoce.org/issues /show_bug.cgi?id=64368) ). The POSIX bracket expressions available are listed below. Note that the exact denition of each depends on locale - for example in a dierent language other characters may be considered 'alphabetic letters' in [:alpha:]. The meanings given here apply generally to English-speaking locales (and do not take into account any Unicode issues). [:digit:] stands for any of the digits 0123456789. This is equivalent to 0-9. [:space:] should stand for any whitespace character, including tab; however as currently implemented it stands simply for a space character. Note that the Help is currently misleading here. (This is the subject of issue 41706 (http://qa.openoce.org/issues/show_bug.cgi?id=41706) ). [:print:] should stand for any printable character; however as currently implemented it does not match the single quote nor the double quote characters (and some others such as ). It matches space, but does not match tab (this latter is expected/dened behaviour). (This is the subject of issue 83290 (http://qa.openoce.org/issues/show_bug.cgi?id=83290) ). [:cntrl:] stands for a control character. As far as a user is concerned, OpenOce.org documents have very few control characters; tab and hard_line_break are both matched, but paragraph_mark is not. [:alpha:] stands for a letter (including a letter with an accent). For example in the phrase (often used in English, and here given with accents as in the original language) 'dj vu' all 6 letters will match. [:alnum:] stands for a character that satises either [:alpha:] or [:digit:]
8 de 13
22-05-2013 09:47
http://wiki.openoce.org/wiki/Documentation/How_To...
[:lower:] stands for a lowercase letter (including a letter with an accent). The case matching does not work unless the Match case box is ticked; if this box is not ticked this expression is equivalent to [:alpha:]. [:upper:] stands for an uppercase letter (including a letter with an accent). The case matching does not work unless the Match case box is ticked; if this box is not ticked this expression is equivalent to [:alpha:]. There seems to be little consistency in any implementation of POSIX bracket expressions (OOo or elsewhere). One approach is simply to use straightforward character classes - so instead of [[:digit:]] you use [0-9] for example.
http://wiki.openoce.org/wiki/Documentation/How_To...
For example: (1..) in the 'Search for' box and \$$1 in the 'Replace with' box replaces '100' with '$100', and '150' with '$150'. $0 in the 'Replace with' box replaces with the entire text found.
\n will match a newline (Shift-Enter) if it is entered in the Search box. In this context it is simply treated like a character, and can be replaced by say a space, or nothing. The regular expression red\n will match red followed by a newline character - and if replaced simply by say blue the newline will also be replaced. The regular expression red$ will match 'red' when it is followed by a newline. In this case, replacing with 'blue' will only replace 'red' - and will leave the newline intact. red\ngreen will match 'red' followed by a newline followed by 'green'; replacing with say 'brown' will remove the newline. However neither red.green nor red.*green will match here - the dot . does not match newline. $ on its own will match a paragraph mark - and can be replaced by say a 'space', or indeed nothing, in order to merge two paragraphs together. Note that red$ will match 'red' at the end of a paragraph, and if you replace it with say a space, you simply get a space where 'red' was - and the paragraphs are unaected - the paragraph mark is not replaced. It may help to regard $ on its own as a special syntax, unique to OOo. ^$ will match an empty paragraph, which can be replaced by say nothing, in order to remove the empty paragraph. Note that ^red$ matches a paragraph with only 'red' in it - replacing this with nothing leaves an empty paragraph the paragraph marks at either end are not replaced. It may help to regard ^$ on its own as a special syntax, unique to OOo. Unfortunately, because OOo has taken over this syntax, it seems you cannot use ^$ to nd empty cells in a table (nor empty Calc cells). If you wish to replace every newline with a paragraph mark, rstly you will
10 de 13
22-05-2013 09:47
http://wiki.openoce.org/wiki/Documentation/How_To...
search for \n with Find All to select the newlines. Then in the Replace box you enter \n, which in the Replace box stands for a paragraph mark; then choose Replace. This is somewhat bizarre, but at least now you know. Note that \r is interpreted as a literal 'r', not a carriage return. To replace paragraph marks - as used to give lines a certain length in some html documents, for instance - with "normal" automatically wrapped lines and paragraphs, the following 3 steps should help. Don't forget to choose More Options and tick the Regular Expressions box for this procedure. 1. So as not to lose "normal" paragraph marks at the end of "normal" paragraphs, replace two consecutive paragraph marks using a sequence of characters not occurring anywhere else in the text, like "*****" to replace an empty paragraph - this makes it easy to nd and reinstate later. You do this by putting ^$ in the Find box and "*****" in the Replace box. (If you're only dealing with a limited chunk of text, don't forget to check "current selection only" under "more options" in the Find and Replace box.) 2. Search for the remaining line-end paragraph marks by putting $ in the Find box. To replace the mark with a "space" just type a space in the Replace dialogue. 3. Now that the text is ready for normal line-wrapping, put back the "normal" paragraph marks by typing "*****" in the Find box and \n in the Replace box. (Remember to check "current selection only" where appropriate!) Before you try this, create a test document to practise on. This is a good sequence to make into a macro. You can nd macro suggestions on this OOo forum page: "replacing hard paragraphs" (http://www.oooforum.org/forum /viewtopic.phtml?t=3641) . (This procedure also helps deal indirectly with line-break problems.)
http://wiki.openoce.org/wiki/Documentation/How_To...
\t inserts a tab, replacing the text found. \n inserts a paragraph mark, replacing the text found. This may be unexpected, because \n in the 'Search for' box means 'newline'! In some operating systems it is possible to use unicode input to directly type a newline character (U+000A) in the 'Replace with' box, providing a workaround, but this is not universal. $1, $2, etc are backreferences, which (from OOo2.4) insert text groups found. See under Grouping and backreferences. $0 inserts the entire text found. & also inserts the entire text found. For example if you searched for bird|berry, you would would nd either 'bird' or 'berry'; now to replace with black& would give you either 'blackbird' or 'blackberry'.
nds duplicate words separated by spaces (note that there is a space before each ])
\<[:alpha:]*\>
nds any word in the whole document (notice:the check box regular expression must by checkt)
12 de 13
22-05-2013 09:47
http://wiki.openoce.org/wiki/Documentation/How_To...
\<[1-9][0-9]*\>
nds most email addresses (there is no perfect regular expression - this is a practical solution) See Also The ICU regular expression package (http://www.icu-project.org/userguide /regexp.html) , a candidate to replace the existing OOo regular expression engine (see: Regexp). Example regular expressions (http://www.OOoNinja.com/2007/12/exampleregular-expressions-for-writer.html) (OpenOce.org Ninja) Backreferences in substitutions (http://www.OOoNinja.com/2007/12 /backreferences-in-replacements-new.html) (OpenOce.org Ninja) Guide to regular expressions in OpenOce.org (http://www.oooninja.com /2007/12/powerful-text-matching-with-regular.html) (OpenOce.org Ninja) Searching and replacing paragraph returns (carriage returns), tabs, and other special characters (http://openoce.blogs.com/openoce/2009/11/searchingand-replacing-paragraph-returns-carriage-returns-tabs-and-other-specialcharacters-in-open.html) (Solveig Haugland's blog) Obtida de "http://wiki.openoce.org/w/index.php?title=Documentation/How_Tos /Regular_Expressions_in_Writer&oldid=153756" Categorias: Documentation/Reference Documentation/How Tos/Writer Esta pgina foi modicada pela ltima vez s 22h14min de 23 de Dezembro de 2009. Content is available under .
13 de 13
22-05-2013 09:47