<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>utilities &#8211; anthro{dendum}</title>
	<atom:link href="/tag/utilities/feed/" rel="self" type="application/rss+xml" />
	<link>/</link>
	<description></description>
	<lastBuildDate>Fri, 06 Apr 2018 02:55:33 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.5.5</generator>

<image>
	<url>/wp-content/uploads/2017/11/cropped-brackets-ico-file-32x32.png</url>
	<title>utilities &#8211; anthro{dendum}</title>
	<link>/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Text-laundering (Working With Text 3)</title>
		<link>/2018/01/28/text-laundering-working-with-text-3/</link>
		
		<dc:creator><![CDATA[Kerim]]></dc:creator>
		<pubDate>Mon, 29 Jan 2018 03:58:08 +0000</pubDate>
				<category><![CDATA[How to]]></category>
		<category><![CDATA[Tools We Use]]></category>
		<category><![CDATA[apps]]></category>
		<category><![CDATA[how-to]]></category>
		<category><![CDATA[mobile apps]]></category>
		<category><![CDATA[regex]]></category>
		<category><![CDATA[regular expressions]]></category>
		<category><![CDATA[text files]]></category>
		<category><![CDATA[tools we use]]></category>
		<category><![CDATA[utilities]]></category>
		<category><![CDATA[working with text]]></category>
		<guid isPermaLink="false">https://anthrodendum.org/?p=602</guid>

					<description><![CDATA[Ever copy and paste something that should be a solid paragraph of text only to have it end up looking a mess? You could fix it using Regular Expressions, or if you prefer not to have to muddle around with code, there are a number of tools out there which can automate this kind of text cleanup for you. ]]></description>
										<content:encoded><![CDATA[<p>Ever copy and paste something that should be a solid paragraph of text, which should look like this:</p>
<blockquote><p>
  Consuetudium lectorum Mirum est notare. Eodem modo typi qui nunc nobis videntur parum clari fiant sollemnes in futurum? Assum Typi non habent claritatem insitam est usus legentis in iis. Claritatem Investigationes demonstraverunt lectores legere me lius quod ii legunt saepius Claritas est etiam. Nam liber tempor cum soluta. Est etiam processus dynamicus qui.
</p></blockquote>
<p>only to have it end up looking like this?</p>
<blockquote><p>
  Consuetudium lectorum Mirum est notare.<br />
  Eodem modo typi qui nunc nobis videntur parum clari fiant sollemnes in futurum? Assum Typi non habent claritatem insitam est usus<br />
  legentis in iis.<br />
  Claritatem Investigationes demonstraverunt lectores legere me lius quod ii legunt saepius Claritas est etiam. Nam liber tempor cum soluta. Est etiam<br />
  processus dynamicus qui.
</p></blockquote>
<p>Most word processors have a command that lets you see invisible markers like spaces (usually represented as a faint dot “•”) and what are still quaintly called “carriage returns,” or “line feeds” (generally shown by the symbols ”¶” or “↵”).<sup id="fnref-602-1"><a href="#fn-602-1" class="jetpack-footnote">1</a></sup> If you turn that feature on, you will see that there are way too many such return symbols in the above text. It might seem like the solution would be to find and replace all those returns with spaces, but then you would have no paragraphs at all in your document. What you want to do is replace all the mid-paragraph returns, but leave those between paragraphs.</p>
<p>Using Regular Expressions (RegEx), as <a href="https://anthrodendum.org/2018/01/24/regex-101-working-with-text-2/">discussed in the last post in this series</a>, what we would want to do is search for every return (or line feed) that is not followed or preceded by a return (or line feed). In addition&#8211;since some paragraphs are separated not by a blank line but by a tab or sequence of spaces at the start of the new paragraph&#8211;we want to look for those as well. I find <a href="https://regex101.com/r/zshq1Q/1">the following search</a> works pretty well for me: <code>(?&lt;=[^\r\n\t ][^\r\n])\R(?=[^\r\n][^\r\n\t ]) </code>  It is easy to find many patterns like this in online forums, <a href="https://stackoverflow.com/questions/10464735/remove-single-line-breaks-keep-empty-lines">as I did</a>, saving you the trouble of having to re-invent the wheel.</p>
<p>If you prefer not to have to muddle around with code, there are a number of tools out there which can automate this kind of text cleanup for you. On macOS my favorite is the package of <a href="http://www.devontechnologies.com/products/freeware.html">free WordService menu extensions</a> from DEVONtechnologies. These are extensions that work with the built-in &#8220;Services&#8221; menu that pops up on macOS whenever you control-click on some selected text. The package offers a number of useful commands to do things like change the capitalization of the selected text (e.g. turn “THE APPLE” into “The Apple,” or “The apple,” etc.), reformat line breaks (or remove them altogether), and one that can give you useful statistics such as the word or character count of the selected text, etc.</p>
<p>Considering that WordService is free and does pretty much the same thing, you might not want to spend $45 for <a href="https://www.unmarked.com/textsoap/%5C">TextSoap</a>, but if you already have a subscription to the <a href="https://setapp.com/">Setapp</a> bundle of macOS apps then TextSoap is included with your subscription. Another option is <a href="http://sociomedia.com/textwell/">Textwell</a> which works on both macOS and iOS and can do much more than just clean text. It has some built in tools, much like those offered in WordService, but (if you aren’t afraid of tweaking the JavaScript in the example code) you also can make your own actions. I really like that these can be synced between the desktop and iOS. <a href="https://www.apimac.com/ios/cleantext/">Clean Text for iOS</a> is even easier to use, but less customizable. Since I don’t use Windows, Linux, or Android, etc. I’ll leave it for others to recommend their favorite text cleanup tools for those platforms in the comments.</p>
<hr />
<h3>List of posts in this series</h3>
<ul>
<li><a href="https://anthrodendum.org/2018/01/18/free-your-mind-the-text-will-follow-working-with-text-1/">Free Your Mind, the Text Will Follow (Working With Text 1)</a></li>
<li><a href="https://anthrodendum.org/2018/01/24/regex-101-working-with-text-2/">RegEx 101 (Working With Text 2)</a></li>
<li><a href="https://anthrodendum.org/2018/01/28/text-laundering-working-with-text-3/">Text-laundering (Working With Text 3)</a></li>
<li><a href="https://anthrodendum.org/2018/02/22/lazy-powerpoint-working-with-text-4/">Lazy PowerPoint (Working With Text 4)</a></li>
<li><a href="https://anthrodendum.org/2018/04/05/roll-your-own-qda-working-with-text-5/">Roll Your Own QDA (Working With Text 5)</a></li>
</ul>
<div class="footnotes">
<hr />
<ol>
<li id="fn-602-1">
Actually, there are significant differences between carriage returns and line feeds, but they aren’t important for this post.&#160;<a href="#fnref-602-1">&#8617;</a>
</li>
</ol>
</div>
<div class="saboxplugin-wrap" itemtype="http://schema.org/Person" itemscope itemprop="author"><div class="saboxplugin-tab"><div class="saboxplugin-gravatar"><img alt='Kerim' src='http://0.gravatar.com/avatar/3f733bd06413af380fcd122e4be08dc4?s=100&#038;d=retro&#038;r=g' srcset='http://0.gravatar.com/avatar/3f733bd06413af380fcd122e4be08dc4?s=200&#038;d=retro&#038;r=g 2x' class='avatar avatar-100 photo' height='100' width='100' itemprop="image"/></div><div class="saboxplugin-authorname"><a href="/author/admin_kerim3916/" class="vcard author" rel="author"><span class="fn">Kerim</span></a></div><div class="saboxplugin-desc"><div itemprop="description"><p><a href="http://kerim.oxus.net/">P. Kerim Friedman</a> is a professor in the Department of Ethnic Relations and Cultures at National Dong Hwa University in Taiwan. His research explores language revitalization efforts among indigenous Taiwanese, looking at the relationship between language ideology, indigeneity, and political economy. An ethnographic filmmaker, he co-produced the Jean Rouch award-winning documentary, &#8216;Please Don&#8217;t Beat Me, Sir!&#8217; about a street theater troupe from one of India&#8217;s Denotified and Nomadic Tribes (DNTs).</p>
</div></div><div class="saboxplugin-web sab-web-position"><a href="http://kerim.oxus.net/" target="_self" >kerim.oxus.net/</a></div><div class="clearfix"></div><div class="saboxplugin-socials "><a title="Twitter" target="_self" href="http://twitter.com/kerim" rel="nofollow noopener" class="saboxplugin-icon-grey"><svg aria-hidden="true" class="sab-twitter" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path fill="currentColor" d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"></path></svg></span></a></div></div></div>
<p><a href="/2018/01/28/text-laundering-working-with-text-3/" rel="nofollow">Source</a></p>]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
