<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule">

<channel>
	<title>techencoder &#187; UTF-8</title>
	<atom:link href="http://techencoder.com/index.php/tag/utf-8/feed/" rel="self" type="application/rss+xml" />
	<link>http://techencoder.com</link>
	<description>Technical ideas in a human readable format</description>
	<lastBuildDate>Wed, 01 Feb 2012 17:19:30 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<creativeCommons:license>http://creativecommons.org/licenses/by/3.0/us/</creativeCommons:license>		<item>
		<title>Batch file error: ∩╗┐is not recognized</title>
		<link>http://techencoder.com/index.php/2009/10/batch-file-formatting/</link>
		<comments>http://techencoder.com/index.php/2009/10/batch-file-formatting/#comments</comments>
		<pubDate>Thu, 22 Oct 2009 13:15:47 +0000</pubDate>
		<dc:creator>r.claypool</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[ANSI]]></category>
		<category><![CDATA[Batch Files]]></category>
		<category><![CDATA[quick tip]]></category>
		<category><![CDATA[UTF-8]]></category>

		<guid isPermaLink="false">http://techencoder.com/?p=903</guid>
		<description><![CDATA[Is your batch file reporting an error like this? Open the file in Notepad++, select &#8216;Format -&#62; Encode in ANSI&#8217; and save. Hope that helps someone.  Happy Programming!]]></description>
			<content:encoded><![CDATA[<p>Is your batch file reporting an error like this?</p>
<div id="attachment_905" class="wp-caption aligncenter" style="width: 610px"><img class="size-full wp-image-905" title="cmd-prompt-error" src="http://techencoder.com/wp-content/uploads/2009/10/cmd-prompt-error.png" alt="'n++ECHO' is not recognized as an internal or external command, operable program or batch file." width="600" height="180" /><p class="wp-caption-text">&#39;n++ECHO&#39; is not recognized as an internal or external command, operable program or batch file.</p></div>
<p>Open the file in <a title="Notepad ++ Home Page" href="http://notepad-plus.sourceforge.net/uk/site.htm">Notepad++</a>, select &#8216;Format -&gt; Encode in ANSI&#8217; and save.</p>
<p><img class="aligncenter size-full wp-image-910" title="encode-file-in-ansi" src="http://techencoder.com/wp-content/uploads/2009/10/encode-file-in-ansi.png" alt="encode-file-in-ansi" width="600" height="300" /></p>
<p>Hope that helps someone.  Happy Programming!</p>
]]></content:encoded>
			<wfw:commentRss>http://techencoder.com/index.php/2009/10/batch-file-formatting/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>ASCII, EBCDIC and UTF-8</title>
		<link>http://techencoder.com/index.php/2009/01/character-encoding/</link>
		<comments>http://techencoder.com/index.php/2009/01/character-encoding/#comments</comments>
		<pubDate>Wed, 14 Jan 2009 18:06:39 +0000</pubDate>
		<dc:creator>r.claypool</dc:creator>
				<category><![CDATA[ASCII]]></category>
		<category><![CDATA[EBCDIC]]></category>
		<category><![CDATA[MIME Types]]></category>
		<category><![CDATA[Unicode]]></category>
		<category><![CDATA[UTF-8]]></category>

		<guid isPermaLink="false">http://techencoder.com/?p=8</guid>
		<description><![CDATA[For a blog named techencoder, it seems appropriate to start with a post about a few important character encodings that our computers use.  I&#8217;ll dive into higher level abstractions and upcoming technologies soon, but a basic understanding in this topic is something every programmer should have. Background  &#8230; Not too long ago, hardware resources were [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: left;">For a blog named techencoder, it seems appropriate to start with a post about a few important character encodings that our computers use.   I&#8217;ll dive into higher level abstractions and upcoming technologies soon, but a basic understanding in this topic is something <a title="The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" href="http://www.joelonsoftware.com/articles/Unicode.html" target="_blank">every programmer should have</a>.</p>
<p><a title="Unicode characters" href="http://en.wikipedia.org/wiki/Character_encoding#Modern_encoding_model" target="_blank"><img class="aligncenter size-full wp-image-31" title="Unicode characters" src="http://techencoder.com/wp-content/uploads/2009/01/unicode1.png" alt="Unicode characters" width="600" height="173" /></a><a title="UTF-8" href="http://en.wikipedia.org/wiki/UTF-8" target="_blank"> </a></p>
<p style="text-align: left;"><strong>Background  &#8230;</strong></p>
<p style="text-align: left;">Not too long ago, hardware resources were exponentially more expensive than they are today and data structures could not afford to waste them.  This explains why early character encodings were designed to use 8 bits per character even though that is not sufficient to communicate in all the world&#8217;s languages. Eight bits were a necessary limitation because the cost of memory (and bandwidth and storage) was exceedingly high, but today we have enough hardware resources to use a modern encoding that is not so restrictive.  Let&#8217;s briefly look at 2 legacy encodings that you should know something about and another that I recommend using for the foreseeable future.</p>
<p style="text-align: left;"><strong>The Three Amigos &#8230;</strong></p>
<p style="text-align: left;">EBCDIC, ASCII and UTF-8  allow for 255, 128 and (something like) 65,000+ characters respectfully.  I couldn&#8217;t find an exact limit for UTF-8 but suffice to say there is enough space for English, Chinese, <a title="Klingon language institute" href="http://www.kli.org/" target="_blank">Klingon</a> and any every other language you might encounter.</p>
<ul style="text-align: left;">
<li> <strong><a title="EBCDIC" href="http://en.wikipedia.org/wiki/EBCDIC" target="_blank">EBCDIC</a> (1963) is a rare find outside IBM mainframes.</strong> The <a title="IBM System 360" href="http://en.wikipedia.org/wiki/IBM_360" target="_blank">System/360</a> series was first to use this encoding and subsequent machines from IBM have continued to use it <em>internally</em>. Their hardware or software translates EBDIC to another encoding when interfaced with another system, so EBCDIC is essentially dead to most of us.</li>
<li><strong><a title="ASCII" href="http://en.wikipedia.org/wiki/Ascii" target="_blank">ASCII</a> (1963) is a widely used international standard, but it is dated.</strong> Until the middle of 2008, it was the dominate encoding on the Internet and some older or poorly implemented programs still (wrongly) assume that files are encoded in ASCII without checking the file&#8217;s <a title="MIME Types" href="http://en.wikipedia.org/wiki/MIME#Content-Type" target="_blank">Content-Type</a>.  Modern encodings are designed to be backward compatible with ASCII, so this is not a problem as long as the text is English.  Non-English and multilingual documents will usually not render correctly in ASCII.
<div class="mceTemp" style="text-align: left;">
<dl id="attachment_13" class="wp-caption alignright" style="width: 325px;">
<dt class="wp-caption-dt"><a title="Google: Moving to Unicode 5.1" href="http://googleblog.blogspot.com/2008/05/moving-to-unicode-51.html" target="_blank"><img class="size-full wp-image-13" title="Google Graph: Unicode on the Web" src="http://techencoder.com/wp-content/uploads/2009/01/unicode.png" alt="Unicode and Universal Character Set were standardized in the early 1990's and produced several different encodings.  The one most in use today is UTF-8." width="315" height="334" /></a></dt>
</dl>
</div>
</li>
<li><strong><a title="UTF-8" href="http://en.wikipedia.org/wiki/UTF-8" target="_blank">UTF-8</a> (1993) is another widely used international standard. </strong>It is based off the <a title="Unicode" href="http://en.wikipedia.org/wiki/Unicode" target="_blank">Unicode</a> standard and is quickly replacing ASCII (see graph, left).  It can encode characters in most of the world&#8217;s writing systems and it is the encoding you should use whenever possible.  The Internet Engineering Task Force (IETF) is one among many organizations that <a title="IETF standard" href="http://tools.ietf.org/html/rfc2277" target="_blank">recommend it</a>.</li>
</ul>
<p style="text-align: left;"><strong>Action Plan &#8230;</strong></p>
<p style="text-align: left;">There are dozens or hundreds of other character encodings around the world, but a little knowledge of these 3 is all you will probably need to know. Check out the links in this article and keep these things in mind:</p>
<ul style="text-align: left;">
<li>Your programs should read and honor the declared <a title="MIME Types" href="http://en.wikipedia.org/wiki/MIME#Content-Type" target="_blank">Content-Type</a> of input.</li>
<li>Your programs should default to UTF-8 output.</li>
<li>Your programs should explicitly state the encoding that was used.  Include a Content-Type in the file&#8217;s header.</li>
</ul>
<p>Do you have anything to add?  Just leave a comment.</p>
<p>Happy Programming!</p>
]]></content:encoded>
			<wfw:commentRss>http://techencoder.com/index.php/2009/01/character-encoding/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

