Most of the web designers today do not really care about the character sets and language in their design. Character sets outline a series of symbols that gets translated into certain characters of languages across the globe. Therefore, the purpose of writing this blog post is to make the designer understand the importance of character encoding and HTML entities in web design.

Why we as a web developer should care about character sets or character encoding? Character encoding is the technical method of how numbers, letters and symbols are represented in numerical values that a computer system can understand. It is an abstract concept. For instance, ‘an uppercase Latin A’ is very different from ‘lowercase Latin a’ When a web document or an HTML document is saved, it would have to concentrate some HTML code and is saved with character encoding. The job of the web developers is to clarify how character encoding is used for the browser to understand and interpret it.

Character-Encoding-HTML-Entities-001

Character encoding preferred or assumed for a web page does not recognize any special character. It basically resorts to the characters it understands such as c, o, p and y. The majority of the websites now a days use the form of Unicode. Although there is a difference between ISO-8859-1 and UTF-8 character encoding, however, most of the web servers are configured with these two encoding. UTF-8 is the modern character encoding for documents of all types. Besides, languages allow references from punctuation characters, symbols and mathematical equations. UTF-8 for Unicode web development is the easiest solution for getting content working and running online properly and has grown substantially over the recent years. However, nowadays, developers also ask the difference between UTF-8 and UTF-16. One of the major benefits of using UTF-16 is to generate smaller file sizes.

Why should you care about character encoding?

If the character encoding stated by the web developer is not similar to the actual character encoding for HTML page, then most of the browsers may extract the web pages and the characters within it incorrectly. Even if you see everything to be perfect in your testing browser, however, with an incorrect character encoding, browsers may interpret it in an entirely different way.

Even search engines are likely to get confused. A search engine will read the webpage in the stated character encoding rather than the encoding it was saved previously. Therefore, it can never read anything accurately.

How will you choose the right character encoding?

A character encoding that is supported by the majority of the browsers:

You may find some browsers unable to read certain encodings either because they are older version or does not have the required settings to view them properly. For instance, sometimes some browsers may not have advanced encodings enabled that can support non-roman characters for other languages. This is because the user of such browsers does not ever require reading in those languages. If the web page design is interpreted in that character encoding, and the users do not have that support in their browser, then there can be issues with reading the web page. Therefore, it is best to use the common encoding that are widely supported by the majority of the browsers.

The characters that are used often:

Remember not to use the character encoding simply because of a few characters throughout the page such as en dashes, symbols or spaces. Make this decision significant on the language that needs to be used and the special characters based on it.

The limitation of the web page editor:

Although most of the editor can handle a wide variety of character encoding, however, it is better to stick to the most common character encodings.

ISO-8859-1 for other western languages:

In order to write other western languages such as Spain, French German, Swedish or Italian, ISO-8859-1 works quite well. You can also find specialised encodings exist for Hebrew, Oriental scripts and Arabic as well.

In a nutshell, Unicode UTF-8 is a very versatile character encoding to choose.

Correctly name the character encoding:

The Character encoding names are case sensitive so make sure that you note the correct way to name the chosen character encoding.

HTML entities and tips for HTML encoding:

Character-Encoding-HTML-Entities-002

HTML entities are an important part of character encoding. You can write codes into HTML that display symbols and items within the text. As such entities are given a specific ID number therefore; they are often called as Numeric Character Reference (NCR).

Web developers should set up their main HTML page with a default character encoding. There is a Meta tag designed particularly for all character encoding. Set this in the document header or in the heading theme template just like in WordPress to include it on all pages.

Another best option is to include code for http-equiv= “Content-Type”. This method is however a lengthy process.

Character-Encoding-HTML-Entities-003

What’s next after you have chosen the right character encoding?

Once you have chosen the encoding you will use, not it is important to ensure the following information is passed to the browsers and even to the search engines.

Web pages functioned by using Hypertext Transfer Protocol (HTTP). A browser sends any of its requests through HTTP and the servers responds to the browser via HTTP. Typically, the response is segregated into two parts- the header and the body. The header provides information about the body and the body contains the requested resources.

For HTML, web server uses the content-Type header to send encoding information.

Character-Encoding-HTML-Entities-004

There is a similar HTTP in HTML that states the encoding while the page is viewed offline. Also this can be done by inserting META element in the head section of a document. However, it is important to set up the web server correctly, as a real HTTP header will always override a META element.

Character-Encoding-HTML-Entities-005

Character-Encoding-HTML-Entities-006

For XML and XHTML, the encoding should be specified in the XML declaration at the top of the file. In this case, the Content-Type header should not contain any encoding information.

Character-Encoding-HTML-Entities-007

Using correct character encoding is all about usability for both the designers and the users or viewers of its content.

[mashshare]