What are diacritics?
Have you ever noticed a funky symbol in your catalog that looks like a diamond with a question mark in the middle? Here’s an example:
Gabriel Garc�ia M�arquez
Are you wondering what this symbol means and why it appears in your records? This is a diacritic behaving badly. Instead of seeing the diamond with the question mark, you are probably supposed to be seeing an umlaut or an acute accent, or any number of other diacritics (most commonly found in non-English languages).
If you see these diacritics gone wrong in your catalog, it probably happened when the record was imported.
Cataloging
The problem above arises when you export records from OCLC as one type of encoding, and then import them into Koha as another. The encoding type must be the same. There are a couple of different options available for character encoding. The two most used encoding formats used in Koha are:
- MARC-8
- UTF-8 encoding.
To find out more about this character encoding, see the resources section below.
When you import records into Koha from Tools > Stage MARC Records for Import, you can choose what type of encoding to use: MARC-8 or UTF-8. In Koha, UTF-8 is the default import encoding.
Below is a screenshot of importing records into Koha, where the encoding option lives:

The
Koha manual has more info on importing MARC records.
The next thing is to double-check OCLC export settings and verify the default export encoding is also UTF-8. To modify your export settings in OCLC:
- “Go to Export Options Screen”
- On the General tab, click Admin.
- At the Preferences screen, click Export Options.
- When you’re done modifying these preferences, select “Save My Default” to make this option the default every time.
Please see the
OCLC manual for more info on exporting MARC records.
Remember: the important thing is that the encoding matches! If it’s UTF-8 coming out of OCLC, it also needs to be UTF-8 going into Koha. You must use the same encoding all through the pipeline. Otherwise, it’s like you’re asking your computer to use a French dictionary to translate Russian words, and the � symbol peppered throughout your catalog can be interpreted as, “What IS this character, and what am I supposed to do with it?!”
If you are finding that you are getting diacritics and your library does not get records from OCLC, check with your vendor that you are receiving records from and find out what encoding they are sending your records.
If your data contains special characters or diacritics, make sure your file is encoded in UTF-8. Otherwise, the special characters will not be imported correctly.
Resources
What is Marc8?
The
MARC-8 encoding standard was developed in 1968 specifically for libraries, with the beginning of the use of MARC format.
What is UTF-8?
Much later on in 1993,
the UTF-8 encoding standard was released. UTF-8 supports every character in the Unicode character set, which includes over 110,000 characters covering 100 scripts. UTF-8 supports far more characters than MARC-8 and has become the dominant character encoding for the internet.
How Do I Prevent Z39.50 records Coming in with Bad Diacritics?
As explained above, diacritics pop up in Koha when a record encoded in one format is brought into Koha with a different encoding.
Fortunately, in a Z39.50 search, you can preview the MARC record to know that an encoding mismatch has occurred, like this example Pokémon graphic novel:
Take note of which Z39.50 target this record came from.
Under Koha Administration -> Additional Parameters, select Z39.50/SRU servers. You should now have a list of your current Z39.50 targets. Find the target you noticed had records with bad diacritics, and under the action menu on the furthest right column of the results, navigate into Edit. Halfway down the list of configurations is Encoding.

Most targets' encoding will be either utf-8 or MARC-8; if the target was encoded with utf-8 when you spotted the bad diacritics, switch it to MARC-8 and save, or change to utf-8 if it was in MARC-8 when you spotted the bad characters.
Navigate back to your Z39.50 search and look for the same record. The bad diacritics should be displaying and importing correctly now.