Tag Archives: collation

SQL Server’s Binary Collations

I was reading up on database engine Always Encrypted feature while preparing for 70-473 exam, and bump into these binary collations which I somehow never heard of before.

While configure Always Encrypted one of the choices you have to made is whether to use Deterministic or Randomized encryption (and you have to know the differences very well for the exam). One of the caveats when using Deterministic encryption is that it have to use a column collation with a binary2 sort order for character columns. More specifically documentation states that: Deterministic encryption requires a column to have one of the binary2 collations. If you will be using SSMS Encrypt Columns wizard it will be converting your column collation into binary2 case sensitive collation.

These statement required me to investigate the topic of binary collation a little bit.

First of all you may have different collation settings on a SQL Server instance level (i.e. on its system databases) on your databases and on specific columns and expression level.

To list all the collations available on your instance of SQL Server you can issue the following SQL statement:

On Azure SQL Database you will get back 3955 rows or possible collations. Understanding collation requires you to understand set of related terms, such as Collation, Locale, Code page, Sort order.  You should also know  that there are three major sets of collations available to you:

  • Windows collations
  • Binary collations
  • SQL Server collations

These collation groups sort data differently. In the past my standard answer/explanation about Windows VS SQL collation was that Windows one is more frequently updated, more compatible and hence more preferable over SQL one. Technically speaking it is more about how sorting works, but as per MSFT documentation: “SQL Server supports supports a limited number (<80) of collations called SQL Server collations which were developed before SQL Server supported Windows collations. SQL Server collations are still supported for backward compatibility, but should not be used for new development work.” So what I was saying/writing seems to be correct.

I won’t be covering all the details and differences related to these sets of collations as I only want to focus here on Binary collations which are requirement for Always Encrypted Deterministic encryption.

Binary collations sort data based on the sequence of coded values that are defined by the locale and data type. They are case sensitive. A binary collation in SQL Server defines the locale and the ANSI code page that is used. This enforces a binary sort order. Because they are relatively simple, binary collations help improve application performance. For non-Unicode data types, data comparisons are based on the code points that are defined in the ANSI code page. For Unicode data types, data comparisons are based on the Unicode code points. For binary collations on Unicode data types, the locale is not considered in data sorts. For example, Latin_1_General_BIN and Japanese_BIN yield identical sorting results when they are used on Unicode data.

There are two types of binary collations in SQL Server; the older BIN collations and the newer BIN2 collations. In a BIN2 collation all characters are sorted according to their code points. In a BIN collation only the first character is sorted according to the code point, and remaining characters are sorted according to their byte values. (Because the Intel platform is a little endian architecture, Unicode code characters are always stored byte-swapped.)

So for  Always Encrypted Deterministic encryption any of the collations returned by the query below will do:

This leaves you with 133 collations to choose from 🙂 Generally speaking BIN and BIN2 collations use different sorting algorithms and BIN2 is more preferable in general, not only for Always Encrypted Deterministic encryption columns. Another interesting question is why we have BIN/BIN2 collations for different languages? Like Arabic_BIN2, French_BIN2 etc. The reason is that each of those uses different code page for encoding the characters sorting in the varchar type so linguistic collation type is very important and comes into play only for varchar data as this will be sorted based on the language selected (this is not applicable to nvarchar where it has no difference).

All collations which are not binary collation are linguistic collations. For example, Latin1_General_CI_AS is a linguistic collation and it uses a sorting algorithm compatible with several of English language and many Western European languages. Please don’t be confused by the name of Latin1_General, as it actually can sort all Unicode characters defined in Unicode 3.2 characters set and it can also sort many other languages correctly as well (if the language has no sorting conflict with the latin1_general sorting rule).

Binary collations have better performance than linguistic collations, and that is the main advantage of using then. A binary collation is always case sensitive and accent sensitive. BIN2 collation is generally more preferable than BIN collation.

Facebooktwittergoogle_plusredditpinterestlinkedinmail

K2 Five collation requirement changes

In the past I’ve wrote some blog post promoting documented K2 collation requirement. With release K2 Five this requirement changed and I guess I have to mention it on my blog as essentially it makes my old blog posts about required K2 database collation incorrect.

So recently (with release of K2 Five) all K2 documentation was updated and states that our requited collation now is “SQL_Latin1_General_CP1_CI_AS“. Where to find this information?

In K2 Installation and Configuration Guide you can find “SQL and Reporting Services Operational Requirements” section which says that:

– Case-sensitive databases are NOT supported.

– The following collation setting is required for the K2 database: SQL_Latin1_General_CP1_CI_AS

And as usually you can find the same information in K2 Product Compatibility, Integration and Support matrix:

What is good about this change is that SQL_Latin1_General_CP1_CI_AS will be default collation if you installing SQL Server on top of Windows Server which has been installed with US location/language settings – so at least some people will meet this requirement by accident.

What is bad is that collation requirement was just silently changed in documentation with release of K2 Five without any explanations. According to my current knowledge collation which was mentioned in documentation before was a requirement only for pre-4.6.11 versions of K2. So in case you are doing new installation of K2 4.6.11 or newer make sure that your SQL Server instance provisioned with SQL_Latin1_General_CP1_CI_AS collation.

Somewhat mixed blessing is change which was made to K2 Five installer to enforce this collation: what it does at the moment is just enforces this collation on K2 DB level while ignoring SQL Server instance level collation. So in case you not provisioned SQL Server instance with the right collation you will get an errors post installation and will be forced to change SQL Server instance level collation to fix this. That’s something that I hope will be corrected in K2 installer in the future so that it warns you about wrong instance level collation issues before you start your installation.

Conclusion: read vendor documentation carefully before doing your installation even if you did it many times before 🙂

Update (Feb 2018): Current guidelines are:

– For upgrade installations be sure to make SQL server instance and K2 DB collation are the same (in case you moving K2 DB onto new server)

– For clean installs I would suggest stick to Latin1_General_CP1_CI_AS (especially if you do 4.7 install). I will probably write more detailed post to explain this position later

Facebooktwittergoogle_plusredditpinterestlinkedinmail

K2 database collation requirement – finally we have it stated in the right place

If you read my old blog post on Installing SQL Server instance for K2 blackpearl you probably aware that K2 database requires very specific SQL Server instance collation in case you care to be in supported state. The main problem was that this requirement has been mentioned in quite obscure place which no sane person even reach in endless quest for knowledge 🙂 That original requirement location was quite close to that joke about ginormous EULA and vendor injecting a sentence which says: “If you really read through this EULA till this place please give us a call to claim your 1000$ reward”…

There is no doubts that K2 is very flexible and versatile platform but when it comes to K2 database collation you only have a Hobson’s choice. Many product vendors have similar requirements for their back-end databases and there is nothing wrong with that, but problem we had in case of K2 was that a lot of people failed to realize that only one collation is required and supported, i.e. Hobson’s choice was poorly presented to them 🙂

Finally K2 blackpearl compatibility matrix was updated this month to reflect this requirement and I really hope this will clarify K2 collation requirement once and for all. We all agree that at least compatibility matrix is something we all read before rushing into installation or upgrade, right? 😉

So navigate to K2 blackpearl Compatibility Matrix page > SQL Server section notes and… lo and behold this:

blackpearl collation requirement

I hope this will help people to avoid collation related issues from now on and we are all clear that:

Latin1_General_CI_AS collation is required on the SQL server instance hosting the K2 database

Facebooktwittergoogle_plusredditpinterestlinkedinmail

Changing SQL collation for deployed instance without reinstall (almost)

IMPORTANT UPDATE: If you use K2 4.6.11 or newer you have to have SQL_Latin1_General_CP1_CI_AS collation on a SQL Server instance which hosts K2 DB (more details)

Collation matters. Period. If you not sure just wait till you get an incident or issue which will teach you that it is. Some apps require you to have specific collation for their back end DB. For example for K2 you need to have  Latin1_General_CI_AS collation (see my earlier post on that _ ).

Nonetheless it can often be the case that your RTFM/Prerequisites_Check moment (I wonder if I can copyright/trademark this term? I heard that Michael Buffer trademarked his “let’s get ready to rumble” phrase _ ) happens after SQL server instance has been already provisioned. For newly created (clean install) K2 database you just need to delete database which was created on instance with wrong collation and allow K2 Setup Manager to create it again after you deal with your instance collation. There is an MSDN article detailing that _ but I will outline the procedure below.

Aforementioned MSDN article says that you have to have installation media to perform this operation, but in fact you can get away with Setup Bootstrap folder which should present on your server in location similar to one below (may vary from version to version, see details _ )

C:\Program Files\Microsoft SQL Server\120\Setup Bootstrap\SQLServer2014

So steps are the following:

0. In case you have any non system databases attached to this instance take a note of database(s) file location, frequently it is just default SQL instance data folder common to all databases. Save this path or paths to notepad. 

1. Locate setup bootstrap folder for your SQL server installation, open CMD or PS window (don’t forget “dot net sourcing” thing for the latter):

2. Execute the following command:

Once this operation completes  master, model, msdb, and tempdb system databases are rebuilt (dropped and re-created) with collation you specified in the command. Obviously all your modifications to these databases are lost (if you made any) and essentially it is sort of reinstall/repair type of operation just much quicker and without need to wade through SQL Setup GUI.

3. In case there were non system databases attached to this SQL instance (see step 0) they all are detached now. See my other blog post explaining how to attach all the databases from specific folder

Facebooktwittergoogle_plusredditpinterestlinkedinmail

How to: quickly check SQL server collation settings

Just a quick note on how to quickly check Microsoft SQL server collation without wading through SQL Server Management Studio GUI. Go to CMD and execute the following:

\n\n\n[code language=”sql” light=”true”]\nsqlcmd -q "SELECT CONVERT (varchar, SERVERPROPERTY(‘collation’));"\n[/code]\n\n

Alternatively you may use sp_helpsort stored procedure, though when run via sqlcmd it gives a bit strange output if run with sqlcmd

\n\n\n[code language=”sql” light=”true”]\nsqlcmd -q "EXECUTE sp_helpsort;"\n[/code]\n\n

On the boxes with SQL server installed path to sqlcmd executable added to PATH variable so you can fire it off without specifying full path. I guess you may also check corresponding key in registry which should exist there I believe (at least for server collation) 🙂

Also you may view collation settings on databases:

\n\n\n[code language=”sql” light=”true”]\nsqlcmd -q "SELECT name, collation_name FROM sys.databases;"\n[/code]\n\n

Reference: MSDN: View Collation Information

Facebooktwittergoogle_plusredditpinterestlinkedinmail