Azure AI Language – Language Detection Of Content in Optimizely CMS

In this article, I show how the Azure AI Language service’s language detection feature can be used to identify the language of content on an Optimizely CMS content type during the publishing process. This allows us to verify if the content matches the current language branch of a page in Optimizely CMS.

The language detection feature uses machine learning and AI algorithms in the cloud. These algorithms are meant to help create smart applications that work with written language. It can accurately figure out the language of a piece of text as well as give a language code for many different languages, variants, dialects, regional and cultural languages.

Language Detection of text can be done by downloading the “Patel.AzureAILanguage.Optimizely” NuGet package. You can get this package from the Optimizely NuGet Feed or the NuGet Feed.

Once the NuGet Package has been downloaded and required setup has been done, the user must add a boolean property with the [TextAnalyticsAllowed] attribute to the Start Page Content Type in Optimizely to activate this functionality. This is detailed here TextAnalyticsAllowed Documentation.

The next step is to then create one or many string properties with the [DetectLanguage] attribute. This can be added to any Optimizely CMS Content type that inherits from IContent. Additional details are available via the following link: Language Detection Documentation.

Upon adding these properties, the CMS can detect the language when publishing content. Below are 2 scenarios where the language detection feature can been triggered.

The first scenario is when one language, which is different than the current language branch (English), has been identified.

Response from the API via the Console

Language Detection operation has completed
Language: French, 

ISO-6391: fr

In this example, French text was used on a page set to English. This triggers an error message in the CMS to alert the user of the language mismatch.

The second scenario is when multiple languages which are different than the current language branch(English) have been identified.

Response from the API via the Console

Language Detection operation of a page has completed
Number of properties used for Language Detection is : 8
Language: English, ISO-6391: en, Confidence Score: 0.98
Language: English, ISO-6391: en, Confidence Score: 0.96
Language: English, ISO-6391: en, Confidence Score: 0.99
Language: French, ISO-6391: fr, Confidence Score: 1
Language: English, ISO-6391: en, Confidence Score: 0.96
Language: Italian, ISO-6391: it, Confidence Score: 1
Language: English, ISO-6391: en, Confidence Score: 0.96
Language: German, ISO-6391: de, Confidence Score: 0.99

In this example, the language detection feature has found French, Italian, and German text on a page in English. An error message is displayed in the CMS to inform the user of this issue.