Saturday, August 01, 2020

SOLR: Non English (Latin) Characters in Field Name

The SOLR documentation mentions following requirement for defining name of a field.


name
The name of the field. Field names should consist of alphanumeric or underscore characters only and not start with a digit. . . .
While working on a dictionary website, the JSON documents that I created had field names in Hindi. After indexing the data I was surprised to see that field names in the data were converted to multiple underscore letters e.g. field name शब्द was converted to ____. According to SOLR documentation शब्द should have been allowed as field name.

Looks like SOLR developers have assumed that only 26 letters in Latin script are alphabets. Mentioning this assumption explicitly in documentation would have been helpful.

After a closer scrutiny of solrconfig.xml file, I found following configuration, which converts anything that is not Latin alphanumeric in field name to underscore while indexing the data.

<updateProcessor class="solr.FieldNameMutatingUpdateProcessorFactory" name="field-name-mutating">
   <str name="pattern">[^\w-\.]</str>
   <str name="replacement">_</str>
</updateProcessor>

Changing the pattern regex for FieldNameMutatingUpdateProcessorFactory to something like below will allow SOLR to accept non Latin alphabets in field name.

   <str name="pattern">[\s]</str>

[\s] is a too generic pattern to use in real life scenario. This pattern should be further restricted to a limited set of characters that one intend to use in field name.

3 Comments:

Totopick Pro said...

I think this is among the most vital info for me.
And i am glad reading your article. But wanna remark on few general things,
The website style is great, the articles is really excellent 사설토토

economics assignment help said...

You can ask us for any type of economics assignment help. We will deliver a flawless economics assignment for you. Here, we provide the best economics assignment help service.

personal loan santander said...

Santander's home improvement loans are the key to a better living environment. Make the most of your property and invest in its future with our flexible financing options. santander debt consolidation