Quantcast
Channel: SQL Server Database Engine forum
Viewing all articles
Browse latest Browse all 15264

[FullText] Indexer seems to cut up some words at accent position

$
0
0

I have a full text catalog defined on 2 VARCHAR columns of a single table. It works well except for a few words where indexation seems to cut the term where an accent is found. It doesn't do it for all accentuated words, only a minority of them. As a result, querying on these exact terms won't return any result but querying on accent-stripped parts of them will work.

select mycolumn from mytable where Id = 2028

mycolumn
------------------------------
<P>Anaïs et Alizé</P>

select * from sys.dm_fts_index_keywords_by_document(DB_ID(),Object_iD('mytable'))where display_term like 'Anaïs%' or display_term like 'Alizé%'

keyword    display_term    column_id    document_id    occurrence_count
--------------------------------------------------------------------------------------------------
(no results)

select * from sys.dm_fts_index_keywords_by_document(DB_ID(),Object_iD('mytable'))where display_term like 'Ana%' or display_term like 'Aliz%'

keyword    display_term    column_id    document_id    occurrence_count
-------------------------------------------------------------------------------------------------------------------------
0x0061006C0069007A    aliz    22    20259    1<== notice amputated words here
0x0061006E0061    ana    22    20259    1<== and there

I tried both to repopulate the index manually and rebuild it to no avail.

The odd thing is, we have a similar database on another server with the same data and exactly the same fulltext configuration and the complete words do appear in  terms referenced by that index :

select * from sys.dm_fts_index_keywords_by_document(DB_ID(),Object_iD('mytable'))where display_term like 'Anaïs%' or display_term like 'Alizé%'

keyword    display_term    column_id    document_id    occurrence_count
---------------------------------------------------------------------------------------------------------------------
0x0061006C0069007A0065    alize    22    20259    1
0x0061006E006100690073    anais    22    20259    1


Environment details :

- Microsoft SQL Server 2008 (SP3) - 10.0.5828.0 (X64)  Standard Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1)

- Server Collation : SQL_Latin1_General_CP1_CI_AI

- Catalog - Accent Sensitive : false,

- Language for Word Breaker : French for both columns

- Catalog track changes : Automatic

- Catalog Stoplist : SYSTEM

Has anyone experienced anything like this ?

Thanks for your help






Viewing all articles
Browse latest Browse all 15264

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>