I have a full text catalog defined on 2 VARCHAR columns of a single table. It works well except for a few words where indexation seems to cut the term where an accent is found. It doesn't do it for all accentuated words, only a minority of them. As a result,
querying on these exact terms won't return any result but querying on accent-stripped parts of them will work.
select mycolumn from mytable where Id = 2028
mycolumn
------------------------------
<P>Anaïs et Alizé</P>
select * from sys.dm_fts_index_keywords_by_document(DB_ID(),Object_iD('mytable'))where display_term like 'Anaïs%' or display_term like 'Alizé%'
keyword display_term column_id document_id occurrence_count
--------------------------------------------------------------------------------------------------
(no results)
select * from sys.dm_fts_index_keywords_by_document(DB_ID(),Object_iD('mytable'))where display_term like 'Ana%' or display_term like 'Aliz%'
keyword display_term column_id document_id occurrence_count
-------------------------------------------------------------------------------------------------------------------------
0x0061006C0069007A aliz 22 20259 1<== notice amputated words here
0x0061006E0061 ana 22 20259 1<== and there
I tried both to repopulate the index manually and rebuild it to no avail.
The odd thing is, we have a similar database on another server with the same data and exactly the same fulltext configuration and the complete words do appear in terms referenced by that index :
select * from sys.dm_fts_index_keywords_by_document(DB_ID(),Object_iD('mytable'))where display_term like 'Anaïs%' or display_term like 'Alizé%'
keyword display_term column_id document_id occurrence_count
---------------------------------------------------------------------------------------------------------------------
0x0061006C0069007A0065 alize 22 20259 1
0x0061006E006100690073 anais 22 20259 1
Environment details :
- Microsoft SQL Server 2008 (SP3) - 10.0.5828.0 (X64) Standard Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1)
- Server Collation : SQL_Latin1_General_CP1_CI_AI
- Catalog - Accent Sensitive : false,
- Language for Word Breaker : French for both columns
- Catalog track changes : Automatic
- Catalog Stoplist : SYSTEM
Has anyone experienced anything like this ?
Thanks for your help