Had a “strange” problem last week that really stumped me for longer than it should have.
My colleague had run a query to list some data and was getting approximately 200,000 rows back. We were investigating why a unique constraint was being violated so I took his query and did a count distinct to try and find the duplicates. I got 30,000 distinct values.
This really confused me, I was expecting only a handfull of duplicates, if any. Then I noticed that the first query already WAS a distinct list !
select distinct col1_id, col2_id from tablea
select count(distinct col1_id + ‘-‘ + col2_id) from tablea
I concatenated the fields together so I am counting a single field, I included the separator so that I could differentiate 1-11 from 11-1.
See if you can spot what the problem is before you read on.
Datatype Conversion !
As an ex-SQL Trainer my mantra always was, and always has been ALWAYS do explicit datatype conversion.
SQLServer tries to derive the datatype from the first field, in this case both col1_id and col2_id are integers so it tries to do an arithmetic operation rather than the character conversion that I was expecting.
How on earth, can it convert ‘-‘ to an integer? ‘-0’ of course ! So what it is doing is col1_id + -0 + col2_id. I proved this with:
select distinct col1_id, col2_id, col1_id + ‘-‘ + col2_id
Which gave me:
10, 3, 7
Of course what I should have done was:
select convert(varchar, col1_id) + ‘-‘ + convert(varchar, col2_id) from tablea
Lesson learned AGAIN ! Always explicitly convert your datatypes !