Had a “strange” problem last week that really stumped me for longer than it should have.

My colleague had run a query to list some data and was getting approximately 200,000 rows back. We were investigating why a unique constraint was being violated so I took his query and did a count distinct to try and find the duplicates. I got 30,000 distinct values.

This really confused me, I was expecting only a handfull of duplicates, if any. Then I noticed that the first query already WAS a distinct list !

select distinct col1_id, col2_id from tablea

select count(distinct col1_id + ‘-’ + col2_id) from tablea

I concatenated the fields together so I am counting a single field, I included the separator so that I could differentiate 1-11 from 11-1.

See if you can spot what the problem is before you read on.

Datatype Conversion !

As an ex-SQL Trainer my mantra always was, and always has been ALWAYS do explicit datatype conversion.

SQLServer tries to derive the datatype from the first field, in this case both col1_id and col2_id are integers so it tries to do an arithmetic operation rather than the character conversion that I was expecting.

How on earth, can it convert ‘-’ to an integer? ‘-0′ of course ! So what it is doing is col1_id + -0 + col2_id. I proved this with:

select distinct col1_id, col2_id, col1_id + ‘-’ + col2_id

Which gave me:

10, 3, 7

Of course what I should have done was:

select convert(varchar, col1_id) + ‘-’ + convert(varchar, col2_id) from tablea

Lesson learned AGAIN ! Always explicitly convert your datatypes !

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • Live
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • TwitThis
  • LinkedIn

One Response

subscribe to comments rss or leave a trackback
  1. Ross,

    Just reading that gave me a headache!! Hope you are all well.

    Lee

Leave A Comment

subscribe to our news feeds
Already a member, login below or join us here
USERNAME
PASSWORD