When a revision isn’t a revision
So I got the DTS package that will import the SAT files all finished and started importing files. I had one file from Oct. ‘05, two from Nov. ‘05, one from Dec. ‘05 and one from this month. After importing them, I ran a query against the table where the scores are stored and discovered that there were many duplicates.
I couldn’t understand what was going on until I started looking more closely at the files sent from College Board (the administrators of the SAT test). It seems that they include the same records in different files. In fact, the second disk from Nov. 05 was all duplicates that were supposed to be revision scores (i.e. there was a mistake so this one corrected it).
Here’s the kicker. According to the data map documentation for the SAT files, if a record is a revision, the letter “Z” is supposed to show up in a particular location/field. That field is one imported into the temporary table. If that code shows up, one of the steps in the DTS package updates the records in the final table instead of inserting that row. None of the records on the second disk from Nov. ‘05 were marked with the letter “Z”, indicating it was a revision. They simply just sent the corrected scores but didn’t flag them.
I ended up having to add another query to my DTS package that would delete records from the final score table if the student number and test date matched the records in the temporary table. Once I did that, I could import records and no duplicates showed up. The extra step didn’t affect the speed at which the files import. It still pulls in and stores over 1,000 records in a little over a two seconds.
28.Feb.06
General, Microsoft SQL
You can leave a response, or trackback from your own site.






















