Posted on March 13, 2017
Greedy Test Case Algorithm in a SQL Stored Proc
Here’s a straightforward problem: I have a table with a lot of fields in it (in this case, several tables — new Fact and Dimension tables in a star schema data warehouse, but, you know, any wide table will do).
I want to extract a few real world test records that exercise the entire table… a “covering set” of test cases… so if I have 100 columns, and record A has non-zero, non-null values in columns 1-50 and record B has good values in columns 51-100, then I only need to test those two records. How great is that?!
Ok, I should probably BUILD test cases, but I like using real data since there are always unseen business rules lurking about. Anyway, this is a pretty basic math problem: Select the minimal number of objects from the set of rows where the union of the viable (non-null, non-zero) columns across the subset covers all possible columns.
There’s some code below. Â Note that it is very bad code. Â I use the wrong scope on global temporary tables, I don’t do lots of checking of things, I generate SQL and execute it, I debug with print statements. Â It is also formatted poorly, but that’s actually more of a wordpress/plugin issue than anything else.
But it’s mine, and I love it…
To use, do something like:
exec etl.DQFindCompleteTestCases 'Facts', 'MyFactTable', 'MyFactTablePrimaryKey'
Which will output records with the MyFactTablePrimaryKey ID and the number of new columns handled by that corresponding record.
Note that the procedure will not go into an infinite loop if there is a column that has all NULLs or all zeros or something. Â Found out I needed that the hard way. Â Oops!
ALTER PROCEDURE [ETL].[DQFindCompleteTestCases](@SchemaName [nvarchar](128), @TableName [nvarchar](128), @IDFieldName [nvarchar](128)) /* Created 2017/02/22 by Chip Lynch Inputs: @DatabaseName.@SchemaName.@TableName identifies a table for which all numeric type columns will be searched for non-zero non-null values. A short (not provably minimal) list of records, identified by @IDFieldName, will be returned so that the collection of those records contains non-zero or non-null values in ALL of the available columns. ToDo: Add more robust schedule to rules when runtimes become onerous. */ AS BEGIN SET NOCOUNT ON /* Validate inputs */ IF (@SchemaName IS NULL OR @TableName IS NULL OR @IDFieldName IS NULL) BEGIN RAISERROR ('Incorrect parameters.', 15, 1) END /* Create tables and indexes as needed */ create table #TempGoodIDs ( CandidateID varchar(4000) ); select sac.name as column_name, sac.system_type_id, sac.user_type_id into #TempMyColumns from sys.all_columns sac join sys.all_objects sao on sac.object_id = sao.object_id join sys.schemas ss on sao.schema_id = ss.schema_id join sys.types st on sac.system_type_id = st.system_type_id where ss.name = @SchemaName and sao.name = @TableName and st.name in ('tinyint', 'smallint', 'int', 'real', 'money', 'float', 'decimal', 'numeric', 'smallmoney', 'bigint') and st.name != @IDFieldName -- Just in case ; DECLARE @ColumnsRemaining int; select @ColumnsRemaining = count(1) from #TempMyColumns; DECLARE @MySQL nvarchar(max); DECLARE @MyNotNullList varchar(max); DECLARE @MyNEZeroList varchar(max); DECLARE @MySumList varchar(max); DECLARE @MyGoodColumnList varchar(max); DECLARE @MyCandidateColumn varchar(128); DECLARE @MyGoodCount int; DECLARE @MyRecordCount int; WHILE(@ColumnsRemaining > 0) BEGIN -- Build SQL to select the record with the most <> 0 fields from the list: --select @MyNotNullList = stuff( ( -- select ' IS NOT NULL OR ' + tmc.column_name -- from #TempMyColumns tmc -- for XML PATH('') -- ), 1, 17, '') + ' IS NOT NULL' select @MyNEZeroList = stuff( ( select ' <> 0 OR ' + tmc.column_name from #TempMyColumns tmc for XML PATH(''), type ).value('.','nvarchar(max)'), 1, 9, '') + ' <> 0' print @MyNEZeroList select @MySumList = stuff( ( select '+ CASE WHEN ' + tmc.column_name + ' <> 0 THEN 1 ELSE 0 END ' from #TempMyColumns tmc for XML PATH(''), type ).value('.','nvarchar(max)'), 1, 1, '') if object_id('tempdb..##TempBestID') IS NOT NULL DROP TABLE ##TempBestID set @MySQL = 'select top 1 ' + @IDFieldName + ' as MyId, ' + @MySumList + ' as MyGoodColCount ' + ' into ##TempBestID ' + ' from ' + @SchemaName + '.' + @TableName + ' where (' + @MyNEZeroList + ') ' + --' AND ' + @MyNotNullList + ' order by 2 desc;' print len(@MySQL) print 'select top 1 ' + @IDFieldName + ' as MyId, ' + @MySumList + ' as MyGoodColCount ' print ' into ##TempBestID ' print ' from ' + @SchemaName + '.' + @TableName print ' where (' + @MyNEZeroList + ') ' --' AND ' + @MyNotNullList + print ' order by 2 desc;' exec sp_executesql @MySQL select * from ##TempBestID SELECT @MyRecordCount = @@ROWCOUNT if @MyRecordCount = 0 BEGIN print 'No new useful records found.' set @ColumnsRemaining = 0 END -- insert into #TempGoodIDs select cast(MyId as varchar) from ##TempBestID DECLARE MyColumnCursor CURSOR FOR select tmc.column_name from #TempMyColumns tmc OPEN MyColumnCursor FETCH NEXT FROM MyColumnCursor into @MyCandidateColumn WHILE (@@FETCH_STATUS = 0) AND (@ColumnsRemaining > 0) BEGIN -- print @MyCandidateColumn if object_id('tempdb..##TempColumnFound') IS NOT NULL DROP TABLE ##TempColumnFound -- select * from tempdb.sys.tables select @MySQL = 'select count(1) as MyCount into ##TempColumnFound from ' + @SchemaName + '.' + @TableName + ' where ' + @IDFieldName + ' = ' + cast(MyID as varchar) + ' and ' + @MyCandidateColumn + ' <> 0 ' + ' and ' + @MyCandidateColumn + ' IS NOT NULL ;' from ##TempBestID; -- This will exist at runtime exec sp_executesql @MySQL select @MyGoodCount = MyCount from ##TempColumnFound; if @MyGoodCount > 0 BEGIN select @ColumnsRemaining = @ColumnsRemaining - @MyGoodCount; -- Which should be an if statement... delete from #TempMyColumns where #TempMyColumns.column_name = @MyCandidateColumn -- print 'Deleting ' + @MyCandidateColumn + ' - Columns Remaining = ' + cast(@ColumnsRemaining as varchar); END FETCH NEXT FROM MyColumnCursor into @MyCandidateColumn END CLOSE MyColumnCursor DEALLOCATE MyColumnCursor END END