Is this possible to select a particular column in the way that select each word in each of the rows with its word count. For example, if I have a table like this:
ID Description
1 Hello World
2 Hello Buddy
3 Buddy List
I need to write a select statement that should return following result:
DECLARE @tmp TABLE (
ID INT IDENTITY(1,1)
, [Description] VARCHAR(50)
)
INSERT INTO @tmp ([Description])
SELECT 'Hello World' UNION ALL
SELECT 'Hello Buddy' UNION ALL
SELECT 'Buddy List'
SELECT ItemData AS Word, Count(*) AS [Count]
FROM @tmp AS t
CROSS APPLY dbo.fn_SplitStringToTable(t.[Description], ' ')
GROUP BY ItemData
Last edited by CVMichael; Dec 14th, 2011 at 12:13 PM.
Thanks a lot for the great code. I have following SQL with respect to your suggestion:
Code:
DECLARE @tmp TABLE (
ID INT IDENTITY(1,1)
, [Description] TEXT
)
INSERT INTO @tmp ([Description])
SELECT [Name] from IdeaSheet WHERE ForumID = 'F89BC50B-19F2-496C-9CF5-A5FB6DF0EABE' UNION ALL
SELECT [Description] from IdeaSheet WHERE ForumID = 'F89BC50B-19F2-496C-9CF5-A5FB6DF0EABE' UNION ALL
SELECT [Insight] from IdeaSheet WHERE ForumID = 'F89BC50B-19F2-496C-9CF5-A5FB6DF0EABE' UNION ALL
SELECT [Support] from IdeaSheet WHERE ForumID = 'F89BC50B-19F2-496C-9CF5-A5FB6DF0EABE'
SELECT ItemData AS Word, Count(*) AS [Count]
FROM @tmp AS t
CROSS APPLY dbo.fn_SplitStringToTable(t.[Description], ' ')
GROUP BY ItemData
When I run it, it gives an error The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
I added OPTION (MAXRECURSION 32767); at the end of your function but still it reaches the maximum recursion limit.
In my table, the column [Name] is VARCHAR(50) and other three columns are NTEXT, so what should I use? I am using SQL Server 2005. I have very long strings in each of the NTEXT columns.
String functions don't work on (N)TEXT, and should not be used for string search & manipulation. This is why (N)VARCHAR(MAX) was introduced in SQL Server 2005.
I'll change the data type and it will take me a little longer because I need to change all the stored procedures as well. Lets make the SQL work on VARCHAR(50). When I run the following SQL, I am still getting the same error.
Code:
DECLARE @tmp TABLE (
ID INT IDENTITY(1,1)
, [Description] VARCHAR(50)
)
INSERT INTO @tmp ([Description])
SELECT [Name] from IdeaSheet WHERE ForumID = 'F89BC50B-19F2-496C-9CF5-A5FB6DF0EABE'
SELECT ItemData AS Word, Count(*) AS [Count]
FROM @tmp AS t
CROSS APPLY dbo.fn_SplitStringToTable(t.[Description], ' ')
GROUP BY ItemData
OK, I need a string that gives that error, so I can test on my computer and figure out where the problem is.
To start, can you select only a few records at once, to see if you get an error?
Something like
Code:
INSERT INTO @tmp ([Description])
SELECT TOP 10 [Name] from IdeaSheet WHERE ForumID = 'F89BC50B-19F2-496C-9CF5-A5FB6DF0EABE'
If that does not give you an error, then "TOP 20" (or more), until you get an error, then if you can check what string(s) you have that give you the error?
I don't know how else to do it because it's working fine on my side...
Or... what if you do something like:
Code:
INSERT INTO @tmp ([Description])
SELECT 'akldjf kjasdf kjasdfkl ashkd fk a rsdfklj asdfklj'
That can't be right... are you sure you did not make any modifications to my function?
Here it is again: (I added an extra IF statement to check string length before execution...)
Code:
ALTER FUNCTION [dbo].[fn_SplitStringToTable]
(
@DataList NVARCHAR(MAX)
, @Separator NVARCHAR(MAX)
)
RETURNS @tbl TABLE (
RowIndex INT PRIMARY KEY
, FromPos INT
, ToPos INT
, ItemData NVARCHAR(MAX)
)
AS
BEGIN
-- SELECT * FROM dbo.fn_SplitStringToTable('123,43,5465,6788,1231,111', ',')
DECLARE @LenSep INT
SET @LenSep = DATALENGTH(@Separator) / 2
IF @LenSep > 0 AND DATALENGTH(@DataList) > 0 BEGIN
; WITH res (RowIndex, FromPos, ToPos) AS (
SELECT CAST(1 AS INT) AS RowIndex
, CAST(1 AS INT) AS FromPos
, CAST(CHARINDEX(@Separator, @DataList + @Separator) AS INT) AS ToPos
UNION ALL
SELECT CAST(RowIndex + 1 AS INT) AS RowIndex
, CAST(res.ToPos + @LenSep AS INT) AS FromPos
, CAST(CHARINDEX(@Separator, @DataList + @Separator, ToPos + @LenSep) AS INT) AS ToPos
FROM res
WHERE CHARINDEX(@Separator, @DataList + @Separator, ToPos + @LenSep) > 0
)
INSERT INTO @tbl
SELECT res.*, SUBSTRING(@DataList, FromPos, ToPos - FromPos) AS ItemData
FROM res
END
RETURN
END
Also, just to double check, what version number shows for the database? (See attached)
I just copied your function and it worked. I only added OPTION (MAXRECURSION 32767); in your function but if replacing the function solved the issue then I must had something wrong in implementing your function.
I updated the code in the CodeBank to the one I posted here (even though I did not see if there is any mistake), and I also added "OPTION (MAXRECURSION 0)" to remove the recursion limit.
Sorry to disturb you again, I have a little problem in implementing, not related to your function but writing SQL over it. I have the following SQL statement:
Code:
DECLARE @tmp TABLE (
ID INT IDENTITY(1,1)
, [Description] VARCHAR(max)
)
INSERT INTO @tmp ([Description])
SELECT dbo.RemoveSpecialCharacters([Name]) from IdeaSheet WHERE ForumID = 'F89BC50B-19F2-496C-9CF5-A5FB6DF0EABE' UNION ALL
SELECT dbo.RemoveSpecialCharacters([Description]) from IdeaSheet WHERE ForumID = 'F89BC50B-19F2-496C-9CF5-A5FB6DF0EABE' UNION ALL
SELECT dbo.RemoveSpecialCharacters([Insight]) from IdeaSheet WHERE ForumID = 'F89BC50B-19F2-496C-9CF5-A5FB6DF0EABE' UNION ALL
SELECT dbo.RemoveSpecialCharacters([Support]) from IdeaSheet WHERE ForumID = 'F89BC50B-19F2-496C-9CF5-A5FB6DF0EABE'
update @tmp set [Description] = dbo.RemoveSpaces([Description])
SELECT ItemData AS Word, Count(*) AS [Count]
FROM @tmp AS t
CROSS APPLY dbo.fn_SplitStringToTable(t.[Description], ' ')
GROUP BY ItemData
I have one function called RemoveSpecialCharacters that just removed special characters from the string like comma, question mark, dot etc. I have another function RemoveSpaces as below:
Code:
alter FUNCTION dbo.RemoveSpaces(@Data VARCHAR(MAX))
RETURNS VARCHAR(MAX)
WITH EXECUTE AS CALLER
AS
BEGIN
SET @Data = LTRIM(RTRIM(@Data));
SET @Data = REPLACE(@Data,' ',' ');
SET @Data = REPLACE(@Data,' ',' ');
SET @Data = REPLACE(@Data,' ',' ');
SET @Data = REPLACE(@Data,' ',' ');
SET @Data = REPLACE(@Data,' ',' ');
SET @Data = REPLACE(@Data,' ',' ');
SET @Data = REPLACE(@Data,' ',' ');
SET @Data = REPLACE(@Data,' ',' ');
SET @Data = REPLACE(@Data,' ',' ');
SET @Data = REPLACE(@Data,' ',' ');
RETURN(@Data);
END;
When I run my query, it gives me correct resultset including some blank entries at the top and some words are shown with spaces at the start. This way a same word with and without space is listed two times hence giving incorrect value for me to work further. My resultset is attached:
Would you please help me in avoiding all kinds of spaces whether they are in the start of resultset or spaces before or after any of the words?
usamaalam, just put LTRIM and RTRIM when returning the string
Code:
alter FUNCTION dbo.RemoveSpaces(@Data VARCHAR(MAX))
RETURNS VARCHAR(MAX)
WITH EXECUTE AS CALLER
AS
BEGIN
SET @Data = LTRIM(RTRIM(@Data));
SET @Data = REPLACE(@Data,' ',' ');
SET @Data = REPLACE(@Data,' ',' ');
SET @Data = REPLACE(@Data,' ',' ');
SET @Data = REPLACE(@Data,' ',' ');
SET @Data = REPLACE(@Data,' ',' ');
SET @Data = REPLACE(@Data,' ',' ');
SET @Data = REPLACE(@Data,' ',' ');
SET @Data = REPLACE(@Data,' ',' ');
SET @Data = REPLACE(@Data,' ',' ');
SET @Data = REPLACE(@Data,' ',' ');
RETURN(LTRIM(RTRIM(@Data)));
END;