Why is a temp table a more efficient solution to the Halloween Problem than an eager spool?When converting a...
Does the US government have any planning in place to ensure there's no shortages of food, fuel, steel and other commodities?
"Starve to death" Vs. "Starve to the point of death"
Has any human ever had the choice to leave Earth permanently?
What makes papers publishable in top-tier journals?
Square Root Distance from Integers
Does the ditching switch allow an A320 to float indefinitely?
What can I do to encourage my players to use their consumables?
What does an unprocessed RAW file look like?
Is there a file that always exists and a 'normal' user can't lstat it?
How do you funnel food off a cutting board?
Why didn't Tom Riddle take the presence of Fawkes and the Sorting Hat as more of a threat?
How can the probability of a fumble decrease linearly with more dice?
Why do all the books in Game of Thrones library have their covers facing the back of the shelf?
Why is the size of a directory either 0 or 4096?
Plausible reason for gold-digging ant
After checking in online, how do I know whether I need to go show my passport at airport check-in?
Why maximum length of IP, TCP, UDP packet is not suit?
Memory usage: #define vs. static const for uint8_t
Why does 0.-5 evaluate to -5?
How is it possible that the folder is there yet isnt in the same time?
When obtaining gender reassignment/plastic surgery overseas, is an emergency travel document required to return home?
Non-Cancer terminal illness that can affect young (age 10-13) girls?
What game did these black and yellow dice come from?
Is there a way to not have to poll the UART of an AVR?
Why is a temp table a more efficient solution to the Halloween Problem than an eager spool?
When converting a table valued function to inline, why do I get a lazy spool?SHOWPLAN does not display a warning but “Include Execution Plan” does for the same queryMost efficient way to insert rows into a temp table in a stored procedureIs the eager spool operator useful for this delete from a clustered columnstore?Insert into temp table is taking longer than temp variableClustered index scan appears to be costed too low with row count spoolThe actual number of row of Lazy Spool is hugeSql query performance measure IO vs TIMEPlan changes to include Eager Spool causes the query to run slowerWhat's a pathological case where a bitmap filter would not allow the PROBE(Field, IN-ROW) semijoin reduction optimization?
Consider the following query that inserts rows from a source table only if they aren't already in the target table:
INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
SELECT maybe_new_rows.ID
FROM dbo.A_HEAP_OF_MOSTLY_NEW_ROWS maybe_new_rows
WHERE NOT EXISTS (
SELECT 1
FROM dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR halloween
WHERE maybe_new_rows.ID = halloween.ID
)
OPTION (MAXDOP 1, QUERYTRACEON 7470);
One possible plan shape includes a merge join and an eager spool. The eager spool operator is present to solve the Halloween Problem:
On my machine, the above code executes in about 6900 ms. Repro code to create the tables is included at the bottom of the question. If I'm dissatisfied with performance I might try to load the rows to be inserted into a temp table instead of relying on the eager spool. Here's one possible implementation:
DROP TABLE IF EXISTS #CONSULTANT_RECOMMENDED_TEMP_TABLE;
CREATE TABLE #CONSULTANT_RECOMMENDED_TEMP_TABLE (
ID BIGINT,
PRIMARY KEY (ID)
);
INSERT INTO #CONSULTANT_RECOMMENDED_TEMP_TABLE WITH (TABLOCK)
SELECT maybe_new_rows.ID
FROM dbo.A_HEAP_OF_MOSTLY_NEW_ROWS maybe_new_rows
WHERE NOT EXISTS (
SELECT 1
FROM dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR halloween
WHERE maybe_new_rows.ID = halloween.ID
)
OPTION (MAXDOP 1, QUERYTRACEON 7470);
INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
SELECT new_rows.ID
FROM #CONSULTANT_RECOMMENDED_TEMP_TABLE new_rows
OPTION (MAXDOP 1);
The new code executes in about 4400 ms. I can get actual plans and use Actual Time Statistics™ to examine where time is spent at the operator level. Note that asking for an actual plan adds significant overhead for these queries so totals will not match the previous results.
╔═════════════╦═════════════╦══════════════╗
║ operator ║ first query ║ second query ║
╠═════════════╬═════════════╬══════════════╣
║ big scan ║ 1771 ║ 1744 ║
║ little scan ║ 163 ║ 166 ║
║ sort ║ 531 ║ 530 ║
║ merge join ║ 709 ║ 669 ║
║ spool ║ 3202 ║ N/A ║
║ temp insert ║ N/A ║ 422 ║
║ temp scan ║ N/A ║ 187 ║
║ insert ║ 3122 ║ 1545 ║
╚═════════════╩═════════════╩══════════════╝
The query plan with the eager spool seems to spend significantly more time on the insert and spool operators compared to the plan that uses the temp table.
Why is the plan with the temp table more efficient? Isn't an eager spool mostly just an internal temp table anyway? I believe I am looking for answers that focus on internals. I'm able to see how the call stacks are different but can't figure out the big picture.
I am on SQL Server 2017 CU 11 in case someone wants to know. Here is code to populate the tables used in the above queries:
DROP TABLE IF EXISTS dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR;
CREATE TABLE dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR (
ID BIGINT NOT NULL,
PRIMARY KEY (ID)
);
INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
SELECT TOP (20000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
CROSS JOIN master..spt_values t3
OPTION (MAXDOP 1);
DROP TABLE IF EXISTS dbo.A_HEAP_OF_MOSTLY_NEW_ROWS;
CREATE TABLE dbo.A_HEAP_OF_MOSTLY_NEW_ROWS (
ID BIGINT NOT NULL
);
INSERT INTO dbo.A_HEAP_OF_MOSTLY_NEW_ROWS WITH (TABLOCK)
SELECT TOP (1900000) 19999999 + ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2;
sql-server sql-server-2017 database-internals
add a comment |
Consider the following query that inserts rows from a source table only if they aren't already in the target table:
INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
SELECT maybe_new_rows.ID
FROM dbo.A_HEAP_OF_MOSTLY_NEW_ROWS maybe_new_rows
WHERE NOT EXISTS (
SELECT 1
FROM dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR halloween
WHERE maybe_new_rows.ID = halloween.ID
)
OPTION (MAXDOP 1, QUERYTRACEON 7470);
One possible plan shape includes a merge join and an eager spool. The eager spool operator is present to solve the Halloween Problem:
On my machine, the above code executes in about 6900 ms. Repro code to create the tables is included at the bottom of the question. If I'm dissatisfied with performance I might try to load the rows to be inserted into a temp table instead of relying on the eager spool. Here's one possible implementation:
DROP TABLE IF EXISTS #CONSULTANT_RECOMMENDED_TEMP_TABLE;
CREATE TABLE #CONSULTANT_RECOMMENDED_TEMP_TABLE (
ID BIGINT,
PRIMARY KEY (ID)
);
INSERT INTO #CONSULTANT_RECOMMENDED_TEMP_TABLE WITH (TABLOCK)
SELECT maybe_new_rows.ID
FROM dbo.A_HEAP_OF_MOSTLY_NEW_ROWS maybe_new_rows
WHERE NOT EXISTS (
SELECT 1
FROM dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR halloween
WHERE maybe_new_rows.ID = halloween.ID
)
OPTION (MAXDOP 1, QUERYTRACEON 7470);
INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
SELECT new_rows.ID
FROM #CONSULTANT_RECOMMENDED_TEMP_TABLE new_rows
OPTION (MAXDOP 1);
The new code executes in about 4400 ms. I can get actual plans and use Actual Time Statistics™ to examine where time is spent at the operator level. Note that asking for an actual plan adds significant overhead for these queries so totals will not match the previous results.
╔═════════════╦═════════════╦══════════════╗
║ operator ║ first query ║ second query ║
╠═════════════╬═════════════╬══════════════╣
║ big scan ║ 1771 ║ 1744 ║
║ little scan ║ 163 ║ 166 ║
║ sort ║ 531 ║ 530 ║
║ merge join ║ 709 ║ 669 ║
║ spool ║ 3202 ║ N/A ║
║ temp insert ║ N/A ║ 422 ║
║ temp scan ║ N/A ║ 187 ║
║ insert ║ 3122 ║ 1545 ║
╚═════════════╩═════════════╩══════════════╝
The query plan with the eager spool seems to spend significantly more time on the insert and spool operators compared to the plan that uses the temp table.
Why is the plan with the temp table more efficient? Isn't an eager spool mostly just an internal temp table anyway? I believe I am looking for answers that focus on internals. I'm able to see how the call stacks are different but can't figure out the big picture.
I am on SQL Server 2017 CU 11 in case someone wants to know. Here is code to populate the tables used in the above queries:
DROP TABLE IF EXISTS dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR;
CREATE TABLE dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR (
ID BIGINT NOT NULL,
PRIMARY KEY (ID)
);
INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
SELECT TOP (20000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
CROSS JOIN master..spt_values t3
OPTION (MAXDOP 1);
DROP TABLE IF EXISTS dbo.A_HEAP_OF_MOSTLY_NEW_ROWS;
CREATE TABLE dbo.A_HEAP_OF_MOSTLY_NEW_ROWS (
ID BIGINT NOT NULL
);
INSERT INTO dbo.A_HEAP_OF_MOSTLY_NEW_ROWS WITH (TABLOCK)
SELECT TOP (1900000) 19999999 + ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2;
sql-server sql-server-2017 database-internals
What happens if you useMERGE
to insert non-existent rows?
– Michael Kutz
7 hours ago
Spool has to work with bags not just sets, and even with sets there's no guarantee it knows any keys. Note that in your temp table, you not only asserted it was a set but also provided an index supporting that, by declaring the primary key.
– Damien_The_Unbeliever
43 mins ago
add a comment |
Consider the following query that inserts rows from a source table only if they aren't already in the target table:
INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
SELECT maybe_new_rows.ID
FROM dbo.A_HEAP_OF_MOSTLY_NEW_ROWS maybe_new_rows
WHERE NOT EXISTS (
SELECT 1
FROM dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR halloween
WHERE maybe_new_rows.ID = halloween.ID
)
OPTION (MAXDOP 1, QUERYTRACEON 7470);
One possible plan shape includes a merge join and an eager spool. The eager spool operator is present to solve the Halloween Problem:
On my machine, the above code executes in about 6900 ms. Repro code to create the tables is included at the bottom of the question. If I'm dissatisfied with performance I might try to load the rows to be inserted into a temp table instead of relying on the eager spool. Here's one possible implementation:
DROP TABLE IF EXISTS #CONSULTANT_RECOMMENDED_TEMP_TABLE;
CREATE TABLE #CONSULTANT_RECOMMENDED_TEMP_TABLE (
ID BIGINT,
PRIMARY KEY (ID)
);
INSERT INTO #CONSULTANT_RECOMMENDED_TEMP_TABLE WITH (TABLOCK)
SELECT maybe_new_rows.ID
FROM dbo.A_HEAP_OF_MOSTLY_NEW_ROWS maybe_new_rows
WHERE NOT EXISTS (
SELECT 1
FROM dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR halloween
WHERE maybe_new_rows.ID = halloween.ID
)
OPTION (MAXDOP 1, QUERYTRACEON 7470);
INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
SELECT new_rows.ID
FROM #CONSULTANT_RECOMMENDED_TEMP_TABLE new_rows
OPTION (MAXDOP 1);
The new code executes in about 4400 ms. I can get actual plans and use Actual Time Statistics™ to examine where time is spent at the operator level. Note that asking for an actual plan adds significant overhead for these queries so totals will not match the previous results.
╔═════════════╦═════════════╦══════════════╗
║ operator ║ first query ║ second query ║
╠═════════════╬═════════════╬══════════════╣
║ big scan ║ 1771 ║ 1744 ║
║ little scan ║ 163 ║ 166 ║
║ sort ║ 531 ║ 530 ║
║ merge join ║ 709 ║ 669 ║
║ spool ║ 3202 ║ N/A ║
║ temp insert ║ N/A ║ 422 ║
║ temp scan ║ N/A ║ 187 ║
║ insert ║ 3122 ║ 1545 ║
╚═════════════╩═════════════╩══════════════╝
The query plan with the eager spool seems to spend significantly more time on the insert and spool operators compared to the plan that uses the temp table.
Why is the plan with the temp table more efficient? Isn't an eager spool mostly just an internal temp table anyway? I believe I am looking for answers that focus on internals. I'm able to see how the call stacks are different but can't figure out the big picture.
I am on SQL Server 2017 CU 11 in case someone wants to know. Here is code to populate the tables used in the above queries:
DROP TABLE IF EXISTS dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR;
CREATE TABLE dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR (
ID BIGINT NOT NULL,
PRIMARY KEY (ID)
);
INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
SELECT TOP (20000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
CROSS JOIN master..spt_values t3
OPTION (MAXDOP 1);
DROP TABLE IF EXISTS dbo.A_HEAP_OF_MOSTLY_NEW_ROWS;
CREATE TABLE dbo.A_HEAP_OF_MOSTLY_NEW_ROWS (
ID BIGINT NOT NULL
);
INSERT INTO dbo.A_HEAP_OF_MOSTLY_NEW_ROWS WITH (TABLOCK)
SELECT TOP (1900000) 19999999 + ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2;
sql-server sql-server-2017 database-internals
Consider the following query that inserts rows from a source table only if they aren't already in the target table:
INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
SELECT maybe_new_rows.ID
FROM dbo.A_HEAP_OF_MOSTLY_NEW_ROWS maybe_new_rows
WHERE NOT EXISTS (
SELECT 1
FROM dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR halloween
WHERE maybe_new_rows.ID = halloween.ID
)
OPTION (MAXDOP 1, QUERYTRACEON 7470);
One possible plan shape includes a merge join and an eager spool. The eager spool operator is present to solve the Halloween Problem:
On my machine, the above code executes in about 6900 ms. Repro code to create the tables is included at the bottom of the question. If I'm dissatisfied with performance I might try to load the rows to be inserted into a temp table instead of relying on the eager spool. Here's one possible implementation:
DROP TABLE IF EXISTS #CONSULTANT_RECOMMENDED_TEMP_TABLE;
CREATE TABLE #CONSULTANT_RECOMMENDED_TEMP_TABLE (
ID BIGINT,
PRIMARY KEY (ID)
);
INSERT INTO #CONSULTANT_RECOMMENDED_TEMP_TABLE WITH (TABLOCK)
SELECT maybe_new_rows.ID
FROM dbo.A_HEAP_OF_MOSTLY_NEW_ROWS maybe_new_rows
WHERE NOT EXISTS (
SELECT 1
FROM dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR halloween
WHERE maybe_new_rows.ID = halloween.ID
)
OPTION (MAXDOP 1, QUERYTRACEON 7470);
INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
SELECT new_rows.ID
FROM #CONSULTANT_RECOMMENDED_TEMP_TABLE new_rows
OPTION (MAXDOP 1);
The new code executes in about 4400 ms. I can get actual plans and use Actual Time Statistics™ to examine where time is spent at the operator level. Note that asking for an actual plan adds significant overhead for these queries so totals will not match the previous results.
╔═════════════╦═════════════╦══════════════╗
║ operator ║ first query ║ second query ║
╠═════════════╬═════════════╬══════════════╣
║ big scan ║ 1771 ║ 1744 ║
║ little scan ║ 163 ║ 166 ║
║ sort ║ 531 ║ 530 ║
║ merge join ║ 709 ║ 669 ║
║ spool ║ 3202 ║ N/A ║
║ temp insert ║ N/A ║ 422 ║
║ temp scan ║ N/A ║ 187 ║
║ insert ║ 3122 ║ 1545 ║
╚═════════════╩═════════════╩══════════════╝
The query plan with the eager spool seems to spend significantly more time on the insert and spool operators compared to the plan that uses the temp table.
Why is the plan with the temp table more efficient? Isn't an eager spool mostly just an internal temp table anyway? I believe I am looking for answers that focus on internals. I'm able to see how the call stacks are different but can't figure out the big picture.
I am on SQL Server 2017 CU 11 in case someone wants to know. Here is code to populate the tables used in the above queries:
DROP TABLE IF EXISTS dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR;
CREATE TABLE dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR (
ID BIGINT NOT NULL,
PRIMARY KEY (ID)
);
INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
SELECT TOP (20000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
CROSS JOIN master..spt_values t3
OPTION (MAXDOP 1);
DROP TABLE IF EXISTS dbo.A_HEAP_OF_MOSTLY_NEW_ROWS;
CREATE TABLE dbo.A_HEAP_OF_MOSTLY_NEW_ROWS (
ID BIGINT NOT NULL
);
INSERT INTO dbo.A_HEAP_OF_MOSTLY_NEW_ROWS WITH (TABLOCK)
SELECT TOP (1900000) 19999999 + ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2;
sql-server sql-server-2017 database-internals
sql-server sql-server-2017 database-internals
asked 8 hours ago
Joe ObbishJoe Obbish
21.1k33083
21.1k33083
What happens if you useMERGE
to insert non-existent rows?
– Michael Kutz
7 hours ago
Spool has to work with bags not just sets, and even with sets there's no guarantee it knows any keys. Note that in your temp table, you not only asserted it was a set but also provided an index supporting that, by declaring the primary key.
– Damien_The_Unbeliever
43 mins ago
add a comment |
What happens if you useMERGE
to insert non-existent rows?
– Michael Kutz
7 hours ago
Spool has to work with bags not just sets, and even with sets there's no guarantee it knows any keys. Note that in your temp table, you not only asserted it was a set but also provided an index supporting that, by declaring the primary key.
– Damien_The_Unbeliever
43 mins ago
What happens if you use
MERGE
to insert non-existent rows?– Michael Kutz
7 hours ago
What happens if you use
MERGE
to insert non-existent rows?– Michael Kutz
7 hours ago
Spool has to work with bags not just sets, and even with sets there's no guarantee it knows any keys. Note that in your temp table, you not only asserted it was a set but also provided an index supporting that, by declaring the primary key.
– Damien_The_Unbeliever
43 mins ago
Spool has to work with bags not just sets, and even with sets there's no guarantee it knows any keys. Note that in your temp table, you not only asserted it was a set but also provided an index supporting that, by declaring the primary key.
– Damien_The_Unbeliever
43 mins ago
add a comment |
1 Answer
1
active
oldest
votes
This is what I call Manual Halloween Protection.
You can find an example of it being used with an update statement in my article Optimizing Update Queries. One has to be a bit careful to preserve the same semantics, for example by locking the target table against all concurrent modifications while the separate queries execute, if that is relevant in your scenario.
Why is the plan with the temp table more efficient? Isn't an eager spool mostly just an internal temp table anyway?
A spool has some of the characteristics of a temporary table, but the two are not exact equivalents. In particular, a spool is essentially a row-by-row unordered insert to a b-tree structure. It does benefit from locking and logging optimizations, but does not support bulk load optimizations.
Consequently, one can often get better performance by splitting the query in a natural way: Bulk loading the new rows into a temporary table or variable, then performing an optimized insert (without explicit Halloween Protection) from the temporary object.
Making this separation also allows you extra freedom to tune the read and write portions of the original statement separately.
As a side note, it is interesting to think about how the Halloween Problem might be addressed using row versions. Perhaps a future version of SQL Server will provide that feature in suitable circumstances.
As Michael Kutz alluded to in a comment, you could also explore the possibility of exploiting the hole-filling optimization to avoid explicit HP. One way to achieve this for the demo is to create a unique index (clustered if you like) on the ID
column of A_HEAP_OF_MOSTLY_NEW_ROWS
.
CREATE UNIQUE INDEX i ON dbo.A_HEAP_OF_MOSTLY_NEW_ROWS (ID);
With that guarantee in place the optimizer can use hole-filling and rowset sharing:
MERGE dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (SERIALIZABLE) AS HICETY
USING dbo.A_HEAP_OF_MOSTLY_NEW_ROWS AS AHOMNR
ON AHOMNR.ID = HICETY.ID
WHEN NOT MATCHED BY TARGET
THEN INSERT (ID) VALUES (AHOMNR.ID);
While interesting, you will still be able to achieve better performance in many cases by employing carefully-implemented Manual Halloween Protection.
AHOMNR begat HICETY.
– Erik Darling
4 hours ago
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f230722%2fwhy-is-a-temp-table-a-more-efficient-solution-to-the-halloween-problem-than-an-e%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
This is what I call Manual Halloween Protection.
You can find an example of it being used with an update statement in my article Optimizing Update Queries. One has to be a bit careful to preserve the same semantics, for example by locking the target table against all concurrent modifications while the separate queries execute, if that is relevant in your scenario.
Why is the plan with the temp table more efficient? Isn't an eager spool mostly just an internal temp table anyway?
A spool has some of the characteristics of a temporary table, but the two are not exact equivalents. In particular, a spool is essentially a row-by-row unordered insert to a b-tree structure. It does benefit from locking and logging optimizations, but does not support bulk load optimizations.
Consequently, one can often get better performance by splitting the query in a natural way: Bulk loading the new rows into a temporary table or variable, then performing an optimized insert (without explicit Halloween Protection) from the temporary object.
Making this separation also allows you extra freedom to tune the read and write portions of the original statement separately.
As a side note, it is interesting to think about how the Halloween Problem might be addressed using row versions. Perhaps a future version of SQL Server will provide that feature in suitable circumstances.
As Michael Kutz alluded to in a comment, you could also explore the possibility of exploiting the hole-filling optimization to avoid explicit HP. One way to achieve this for the demo is to create a unique index (clustered if you like) on the ID
column of A_HEAP_OF_MOSTLY_NEW_ROWS
.
CREATE UNIQUE INDEX i ON dbo.A_HEAP_OF_MOSTLY_NEW_ROWS (ID);
With that guarantee in place the optimizer can use hole-filling and rowset sharing:
MERGE dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (SERIALIZABLE) AS HICETY
USING dbo.A_HEAP_OF_MOSTLY_NEW_ROWS AS AHOMNR
ON AHOMNR.ID = HICETY.ID
WHEN NOT MATCHED BY TARGET
THEN INSERT (ID) VALUES (AHOMNR.ID);
While interesting, you will still be able to achieve better performance in many cases by employing carefully-implemented Manual Halloween Protection.
AHOMNR begat HICETY.
– Erik Darling
4 hours ago
add a comment |
This is what I call Manual Halloween Protection.
You can find an example of it being used with an update statement in my article Optimizing Update Queries. One has to be a bit careful to preserve the same semantics, for example by locking the target table against all concurrent modifications while the separate queries execute, if that is relevant in your scenario.
Why is the plan with the temp table more efficient? Isn't an eager spool mostly just an internal temp table anyway?
A spool has some of the characteristics of a temporary table, but the two are not exact equivalents. In particular, a spool is essentially a row-by-row unordered insert to a b-tree structure. It does benefit from locking and logging optimizations, but does not support bulk load optimizations.
Consequently, one can often get better performance by splitting the query in a natural way: Bulk loading the new rows into a temporary table or variable, then performing an optimized insert (without explicit Halloween Protection) from the temporary object.
Making this separation also allows you extra freedom to tune the read and write portions of the original statement separately.
As a side note, it is interesting to think about how the Halloween Problem might be addressed using row versions. Perhaps a future version of SQL Server will provide that feature in suitable circumstances.
As Michael Kutz alluded to in a comment, you could also explore the possibility of exploiting the hole-filling optimization to avoid explicit HP. One way to achieve this for the demo is to create a unique index (clustered if you like) on the ID
column of A_HEAP_OF_MOSTLY_NEW_ROWS
.
CREATE UNIQUE INDEX i ON dbo.A_HEAP_OF_MOSTLY_NEW_ROWS (ID);
With that guarantee in place the optimizer can use hole-filling and rowset sharing:
MERGE dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (SERIALIZABLE) AS HICETY
USING dbo.A_HEAP_OF_MOSTLY_NEW_ROWS AS AHOMNR
ON AHOMNR.ID = HICETY.ID
WHEN NOT MATCHED BY TARGET
THEN INSERT (ID) VALUES (AHOMNR.ID);
While interesting, you will still be able to achieve better performance in many cases by employing carefully-implemented Manual Halloween Protection.
AHOMNR begat HICETY.
– Erik Darling
4 hours ago
add a comment |
This is what I call Manual Halloween Protection.
You can find an example of it being used with an update statement in my article Optimizing Update Queries. One has to be a bit careful to preserve the same semantics, for example by locking the target table against all concurrent modifications while the separate queries execute, if that is relevant in your scenario.
Why is the plan with the temp table more efficient? Isn't an eager spool mostly just an internal temp table anyway?
A spool has some of the characteristics of a temporary table, but the two are not exact equivalents. In particular, a spool is essentially a row-by-row unordered insert to a b-tree structure. It does benefit from locking and logging optimizations, but does not support bulk load optimizations.
Consequently, one can often get better performance by splitting the query in a natural way: Bulk loading the new rows into a temporary table or variable, then performing an optimized insert (without explicit Halloween Protection) from the temporary object.
Making this separation also allows you extra freedom to tune the read and write portions of the original statement separately.
As a side note, it is interesting to think about how the Halloween Problem might be addressed using row versions. Perhaps a future version of SQL Server will provide that feature in suitable circumstances.
As Michael Kutz alluded to in a comment, you could also explore the possibility of exploiting the hole-filling optimization to avoid explicit HP. One way to achieve this for the demo is to create a unique index (clustered if you like) on the ID
column of A_HEAP_OF_MOSTLY_NEW_ROWS
.
CREATE UNIQUE INDEX i ON dbo.A_HEAP_OF_MOSTLY_NEW_ROWS (ID);
With that guarantee in place the optimizer can use hole-filling and rowset sharing:
MERGE dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (SERIALIZABLE) AS HICETY
USING dbo.A_HEAP_OF_MOSTLY_NEW_ROWS AS AHOMNR
ON AHOMNR.ID = HICETY.ID
WHEN NOT MATCHED BY TARGET
THEN INSERT (ID) VALUES (AHOMNR.ID);
While interesting, you will still be able to achieve better performance in many cases by employing carefully-implemented Manual Halloween Protection.
This is what I call Manual Halloween Protection.
You can find an example of it being used with an update statement in my article Optimizing Update Queries. One has to be a bit careful to preserve the same semantics, for example by locking the target table against all concurrent modifications while the separate queries execute, if that is relevant in your scenario.
Why is the plan with the temp table more efficient? Isn't an eager spool mostly just an internal temp table anyway?
A spool has some of the characteristics of a temporary table, but the two are not exact equivalents. In particular, a spool is essentially a row-by-row unordered insert to a b-tree structure. It does benefit from locking and logging optimizations, but does not support bulk load optimizations.
Consequently, one can often get better performance by splitting the query in a natural way: Bulk loading the new rows into a temporary table or variable, then performing an optimized insert (without explicit Halloween Protection) from the temporary object.
Making this separation also allows you extra freedom to tune the read and write portions of the original statement separately.
As a side note, it is interesting to think about how the Halloween Problem might be addressed using row versions. Perhaps a future version of SQL Server will provide that feature in suitable circumstances.
As Michael Kutz alluded to in a comment, you could also explore the possibility of exploiting the hole-filling optimization to avoid explicit HP. One way to achieve this for the demo is to create a unique index (clustered if you like) on the ID
column of A_HEAP_OF_MOSTLY_NEW_ROWS
.
CREATE UNIQUE INDEX i ON dbo.A_HEAP_OF_MOSTLY_NEW_ROWS (ID);
With that guarantee in place the optimizer can use hole-filling and rowset sharing:
MERGE dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (SERIALIZABLE) AS HICETY
USING dbo.A_HEAP_OF_MOSTLY_NEW_ROWS AS AHOMNR
ON AHOMNR.ID = HICETY.ID
WHEN NOT MATCHED BY TARGET
THEN INSERT (ID) VALUES (AHOMNR.ID);
While interesting, you will still be able to achieve better performance in many cases by employing carefully-implemented Manual Halloween Protection.
answered 6 hours ago
Paul White♦Paul White
52.5k14279454
52.5k14279454
AHOMNR begat HICETY.
– Erik Darling
4 hours ago
add a comment |
AHOMNR begat HICETY.
– Erik Darling
4 hours ago
AHOMNR begat HICETY.
– Erik Darling
4 hours ago
AHOMNR begat HICETY.
– Erik Darling
4 hours ago
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f230722%2fwhy-is-a-temp-table-a-more-efficient-solution-to-the-halloween-problem-than-an-e%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
What happens if you use
MERGE
to insert non-existent rows?– Michael Kutz
7 hours ago
Spool has to work with bags not just sets, and even with sets there's no guarantee it knows any keys. Note that in your temp table, you not only asserted it was a set but also provided an index supporting that, by declaring the primary key.
– Damien_The_Unbeliever
43 mins ago