J'ai une grande table dans une base de données SQl Server 2008, elle contient environ 570 millions d'enregistrements. Chaque jour, nous exécutons un travail par lots qui prend un fichier d'environ 200 000 enregistrements de transactions, effectue un regroupement et une somme par rapport à ces données et les insère dans la grande table.
Récemment, j'ai expérimenté la modification de l'index cluster de la grande table en une colonne d'identité int, ce qui a réduit l'insertion de 3 heures à une heure, mais je suis toujours perplexe quant à la raison pour laquelle cette requête simple devrait prendre autant de temps à s'exécuter (indépendamment de la taille de la table)
Voici le tableau avec 570 millions de lignes
CREATE TABLE [dbo].[POINTS_EARNED](
[POINTS_EARNED_ID]int identity not null,
[CARD_ID] [int] NOT NULL,
[CYCLE_ID] [int] NOT NULL,
[POINTS_CODE] [int] NOT NULL,
[NO_POINTS] [int] NULL,
[ACCOUNT_ID] [int] NOT NULL,
[CREATED_DATE] [datetime] NULL,
[CREATED_BY] [varchar](20) NULL,
[LAST_MODIFIED_DATE] [datetime] NULL,
[LAST_MODIFIED_BY] [varchar](20) NULL,
[DELETED] [bit] NULL,
CONSTRAINT [PK_POINTS_EARNED] PRIMARY KEY CLUSTERED
(
[POINTS_EARNED_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Il a également des contraintes (valeurs par défaut et clés étrangères) et des index, et je me demande si ce sont eux qui causent le problème.
Le SQL réel qui prend une heure à exécuter est:
insert into points_earned (
card_id,
cycle_id,
points_code,
no_points,
account_id
)
select pe.card_id, pe.cycle_id, pe.points_code, sum(pe.no_points),pe.account_id
from #points_earned pe
group by pe.card_id, pe.cycle_id, pe.points_code,pe.account_id
et la table temporaire #points_earned a environ 200 000 lignes et a la structure suivante (sans index)
create table #points_earned (
card_id int,
cycle_id int,
points_code int,
card_type varchar(5),
no_points int,
account_id int
)
Donc, je voudrais quelques opinions sur si je devrais
Mettre à jour - comme demandé un peu plus d'informations - L'instruction select s'exécute sans l'insertion en 2 secondes, donc cela ne semble pas être le problème, donc probablement pas besoin de s'inquiéter d'indexer le temp table
Les index, le déclencheur (de mise à jour) et les contraintes sont:
CREATE NONCLUSTERED INDEX [IDX_CYCLE_ID] ON [dbo].[POINTS_EARNED]
(
[CYCLE_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_ACCOUNT_ID] ON [dbo].[POINTS_EARNED]
(
[ACCOUNT_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_ACCOUNT_ID_POINTS_CODE] ON [dbo].[POINTS_EARNED]
(
[ACCOUNT_ID] ASC,
[POINTS_CODE] ASC
)
INCLUDE ( [CARD_ID],
[CYCLE_ID],
[NO_POINTS]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [RELATION_151_FK] ON [dbo].[POINTS_EARNED]
(
[CARD_ID] ASC,
[CYCLE_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [RELATION_152_FK] ON [dbo].[POINTS_EARNED]
(
[POINTS_CODE] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Trigger [update_points_earned] Script Date: 09/13/2013 13:20:54 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TRIGGER [dbo].[update_points_earned] ON [dbo].[POINTS_EARNED]
FOR UPDATE
AS
BEGIN
UPDATE points_earned
SET Last_Modified_By = USER,
Last_Modified_Date = GETDATE()
FROM
points_earned t,
inserted i
WHERE
t.card_id = i.card_id AND
t.cycle_id = i.cycle_id AND
t.points_code = i.points_code AND
t.account_id = i.account_id
END
GO
/****** Object: Default [DF_POINTS_EARNED_ACCOUNT_ID] Script Date: 09/13/2013 13:20:54 ******/
ALTER TABLE [dbo].[POINTS_EARNED] ADD CONSTRAINT [DF_POINTS_EARNED_ACCOUNT_ID] DEFAULT ((0)) FOR [ACCOUNT_ID]
GO
/****** Object: Default [DF_POINTS_EARNED_CREATED_DATE] Script Date: 09/13/2013 13:20:54 ******/
ALTER TABLE [dbo].[POINTS_EARNED] ADD CONSTRAINT [DF_POINTS_EARNED_CREATED_DATE] DEFAULT (getdate()) FOR [CREATED_DATE]
GO
/****** Object: Default [DF_POINTS_EARNED_CREATED_BY] Script Date: 09/13/2013 13:20:54 ******/
ALTER TABLE [dbo].[POINTS_EARNED] ADD CONSTRAINT [DF_POINTS_EARNED_CREATED_BY] DEFAULT (user_name()) FOR [CREATED_BY]
GO
/****** Object: Default [DF_POINTS_EARNED_DELETED] Script Date: 09/13/2013 13:20:54 ******/
ALTER TABLE [dbo].[POINTS_EARNED] ADD CONSTRAINT [DF_POINTS_EARNED_DELETED] DEFAULT ((0)) FOR [DELETED]
GO
/****** Object: ForeignKey [FK_POINTS_E_REFERENCE_CYCLE_CA] Script Date: 09/13/2013 13:20:54 ******/
ALTER TABLE [dbo].[POINTS_EARNED] WITH CHECK ADD CONSTRAINT [FK_POINTS_E_REFERENCE_CYCLE_CA] FOREIGN KEY([CARD_ID], [CYCLE_ID])
REFERENCES [dbo].[CYCLE_CARD] ([CARD_ID], [CYCLE_ID])
GO
ALTER TABLE [dbo].[POINTS_EARNED] CHECK CONSTRAINT [FK_POINTS_E_REFERENCE_CYCLE_CA]
GO
/****** Object: ForeignKey [FK_POINTS_E_REFERENCE_POINTS_C] Script Date: 09/13/2013 13:20:54 ******/
ALTER TABLE [dbo].[POINTS_EARNED] WITH NOCHECK ADD CONSTRAINT [FK_POINTS_E_REFERENCE_POINTS_C] FOREIGN KEY([POINTS_CODE])
REFERENCES [dbo].[POINTS_CODE] ([POINTS_CODE])
GO
ALTER TABLE [dbo].[POINTS_EARNED] CHECK CONSTRAINT [FK_POINTS_E_REFERENCE_POINTS_C]
GO
/****** Object: ForeignKey [FK_POINTS_EARNED_REF_ACCOUNT] Script Date: 09/13/2013 13:20:54 ******/
ALTER TABLE [dbo].[POINTS_EARNED] WITH NOCHECK ADD CONSTRAINT [FK_POINTS_EARNED_REF_ACCOUNT] FOREIGN KEY([ACCOUNT_ID])
REFERENCES [dbo].[ACCOUNT] ([ACCOUNT_ID])
GO
ALTER TABLE [dbo].[POINTS_EARNED] CHECK CONSTRAINT [FK_POINTS_EARNED_REF_ACCOUNT]
Edit 2, plan de requête pour l'instruction d'insertion
|--Sequence
|--Index Insert(OBJECT:([Progressive_Points].[dbo].[POINTS_EARNED].[IDX_CYCLE_ID]), SET:([POINTS_EARNED_ID1040] = [Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_EARNED_ID],[CYCLE_ID1041] = [Progressive_Points].[dbo].[POINTS_EARNED].[CYCLE_ID]) WITH ORDERED PREFETCH)
| |--Sort(ORDER BY:([Progressive_Points].[dbo].[POINTS_EARNED].[CYCLE_ID] ASC, [Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_EARNED_ID] ASC))
| |--Table Spool
| |--Clustered Index Insert(OBJECT:([Progressive_Points].[dbo].[POINTS_EARNED].[PK_POINTS_EARNED]), SET:([Progressive_Points].[dbo].[POINTS_EARNED].[CARD_ID] = RaiseIfNullInsert([tempdb].[dbo].[#points_earned].[card_id] as [pe].[card_id]),[Progressive_Points].[dbo].[POINTS_EARNED].[CYCLE_ID] = RaiseIfNullInsert([tempdb].[dbo].[#points_earned].[cycle_id] as [pe].[cycle_id]),[Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_CODE] = RaiseIfNullInsert([tempdb].[dbo].[#points_earned].[points_code] as [pe].[points_code]),[Progressive_Points].[dbo].[POINTS_EARNED].[NO_POINTS] = [Expr1006],[Progressive_Points].[dbo].[POINTS_EARNED].[ACCOUNT_ID] = RaiseIfNullInsert([tempdb].[dbo].[#points_earned].[account_id] as [pe].[account_id]),[Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_EARNED_ID] = [Expr1007],[Progressive_Points].[dbo].[POINTS_EARNED].[CREATED_DATE] = [Expr1008],[Progressive_Points].[dbo].[POINTS_EARNED].[CREATED_BY] = [Expr1009],[Progressive_Points].[dbo].[POINTS_EARNED].[DELETED] = [Expr1010],[Progressive_Points].[dbo].[POINTS_EARNED].[LAST_MODIFIED_DATE] = NULL,[Progressive_Points].[dbo].[POINTS_EARNED].[LAST_MODIFIED_BY] = NULL) WITH UNORDERED PREFETCH)
| |--Compute Scalar(DEFINE:([Expr1008]=getdate(), [Expr1009]=CONVERT_IMPLICIT(varchar(20),user_name(),0), [Expr1010]=(0)))
| |--Compute Scalar(DEFINE:([Expr1007]=getidentity((1243867498),(8),NULL)))
| |--Top(ROWCOUNT est 0)
| |--Parallelism(Gather Streams)
| |--Compute Scalar(DEFINE:([Expr1006]=CASE WHEN [Expr1062]=(0) THEN NULL ELSE [Expr1063] END))
| |--Hash Match(Aggregate, HASH:([pe].[card_id], [pe].[cycle_id], [pe].[points_code], [pe].[account_id]), RESIDUAL:([tempdb].[dbo].[#points_earned].[card_id] as [pe].[card_id] = [tempdb].[dbo].[#points_earned].[card_id] as [pe].[card_id] AND [tempdb].[dbo].[#points_earned].[cycle_id] as [pe].[cycle_id] = [tempdb].[dbo].[#points_earned].[cycle_id] as [pe].[cycle_id] AND [tempdb].[dbo].[#points_earned].[points_code] as [pe].[points_code] = [tempdb].[dbo].[#points_earned].[points_code] as [pe].[points_code] AND [tempdb].[dbo].[#points_earned].[account_id] as [pe].[account_id] = [tempdb].[dbo].[#points_earned].[account_id] as [pe].[account_id]) DEFINE:([Expr1062]=COUNT_BIG([tempdb].[dbo].[#points_earned].[no_points] as [pe].[no_points]), [Expr1063]=SUM([tempdb].[dbo].[#points_earned].[no_points] as [pe].[no_points])))
| |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([pe].[card_id], [pe].[cycle_id], [pe].[points_code], [pe].[account_id]))
| |--Clustered Index Scan(OBJECT:([tempdb].[dbo].[#points_earned] AS [pe]))
|--Index Insert(OBJECT:([Progressive_Points].[dbo].[POINTS_EARNED].[IX_ACCOUNT_ID]), SET:([POINTS_EARNED_ID1042] = [Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_EARNED_ID],[ACCOUNT_ID1043] = [Progressive_Points].[dbo].[POINTS_EARNED].[ACCOUNT_ID]) WITH ORDERED PREFETCH)
| |--Sort(ORDER BY:([Progressive_Points].[dbo].[POINTS_EARNED].[ACCOUNT_ID] ASC, [Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_EARNED_ID] ASC))
| |--Table Spool
|--Assert(WHERE:(CASE WHEN [Expr1050] IS NULL THEN (0) ELSE NULL END))
| |--Nested Loops(Left Semi Join, OUTER REFERENCES:([Progressive_Points].[dbo].[POINTS_EARNED].[ACCOUNT_ID], [Expr1068]) WITH UNORDERED PREFETCH, DEFINE:([Expr1050] = [PROBE VALUE]))
| |--Index Insert(OBJECT:([Progressive_Points].[dbo].[POINTS_EARNED].[IX_ACCOUNT_ID_POINTS_CODE]), SET:([POINTS_EARNED_ID1044] = [Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_EARNED_ID],[CARD_ID1045] = [Progressive_Points].[dbo].[POINTS_EARNED].[CARD_ID],[CYCLE_ID1046] = [Progressive_Points].[dbo].[POINTS_EARNED].[CYCLE_ID],[POINTS_CODE1047] = [Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_CODE],[NO_POINTS1048] = [Progressive_Points].[dbo].[POINTS_EARNED].[NO_POINTS],[ACCOUNT_ID1049] = [Progressive_Points].[dbo].[POINTS_EARNED].[ACCOUNT_ID]) WITH UNORDERED PREFETCH)
| | |--Table Spool
| |--Clustered Index Seek(OBJECT:([Progressive_Points].[dbo].[ACCOUNT].[PK_ACCOUNT]), SEEK:([Progressive_Points].[dbo].[ACCOUNT].[ACCOUNT_ID]=[Progressive_Points].[dbo].[POINTS_EARNED].[ACCOUNT_ID]) ORDERED FORWARD)
|--Assert(WHERE:(CASE WHEN [Expr1054] IS NULL THEN (0) ELSE NULL END))
| |--Nested Loops(Left Semi Join, OUTER REFERENCES:([Progressive_Points].[dbo].[POINTS_EARNED].[CARD_ID], [Progressive_Points].[dbo].[POINTS_EARNED].[CYCLE_ID], [Expr1070]) WITH UNORDERED PREFETCH, DEFINE:([Expr1054] = [PROBE VALUE]))
| |--Index Insert(OBJECT:([Progressive_Points].[dbo].[POINTS_EARNED].[RELATION_151_FK]), SET:([POINTS_EARNED_ID1051] = [Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_EARNED_ID],[CARD_ID1052] = [Progressive_Points].[dbo].[POINTS_EARNED].[CARD_ID],[CYCLE_ID1053] = [Progressive_Points].[dbo].[POINTS_EARNED].[CYCLE_ID]) WITH ORDERED PREFETCH)
| | |--Sort(ORDER BY:([Progressive_Points].[dbo].[POINTS_EARNED].[CARD_ID] ASC, [Progressive_Points].[dbo].[POINTS_EARNED].[CYCLE_ID] ASC, [Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_EARNED_ID] ASC))
| | |--Table Spool
| |--Row Count Spool
| |--Index Seek(OBJECT:([Progressive_Points].[dbo].[CYCLE_CARD].[IDX_NCLST_CARD_ID_CYCLE_ID]), SEEK:([Progressive_Points].[dbo].[CYCLE_CARD].[CARD_ID]=[Progressive_Points].[dbo].[POINTS_EARNED].[CARD_ID] AND [Progressive_Points].[dbo].[CYCLE_CARD].[CYCLE_ID]=[Progressive_Points].[dbo].[POINTS_EARNED].[CYCLE_ID]) ORDERED FORWARD)
|--Assert(WHERE:(CASE WHEN [Expr1057] IS NULL THEN (0) ELSE NULL END))
|--Merge Join(Left Semi Join, MANY-TO-MANY MERGE:([Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_CODE])=([Progressive_Points].[dbo].[POINTS_CODE].[POINTS_CODE]), RESIDUAL:([Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_CODE]=[Progressive_Points].[dbo].[POINTS_CODE].[POINTS_CODE]))
|--Index Insert(OBJECT:([Progressive_Points].[dbo].[POINTS_EARNED].[RELATION_152_FK]), SET:([POINTS_EARNED_ID1055] = [Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_EARNED_ID],[POINTS_CODE1056] = [Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_CODE]) WITH ORDERED PREFETCH)
| |--Sort(ORDER BY:([Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_CODE] ASC, [Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_EARNED_ID] ASC))
| |--Table Spool
|--Index Scan(OBJECT:([Progressive_Points].[dbo].[POINTS_CODE].[POINTS_CODES_PK]), ORDERED FORWARD)
OK, voici ce que je ferais:
Vérifiez si vous avez besoin des deux index [IX_ACCOUNT_ID_POINTS_CODE] et [IX_ACCOUNT_ID] car ils peuvent être redondants.
Avant de faire le INSERT
, désactivez le déclencheur et supprimez les clés étrangères.
Faites le INSERT
en définissant les champs normalement définis par le déclencheur et en vous assurant que les valeurs de la colonne FK sont valides.
Réactivez le déclencheur et recréez les clés étrangères AVEC NOCHECK.
Je laisserais les index allumés pendant que vous insérez moins de 0,2% du nombre total de lignes, il est donc probablement plus rapide de les mettre à jour en place plutôt que de les supprimer et de les reconstruire.
au lieu de supprimer 200k lignes de la table massive en une seule fois, essayez de la fragmenter. Par exemple.:
while (1=1)
begin
delete top(1000) from #points_earned
output deleted.* into points_earned
if @@rowcount=0 break
end
Concernant votre TABLE, il y a 3 considérations qui affectent vos performances pour chaque enregistrement que vous ajoutez: (1) vos index (2) votre déclencheur (3) vos clés étrangères
Si vous pouvez vous le permettre, appliquez l'architecture appropriée pour votre TABLE, comme PARTITION TABLE et PARTITION INDEXES dans les lecteurs SAS SAS appropriés).
Sinon, dans une situation similaire, avec des dizaines de milliers d'enregistrements mis à jour chaque minute, j'utilise la technique de BULK/INSERT, avec une autre TABLE (ex. [POINTS_EARNED_TMP] sur une base de données distincte dans la même instance (*).
Ajoutez l'enregistrement avec le déclencheur [POINTS_EARNED_TMP]. Ensuite, à partir d'un autre BULK, vous définissez votre procédure sans déclencheurs et BULK/INSERT vos données de [POINTS_EARNED_TMP] à [POINTS_EARNED] (y compris l'UTILISATEUR et la mise à jour de la DATE). Au moins, les performances de déclenchement sont évitées et le #TMP au sein de la même instance est également évité.
(*) L'utilisation d'une autre base de données est principalement pour des raisons de maintenance. BULK donne jusqu'à présent des résultats incroyables par rapport à INSERT TO.