The development and DBA almost came to blows when a 500 million data table didn't work
The development and DBA almost came to blows when a 500 million data table didn't work

Recently, the company needs to uniformly change the primary key type of some old tables. In the past, the primary key type of the table was always int. This time, it will be changed to bigint.
Then when I sorted out, I found a table with 500 million data in it. I had investigated problems and optimized several slow SQL before, but the query of this table did not enter the slow SQL list, which was a bit of a breakthrough in my cognition, and there was no problem in daily use.
Behind also found a lot of 3000W to 8000W table, the number of fields is also relatively normal, about 10, but also in good use, so do not listen to the Internet said that more than 1000W, 2000W will be divided into table what.
Avoid optimization in advance, and then deal with the problem is the king, because some of the preparation you do in advance is likely to be a waste of effort, a waste of emotion and energy.
So let's go back to this table of 500 million figures.
The execution time of the modified statement exceeds the time threshold set by the self-built SQL platform (the platform automatically closes the connection if an SQL statement is executed for more than 2 hours).
The modify statement cannot be executed separately. It can only be executed once.
Another option was to bypass the SQL platform and let the DBA execute directly from the outside, then wait until the next day because it was too late.
On the second day, I analyzed the table and found that the previous data were useless and could be archived, that is, the data of 21 years were moved to another table and only the data of 22 years were left.
This table is time-indexed.
I have checked the data of 21 years, which is about 300 million. After deleting these data, it is estimated that the time of modifying will be reduced by more than half. Besides, the table itself needs to be archived, but I forgot to do it this year (indicating that I have not encountered the problem of slow query).
So the solution is to first archive the data, that is, insert into the 21 year old data into the new table, then delete the 21 year old data in the table, and then modify the type change.
The INSERT INTO and DELETE statements were simple, but the DBA asked us to split the SQL statements for him to execute because of the large amount of data and to avoid the problem of long transactions.
At that time I thought: this split also need to develop to dismantle? Are DBAs just ruthless execution machines?
Ok, so I took the INSERT into 100 and the DELETE into 100 and gave it to the DBA.
In the evening, the DBA executed another wave, but several DELETE failed. He asked me if there would be any requests to change the table now.
I said it was impossible, because this table is equivalent to a running table, and the data of 21 years is deleted. It is impossible to change the data of 21 years at present, but it is indeed an error. I checked the error, lock timeout.
At the time I wondered why there was a lockwait timeout when no business could be operating on 21 years of data now.
I later discovered that the DBA was executing multiple DELETE statements in parallel.
Therefore, I told the DBA in the group that it should be because you are executing multiple deletes in parallel, they are previously competing, and a delete deletes a lot of data, so the wait time is out.
There should be no conflict between the deletes if the ID range is restricted.




Comments
There are no comments for this story
Be the first to respond and start the conversation.