Discussion:
max() on primary key very slow
r***@gmx.de
2012-02-03 15:25:53 UTC
Permalink
Hallo,

SQL databases are always in for a surprise about performance of simple
statements.

I have a table with the column 'ID' as BIGINT unique primary key.

The table has about 4000000 entries, the ID counts up without gaps.

A simple

select max(ID) from T

takes about 14 seconds complete.

Execution plan:

PLAN (T NATURAL)

So Firebird seems to do plan a full table scan. (Hallo, anyone at home?
I have an unique index on that column ;-) )

Some databases can do so much better here: The same statement on a 100%
identical Derby database completes immediately, as does on an Oracle 10g XE.

Do I miss something ? Any suggestions ?

Best regards

Marc
Matthias Hanft
2012-02-03 18:10:44 UTC
Permalink
Post by r***@gmx.de
I have a table with the column 'ID' as BIGINT unique primary key.
select max(ID) from T
So Firebird seems to do plan a full table scan. (Hallo, anyone at home?
I have an unique index on that column ;-) )
As far as I know, you need a DESC index for MAX queries.

Matt
Thomas Steinmaurer
2012-02-03 18:44:33 UTC
Permalink
Post by r***@gmx.de
SQL databases are always in for a surprise about performance of simple
statements.
I have a table with the column 'ID' as BIGINT unique primary key.
The table has about 4000000 entries, the ID counts up without gaps.
A simple
select max(ID) from T
takes about 14 seconds complete.
PLAN (T NATURAL)
So Firebird seems to do plan a full table scan. (Hallo, anyone at home?
I have an unique index on that column ;-) )
Some databases can do so much better here: The same statement on a 100%
identical Derby database completes immediately, as does on an Oracle 10g XE.
Do I miss something ? Any suggestions ?
You need an DESCENDING index on the primary key column to speed up a MAX
operation, but I wonder what you are doing with that value then?
Hopefully not incrementing the value by 1 and use that as primary key
value then?
--
With regards,
Thomas Steinmaurer (^TS^)
Firebird Technology Evangelist

http://www.upscene.com/
http://www.firebirdsql.org/en/firebird-foundation/
Ann Harrison
2012-02-03 20:07:44 UTC
Permalink
Post by r***@gmx.de
SQL databases are always in for a surprise about performance of simple
statements.
Particularly true when you switch between record locking and MVCC.
Post by r***@gmx.de
I have a table with the column 'ID' as BIGINT unique primary key.
The table has about 4000000 entries, the ID counts up without gaps.
A simple
       select max(ID) from T
takes about 14 seconds complete.
       PLAN (T NATURAL)
So Firebird seems to do plan a full table scan. (Hallo, anyone at home?
I have an unique index on that column ;-) )
Yes, there is someone at home. The problem is that in an MVCC system,
the MAX value in the index may belong to a record that's not
appropriate for your transaction. Firebird indexes cannot be
traversed backward and can change between reads.
Post by r***@gmx.de
Some databases can do so much better here: The same statement on a 100%
identical Derby database completes immediately, as does on an Oracle 10g XE.
Right. And neither of those has index entries for transactions with
different snapshots of the database.

If you actually need to find the MAX often, create a descending index.
If you're using the MAX to figure out what the next id should be, use
generators (aka sequences).

Good luck,

Ann
Post by r***@gmx.de
Do I miss something ? Any suggestions ?
Best regards
Marc
------------------------------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Visit http://www.firebirdsql.org and click the Resources item
on the main (top) menu.  Try Knowledgebase and FAQ links !
Also search the knowledgebases at http://www.ibphoenix.com
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Yahoo! Groups Links
bigmarcman2000
2012-02-10 16:48:35 UTC
Permalink
Thank you for your answers.

First of all, I'm using Firebird since version 1.0 in many different projects and was always pleased with the performance and stability.

I don't want to complain about Firebird, I was just a bit surprised to find a performance problem, where I never expected it (and did not understand it), especially compared to other databases.

In my current project, I'm developing a middelware, using an SQL database as persistence. I try to support many different databases and like to keep the code as common as possible. (That's why I try not to add procedures and triggers)

Currently I'm using Derby, Oracle, Firebird and MS SQL, that's why I can easily compare the performance of this DBs in different areas.

I'm using identical indices on all these databases, and none of the other shows any delays on that max() function. (Even with MS SQL and Oracle using MVCC design)

In deed, I'm using the max() function to create IDs for new elements, but since this is done by a synchronized method within the middleware, it just need to be done once on the first connect. But this call for two tables needs up to 60 sec on an 3GHz i7 quadcore, with the database file on a SSD drive. This slows down the first login to the system.

Knowing nothing about the internal structure of an index in Firebird, I still believe, that getting the maximum value from an column with an ascending index should not need a full table scan.

Never mind, using a descending index does speed up the process as you suggested.

I find the following Note in the firebird documentation:

'If you define a descending constraint-enforcing index on a primary or unique key, be sure to make any foreign keys referencing it descending as well.'

Are the following statements correct?

create table a(
id int not null,
primary key (id) using desc index ai_pk
)

create table b(
ref int not null
)

alter table b add constraint bc_fk1 foreign key (ref) references a(id) on delete cascade using desc index bi_fk1


Has an descending index any performance drawbacks when inserting rows with ascending ids ?

Thanks !

Marc
Post by Ann Harrison
Post by r***@gmx.de
SQL databases are always in for a surprise about performance of simple
statements.
Particularly true when you switch between record locking and MVCC.
Post by r***@gmx.de
I have a table with the column 'ID' as BIGINT unique primary key.
The table has about 4000000 entries, the ID counts up without gaps.
A simple
ᅵ ᅵ ᅵ ᅵselect max(ID) from T
takes about 14 seconds complete.
ᅵ ᅵ ᅵ ᅵPLAN (T NATURAL)
So Firebird seems to do plan a full table scan. (Hallo, anyone at home?
I have an unique index on that column ;-) )
Yes, there is someone at home. The problem is that in an MVCC system,
the MAX value in the index may belong to a record that's not
appropriate for your transaction. Firebird indexes cannot be
traversed backward and can change between reads.
Post by r***@gmx.de
Some databases can do so much better here: The same statement on a 100%
identical Derby database completes immediately, as does on an Oracle 10g XE.
Right. And neither of those has index entries for transactions with
different snapshots of the database.
If you actually need to find the MAX often, create a descending index.
If you're using the MAX to figure out what the next id should be, use
generators (aka sequences).
Good luck,
Ann
Post by r***@gmx.de
Do I miss something ? Any suggestions ?
Best regards
Marc
------------------------------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Visit http://www.firebirdsql.org and click the Resources item
on the main (top) menu. ᅵTry Knowledgebase and FAQ links !
Also search the knowledgebases at http://www.ibphoenix.com
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Yahoo! Groups Links
Ann Harrison
2012-02-10 17:23:48 UTC
Permalink
Marc,
Post by bigmarcman2000
Currently I'm using Derby, Oracle, Firebird and MS SQL, that's why I can easily compare the performance of this DBs in different areas.
I'm using identical indices on all these databases, and none of the other shows any delays on that max() function. (Even with MS SQL and Oracle using MVCC design)
Right. The problem with MAX and ascending indexes is a combination of
the Firebird MVCC implementation and the Firebird index access
strategy. Firebird indexes are imprecise because the index includes
all versions of a record, not just the most recent. Oracle and MS SQL
use back versions of index pages for older transactions, which avoid
the "whoops it's there but not in your view" problem at the cost of
keeping old copies of index pages around. Firebird avoids holding
exclusive locks on sections of index when they are being split by
allowing readers to move forward, but not backward, through an index.
Yes, it might be possible to make an unqualified MAX faster by reading
to the end, finding a qualified row (Hurray!) or if not, starting at
the top and reading forward to the next lower value, and finding a
qualifiying row or if not, starting again at the top and reading, and
starting and reading and... You see the problem. Could be done,
would be faster, is a lot of code, and encourages doing something
that's better done with sequences/generators.
Post by bigmarcman2000
In deed, I'm using the max() function to create IDs for new elements, but since this is done by a synchronized method within the middleware, it just need to be done once on the first connect. But this call for two tables needs up to 60 sec on an 3GHz i7 quadcore, with the database file on a SSD drive. This slows down the first login to the system.
Most of the databases you name now make some variation on sequences
available - that would be a better solution than relying on the MAX
function. The real problem with using MAX to set the basis for a
unqiue identifier is that in a concurrent environment, it's not
reliable.

Best regards,

Ann
bigmarcman2000
2012-02-13 08:49:26 UTC
Permalink
Hallo Ann,

thanks for you answer.
Post by Ann Harrison
use back versions of index pages for older transactions, which avoid
the "whoops it's there but not in your view" problem at the cost of
keeping old copies of index pages around. Firebird avoids holding
exclusive locks on sections of index when they are being split by
allowing readers to move forward, but not backward, through an index.
Yes, it might be possible to make an unqualified MAX faster by reading
to the end, finding a qualified row (Hurray!) or if not, starting at
the top and reading forward to the next lower value, and finding a
qualifiying row or if not, starting again at the top and reading, and
starting and reading and... You see the problem. Could be done,
would be faster, is a lot of code, and encourages doing something
that's better done with sequences/generators.
Still I don't understand the problem fully. min() runs very fast. What, if the fist (min-) value was deleted in some pending transaction. In this case, min() should have the same problems as max().

I think I'll give a descending index a try, since I have no sorting in my application.

Are the following statements correct for a descending index which is referenced as foreign key in a second table:

create table a(
id int not null,
primary key (id) using desc index ai_pk
)

create table b(
ref int not null
)

alter table b add constraint bc_fk1 foreign key (ref) references a(id) on delete cascade using desc index bi_fk1


Has a descending index any performance drawbacks when inserting rows with ascending ids ?

Thanks !

Marc

P.S. By the way, that MVCC design seems to work out very nicely, since I noticed a lot of dead-lock problems with concurrent access to a derby database, which does not support it.
Ann Harrison
2012-02-13 23:35:37 UTC
Permalink
Post by bigmarcman2000
Still I don't understand the problem fully. min() runs very fast. What, if the fist (min-) value was deleted in some pending transaction. In this case, min() should have the same problems as max().
Getting to the next min record reads the index from left to right,
which works fine. Getting max value from a descending index is also
fast because it reads the index from left to right.
The problem is that reading from right (high end in ascending, low end
in descending) to left is not allowed.
Post by bigmarcman2000
I think I'll give a descending index a try, since I have no sorting in my application.
create table a(
 id int not null,
 primary key (id) using desc index ai_pk
)
create table b(
 ref int not null
 )
alter table b add constraint bc_fk1 foreign key (ref) references a(id) on delete cascade using desc index bi_fk1
Has a descending index any performance drawbacks when inserting rows with ascending ids ?
I think your syntax looks right. You may not get optimal index
density when inserting records with ascending keys into a descending
index.

I don't quite understand your complaint about round trips to get a
sequence value - you can define the field to have a default value
created by a squence/generator or just put the gen-id into the insert
and return the value. And it has the pleasant side-effect of working
under concurrent load.

Cheers,

Ann

Woody
2012-02-10 18:24:03 UTC
Permalink
Post by bigmarcman2000
Thank you for your answers.
First of all, I'm using Firebird since version 1.0 in many different
projects and was always pleased with the performance and stability.
I don't want to complain about Firebird, I was just a bit surprised to
find a performance problem, where I never expected it (and did not
understand it), especially compared to other databases.
In my current project, I'm developing a middelware, using an SQL database
as persistence. I try to support many different databases and like to keep
the code as common as possible. (That's why I try not to add procedures
and triggers)
Since you control all the access, why not create an ID table that holds the
next ID for each table instead of relying on the speed of MAX? It's no more
dangerous in a multi-user environment and as a matter of fact, given the
proper attention to locking, is much safer and easier to deal with. I do
this with many of my relations and have never had any problems with multiple
users. It also eliminates any speed issues no matter which database you use.
:)

Woody (TMW)
bigmarcman2000
2012-02-13 08:41:00 UTC
Permalink
Hallo Woody,
Post by Woody
Since you control all the access, why not create an ID table that holds the
next ID for each table instead of relying on the speed of MAX? It's no more
dangerous in a multi-user environment and as a matter of fact, given the
proper attention to locking, is much safer and easier to deal with. I do
this with many of my relations and have never had any problems with multiple
users. It also eliminates any speed issues no matter which database you use.
:)
First I did exactly that. But doing some benchmarks on mass inserts, I figured out, that the bottleneck is the number of SQL statements per second. If you do a mass insert, you would need twice the number of statements, one for read/increase ID table, one for the insert, what means half the throughput. Besides, that also true for generators, since you need a statement to read the value.

I now use a centralized synchronized method, that reads the max values on startup, and then simply counts up the IDs on each call with no further database access. This works since years without any problem. Just did not know, that max() takes that long on firebird with my ascending indices, which slows down the application startup time.

Thanks

Marc
Mark Rotteveel
2012-02-11 17:17:16 UTC
Permalink
Post by bigmarcman2000
In my current project, I'm developing a middelware, using an SQL database as persistence. I try to support many different databases and like to keep the code as common as possible. (That's why I try not to add procedures and triggers)
If you look at ORM libraries like Hibernate, they use the strategy
pattern to create IDs. Where depending on the features of a database or
performance characteristics, you select a different strategy. See
section 5.1.2.2 in
http://docs.jboss.org/hibernate/core/4.0/manual/en-US/html/mapping.html#mapping-declaration-id

Mark
--
Mark Rotteveel
Leyne, Sean
2012-02-10 19:55:17 UTC
Permalink
Woody,
Post by Woody
Since you control all the access, why not create an ID table that holds the
next ID for each table instead of relying on the speed of MAX? It's no more
dangerous in a multi-user environment and as a matter of fact, given the
proper attention to locking, is much safer and easier to deal with. I do this
with many of my relations and have never had any problems with multiple
users. It also eliminates any speed issues no matter which database you use.
:)
That would approach could introduce transaction concurrency issues.

As Ann has pointed out, generators/sequences are a much better approach.


Sean
Woody
2012-02-10 20:20:57 UTC
Permalink
Post by Leyne, Sean
Woody,
Post by Woody
Since you control all the access, why not create an ID table that holds the
next ID for each table instead of relying on the speed of MAX? It's no more
dangerous in a multi-user environment and as a matter of fact, given the
proper attention to locking, is much safer and easier to deal with. I do this
with many of my relations and have never had any problems with multiple
users. It also eliminates any speed issues no matter which database you use.
:)
That would approach could introduce transaction concurrency issues.
As Ann has pointed out, generators/sequences are a much better approach.
Of course they are a better idea, but not all databases support them. I was
merely offering a solution that is independent of any database design. Just
a different way of solving the problem.

Woody (TMW)
Thomas
2012-02-10 20:49:36 UTC
Permalink
Post by Woody
Of course they are a better idea, but not all databases support them.
The number of DBMS that do not support sequences is extremely small. I think MySQL and Sybase are the only one left which do not have sequences. Even the next SQL Server version will have them.
Kjell Rilbe
2012-02-11 17:05:32 UTC
Permalink
Post by Woody
Post by Leyne, Sean
As Ann has pointed out, generators/sequences are a much better approach.
Of course they are a better idea, but not all databases support them. I was
merely offering a solution that is independent of any database design. Just
a different way of solving the problem.
If you want to avoid sequences that perhaps have different syntax for
different servers, may I suggest that you have a separate table:

create table "PK" (
"TableName" varchar(30) not null primary key,
"LastUsedId" int not null default 0
)

Enter one row in this table for each table you need a PK "generator"
for. To aquire a key, start your transaction with:

update "PK"
set "LastUsedId" = "LastUsedId" + 1
where "TableName" = :TableName

Then

select "LastUsedId"
from "PK"
where "TableName" = :TableName

The update makes sure the record is locked from updates from other
transactions. You can then safely read the new key value with a select.
You can encapsulate the update + select in a sp if you like, but I guess
you don't want that.

This will work on and DB with proper transaction handling, and provided
you don't use some unappropriate transaction isolation level for the key
fetching transaction.

Furthermore, you can add more than one if you want to acquire more than
one key at a time.

Regards,
Kjell
--
--------------------------------------
Kjell Rilbe
DataDIA AB
E-post: ***@datadia.se
Telefon: 08-761 06 55
Mobil: 0733-44 24 64



[Non-text portions of this message have been removed]



------------------------------------

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Visit http://www.firebirdsql.org and click the Resources item
on the main (top) menu. Try Knowledgebase and FAQ links !

Also search the knowledgebases at http://www.ibphoenix.com

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/firebird-support/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/firebird-support/join
(Yahoo! ID required)

<*> To change settings via email:
firebird-support-***@yahoogroups.com
firebird-support-***@yahoogroups.com

<*> To unsubscribe from this group, send an email to:
firebird-support-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
Mark Rotteveel
2012-02-11 17:08:28 UTC
Permalink
Post by Kjell Rilbe
Enter one row in this table for each table you need a PK "generator"
update "PK"
set "LastUsedId" = "LastUsedId" + 1
where "TableName" = :TableName
Then
select "LastUsedId"
from "PK"
where "TableName" = :TableName
You could UPDATE .... RETURNING "LastUsedId" (or explicitly
NEW."LastUsedId") to get it in one go.
--
Mark Rotteveel


------------------------------------

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Visit http://www.firebirdsql.org and click the Resources item
on the main (top) menu. Try Knowledgebase and FAQ links !

Also search the knowledgebases at http://www.ibphoenix.com

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/firebird-support/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/firebird-support/join
(Yahoo! ID required)

<*> To change settings via email:
firebird-support-***@yahoogroups.com
firebird-support-***@yahoogroups.com

<*> To unsubscribe from this group, send an email to:
firebird-support-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
Ann Harrison
2012-02-11 18:05:55 UTC
Permalink
Post by Kjell Rilbe
If you want to avoid sequences that perhaps have different syntax for
create table "PK" (
  "TableName" varchar(30) not null primary key,
  "LastUsedId" int not null default 0
)
Enter one row in this table for each table you need a PK "generator"
update "PK"
set "LastUsedId" = "LastUsedId" + 1
where "TableName" = :TableName
Then
select "LastUsedId"
from "PK"
where "TableName" = :TableName
Just be sure to do that in a separate transaction with a retry on
deadlock/update conflict. That's exactly the strategy used to create
a gateway record that prevents concurrent updates to a table.

Good luck,

Ann
Loading...