Kinerja MongoDB vs. PostgreSQL dengan 5,5 juta baris / dokumen

Dapatkah seseorang membantu saya membandingkan pertanyaan ini dan menjelaskan mengapa kueri PostgreSQL dijalankan hanya di bawah 2000 ms dan permintaan agregat MongoDB memakan waktu hampir 9000 ms dan terkadang setinggi 130K ms?

PostgreSQL 9.3.2 on x86_64-apple-darwin, compiled by i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.9.00), 64-bit

Permintaan postgreSQL

SELECT locomotive_id,
   SUM(date_trunc('second', datetime) - date_trunc('second', prevDatetime)) AS utilization_time

FROM bpkdmp 
WHERE datetime >= '2013-7-26 00:00:00.0000' 
AND   datetime <= '2013-7-26 23:59:59.9999'
GROUP BY locomotive_id
order by locomotive_id

Pertanyaan MongoDB

db.bpkdmp.aggregate([
   {
      $match : {
          datetime : { $gte : new Date(2013,6,26, 0, 0, 0, 0), $lt : new Date(2013,6,26, 23, 59, 59, 9999) }
   }
   },
   {
      $project: {
         locomotive_id : "$locomotive_id",
         loco_time : { $subtract : ["$datetime", "$prevdatetime"] }, 
      }
   },
   {
      $group : {
         _id : "$locomotive_id",
         utilization_time : { $sum : "$loco_time" }
      }
   },
   {
      $sort : {_id : 1}
   }
])

Baik tabel PostgreSQL dan koleksi MongoDB diindeks pada datetime: 1 dan locomotive_id: 1

Pertanyaan ini sedang diuji pada iMac dengan drive hybrid 2TB dan memori 16GB. Saya telah menerima hasil yang sebanding pada mesin Windows 7 dengan memori 8GB dan SSD 256GB.

Terima kasih!

** Pembaruan: Saya memposting hasil EXPLAIN (BUFFERS, ANALYZE) setelah pertanyaan saya diposting

"Sort  (cost=146036.84..146036.88 rows=19 width=24) (actual time=2182.443..2182.457 rows=152 loops=1)"
"  Sort Key: locomotive_id"
"  Sort Method: quicksort  Memory: 36kB"
"  Buffers: shared hit=13095"
"  ->  HashAggregate  (cost=146036.24..146036.43 rows=19 width=24) (actual time=2182.144..2182.360 rows=152 loops=1)"
"        Buffers: shared hit=13095"
"        ->  Bitmap Heap Scan on bpkdmp  (cost=12393.84..138736.97 rows=583942 width=24) (actual time=130.409..241.087 rows=559529 loops=1)"
"              Recheck Cond: ((datetime >= '2013-07-26 00:00:00'::timestamp without time zone) AND (datetime <= '2013-07-26 23:59:59.9999'::timestamp without time zone))"
"              Buffers: shared hit=13095"
"              ->  Bitmap Index Scan on bpkdmp_datetime_ix  (cost=0.00..12247.85 rows=583942 width=0) (actual time=127.707..127.707 rows=559529 loops=1)"
"                    Index Cond: ((datetime >= '2013-07-26 00:00:00'::timestamp without time zone) AND (datetime <= '2013-07-26 23:59:59.9999'::timestamp without time zone))"
"                    Buffers: shared hit=1531"
"Total runtime: 2182.620 ms"

** Pembaruan: Mongo menjelaskan:

Jelaskan dari MongoDB

{
"serverPipeline" : [
    {
        "query" : {
            "datetime" : {
                "$gte" : ISODate("2013-07-26T04:00:00Z"),
                "$lt" : ISODate("2013-07-27T04:00:08.999Z")
            }
        },
        "projection" : {
            "datetime" : 1,
            "locomotive_id" : 1,
            "prevdatetime" : 1,
            "_id" : 1
        },
        "cursor" : {
            "cursor" : "BtreeCursor datetime_1",
            "isMultiKey" : false,
            "n" : 559572,
            "nscannedObjects" : 559572,
            "nscanned" : 559572,
            "nscannedObjectsAllPlans" : 559572,
            "nscannedAllPlans" : 559572,
            "scanAndOrder" : false,
            "indexOnly" : false,
            "nYields" : 1,
            "nChunkSkips" : 0,
            "millis" : 988,
            "indexBounds" : {
                "datetime" : [
                    [
                        ISODate("2013-07-26T04:00:00Z"),
                        ISODate("2013-07-27T04:00:08.999Z")
                    ]
                ]
            },
            "allPlans" : [
                {
                    "cursor" : "BtreeCursor datetime_1",
                    "n" : 559572,
                    "nscannedObjects" : 559572,
                    "nscanned" : 559572,
                    "indexBounds" : {
                        "datetime" : [
                            [
                                ISODate("2013-07-26T04:00:00Z"),
                                ISODate("2013-07-27T04:00:08.999Z")
                            ]
                        ]
                    }
                }
            ],
            "oldPlan" : {
                "cursor" : "BtreeCursor datetime_1",
                "indexBounds" : {
                    "datetime" : [
                        [
                            ISODate("2013-07-26T04:00:00Z"),
                            ISODate("2013-07-27T04:00:08.999Z")
                        ]
                    ]
                }
            },
            "server" : "Michaels-iMac.local:27017"
        }
    },
    {
        "$project" : {
            "locomotive_id" : "$locomotive_id",
            "loco_time" : {
                "$subtract" : [
                    "$datetime",
                    "$prevdatetime"
                ]
            }
        }
    },
    {
        "$group" : {
            "_id" : "$locomotive_id",
            "utilization_time" : {
                "$sum" : "$loco_time"
            }
        }
    },
    {
        "$sort" : {
            "sortKey" : {
                "_id" : 1
            }
        }
    }
],
"ok" : 1
}

performance mongodb postgresql Mike A
sumber

EXPLAIN (BUFFERS, ANALYZE)Tolong, untuk keluaran acara permintaan PostgreSQL . Juga, versi PostgreSQL. (Saya memilih untuk memindahkan ini ke dba.SE)

Craig Ringer

... dan info tentang rencana MongoDB? docs.mongodb.org/manual/reference/method/cursor.explain

Craig Ringer

Meskipun sulit untuk menghindari hype NoSQL, RDBMS tradisional lebih baik dan jauh lebih matang secara agregat setiap hari. Database NoSQL dioptimalkan untuk pengindeksan kunci primer dan pencarian berdasarkan kunci dan bukan untuk pertanyaan semacam itu.

Alexandros

Saya mungkin telah meninggalkan sedikit detail. Ada lebih dari 200 bidang di setiap dokumen. Ini adalah impor langsung dari database PostgreSQL. Banyak nilai bidang adalah nol. Saya ingat MongoDB tidak terlalu menyukai nilai nol. Saya melakukan impor lain dengan <20 bidang data yang relevan dan kinerja kueri jauh lebih baik. Saya mendapatkan <3000ms pada mesin dengan memori 8GB dan HD yang lebih lambat. Saya akan memulai tes baru pada mesin yang jauh lebih kuat segera.

Mike A

Indeks Mongodb {datetime: 1, prevdatetime: 1}harus berkinerja lebih baik daripada indeks saat ini, karena mongodb menyaring pada datetime dan prevdatetime. Ini akan mengurangi jumlah dokumen yang perlu dipindai.

gosok

Kinerja MongoDB vs. PostgreSQL dengan 5,5 juta baris / dokumen

Jawaban: