Publisher does not support the Fluid field type. Please do not contact asking when support will be available.
If you purchased an add-on from expressionengine.com, be sure to visit boldminded.com/claim to add the license to your account here on boldminded.com.
Ticket: Question about optimizing DataGrab speed when importing CSVs
|Add-on / Version||DataGrab 4.0.4|
Jason RoxzJul 12, 2022
Question: Is there a way to speed up DataGrab to perform more like a direct SQL write?
- Yesterday I ran my first CSV import of 7966 entries - (first entries ever added to this fresh EE install)
- Brand new EE install with only Pro and DataGrab installed
- Brand new dedicated server with 0 load
- Each entry with 20 plain text fields (including title, url_title)
- All fields under 50 characters and most with 3 or fewer characters
- Tested with a 200 entry batch size
22:06:25 07/11/2022 Added 200 entries
22:06:25 07/11/2022 Clearing all cache
22:06:25 07/11/2022 Begin Importing [u7Nk3e]
22:09:53 07/11/2022 Added 200 entries
22:09:53 07/11/2022 Clearing all cache
22:09:53 07/11/2022 Begin Importing [bR4cp9]
22:10:22 07/11/2022 Added 200 entries
22:10:22 07/11/2022 Clearing all cache
22:10:22 07/11/2022 Begin Importing [y2KY6g]
22:13:33 07/11/2022 Added 200 entries
22:13:33 07/11/2022 Clearing all cache
22:13:33 07/11/2022 Begin Importing [D2f4pK]
22:16:48 07/11/2022 Added 200 entries
22:16:48 07/11/2022 Clearing all cache
22:16:49 07/11/2022 Begin Importing [wRt70b]
In all, the import took nearly 12 minutes – far too long for my live data use case.
I need near-instant delete/overwrite/updates for these 7500 - 8500 entries every 2 minutes to provide live data on my site – just like a regular SQL write.
Am I spinning my wheels trying to make DataGrab do something it can’t?
Should I be working out how to make EE work with an external database instead?
Jul 13, 2022
Hi, Jason. To answer your question, no there is no way to update it to do direct SQL writes, and honestly I would be suspect of any 3rd party module that tried to do it this way. Not using the provided ORM/models would make such an add-on unstable and difficult to support.
I suggest if you need near-instant database updates for 7000 entries every 2 minutes you look into a more robust pub/sub delivery system such as RabbitMQ or Amazon SQS. I’m not even sure I’d trust raw SQL queries to run in batches of 200 every 2 minutes for that kind of work load.
Jul 13, 2022
Ok, rats. Thank you for the thorough reply.