Publisher does not support the Fluid field type. Please do not contact asking when support will be available.

If you purchased an add-on from expressionengine.com, be sure to visit boldminded.com/claim to add the license to your account here on boldminded.com.

Ticket: Question about optimizing DataGrab speed when importing CSVs

Status Resolved
Add-on / Version DataGrab 4.0.4
Severity
EE Version 6.3.4

Jason Roxz

Jul 12, 2022

Greetings friends.

Question: Is there a way to speed up DataGrab to perform more like a direct SQL write?

- Yesterday I ran my first CSV import of 7966 entries - (first entries ever added to this fresh EE install)
- Brand new EE install with only Pro and DataGrab installed
- Brand new dedicated server with 0 load
- Each entry with 20 plain text fields (including title, url_title)
- All fields under 50 characters and most with 3 or fewer characters
- Tested with a 200 entry batch size

Log excerpt:
===============
...
22:06:25 07/11/2022 Added 200 entries
22:06:25 07/11/2022 Clearing all cache
22:06:25 07/11/2022 Begin Importing [u7Nk3e]
...
22:09:53 07/11/2022 Added 200 entries
22:09:53 07/11/2022 Clearing all cache
22:09:53 07/11/2022 Begin Importing [bR4cp9]
...
22:10:22 07/11/2022 Added 200 entries
22:10:22 07/11/2022 Clearing all cache
22:10:22 07/11/2022 Begin Importing [y2KY6g]
...
22:13:33 07/11/2022 Added 200 entries
22:13:33 07/11/2022 Clearing all cache
22:13:33 07/11/2022 Begin Importing [D2f4pK]
...
22:16:48 07/11/2022 Added 200 entries
22:16:48 07/11/2022 Clearing all cache
22:16:49 07/11/2022 Begin Importing [wRt70b]
...
===============

In all, the import took nearly 12 minutes – far too long for my live data use case.

I need near-instant delete/overwrite/updates for these 7500 - 8500 entries every 2 minutes to provide live data on my site – just like a regular SQL write. 

Am I spinning my wheels trying to make DataGrab do something it can’t? 
Should I be working out how to make EE work with an external database instead?

Thanks.

#1

BoldMinded (Brian)

Hi, Jason. To answer your question, no there is no way to update it to do direct SQL writes, and honestly I would be suspect of any 3rd party module that tried to do it this way. Not using the provided ORM/models would make such an add-on unstable and difficult to support.

https://docs.boldminded.com/datagrab/faqs#is-datagrab-4-is-slower-than-previous-versions

I suggest if you need near-instant database updates for 7000 entries every 2 minutes you look into a more robust pub/sub delivery system such as RabbitMQ or Amazon SQS. I’m not even sure I’d trust raw SQL queries to run in batches of 200 every 2 minutes for that kind of work load.

#2

Jason Roxz

Ok, rats. Thank you for the thorough reply.

Login to reply