Publisher does not support the Fluid field type. Please do not contact asking when support will be available.

If you purchased an add-on from expressionengine.com, be sure to visit boldminded.com/claim to add the license to your account here on boldminded.com.

Ticket: DataGrab seems to process header row, even with “Use first row as title” enabled

Status Resolved
Add-on / Version DataGrab 4.2.1
Severity
EE Version 6.4.3

Paul Larson

Jan 05, 2023

My data file looks like this:

ITEM_NO LONG_DESCR CATEG_COD WEB_PRICE IS_PARENT PARENT_ITEM DIM_1 QTY_AVAIL WEB_DESC
1080BIRMINGHAM Brunswick 8’ Birmingham PLTBL   BRU500 0.0000 16299.0000     Birmingham Pool Table   birmingham-pool-table
1080PURSUIT   Brunswick 8’ Pursuit   PLTBL   BRU500 210.0000     7299.0000     Pursuit Pool Table     pursuit-pool-table
1090CENTENNIAL Brunswick 9’ Centennial Pool Table     PLTBL   BRU500 210.0000     13999.0000     Centennial Pool Table   centennial-pool-table
~                                                                                                                                                                     
~                                                                                                                                                                     
~                                                                                                                                                                     
~             
My import settings are:

https://www.dropbox.com/s/tkard4xgwyo7gki/2023-01-05_11-43-14.png?dl=0
https://www.dropbox.com/s/w7on7fl9xqyemqy/2023-01-05_11-44-15.png?dl=0
~                                                                                                                                                                     
~                                                                                                                                                                     
~   
When I run the CLI import:

-bash-4.2$ php /var/www/vhosts/mysite.com/httpdocs/system/ee/eecli.php import:run—import_id=59 && tail -n 15 system/user/cache/DataGrab-import.log
Starting: Pool Table Parent…
0%
25%
50%
75%
Import Complete
17:45:31 01/05/2023 Import #59 Started
17:45:31 01/05/2023 Import #59 Initialized
17:45:31 01/05/2023 Begin Importing [ITEM_NO]
17:45:32 01/05/2023 [ITEM_NO] already exists. Updating.
17:45:33 01/05/2023 Import error with entry ITEM_NO with the field_id_90 field: This field must contain only numbers.
17:45:33 01/05/2023 Begin Importing [ITEM_NO]
17:45:33 01/05/2023 [ITEM_NO] already exists. Updating.
17:45:34 01/05/2023 Import error with entry ITEM_NO with the field_id_90 field: This field must contain only numbers.
17:45:35 01/05/2023 Begin Importing [ITEM_NO]
17:45:35 01/05/2023 [ITEM_NO] already exists. Updating.
17:45:36 01/05/2023 Import error with entry ITEM_NO with the field_id_90 field: This field must contain only numbers.
17:45:36 01/05/2023 Begin Importing [ITEM_NO]
17:45:36 01/05/2023 [ITEM_NO] already exists. Updating.
17:45:37 01/05/2023 Import error with entry ITEM_NO with the field_id_90 field: This field must contain only numbers.
-bash-4.2$
~                                                                                                                                                                     
~                                                                                                                                                                     
~   
—————————

I would think ITEM_NO would be skipped as it’s line 1.

When I manually remove the header row, the output is:

Starting: Pool Table Parent…
0%
33%
67%
Import Complete
17:48:11 01/05/2023 Begin Importing [1080BIRMINGHAM]
17:48:11 01/05/2023 Appending new row(s) to Grid
17:48:11 01/05/2023 [1080BIRMINGHAM] already exists. Updating.
17:48:13 01/05/2023 Calling after_channel_entry_save() hook.
17:48:13 01/05/2023 Updated 1 entries
17:48:13 01/05/2023 Begin Importing [1080BIRMINGHAM]
17:48:13 01/05/2023 Appending new row(s) to Grid
17:48:13 01/05/2023 [1080BIRMINGHAM] already exists. Updating.
17:48:15 01/05/2023 Calling after_channel_entry_save() hook.
17:48:15 01/05/2023 Updated 2 entries
17:48:15 01/05/2023 Begin Importing [1080BIRMINGHAM]
17:48:15 01/05/2023 Appending new row(s) to Grid
17:48:15 01/05/2023 [1080BIRMINGHAM] already exists. Updating.
17:48:17 01/05/2023 Calling after_channel_entry_save() hook.
17:48:17 01/05/2023 Updated 3 entries

~
~
~

That is, smooth sailing once I manually remove the header.

#1

Paul Larson

Note: my file are tab delimited, just in case the DataGrab code has different sections for delimiter types.

#2

BoldMinded (Brian)

I believe this is fixed in the build I sent in the other ticket. Closing this ticket.

#3

Paul Larson

See image:

https://www.dropbox.com/s/fzdp1iiym5frvep/2023-01-06_08-50-18.png?dl=0

CLI still does not seem to honor header row with new build.

Data file is:

ITEM_NO LONG_DESCR CATEG_COD ITEM_VEND_NO WEIGHT WEB_PRICE WEB_DESC url_title 1080BIRMINGHAM Brunswick 8’ Birmingham PLTBL BRU500 0.0000 16299.0000 Birmingham Pool Table birmingham-pool-table 1080PURSUIT Brunswick 8’ Pursuit PLTBL BRU500 210.0000 7299.0000 Pursuit Pool Table pursuit-pool-table 1090CENTENNIAL Brunswick 9’ Centennial Pool Table PLTBL BRU500 210.0000 13999.0000 Centennial Pool Table centennial-pool-table ~ ~ ~ When I manually delete header row: ~ ~ ~ -bash-4.2$ cat DataGrab-import.log 14:55:28 01/06/2023 Import #59 Started 14:55:28 01/06/2023 Import #59 Initialized 14:55:28 01/06/2023 Begin Importing [1080BIRMINGHAM] 14:55:28 01/06/2023 [1080BIRMINGHAM] already exists. Updating. 14:55:31 01/06/2023 Calling after_channel_entry_save() hook. 14:55:31 01/06/2023 Updated 1 entries 14:55:31 01/06/2023 Begin Importing [1080BIRMINGHAM] 14:55:31 01/06/2023 [1080BIRMINGHAM] already exists. Updating. 14:55:32 01/06/2023 Calling after_channel_entry_save() hook. 14:55:32 01/06/2023 Updated 2 entries 14:55:32 01/06/2023 Begin Importing [1080BIRMINGHAM] 14:55:32 01/06/2023 [1080BIRMINGHAM] already exists. Updating. 14:55:34 01/06/2023 Calling after_channel_entry_save() hook. 14:55:34 01/06/2023 Updated 3 entries ~ ~ ~

Compared to when the header row is present (note repeat of ITEM_NO)

-bash-4.2$ cat DataGrab-import.log 14:59:23 01/06/2023 Import #59 Started 14:59:23 01/06/2023 Import #59 Initialized 14:59:23 01/06/2023 Begin Importing [ITEM_NO] 14:59:23 01/06/2023 [ITEM_NO] already exists. Updating. 14:59:25 01/06/2023 Import error with entry ITEM_NO with the field_id_90 field: This field must contain only numbers. 14:59:25 01/06/2023 Begin Importing [ITEM_NO] 14:59:25 01/06/2023 [ITEM_NO] already exists. Updating. 14:59:26 01/06/2023 Import error with entry ITEM_NO with the field_id_90 field: This field must contain only numbers. 14:59:26 01/06/2023 Begin Importing [ITEM_NO] 14:59:26 01/06/2023 [ITEM_NO] already exists. Updating. 14:59:27 01/06/2023 Import error with entry ITEM_NO with the field_id_90 field: This field must contain only numbers. 14:59:27 01/06/2023 Begin Importing [ITEM_NO] 14:59:27 01/06/2023 [ITEM_NO] already exists. Updating. 14:59:29 01/06/2023 Import error with entry ITEM_NO with the field_id_90 field: This field must contain only numbers.

#4

BoldMinded (Brian)

How are you running this? CLI? ACT? Or the CP interface?

If it’s throwing the POST error message, then it’s doing what it should do. Your POST array is not empty so it thinks you’re trying to POST data somehow.

#5

BoldMinded (Brian)

You might have to log or var_dump the POST array where it is throwing that error. Just look for that error message in the code.

#6

Paul Larson

Running via CLI.

#7

BoldMinded (Brian)

Let’s focus on this issue first. The other ticket is mentioning the heading row too and it’s confusing following both tickets.

Can you share a small sample of your import file? I just need 3-5 rows of the CSV. Share it on dropbox or somewhere I can download the file, don’t paste it in comments I don’t want the formatting to get messed up.

I’m running via CLI and my debugging output is:

No header row in the output.

20:57:19 01/06/2023 Import #27 Started
20:57:19 01/06/2023 Import #27 Initialized
20:57:19 01/06/2023 Begin Importing [3/4" Steel Banding Seals/Clips]
20:57:19 01/06/2023 [3/4" Steel Banding Seals/Clips] already exists. Updating.
20:57:19 01/06/2023 Calling after_channel_entry_save() hook.
20:57:19 01/06/2023 Updated 1 entries
20:57:19 01/06/2023 Begin Importing [U_WebProductName]
20:57:19 01/06/2023 [U_WebProductName] already exists. Updating.
20:57:20 01/06/2023 Calling after_channel_entry_save() hook.
20:57:20 01/06/2023 Updated 2 entries
20:57:20 01/06/2023 Begin Importing [U_WebProductName]
20:57:20 01/06/2023 [U_WebProductName] already exists. Updating.
20:57:20 01/06/2023 Calling after_channel_entry_save() hook.
20:57:20 01/06/2023 Updated 3 entries
20:57:20 01/06/2023 Begin Importing [U_WebProductName]
20:57:20 01/06/2023 [U_WebProductName] already exists. Updating.
20:57:20 01/06/2023 Calling after_channel_entry_save() hook.
20:57:20 01/06/2023 Updated 4 entries
20:57:20 01/06/2023 Begin Importing [U_WebProductName]
20:57:20 01/06/2023 [U_WebProductName] already exists. Updating.
20:57:20 01/06/2023 Calling after_channel_entry_save() hook.
20:57:20 01/06/2023 Updated 5 entries
20:57:20 01/06/2023 Begin Importing [U_WebProductName]
20:57:20 01/06/2023 [U_WebProductName] already exists. Updating.
20:57:21 01/06/2023 Calling after_channel_entry_save() hook.
20:57:21 01/06/2023 Updated 6 entries
20:57:21 01/06/2023 Begin Importing [U_WebProductName]
20:57:21 01/06/2023 [U_WebProductName] already exists. Updating.
20:57:21 01/06/2023 Calling after_channel_entry_save() hook.
20:57:21 01/06/2023 Updated 7 entries
20:57:21 01/06/2023 Begin Importing [U_WebProductName]
20:57:21 01/06/2023 [U_WebProductName] already exists. Updating.
20:57:21 01/06/2023 Calling after_channel_entry_save() hook.
20:57:21 01/06/2023 Updated 8 entries

Command run is: php system/ee/eecli.php import:run –id=27

Settings: https://www.dropbox.com/s/ztgyqh7kpibas1z/ticket-2560-settings.png?dl=0

#8

BoldMinded (Brian)

I posted the command I was running in the previous comment, what command are you running? Are you trying to import everything in a single command or are you batching it in multiple different import commands?

#9

Paul Larson

Comment has been marked private.

#10

BoldMinded (Brian)

I think I see what is causing the issue. I’ll try to have a fix this weekend or early next week.

#11

BoldMinded (Brian)

Comment has been marked private.

#12

Paul Larson

Comment has been marked private.

#13

Paul Larson

Comment has been marked private.

#14

BoldMinded (Brian)

I’m not worried about the status right now, we need to focus on one issue at a time. Does that build correctly import the rows, regardless of the status, and skip the csv heading row correctly?

#15

BoldMinded (Brian)

Do you see the duplicate item in the log when importing through the CP?

Have you tried deleting the entries and reimporting?

Have you clicked the reset import button in the CP to ensure the import is not stuck in a broken import state?

#16

Paul Larson

* Do you see the duplicate item in the log when importing through the CP?

No, logging is correct when run from CP:

* Have you tried deleting the entries and reimporting?

Yes. I made a test entry in the import file.

With header row present in file. CLI: And ‘skip first row’ enabled, it does not create a test entry when I add a row to the import file. CP/Web interface: it DOES create the new entry

With header row NOT present in file: CLI: same behavior (doesn’t create new entry) CP/Web: DOES create new entry

* Have you clicked the reset import button in the CP to ensure the import is not stuck in a broken import state?

It’s never been stuck, so haven’t had the need.

* Does that build correctly import the rows, regardless of the status, and skip the csv heading row correctly?

Almost.

CLI: Seems to correctly skip header row when import is set to do so.

In the log, though, it doesn’t reference the LAST line in the file.

CP: Also seems to skip header properly.

In the log, it DOES reference the last line in the import file, unlike CLI.

Making a test field value change in import file, CLI does not update (presumably because row is skipped) but CP properly updates the field.

So it seems, in both cases the header is skipped in that it isn’t processed, and no log mention of “ITEM_NO” (which is a header row).

But, it does seem the CLI, then, doesn’t process the last row while the CP seems to do everything correctly with respect to header row settings.

#17

Paul Larson

(Tweet-version of that is: Header good, both CP/CLI. But CLI then doesn’t process LAST row while CP does everything fine)

#18

BoldMinded (Brian)

Comment has been marked private.

#19

BoldMinded (Brian)

If the previous build fixes the import count for you then let’s move onto the status issue. It’s still not clear to me what the issue is.

What is the default status of the channel when a new entry is imported?

Should it always import at that status and never change the status or if the status of the entry has been changed manually should the import respect that change?

Can you share a screenshot of your entire import settings page? I’ve only seen parts of it and I need to see the status related settings.

#20

BoldMinded (Brian)

There is a new “Update Status?” option at the bottom of the settings, make sure it’s set to “Create Only” - the description of that field describes what it does. If I had to guess it’s set to “Create or Update” right now.

#21

BoldMinded (Brian)

Comment has been marked private.

#22

Paul Larson

Comment has been marked private.

#23

BoldMinded (Brian)

Comment has been marked private.

#24

Paul Larson

Comment has been marked private.

Login to reply