EE4 compatibility updates:
- Publisher is EE4 and EE5 compatible, but it does not currently support the Fluid field (it's in the works).
- Reel is EE4 and EE5 compatible, but it does not currently support the Fluid field.
Ticket: DataGrab seems to process header row, even with “Use first row as title” enabled
Status | Resolved |
Add-on / Version | DataGrab 4.2.1 |
Severity | |
EE Version | 6.4.3 |
Paul Larson
Jan 05, 2023My data file looks like this:
ITEM_NO LONG_DESCR CATEG_COD WEB_PRICE IS_PARENT PARENT_ITEM DIM_1 QTY_AVAIL WEB_DESC
1080BIRMINGHAM Brunswick 8’ Birmingham PLTBL BRU500 0.0000 16299.0000 Birmingham Pool Table birmingham-pool-table
1080PURSUIT Brunswick 8’ Pursuit PLTBL BRU500 210.0000 7299.0000 Pursuit Pool Table pursuit-pool-table
1090CENTENNIAL Brunswick 9’ Centennial Pool Table PLTBL BRU500 210.0000 13999.0000 Centennial Pool Table centennial-pool-table
~
~
~
~
My import settings are:
https://www.dropbox.com/s/tkard4xgwyo7gki/2023-01-05_11-43-14.png?dl=0
https://www.dropbox.com/s/w7on7fl9xqyemqy/2023-01-05_11-44-15.png?dl=0
~
~
~
When I run the CLI import:
-bash-4.2$ php /var/www/vhosts/mysite.com/httpdocs/system/ee/eecli.php import:run—import_id=59 && tail -n 15 system/user/cache/DataGrab-import.log
Starting: Pool Table Parent…
0%
25%
50%
75%
Import Complete
17:45:31 01/05/2023 Import #59 Started
17:45:31 01/05/2023 Import #59 Initialized
17:45:31 01/05/2023 Begin Importing [ITEM_NO]
17:45:32 01/05/2023 [ITEM_NO] already exists. Updating.
17:45:33 01/05/2023 Import error with entry ITEM_NO with the field_id_90 field: This field must contain only numbers.
17:45:33 01/05/2023 Begin Importing [ITEM_NO]
17:45:33 01/05/2023 [ITEM_NO] already exists. Updating.
17:45:34 01/05/2023 Import error with entry ITEM_NO with the field_id_90 field: This field must contain only numbers.
17:45:35 01/05/2023 Begin Importing [ITEM_NO]
17:45:35 01/05/2023 [ITEM_NO] already exists. Updating.
17:45:36 01/05/2023 Import error with entry ITEM_NO with the field_id_90 field: This field must contain only numbers.
17:45:36 01/05/2023 Begin Importing [ITEM_NO]
17:45:36 01/05/2023 [ITEM_NO] already exists. Updating.
17:45:37 01/05/2023 Import error with entry ITEM_NO with the field_id_90 field: This field must contain only numbers.
-bash-4.2$
~
~
~
—————————
I would think ITEM_NO would be skipped as it’s line 1.
When I manually remove the header row, the output is:
Starting: Pool Table Parent…
0%
33%
67%
Import Complete
17:48:11 01/05/2023 Begin Importing [1080BIRMINGHAM]
17:48:11 01/05/2023 Appending new row(s) to Grid
17:48:11 01/05/2023 [1080BIRMINGHAM] already exists. Updating.
17:48:13 01/05/2023 Calling after_channel_entry_save() hook.
17:48:13 01/05/2023 Updated 1 entries
17:48:13 01/05/2023 Begin Importing [1080BIRMINGHAM]
17:48:13 01/05/2023 Appending new row(s) to Grid
17:48:13 01/05/2023 [1080BIRMINGHAM] already exists. Updating.
17:48:15 01/05/2023 Calling after_channel_entry_save() hook.
17:48:15 01/05/2023 Updated 2 entries
17:48:15 01/05/2023 Begin Importing [1080BIRMINGHAM]
17:48:15 01/05/2023 Appending new row(s) to Grid
17:48:15 01/05/2023 [1080BIRMINGHAM] already exists. Updating.
17:48:17 01/05/2023 Calling after_channel_entry_save() hook.
17:48:17 01/05/2023 Updated 3 entries
~
~
~
That is, smooth sailing once I manually remove the header.
Paul Larson
Note: my file are tab delimited, just in case the DataGrab code has different sections for delimiter types.
BoldMinded (Brian)
I believe this is fixed in the build I sent in the other ticket. Closing this ticket.
Paul Larson
See image:
https://www.dropbox.com/s/fzdp1iiym5frvep/2023-01-06_08-50-18.png?dl=0
CLI still does not seem to honor header row with new build.
Data file is:
ITEM_NO LONG_DESCR CATEG_COD ITEM_VEND_NO WEIGHT WEB_PRICE WEB_DESC url_title 1080BIRMINGHAM Brunswick 8’ Birmingham PLTBL BRU500 0.0000 16299.0000 Birmingham Pool Table birmingham-pool-table 1080PURSUIT Brunswick 8’ Pursuit PLTBL BRU500 210.0000 7299.0000 Pursuit Pool Table pursuit-pool-table 1090CENTENNIAL Brunswick 9’ Centennial Pool Table PLTBL BRU500 210.0000 13999.0000 Centennial Pool Table centennial-pool-table ~ ~ ~ When I manually delete header row: ~ ~ ~ -bash-4.2$ cat DataGrab-import.log 14:55:28 01/06/2023 Import #59 Started 14:55:28 01/06/2023 Import #59 Initialized 14:55:28 01/06/2023 Begin Importing [1080BIRMINGHAM] 14:55:28 01/06/2023 [1080BIRMINGHAM] already exists. Updating. 14:55:31 01/06/2023 Calling after_channel_entry_save() hook. 14:55:31 01/06/2023 Updated 1 entries 14:55:31 01/06/2023 Begin Importing [1080BIRMINGHAM] 14:55:31 01/06/2023 [1080BIRMINGHAM] already exists. Updating. 14:55:32 01/06/2023 Calling after_channel_entry_save() hook. 14:55:32 01/06/2023 Updated 2 entries 14:55:32 01/06/2023 Begin Importing [1080BIRMINGHAM] 14:55:32 01/06/2023 [1080BIRMINGHAM] already exists. Updating. 14:55:34 01/06/2023 Calling after_channel_entry_save() hook. 14:55:34 01/06/2023 Updated 3 entries ~ ~ ~
Compared to when the header row is present (note repeat of ITEM_NO)
-bash-4.2$ cat DataGrab-import.log 14:59:23 01/06/2023 Import #59 Started 14:59:23 01/06/2023 Import #59 Initialized 14:59:23 01/06/2023 Begin Importing [ITEM_NO] 14:59:23 01/06/2023 [ITEM_NO] already exists. Updating. 14:59:25 01/06/2023 Import error with entry ITEM_NO with the field_id_90 field: This field must contain only numbers. 14:59:25 01/06/2023 Begin Importing [ITEM_NO] 14:59:25 01/06/2023 [ITEM_NO] already exists. Updating. 14:59:26 01/06/2023 Import error with entry ITEM_NO with the field_id_90 field: This field must contain only numbers. 14:59:26 01/06/2023 Begin Importing [ITEM_NO] 14:59:26 01/06/2023 [ITEM_NO] already exists. Updating. 14:59:27 01/06/2023 Import error with entry ITEM_NO with the field_id_90 field: This field must contain only numbers. 14:59:27 01/06/2023 Begin Importing [ITEM_NO] 14:59:27 01/06/2023 [ITEM_NO] already exists. Updating. 14:59:29 01/06/2023 Import error with entry ITEM_NO with the field_id_90 field: This field must contain only numbers.
BoldMinded (Brian)
How are you running this? CLI? ACT? Or the CP interface?
If it’s throwing the POST error message, then it’s doing what it should do. Your POST array is not empty so it thinks you’re trying to POST data somehow.
BoldMinded (Brian)
You might have to log or var_dump the POST array where it is throwing that error. Just look for that error message in the code.
Paul Larson
Running via CLI.
BoldMinded (Brian)
Let’s focus on this issue first. The other ticket is mentioning the heading row too and it’s confusing following both tickets.
Can you share a small sample of your import file? I just need 3-5 rows of the CSV. Share it on dropbox or somewhere I can download the file, don’t paste it in comments I don’t want the formatting to get messed up.
I’m running via CLI and my debugging output is:
No header row in the output.
Command run is: php system/ee/eecli.php import:run –id=27
Settings: https://www.dropbox.com/s/ztgyqh7kpibas1z/ticket-2560-settings.png?dl=0
BoldMinded (Brian)
I posted the command I was running in the previous comment, what command are you running? Are you trying to import everything in a single command or are you batching it in multiple different import commands?
Paul Larson
Comment has been marked private.
BoldMinded (Brian)
I think I see what is causing the issue. I’ll try to have a fix this weekend or early next week.
BoldMinded (Brian)
Comment has been marked private.
Paul Larson
Comment has been marked private.
Paul Larson
Comment has been marked private.
BoldMinded (Brian)
I’m not worried about the status right now, we need to focus on one issue at a time. Does that build correctly import the rows, regardless of the status, and skip the csv heading row correctly?
BoldMinded (Brian)
Do you see the duplicate item in the log when importing through the CP?
Have you tried deleting the entries and reimporting?
Have you clicked the reset import button in the CP to ensure the import is not stuck in a broken import state?
Paul Larson
* Do you see the duplicate item in the log when importing through the CP?
No, logging is correct when run from CP:
* Have you tried deleting the entries and reimporting?
Yes. I made a test entry in the import file.
With header row present in file. CLI: And ‘skip first row’ enabled, it does not create a test entry when I add a row to the import file. CP/Web interface: it DOES create the new entry
With header row NOT present in file: CLI: same behavior (doesn’t create new entry) CP/Web: DOES create new entry
* Have you clicked the reset import button in the CP to ensure the import is not stuck in a broken import state?
It’s never been stuck, so haven’t had the need.
* Does that build correctly import the rows, regardless of the status, and skip the csv heading row correctly?
Almost.
CLI: Seems to correctly skip header row when import is set to do so.
In the log, though, it doesn’t reference the LAST line in the file.
CP: Also seems to skip header properly.
In the log, it DOES reference the last line in the import file, unlike CLI.
Making a test field value change in import file, CLI does not update (presumably because row is skipped) but CP properly updates the field.
So it seems, in both cases the header is skipped in that it isn’t processed, and no log mention of “ITEM_NO” (which is a header row).
But, it does seem the CLI, then, doesn’t process the last row while the CP seems to do everything correctly with respect to header row settings.
Paul Larson
(Tweet-version of that is: Header good, both CP/CLI. But CLI then doesn’t process LAST row while CP does everything fine)
BoldMinded (Brian)
Comment has been marked private.
BoldMinded (Brian)
If the previous build fixes the import count for you then let’s move onto the status issue. It’s still not clear to me what the issue is.
What is the default status of the channel when a new entry is imported?
Should it always import at that status and never change the status or if the status of the entry has been changed manually should the import respect that change?
Can you share a screenshot of your entire import settings page? I’ve only seen parts of it and I need to see the status related settings.
BoldMinded (Brian)
There is a new “Update Status?” option at the bottom of the settings, make sure it’s set to “Create Only” - the description of that field describes what it does. If I had to guess it’s set to “Create or Update” right now.
BoldMinded (Brian)
Comment has been marked private.
Paul Larson
Comment has been marked private.
BoldMinded (Brian)
Comment has been marked private.
Paul Larson
Comment has been marked private.