Große csv Dateien mit Importieren (PHP): Unterschied zwischen den Versionen
Aus Wikizone
Steff (Diskussion | Beiträge) (Die Seite wurde neu angelegt: „Probleme bei großen csv Dateien * Memory Overflow * Timeout Timeouts lassen sich mit folgenden Strategien vermeiden: # Aufgeteilter Import # Effizienter Co…“) |
|||
| (5 dazwischenliegende Versionen von 3 Benutzern werden nicht angezeigt) | |||
| Zeile 15: | Zeile 15: | ||
Lösungsansätze in der Praxis | Lösungsansätze in der Praxis | ||
| − | == CSV direkt via MySQL importieren == | + | == CSV direkt via MySQL Statement importieren == |
| − | + | https://stackoverflow.com/questions/7318768/process-very-big-csv-file-without-timeout-and-memory-error | |
| − | + | <pre> | |
| − | + | $sql = "LOAD DATA LOCAL INFILE '/path/to/file.csv' | |
| − | + | REPLACE INTO TABLE table_name FIELDS TERMINATED BY ',' | |
| − | + | ENCLOSED BY '\"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES"; | |
| + | $result = $mysqli->query($sql); | ||
| + | </pre> | ||
| + | === Beispiel (ProcessWire) === | ||
| + | Bei fast 30.000 Datensätzen ging der Import hier nur wenige ms! | ||
| + | <syntaxhighlight lang="php"> | ||
| + | $table = 'dekra_plz'; | ||
| + | if(wire()->db->query("SHOW TABLES LIKE '$table'")->num_rows < 1){ // no table ? | ||
| + | $messageQueue[] = "Create Table $table"; | ||
| + | // create... | ||
| + | $sql = "CREATE TABLE $table ( | ||
| + | plz varchar(16) NOT NULL, | ||
| + | akz VARCHAR(8), | ||
| + | bl_id int(4) UNSIGNED NOT NULL, | ||
| + | nl_id int(4) UNSIGNED NOT NULL, | ||
| + | as_id int(4) UNSIGNED NOT NULL, | ||
| + | gsg_id int(4) UNSIGNED NOT NULL)"; | ||
| + | $result = wire()->db->query($sql); | ||
| + | } | ||
| + | $sql = "LOAD DATA LOCAL INFILE 'data/plz.csv' | ||
| + | REPLACE INTO TABLE $table | ||
| + | FIELDS TERMINATED BY ';' | ||
| + | ENCLOSED BY '\"' | ||
| + | LINES TERMINATED BY '\r\n' | ||
| + | IGNORE 1 LINES"; | ||
| + | $result = wire()->db->query($sql); | ||
| + | </syntaxhighlight> | ||
| + | |||
== CSV Zeilenweise einlesen == | == CSV Zeilenweise einlesen == | ||
https://stackoverflow.com/questions/7318768/process-very-big-csv-file-without-timeout-and-memory-error | https://stackoverflow.com/questions/7318768/process-very-big-csv-file-without-timeout-and-memory-error | ||
| + | Ein Kommentator empfiehlt außerdem fget statt fgetcsv zu nehmen das würde die Zeit in seinem Beispiel von 8min auf 3min reduzieren. | ||
| + | |||
| + | I've used fgetcsv to read a 120MB csv in a stream-wise-manner (is that correct english?). That reads in line by line and then I've inserted every line into a database. That way only one line is hold in memory on each iteration. The script still needed 20 min. to run. Maybe I try Python next time… Don't try to load a huge csv-file into an array, that really would consume a lot of memory. | ||
<syntaxhighlight lang="php"> | <syntaxhighlight lang="php"> | ||
| − | |||
| − | |||
// WDI_GDF_Data.csv (120.4MB) are the World Bank collection of development indicators: | // WDI_GDF_Data.csv (120.4MB) are the World Bank collection of development indicators: | ||
// http://data.worldbank.org/data-catalog/world-development-indicators | // http://data.worldbank.org/data-catalog/world-development-indicators | ||
Aktuelle Version vom 2. Januar 2020, 19:52 Uhr
Probleme bei großen csv Dateien
- Memory Overflow
- Timeout
Timeouts lassen sich mit folgenden Strategien vermeiden:
- Aufgeteilter Import
- Effizienter Code
- PHP von der Console (z.B. via Cronjob) starten
Memory Overflows lassen sich so verhindern:
- Nicht alle Datensätze in den Speicher laden (z.B. nicht in ein Array)
- Stattdessen Zeilenweise oder in Häppchen verarbeiten (dauert halt länger)
Lösungsansätze in der Praxis
CSV direkt via MySQL Statement importieren[Bearbeiten]
$sql = "LOAD DATA LOCAL INFILE '/path/to/file.csv' REPLACE INTO TABLE table_name FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES"; $result = $mysqli->query($sql);
Beispiel (ProcessWire)[Bearbeiten]
Bei fast 30.000 Datensätzen ging der Import hier nur wenige ms!
$table = 'dekra_plz';
if(wire()->db->query("SHOW TABLES LIKE '$table'")->num_rows < 1){ // no table ?
$messageQueue[] = "Create Table $table";
// create...
$sql = "CREATE TABLE $table (
plz varchar(16) NOT NULL,
akz VARCHAR(8),
bl_id int(4) UNSIGNED NOT NULL,
nl_id int(4) UNSIGNED NOT NULL,
as_id int(4) UNSIGNED NOT NULL,
gsg_id int(4) UNSIGNED NOT NULL)";
$result = wire()->db->query($sql);
}
$sql = "LOAD DATA LOCAL INFILE 'data/plz.csv'
REPLACE INTO TABLE $table
FIELDS TERMINATED BY ';'
ENCLOSED BY '\"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES";
$result = wire()->db->query($sql);
CSV Zeilenweise einlesen[Bearbeiten]
https://stackoverflow.com/questions/7318768/process-very-big-csv-file-without-timeout-and-memory-error
Ein Kommentator empfiehlt außerdem fget statt fgetcsv zu nehmen das würde die Zeit in seinem Beispiel von 8min auf 3min reduzieren.
I've used fgetcsv to read a 120MB csv in a stream-wise-manner (is that correct english?). That reads in line by line and then I've inserted every line into a database. That way only one line is hold in memory on each iteration. The script still needed 20 min. to run. Maybe I try Python next time… Don't try to load a huge csv-file into an array, that really would consume a lot of memory.
// WDI_GDF_Data.csv (120.4MB) are the World Bank collection of development indicators:
// http://data.worldbank.org/data-catalog/world-development-indicators
if(($handle = fopen('WDI_GDF_Data.csv', 'r')) !== false)
{
// get the first row, which contains the column-titles (if necessary)
$header = fgetcsv($handle);
// loop through the file line-by-line
while(($data = fgetcsv($handle)) !== false)
{
// resort/rewrite data and insert into DB here
// try to use conditions sparingly here, as those will cause slow-performance
// I don't know if this is really necessary, but it couldn't harm;
// see also: http://php.net/manual/en/features.gc.php
unset($data);
}
fclose($handle);
}