Große csv Dateien mit Importieren (PHP): Unterschied zwischen den Versionen
Aus Wikizone
Steff (Diskussion | Beiträge) (Die Seite wurde neu angelegt: „Probleme bei großen csv Dateien * Memory Overflow * Timeout Timeouts lassen sich mit folgenden Strategien vermeiden: # Aufgeteilter Import # Effizienter Co…“) |
Steff (Diskussion | Beiträge) |
||
| Zeile 16: | Zeile 16: | ||
== CSV direkt via MySQL importieren == | == CSV direkt via MySQL importieren == | ||
| − | + | https://stackoverflow.com/questions/7318768/process-very-big-csv-file-without-timeout-and-memory-error | |
| − | + | <pre> | |
| − | + | $sql = "LOAD DATA LOCAL INFILE '/path/to/file.csv' | |
| − | + | REPLACE INTO TABLE table_name FIELDS TERMINATED BY ',' | |
| − | + | ENCLOSED BY '\"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES"; | |
| + | $result = $mysqli->query($sql); | ||
| + | </pre> | ||
| + | |||
== CSV Zeilenweise einlesen == | == CSV Zeilenweise einlesen == | ||
https://stackoverflow.com/questions/7318768/process-very-big-csv-file-without-timeout-and-memory-error | https://stackoverflow.com/questions/7318768/process-very-big-csv-file-without-timeout-and-memory-error | ||
Version vom 1. Februar 2019, 11:36 Uhr
Probleme bei großen csv Dateien
- Memory Overflow
- Timeout
Timeouts lassen sich mit folgenden Strategien vermeiden:
- Aufgeteilter Import
- Effizienter Code
- PHP von der Console (z.B. via Cronjob) starten
Memory Overflows lassen sich so verhindern:
- Nicht alle Datensätze in den Speicher laden (z.B. nicht in ein Array)
- Stattdessen Zeilenweise oder in Häppchen verarbeiten (dauert halt länger)
Lösungsansätze in der Praxis
CSV direkt via MySQL importieren
$sql = "LOAD DATA LOCAL INFILE '/path/to/file.csv' REPLACE INTO TABLE table_name FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES"; $result = $mysqli->query($sql);
CSV Zeilenweise einlesen
https://stackoverflow.com/questions/7318768/process-very-big-csv-file-without-timeout-and-memory-error
I've used fgetcsv to read a 120MB csv in a stream-wise-manner (is that correct english?). That reads in line by line and then I've inserted every line into a database. That way only one line is hold in memory on each iteration. The script still needed 20 min. to run. Maybe I try Python next time… Don't try to load a huge csv-file into an array, that really would consume a lot of memory.
// WDI_GDF_Data.csv (120.4MB) are the World Bank collection of development indicators:
// http://data.worldbank.org/data-catalog/world-development-indicators
if(($handle = fopen('WDI_GDF_Data.csv', 'r')) !== false)
{
// get the first row, which contains the column-titles (if necessary)
$header = fgetcsv($handle);
// loop through the file line-by-line
while(($data = fgetcsv($handle)) !== false)
{
// resort/rewrite data and insert into DB here
// try to use conditions sparingly here, as those will cause slow-performance
// I don't know if this is really necessary, but it couldn't harm;
// see also: http://php.net/manual/en/features.gc.php
unset($data);
}
fclose($handle);
}