Große csv Dateien mit Importieren (PHP): Unterschied zwischen den Versionen

Aus Wikizone
Wechseln zu: Navigation, Suche
(Die Seite wurde neu angelegt: „Probleme bei großen csv Dateien * Memory Overflow * Timeout Timeouts lassen sich mit folgenden Strategien vermeiden: # Aufgeteilter Import # Effizienter Co…“)
 
Zeile 16: Zeile 16:
  
 
== CSV direkt via MySQL importieren ==
 
== CSV direkt via MySQL importieren ==
https://stackoverflow.com/questions/7318768/process-very-big-csv-file-without-timeout-and-memory-error
+
https://stackoverflow.com/questions/7318768/process-very-big-csv-file-without-timeout-and-memory-error
$sql = "LOAD DATA LOCAL INFILE '/path/to/file.csv'  
+
<pre>
        REPLACE INTO TABLE table_name FIELDS TERMINATED BY ','  
+
$sql = "LOAD DATA LOCAL INFILE '/path/to/file.csv'  
        ENCLOSED BY '\"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES";
+
  REPLACE INTO TABLE table_name FIELDS TERMINATED BY ','  
  $result = $mysqli->query($sql);
+
  ENCLOSED BY '\"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES";
 +
$result = $mysqli->query($sql);
 +
</pre>
 +
 
 
== CSV Zeilenweise einlesen ==
 
== CSV Zeilenweise einlesen ==
 
  https://stackoverflow.com/questions/7318768/process-very-big-csv-file-without-timeout-and-memory-error
 
  https://stackoverflow.com/questions/7318768/process-very-big-csv-file-without-timeout-and-memory-error

Version vom 1. Februar 2019, 11:36 Uhr

Probleme bei großen csv Dateien

  • Memory Overflow
  • Timeout

Timeouts lassen sich mit folgenden Strategien vermeiden:

  1. Aufgeteilter Import
  2. Effizienter Code
  3. PHP von der Console (z.B. via Cronjob) starten

Memory Overflows lassen sich so verhindern:

  1. Nicht alle Datensätze in den Speicher laden (z.B. nicht in ein Array)
  2. Stattdessen Zeilenweise oder in Häppchen verarbeiten (dauert halt länger)

Lösungsansätze in der Praxis

CSV direkt via MySQL importieren

https://stackoverflow.com/questions/7318768/process-very-big-csv-file-without-timeout-and-memory-error

$sql = "LOAD DATA LOCAL INFILE '/path/to/file.csv' 
  REPLACE INTO TABLE table_name FIELDS TERMINATED BY ',' 
  ENCLOSED BY '\"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES";
$result = $mysqli->query($sql);

CSV Zeilenweise einlesen

https://stackoverflow.com/questions/7318768/process-very-big-csv-file-without-timeout-and-memory-error
I've used fgetcsv to read a 120MB csv in a stream-wise-manner (is that correct english?). That reads in line by line and then I've inserted every line into a database. That way only one line is hold in memory on each iteration. The script still needed 20 min. to run. Maybe I try Python next time… Don't try to load a huge csv-file into an array, that really would consume a lot of memory.

// WDI_GDF_Data.csv (120.4MB) are the World Bank collection of development indicators:
// http://data.worldbank.org/data-catalog/world-development-indicators
if(($handle = fopen('WDI_GDF_Data.csv', 'r')) !== false)
{
    // get the first row, which contains the column-titles (if necessary)
    $header = fgetcsv($handle);

    // loop through the file line-by-line
    while(($data = fgetcsv($handle)) !== false)
    {
        // resort/rewrite data and insert into DB here
        // try to use conditions sparingly here, as those will cause slow-performance

        // I don't know if this is really necessary, but it couldn't harm;
        // see also: http://php.net/manual/en/features.gc.php
        unset($data);
    }
    fclose($handle);
}