Große csv Dateien mit Importieren (PHP)
Aus Wikizone
Probleme bei großen csv Dateien
- Memory Overflow
- Timeout
Timeouts lassen sich mit folgenden Strategien vermeiden:
- Aufgeteilter Import
- Effizienter Code
- PHP von der Console (z.B. via Cronjob) starten
Memory Overflows lassen sich so verhindern:
- Nicht alle Datensätze in den Speicher laden (z.B. nicht in ein Array)
- Stattdessen Zeilenweise oder in Häppchen verarbeiten (dauert halt länger)
Lösungsansätze in der Praxis
CSV direkt via MySQL Statement importieren[Bearbeiten]
$sql = "LOAD DATA LOCAL INFILE '/path/to/file.csv' REPLACE INTO TABLE table_name FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES"; $result = $mysqli->query($sql);
Beispiel (ProcessWire)[Bearbeiten]
Bei fast 30.000 Datensätzen ging der Import hier nur wenige ms!
$table = 'dekra_plz';
if(wire()->db->query("SHOW TABLES LIKE '$table'")->num_rows < 1){ // no table ?
$messageQueue[] = "Create Table $table";
// create...
$sql = "CREATE TABLE $table (
plz varchar(16) NOT NULL,
akz VARCHAR(8),
bl_id int(4) UNSIGNED NOT NULL,
nl_id int(4) UNSIGNED NOT NULL,
as_id int(4) UNSIGNED NOT NULL,
gsg_id int(4) UNSIGNED NOT NULL)";
$result = wire()->db->query($sql);
}
$sql = "LOAD DATA LOCAL INFILE 'data/plz.csv'
REPLACE INTO TABLE $table
FIELDS TERMINATED BY ';'
ENCLOSED BY '\"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES";
$result = wire()->db->query($sql);
CSV Zeilenweise einlesen[Bearbeiten]
https://stackoverflow.com/questions/7318768/process-very-big-csv-file-without-timeout-and-memory-error
Ein Kommentator empfiehlt außerdem fget statt fgetcsv zu nehmen das würde die Zeit in seinem Beispiel von 8min auf 3min reduzieren.
I've used fgetcsv to read a 120MB csv in a stream-wise-manner (is that correct english?). That reads in line by line and then I've inserted every line into a database. That way only one line is hold in memory on each iteration. The script still needed 20 min. to run. Maybe I try Python next time… Don't try to load a huge csv-file into an array, that really would consume a lot of memory.
// WDI_GDF_Data.csv (120.4MB) are the World Bank collection of development indicators:
// http://data.worldbank.org/data-catalog/world-development-indicators
if(($handle = fopen('WDI_GDF_Data.csv', 'r')) !== false)
{
// get the first row, which contains the column-titles (if necessary)
$header = fgetcsv($handle);
// loop through the file line-by-line
while(($data = fgetcsv($handle)) !== false)
{
// resort/rewrite data and insert into DB here
// try to use conditions sparingly here, as those will cause slow-performance
// I don't know if this is really necessary, but it couldn't harm;
// see also: http://php.net/manual/en/features.gc.php
unset($data);
}
fclose($handle);
}