Working with large CSV files in Python from Scratch

5 Techniques

Ramses Alexander Coraspe Valdez
12 min readDec 21, 2022
Large CSV files

I assume you have already had the experience of trying to open a large .csv file on your computer and it stopped working to the point of having to restart it. It is obvious that trying to load files over 2gb into memory is still an extremely heavy task for current computers. This article explains and provides some techniques that I have sometimes had to use to process very large csv files from scratch:

Estimate count of rows

Knowing the number of records or rows in your csv file in advance can help you to improve the partitioning strategy, or division of the file. The code below can help you estimate the number of records or rows contained in a very large csv file:

Count of rows estimation
mport mmap
import os

class Csvnrows(object):

@staticmethod
def estimate_csv_rows(filename, header = True):

count_rows = 0

with open(filename, mode="r", encoding = "ISO-8859-1") as file_obj:

with mmap.mmap(file_obj.fileno(), length=0, access=mmap.ACCESS_READ) as map_file:

buffer = map_file.read(1<<13)
file_size = os.path.getsize(filename)…

--

--

Ramses Alexander Coraspe Valdez
Ramses Alexander Coraspe Valdez

Written by Ramses Alexander Coraspe Valdez

Very passionate about data engineering and technology, love to design, create, test and write ideas, I hope you like my articles.

No responses yet