Importing GeoIP2 and GeoLite2 databases to MySQL

This guide will show you how to import GeoIP2 or GeoLite2 databases into MySQL so that they can be easily queried and manipulated on your server.

Importing a CSV database consists of downloading the database, extracting it, creating tables to hold the data, and indexing those tables for faster querying.

Download and extract the database

Begin by making sure you have downloaded the latest version of the GeoIP2 or GeoLite2 database that you want to import. You can download databases through your account portal. CSV format databases are shipped as a single zip file. See our CSV database documentation for details about the zip structure and contents of the archive. For the purposes of this tutorial, we will be using the GeoIP2 City CSV file, but you can get information about any of our CSV format databases and adapt the following instructions accordingly.

Keep your database updated

If you are importing databases for use in an ongoing manner, you will want to automate the process of downloading and extracting the CSV files so that you can be sure that your database is always up to date.

Extract the CSV files

Once you have downloaded the database, unpack the zip file in your desired directory. If you are importing the GeoIP2 City database, you will have a number of files. In this tutorial we will be working with the following files:

  • GeoIP2-City-Blocks-IPv4.csv
  • GeoIP2-City-Blocks-IPv6.csv
  • GeoIP2-City-Locations-en.csv

If you are working in a language other than English, you can choose the appropriate Locations file from the zip archive. For example, if you want to load Chinese location names into MySQL, you can use GeoIP2-City-Locations-zh-CN.csv instead of GeoIP2-City-Locations-en.csv. Our documentation on CSV format databases includes a list of all location files currently included in GeoIP2 and GeoLite2 databases.

Create a table for network data

First we create a table to hold the network information contained in GeoIP2-City-Blocks-IPv4.csv and GeoIP2-City-Blocks-IPv6.csv.

Converting the network field

You can find the full description of the schema for these files in the section on Blocks files for GeoIP2 and GeoLite2 CSV databases. If you are working with a different database, you can find the Blocks file schema for the appropriate database, and adapt the table to meet that structure.

The network field of the database uses CIDR notation. Unfortunately MySQL doesn't offer any capabilities to work with data in that format, so we'll first have to convert the networks into something else that we can easily query later on. We chose to represent networks as a pair of IP addresses that are the first and last address in the network, respectively. We will convert this field so that both of these IP addresses will be represented as hexadecimal numbers.

We can use the database conversion tool to convert this field to hexadecimal numbers. Once you have downloaded the program and installed it to the same directory where you have extracted your CSV files, you can run it:

Schema

We can now create a table called geoip2_networkto hold the data we just converted. We'll represent the IP addresses using the type varbinary(16), which will be large enough to represent 128-bit (16 byte) IPv6 addresses.

Note that we're adding two separate indexes for network_start and network_end. If we used a composite index over both columns, we wouldn't be able to speed up the queries we'll be using later on.

Load data into the network table

We can now import the contents of GeoIP2-City-Blocks-IPv4.csv and GeoIP2-City-Blocks-IPv6.csv into the table we just created.

We'll start by loading the converted IPv6 data:

We can load the converted IPv4 data in the same way:

Test our table by querying it

With everything loaded we're now ready to look up an IP address in the database. As we're representing IP addresses as varbinary(16), we'll first have to convert the textual representation of the IP address we're interested in to that same type using MySQL's built-in inet6_aton function.

While that yields the correct results, we notice that the query performed so poorly that it will not be practical in many applications.

Order the table for faster searching

One way of speeding up the query is the addition of order by network_end like so:

While that addresses the performance concern we've had with the previous query, this query will still perform poorly for addresses not contained in our GeoIP2 database:

Break up our query for faster searching

We can work around this problem by breaking our query up into two parts such that MySQL will be able to use the indexes we've created more efficiently:

Using that construct gives us good query performance for all addresses, regardless of whether or not the GeoIP2 database contains any information about them. Depending on your application, you might want to consider encapsulating this complexity/verbosity in a function. Alternatively, MySQL also offers spatial data types that can be used to achieve similar performance while allowing for queries to be expressed more naturally.

Optional: Create a table for locations data

If postal_code, latitude, longitude, and accuracy_radius are everything we're interested in we'd be done at this point and our application would be easily able to query what it needs. However, GeoIP2 databases provide additional location information. Note the geoname_id field in our network table. This field can be used to look up additional information about the geolocation from the Locations files we downloaded before. We'll load this data into MySQL next.

Locations table schema

We start by creating a table as before. As with the Blocks files, the schema for GeoIP2 and GeoLite2 City locations files can be found in the CSV section of the database documentation.

We'll name this table geoip2_location:

Load data into the locations table

We then populate our geoip2_location table from a Locations CSV file. In this case we will be populating the table from GeoIP2-City-Locations-en.csv. Using the file with the -en suffix will give us the names of geolocations in English:

Note that there's a number of different Locations files available. The other files with different language suffixes contain localized versions of the -en data in different languages for some of the geoname_ids. Depending on your application's needs you may decide to import additional Locations files into localized tables. For example, you could load GeoIP2-City-Locations-en.csv into a table called geoip2_location-en, and load GeoIP2-City-Locations-zh-CN.csv into a table called geoip2_location-zh. You could then query whichever locations table you needed for English or Chinese respectively.

Query our tables

We can now use our geoip2_location table to resolve the geoname_ids provided by our geoip2_network table. For example:

Here we were only interested in English results, but we can adjust our join condition if we were interested in different or additional languages.

Note how a left outer join is used. This is because additional location information might not be available for any given row of our geoip2_network table. For example, some IP addresses cannot be resolved to a city or subdivision. With the left join we'd still receive latitude, longitude, and accuracy_radius as a result of our query if available, while an inner join would result in zero rows if no additional location information was available.

In addition to the geoname_id column that provides location information for a network, there's also registered_country_geoname_id and represented_country_geoname_id, which provide location information about the country in which the ISP has registered the network, and the country which is represented by users of the IP address, respectively. The location data for both can be included by additional joins:

This page was last updated on July 29, 2021.