Importing GeoIP2 and GeoLite2 databases to PostgreSQL

This guide will show you how to import GeoIP2 or GeoLite2 databases into PostgreSQL so that they can be easily queried and manipulated on your server.

Importing a CSV database consists of downloading the database, extracting it, creating tables to hold the data, and indexing those tables for faster querying.

Download and extract the database

Begin by making sure you have downloaded the latest version of the GeoIP2 or GeoLite2 database that you want to import. You can download databases through your account portal. CSV format databases are shipped as a single zip file. See our CSV database documentation for details about the zip structure and contents of the archive. For the purposes of this tutorial, we will be using the GeoIP2 City CSV file, but you can get information about any of our CSV format databases and adapt the following instructions accordingly.

Keep your database updated

If you are importing databases for use in an ongoing manner, you will want to automate the process of downloading and extracting the CSV files so that you can be sure that your database is always up to date.

Extract the CSV files

Once you have downloaded the database, unpack the zip file in your desired directory. If you are importing the GeoIP2 City database, you will have a number of files. In this tutorial we will be working with the following files:

  • GeoIP2-City-Blocks-IPv4.csv
  • GeoIP2-City-Blocks-IPv6.csv
  • GeoIP2-City-Locations-en.csv

If you are working in a language other than English, you can choose the appropriate Locations file from the zip archive. For example, if you want to load Chinese location names into PostgreSQL, you can use GeoIP2-City-Locations-zh-CN.csv instead of GeoIP2-City-Locations-en.csv. Our documentation on CSV format databases includes a list of all location files currently included in GeoIP2 and GeoLite2 databases.

Create a table for network data

First we create a table to hold the network information contained in GeoIP2-City-Blocks-IPv4.csv and GeoIP2-City-Blocks-IPv6.csv.

Network table schema

You can find the full description of the schema for these files in the section on Blocks files for GeoIP2 and GeoLite2 CSV databases. If you are working with a different database, you can find the Blocks file schema for the appropriate database, and adapt the table to meet that structure.

Please note that we may add additional columns to the CSV version of our GeoIP databases at any time. Ideally, you will design any automated integration to check the columns of the CSV and build your table schema based on the actual columns, or remove columns from the CSV that you don’t need before importing.

The specific method for comparing the columns in the CSV file against the table schema will depend on the language you’re working with and how you wish to handle additional data.

In this tutorial we will be creating a table to hold our network data called geoip2_network. Note that we will be using PostgreSQL’s cidr type for the network column. This will allow us to to query the database more easily.

 1create table geoip2_network (
 2  network cidr not null,
 3  geoname_id int,
 4  registered_country_geoname_id int,
 5  represented_country_geoname_id int,
 6  is_anonymous_proxy bool,
 7  is_satellite_provider bool,
 8  postal_code text,
 9  latitude numeric,
10  longitude numeric,
11  accuracy_radius int,
12  is_anycast bool
13);

Load data into the network table

We can now import the contents of GeoIP2-City-Blocks-IPv4.csv and GeoIP2-City-Blocks-IPv6.csv into the table we just created:

1\copy geoip2_network from 'GeoIP2-City-Blocks-IPv4.csv' with (format csv, header);
1\copy geoip2_network from 'GeoIP2-City-Blocks-IPv6.csv' with (format csv, header);

If there are errors such as missing data for column or extra data after last expected column, alter the table to meet the CSV file structure.

Test our table by querying it

Remember that the table we created uses PostgreSQL’s cidr type for the network column. This built-in type represents IPv4 and IPv6 networks and provides operations to (among other things) determine if a given host address is contained in a network. For example:

1select '192.168.0.0/24'::cidr >>= '192.168.0.1'; -- true
2select '192.168.0.0/24'::cidr >>= '192.168.1.1'; -- false

We can use that to look up the geoip2_network entry for a given IP address like so:

1select * from geoip2_network where network >>= '214.0.0.0';

Which gives a result of:

 1-[ RECORD 1 ]------------------+-------------
 2network                        | 214.0.0.0/16
 3geoname_id                     | 6252001
 4registered_country_geoname_id  | 6252001
 5represented_country_geoname_id |
 6is_anonymous_proxy             | f
 7is_satellite_provider          | f
 8postal_code                    |
 9latitude                       | 37.7510
10longitude                      | -97.8220
11accuracy_radius                | 1000
12is_anycast                     | f
13
14Time: 227.970 ms

Index our table for faster searching

We can now look up an individual IP address’s latitude, longitude, and accuracy_radius. To improve our query speed, we’ll create an index using PostgreSQL’s GiST indexing method. For historical reasons, we also have to specify the inet_ops operator class:

1create index on geoip2_network using gist (network inet_ops);

With that change we observe that the same query runs a lot faster:

1select * from geoip2_network net where network >>= '214.0.0.0';

Which gives a result of:

 1-[ RECORD 1 ]------------------+-------------
 2network                        | 214.0.0.0/16
 3geoname_id                     | 6252001
 4registered_country_geoname_id  | 6252001
 5represented_country_geoname_id |
 6is_anonymous_proxy             | f
 7is_satellite_provider          | f
 8postal_code                    |
 9latitude                       | 37.7510
10longitude                      | -97.8220
11accuracy_radius                | 1000
12is_anycast                     | f
13
14Time: 9.869 ms

You can now geolocate IP addresses using latitude, longitude, and accuracy_radius. For more information about how to understand the accuracy_radius field, see the article on geolocation accuracy our knowledge base.

Optional: Create a table for locations data

If postal_code, latitude, longitude, and accuracy_radius are everything we’re interested in we’d be done at this point and our application would be easily able to query what it needs. However, GeoIP2 databases provide additional location information. Note the geoname_id field in our network table. This field can be used to look up additional information about the geolocation from the Locations files we downloaded before. We’ll load this data into PostgreSQL next.

Locations table schema

We start by creating a table as before. As with the Blocks files, the schema for GeoIP2 and GeoLite2 City locations files can be found in the CSV section of the database documentation.

We’ll name this table geoip2_location:

 1create table geoip2_location (
 2  geoname_id int not null,
 3  locale_code text not null,
 4  continent_code text,
 5  continent_name text,
 6  country_iso_code text,
 7  country_name text,
 8  subdivision_1_iso_code text,
 9  subdivision_1_name text,
10  subdivision_2_iso_code text,
11  subdivision_2_name text,
12  city_name text,
13  metro_code int,
14  time_zone text,
15  is_in_european_union bool not null,
16  primary key (geoname_id, locale_code)
17);

Load data into the locations table

We then populate our geoip2_location table from a Locations CSV file. In this case we will be populating the table from GeoIP2-City-Locations-en.csv. Using the file with the -en suffix will give us the names of geolocations in English:

1\copy geoip2_location from 'GeoIP2-City-Locations-en.csv' with (format csv, header);

Note that there’s a number of different Locations files available. The other files with different language suffixes contain localized versions of the -en data in different languages for some of the geoname_ids. Depending on your application’s needs you may decide to import additional Locations files into localized tables. For example, you could load GeoIP2-City-Locations-en.csv into a table called geoip2_location-en, and load GeoIP2-City-Locations-zh-CN.csv into a table called geoip2_location-zh. You could then query whichever locations table you needed for English or Chinese respectively.

Query our tables

We can now use our geoip2_location table to resolve the geoname_ids provided by our geoip2_network table. For example:

1select latitude, longitude, accuracy_radius, continent_name, country_name, subdivision_1_name, city_name
2from geoip2_network net
3left join geoip2_location location on (
4  net.geoname_id = location.geoname_id
5  and location.locale_code = 'en'
6)
7where network >>= '214.0.0.0';

Which gives a result of:

1-[ RECORD 1 ]------+--------------
2latitude           | 37.7510
3longitude          | -97.8220
4accuracy_radius    | 1000
5continent_name     | North America
6country_name       | United States
7subdivision_1_name |
8city_name          |

Here we were only interested in English results, but we can adjust our join condition if we were interested in different or additional languages.

Note how a left outer join is used. This is because additional location information might not be available for any given row of our geoip2_network table. For example, some IP addresses cannot be resolved to a city or subdivision. With the left join we’d still receive latitude, longitude, and accuracy_radius as a result of our query if available, while an inner join would result in zero rows if no additional location information was available.

In addition to the geoname_id column that provides location information for a network, there’s also registered_country_geoname_id and represented_country_geoname_id, which provide location information about the country in which the ISP has registered the network, and the country which is represented by users of the IP address, respectively. The location data for both can be included by additional joins:

 1select latitude, longitude, accuracy_radius,
 2       location.continent_name as location_continent_name,
 3       location.country_name as location_country_name,
 4       location.subdivision_1_name as location_subdivision_1_name,
 5       location.city_name as location_city_name,
 6       registered_country.continent_name as registered_country_continent_name,
 7       registered_country.country_name as registered_country_country_name,
 8       represented_country.continent_name as represented_country_continent_name,
 9       represented_country.country_name as represented_country_country_name
10from geoip2_network net
11left join geoip2_location location on (
12  net.geoname_id = location.geoname_id
13  and location.locale_code = 'en'
14)
15left join geoip2_location registered_country on (
16  net.registered_country_geoname_id = registered_country.geoname_id
17  and registered_country.locale_code = 'en'
18)
19left join geoip2_location represented_country on (
20  net.represented_country_geoname_id = represented_country.geoname_id
21  and represented_country.locale_code = 'en'
22)
23where network >>= '214.0.0.0';

Which gives a result of:

 1-[ RECORD 1 ]----------------------+--------------
 2latitude                           | 37.7510
 3longitude                          | -97.8220
 4accuracy_radius                    | 1000
 5location_continent_name            | North America
 6location_country_name              | United States
 7location_subdivision_1_name        |
 8location_city_name                 |
 9registered_country_continent_name  | North America
10registered_country_country_name    | United States
11represented_country_continent_name |
12represented_country_country_name   |