Importing GeoIP2 and GeoLite2 databases to PostgreSQL
This guide will show you how to import GeoIP2 or GeoLite2 databases into PostgreSQL so that they can be easily queried and manipulated on your server.
Importing a CSV database consists of downloading the database, extracting it, creating tables to hold the data, and indexing those tables for faster querying.
Download and extract the database
Begin by making sure you have downloaded the latest version of the GeoIP2 or GeoLite2 database that you want to import. You can download databases through your account portal. CSV format databases are shipped as a single zip file. See our CSV database documentation for details about the zip structure and contents of the archive. For the purposes of this tutorial, we will be using the GeoIP2 City CSV file, but you can get information about any of our CSV format databases and adapt the following instructions accordingly.
Keep your database updated
If you are importing databases for use in an ongoing manner, you will want to automate the process of downloading and extracting the CSV files so that you can be sure that your database is always up to date.
Extract the CSV files
Once you have downloaded the database, unpack the zip file in your desired directory. If you are importing the GeoIP2 City database, you will have a number of files. In this tutorial we will be working with the following files:
GeoIP2-City-Blocks-IPv4.csv
GeoIP2-City-Blocks-IPv6.csv
GeoIP2-City-Locations-en.csv
If you are working in a language other than English, you can choose the
appropriate Locations
file from the zip archive. For example, if you want to
load Chinese location names into PostgreSQL, you can use
GeoIP2-City-Locations-zh-CN.csv
instead of GeoIP2-City-Locations-en.csv
. Our
documentation on CSV format databases includes a
list of all location files currently included in GeoIP2 and GeoLite2 databases.
Create a table for network data
First we create a table to hold the network information contained in
GeoIP2-City-Blocks-IPv4.csv
and GeoIP2-City-Blocks-IPv6.csv
.
Network table schema
You can find the full description of the schema for these files in
the section on Blocks
files for GeoIP2 and GeoLite2 CSV databases.
If you are working with a different database, you can find the Blocks
file
schema for the appropriate database, and adapt the
table to meet that structure.
Please note that we may add additional columns to the CSV version of our GeoIP databases at any time. Ideally, you will design any automated integration to check the columns of the CSV and build your table schema based on the actual columns, or remove columns from the CSV that you don't need before importing.
The specific method for comparing the columns in the CSV file against the table schema will depend on the language you're working with and how you wish to handle additional data.
In this tutorial we will be creating a table to hold our network data called
geoip2_network
. Note that we will be using PostgreSQL's
cidr
type
for the network
column. This will allow us to to query the database more
easily.
Load data into the network table
We can now import the contents of GeoIP2-City-Blocks-IPv4.csv
and
GeoIP2-City-Blocks-IPv6.csv
into the table we just created:
If there are errors such as missing data for column
or
extra data after last expected column
, alter the table to meet the CSV file
structure.
Test our table by querying it
Remember that the table we created uses PostgreSQL's
cidr type for
the network
column. This built-in type represents IPv4 and IPv6 networks and
provides
operations to
(among other things) determine if a given host address is contained in a
network. For example:
We can use that to look up the geoip2_network
entry for a given IP address
like so:
Which gives a result of:
Index our table for faster searching
We can now look up an individual IP address's latitude
, longitude
, and
accuracy_radius
. To improve our query speed, we'll create an index using
PostgreSQL's
GiST indexing
method. For historical reasons, we also have to specify the
inet_ops
operator class:
With that change we observe that the same query runs a lot faster:
Which gives a result of:
You can now geolocate IP addresses using latitude
, longitude
, and
accuracy_radius
. For more information about how to understand the
accuracy_radius
field, see the
article on geolocation accuracy our knowledge base.
Optional: Create a table for locations data
If postal_code
, latitude
, longitude
, and accuracy_radius
are everything
we're interested in we'd be done at this point and our application would be
easily able to query what it needs. However, GeoIP2 databases provide additional
location information. Note the geoname_id
field in our network table. This
field can be used to look up additional information about the geolocation from
the Locations
files we downloaded before. We'll load this data into PostgreSQL
next.
Locations table schema
We start by creating a table as before. As with the Blocks
files, the schema
for GeoIP2 and GeoLite2 City locations files can be found in the
CSV section of the database documentation.
We'll name this table geoip2_location
:
Load data into the locations table
We then populate our geoip2_location
table from a Locations
CSV file. In
this case we will be populating the table from GeoIP2-City-Locations-en.csv
.
Using the file with the -en
suffix will give us the names of geolocations in
English:
Note that there's a number of different Locations
files available. The other
files with different language suffixes contain localized versions of the -en
data in different languages for some of the geoname_id
s. Depending on your
application's needs you may decide to import additional Locations
files into
localized tables. For example, you could load GeoIP2-City-Locations-en.csv
into a table called geoip2_location-en
, and load
GeoIP2-City-Locations-zh-CN.csv
into a table called geoip2_location-zh
. You
could then query whichever locations table you needed for English or Chinese
respectively.
Query our tables
We can now use our geoip2_location
table to resolve the geoname_id
s provided
by our geoip2_network
table. For example:
Which gives a result of:
Here we were only interested in English results, but we can adjust our join condition if we were interested in different or additional languages.
Note how a left outer join is used. This is because additional location
information might not be available for any given row of our geoip2_network
table. For example, some IP addresses cannot be resolved to a city or
subdivision. With the left join we'd still receive latitude
, longitude
, and
accuracy_radius
as a result of our query if available, while an inner join
would result in zero rows if no additional location information was available.
In addition to the geoname_id
column that provides location information for a
network, there's also registered_country_geoname_id
and
represented_country_geoname_id
, which provide location information about the
country in which the ISP has registered the network, and the country which is
represented by users of the IP address, respectively. The location data for both
can be included by additional joins:
Which gives a result of:
This page was last updated on September 12, 2024.