Importing Graph from Files

You can use the following instructions to create a graph from the files. This document explains

information about the current branch that includes the functions to load graphs from files
explanation of the functions that enable the creation of graphs from files
the structure of CSV files that load functions as input, do and do not.
A simple source code example to load countries and cities from the files.

User can load graph in two steps

Load Vertices in the first step
Load Edges in the second step

User must create graph and labels before loading data from files

Load Graph functions

Following are the details about the functions to create vertices and edges from the file.

Function load_labels_from_file is used to load vertices from the CSV files.

load_labels_from_file('<graph name>', 
                      '<label name>',
                      '<file path>')

By adding the fourth parameter user can exclude the id field. *** Use this when there is no id field in the file***

load_labels_from_file('<graph name>', 
                      '<label name>',
                      '<file path>', 
                      false)

Function load_edges_from_file can be used to load edges from the CSV file. Please see the file structure in the following.

Note: make sure that ids in the edge file are identical to ones that are in vertices files.

load_edges_from_file('<graph name>',
                    '<label name>',
                    '<file path>');

Explanation about the CSV format

Following is the explanation about the structure for CSV files for vertices and edges.

A CSV file for nodes shall be formatted as following;

field name	Field description
id	it shall be the first column of the file and all values shall be a positive integer. This is an optional field when `id_field_exists` is false. However, it should be present when `id_field_exists` is not set to false.
Properties	all other columns contains the properties for the nodes. Header row shall contain the name of property

Similarly, a CSV file for edges shall be formatted as follows

field name	Field description
start_id	node id of the node from where the edge is stated. This id shall be present in nodes.csv file.
start_vertex_type	class of the node
end_id	end id of the node at which the edge shall be terminated
end_vertex_type	Class of the node
properties	properties of the edge. the header shall contain the property name

Example files can be viewed at regress/age_load/data

Example SQL script

Load and create graph

LOAD 'age';

SET search_path TO ag_catalog;
SELECT create_graph('agload_test_graph');

Create label Country and load vertices from csv file. *** Note this CSV file has id field ***

SELECT create_vlabel('agload_test_graph','Country');
SELECT load_labels_from_file('agload_test_graph',
                             'Country',
                             'age/regress/age_load/data/countries.csv');

Create label City and load vertices from csv file. *** Note this CSV file has id field ***

SELECT create_vlabel('agload_test_graph','City');
SELECT load_labels_from_file('agload_test_graph',
                             'City', 
                             'age/regress/age_load/data/cities.csv');

Create label has_city and load edges from csv file.

SELECT create_elabel('agload_test_graph','has_city');
SELECT load_edges_from_file('agload_test_graph', 'has_city',
     'age/regress/age_load/data/edges.csv');

Check if the graph has been loaded properly

SELECT table_catalog, table_schema, table_name, table_type
FROM information_schema.tables
WHERE table_schema = 'agload_test_graph';

SELECT COUNT(*) FROM agload_test_graph."Country";
SELECT COUNT(*) FROM agload_test_graph."City";
SELECT COUNT(*) FROM agload_test_graph."has_city";

SELECT COUNT(*) FROM cypher('agload_test_graph', $$MATCH(n) RETURN n$$) as (n agtype);
SELECT COUNT(*) FROM cypher('agload_test_graph', $$MATCH (a)-[e]->(b) RETURN e$$) as (n agtype);

Creating vertices without id field in the file.

Create label Country2 and load vertices from csv file. *** Note this CSV file has no id field ***

SELECT create_vlabel('agload_test_graph','Country2');
SELECT load_labels_from_file('agload_test_graph',
                             'Country2',
                             'age/regress/age_load/data/countries.csv', 
                             false);

Create label City2 and load vertices from csv file. *** Note this CSV file has no id field ***

SELECT create_vlabel('agload_test_graph','City2');
SELECT load_labels_from_file('agload_test_graph',
                             'City2',
                             'age/regress/age_load/data/cities.csv', 
                             false);

Check if the graph has been loaded properly and perform difference analysis between ids created automatically and picked from the files.
Labels Country and City were created with id field in the file
Labels Country2 and City2 were created with no id field in the file.

SELECT COUNT(*) FROM agload_test_graph."Country2";
SELECT COUNT(*) FROM agload_test_graph."City2";

SELECT id FROM agload_test_graph."Country" LIMIT 10;
SELECT id FROM agload_test_graph."Country2" LIMIT 10;

SELECT * FROM cypher('agload_test_graph', $$MATCH(n:Country {iso2 : 'BE'})
    RETURN id(n), n.name, n.iso2 $$) as ("id(n)" agtype, "n.name" agtype, "n.iso2" agtype);
SELECT * FROM cypher('agload_test_graph', $$MATCH(n:Country2 {iso2 : 'BE'})
    RETURN id(n), n.name, n.iso2 $$) as ("id(n)" agtype, "n.name" agtype, "n.iso2" agtype);

SELECT * FROM cypher('agload_test_graph', $$MATCH(n:Country {iso2 : 'AT'})
    RETURN id(n), n.name, n.iso2 $$) as ("id(n)" agtype, "n.name" agtype, "n.iso2" agtype);
SELECT * FROM cypher('agload_test_graph', $$MATCH(n:Country2 {iso2 : 'AT'})
    RETURN id(n), n.name, n.iso2 $$) as ("id(n)" agtype, "n.name" agtype, "n.iso2" agtype);

SELECT drop_graph('agload_test_graph', true);