Copyright 2023 Mark Rotteveel and all contributing authors under the Apache 2.0 license.

1. Introduction

The “Firebird External Table Generator” or ext-table-gen is a commandline tool to convert RFC 4180 CSV files and CSV files with custom formats to Firebird external tables (a binary format for Firebird external table data).

External tables are a good way for bulk-loading data into Firebird. Unfortunately, external tables use a fixed-width binary format[1], and not a (more) standard format like CSV. It is not always easy to create an appropriate external table file, and this is where ext-table-gen can be used.

At a high level, ext-table-gen provides the following features:

  • Derive a CHAR-based external table definition (i.e. a CREATE TABLE statement and a configuration for ext-table-gen) from a CSV file

  • Transform a CSV file to an external table file (either based on the CSV file itself, or based on a configuration file)

By default, ext-table-gen derives tables with only CHAR columns. This makes the generated file essentially a fixed-width text format. However, you can modify the configuration file to make ext-table-gen generate columns of different types, in which case the file becomes a binary format. ext-table-gen supports the following types of columns:

  • String types:

    • char

  • Integral number types:

    • smallint

    • integer

    • bigint

    • int128

  • Exact numeric types (fixed point):

    • numeric

    • decimal

  • Datetime types:

    • date

    • time (without time zone)

    • timestamp (without time zone)

This user manual provides examples of use, and serves as reference documentation for the commandline options and XML format.

The project itself is hosted on GitHub: https://github.com/mrotteveel/ext-table-gen.

2. Installation

2.1. Requirements

The “Firebird External Table Generator” or ext-table-gen requires Java 17. We have not tested it on higher versions, but expect it to work on higher versions as well. We will only fix issues reported against Java versions which are in support according to the OpenJDK project, at the time of writing this — August 2023, that means Java 17 and Java 20.

2.2. Distribution package

The ext-table-gen is distributed in a zip:

  • ext-table-gen-2.0.zip — for Linux, macOS and Windows

The layout of the archive is as follows:

  • ext-table-gen-2.0 — root directory with application files

    • bin — contains executable files ext-table-gen (sh script) and ext-table-gen.bat (Windows Batch script)

    • docs — contains documentation, including this manual

    • lib — contains the application code and other Java dependencies

2.3. Installing

We gloss over describing OS-specific configuration like installing Java, configuring the PATH or JAVA_HOME, etc. Improvements are always welcome through pull requests.

2.3.1. Extract archive

Use your favourite zip or unzip tool to extract the archive to a user-accessible directory.

Windows

When copying or extracting to C:\Program Files\ or C:\Program Files (x86)\, be aware that the archive tool must be run as administrator. You can also use Explorer to open ZIP files, and copy or extract the ext-table-gen-2.0 directory from the zip.

Linux/macOS

It may be necessary to make bin/ext-table-gen explicitly executable with chmod +x bin/ext-table-gen.

2.3.2. Setting environment variables

To be able to call ext-table-gen, Java 17 or higher must be on the PATH (the bin directory of the Java installation), or alternatively, you need to set JAVA_HOME to point to the Java 17 installation directory (not its bin directory, but the directory which contains the bin directory).

If you don’t have Java installed, install it using your package manager, sdkman (Linux/macOS), or download it, for example from Eclipse Adoptium.

If you want to call ext-table-gen without specifying the full path, you need to add its bin directory to the PATH.

For information on setting environment variables, see What are PATH and other environment variables, and how can I set or use them?

2.3.3. Verify installation

To verify if ext-table-gen is installed and can be run, run ext-table-gen from the shell terminal or Command Prompt or Powershell. If you added ext-table-gen to the PATH, you can try from any directory, otherwise go to the bin directory of ext-table-gen.

> ext-table-gen
Missing option(s): --csv-file=CSV, --table-file=FILE, or --config-in=FILE
Usage: ext-table-gen ...

Above output of the usage instruction was truncated for brevity.

If you get another error than “Missing option(s): …​”, the installation is not correct.

2.4. Firebird configuration

The default Firebird configuration disallows use of external tables, or more specifically, it doesn’t allow Firebird to read or write external table files from any location.

To enable access, you need to configure the ExternalFileAccess in firebird.conf (globally) or in — Firebird 3.0 and higher — databases.conf (per database). The ExternalFileAccess setting accepts Restrict followed by a semicolon separated list of paths where Firebird can read and write external table files.

The user running the Firebird database server needs to have read and write access to these directories. The user(s) who need to be able to create external table files using ext-table-gen also need read and write access. On Linux, there is generally a common group (e.g. firebird) which can be assigned these rights, or consider creating a separate group (e.g. fb_external_tables) and assign the user running Firebird (e.g. user firebird) and the users creating external tables to this group.

Security risks of external tables

Enabling ExternalFileAccess introduces security risks if not managed carefully. All files in the directories (and subdirectories) listed in ExternalFileAccess which are readable by the Firebird process can be read this way, and files which are writable can be modified (append only), and files can be created in directories which are writable.

If you have a compromised database or Firebird server, it would be possible to exfiltrate the contents of those files with a carefully crafted external table (or tables), and risks files that are writable by the Firebird server user to be modified (only by appending), or additional files to be introduced on your filesystem in directories where the server has write access.

To reduce these risks, follow these guidelines

  1. The directories added to ExternalFileAccess should be exclusively used for external tables, and not contain any other files (including database files and backups).

  2. Where possible, use database-specific folders, configured in databases.conf for that specific database.

  3. Do not use a directory which is a subdirectory of any directory listed in DatabaseAccess.

    Be aware that DatabaseAccess defaults to Full, and you should restrict it to specific directories, or even None to only allow access to databases through aliases defined in databases.conf.

  4. Never configure ExternalFileAccess as Full, as this can expose all files readable by the Firebird server, modify existing files or create new files anywhere the server user has access (see also above).

For example, set it to

ExternalFileAccess through firebird.conf
DatabaseAccess = Restrict C:\FirebirdData\databases
# or even
# DatabaseAccess = None

ExternalFileAccess = Restrict C:\FirebirdData\exttables

Or for restricting it per database:

No ExternalFileAccess through firebird.conf
DatabaseAccess = Restrict C:\FirebirdData\databases
# or even
# DatabaseAccess = None

ExternalFileAccess = None
ExternalFileAccess through databases.conf, per database
db_one = C:\FirebirdData\databases\db_one.fdb
{
    ExternalFileAccess = Restrict C:\FirebirdData\exttables\db_one
}

db_two = C:\FirebirdData\databases\db_two.fdb
{
    ExternalFileAccess = Restrict C:\FirebirdData\exttables\db_two
}

3. Usage Examples

This chapter illustrates how to use ext-table-gen by showing basic usage examples.

The author primarily uses Windows, so most examples are based on Windows. In most cases, the only relevant differences are file paths.

Examples may use the Windows Command Prompt line continuation ^. For PowerShell, use `. For (Linux) shells, use \.

3.1. General notes

ext-table-gen by defaults reads CSV files in RFC 4180 format, with or without a header line. The commandline and XML file support configuring a different quote character and separator for the RFC 4180 parser (com.opencsv.RFC4180Parser), and configuring a different parser with more options (com.opencsv.CSVParser). See Customizing the CSV Format for more information.

CSV files are assumed to be encoded in UTF-8. The character set can be explicitly configured using --csv-charset=CHARSET, where CHARSET is a Java character set name (e.g. --csv-charset=windows-1252).

The external table definition derived from a CSV file uses only CHAR columns, and the default character set is ISO8859_1. This can be overridden by specifying --column-encoding=ENCODING, where ENCODING is a Firebird character set name (e.g. column-encoding=WIN1252). Alternatively, an existing configuration file can be edited to change the column encoding. The configuration file allows you to configure the character set per column, and also allows you to use data types other than CHAR.

Given the overhead of using UTF8 columns (e.g. a CHAR(1000) CHARACTER SET UTF8 column will occupy 4000 bytes in an external table), we recommend using a single-byte character set where possible, and only defining columns as UTF8 when absolutely necessary.

Naming external table files

There is no common naming convention for external table files. In this manual, we use the dat extension, as that is a generally accepted extension for opaque data.

3.2. Generating an external table from CSV

The following demonstrates how you can use ext-table-gen to derive an external table definition and generate an external table file matching that definition.

This example uses a very simple CSV file, shown below.

Example CSV file, persons.csv
ID,Lastname,Firstname,Email
1,Doe,John,john@example.org
2,Doe,Jane,jane@example.org
3,Deer,Jason,jason@example.org
4,Deer,Jillian,jillian@example.org

The minimum necessary commandline to create an external table file is to use:

ext-table-gen --csv-file=persons.csv --table-file=persons.dat

However, this will not show you the external table that was derived from the CSV file, so it is not very useful.

Instead, also specify the --config-out option to create a configuration file:

ext-table-gen --csv-file=C:\FirebirdData\csv\persons.csv ^
  --table-file=C:\FirebirdData\exttables\persons.dat ^
  --config-out=C:\FirebirdData\csv\persons.xml

We’re specifying absolute paths so the output of running this command matches the XML file shown below. It is possible to use relative paths, but those will also appear as relative paths in the configuration file.

Above command will create the following configuration file:

persons.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<extTableGenConfig xmlns="https://www.lawinegevaar.nl/xsd/ext-table-gen-1.0.xsd" schemaVersion="2.0">
    <externalTable name="DEFAULT_EXTERNAL_TABLE_NAME" byteOrder="LITTLE_ENDIAN">
        <columns>
            <column name="ID">
                <char length="1" encoding="ISO8859_1"/>
            </column>
            <column name="Lastname">
                <char length="4" encoding="ISO8859_1"/>
            </column>
            <column name="Firstname">
                <char length="7" encoding="ISO8859_1"/>
            </column>
            <column name="Email">
                <char length="19" encoding="ISO8859_1"/>
            </column>
            <endColumn type="LF"/>
        </columns>
        <tableFile path="C:\FirebirdData\exttables\persons.dat" overwrite="false"/>
    </externalTable>
    <tableDerivation columnEncoding="ISO8859_1" endColumnType="LF"/>
    <csvFile path="C:\FirebirdData\csv\persons.csv" charset="UTF-8" headerRow="true">
        <rfc4180CsvParser/>
    </csvFile>
    <informational>
        <ddl>create table "DEFAULT_EXTERNAL_TABLE_NAME" external file 'C:\FirebirdData\exttables\persons.dat' (
  "ID" char(1) character set ISO8859_1,
  "Lastname" char(4) character set ISO8859_1,
  "Firstname" char(7) character set ISO8859_1,
  "Email" char(19) character set ISO8859_1,
  "LF" char(1) character set ASCII default _ASCII x'0a'
);
</ddl>
    </informational>
</extTableGenConfig>

This configuration file shows the definition of the external table and its columns, including length and character set. By default, an extra column, LF, is added with a linefeed to make the generated external table file somewhat human-readable. This can be configured with --end-column={LF|CRLF|NONE}.

It also shows the DDL necessary to create the external table matching the definition. Given we didn’t specify an explicit table name (--table-name=NAME), the name DEFAULT_EXTERNAL_TABLE_NAME is used.

ext-table-gen also created the following persons.dat

persons.dat
1Doe John   john@example.org   
2Doe Jane   jane@example.org   
3DeerJason  jason@example.org  
4DeerJillianjillian@example.org

By default, ext-table-gen will not overwrite existing external table files or configuration files. This needs to be explicitly enabled with --overwrite-table-file and/or --overwrite-config.

3.3. Using configuration

Generating an external table file using the instructions of the previous section, Generating an external table from CSV, is simple and easy, but has some limitations and downsides:

  1. The datatype is always CHAR, while using a different data type may be more efficient (both from a storage perspective, and in loading it into Firebird)

  2. The maximum lengths of columns is derived from the CSV data. If you want to reuse the external table without dropping and recreating it, with data files derived from CSV files with the same layout, but columns have longer (or shorter!) maximum length, this will not work.

    For example, we want to process a persons.csv file where the ID column has two or more digits, or someone has a lastname Rotteveel, or everyone has an email address shorter than 19 characters, then Firebird will read the file incorrectly using the previous definition.

  3. All columns will use the same character set. For Western Europe and North America, using ISO8859_1 is generally sufficient, but if you have persons from — for example — Eastern Europe or China and their names use characters from their region, those characters will be replaced by a ?. Using UTF8 would be a catch-all solution, but given its downsides (wider columns, especially if a lot of data is Latin-1 or even plain ASCII), we recommend to only use it if you really need it, and only for those columns that need it.

These limitations can be addressed by modifying the configuration file and using it for subsequent calls to ext-table-gen.

We’ll demonstrate this separately for length, character set, and using a different column type.

3.3.1. Changing column lengths

The length of a column can be modified by editing the length attribute of the char element of a column.

For example, from length 1

<column name="ID">
    <char length="1" encoding="ISO8859_1"/>
</column>

to length 3

<column name="ID">
    <char length="3" encoding="ISO8859_1"/>
</column>

Below, we modified the XML generated in section Generating an external table from CSV, slightly increasing the lengths of each column[2].

Modified persons.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<extTableGenConfig xmlns="https://www.lawinegevaar.nl/xsd/ext-table-gen-1.0.xsd" schemaVersion="2.0">
    <externalTable name="DEFAULT_EXTERNAL_TABLE_NAME" byteOrder="LITTLE_ENDIAN">
        <columns>
            <column name="ID">
                <char length="3" encoding="ISO8859_1"/>
            </column>
            <column name="Lastname">
                <char length="10" encoding="ISO8859_1"/>
            </column>
            <column name="Firstname">
                <char length="10" encoding="ISO8859_1"/>
            </column>
            <column name="Email">
                <char length="25" encoding="ISO8859_1"/>
            </column>
            <endColumn type="LF"/>
        </columns>
        <tableFile path="C:\FirebirdData\exttables\persons.dat" overwrite="true"/>
    </externalTable>
    <tableDerivation columnEncoding="ISO8859_1" endColumnType="LF"/>
    <csvFile path="C:\FirebirdData\csv\persons.csv" charset="UTF-8" headerRow="true">
        <rfc4180CsvParser/>
    </csvFile>
    <informational>
        <ddl>create table "DEFAULT_EXTERNAL_TABLE_NAME" external file 'C:\FirebirdData\exttables\persons.dat' (
  "ID" char(1) character set ISO8859_1,
  "Lastname" char(4) character set ISO8859_1,
  "Firstname" char(7) character set ISO8859_1,
  "Email" char(19) character set ISO8859_1,
  "LF" char(1) character set ASCII default _ASCII x'0a'
);
</ddl>
    </informational>
</extTableGenConfig>

We also set the attribute overwrite to true of tableFile to simplify the next commandline. The --overwrite-table-file option only works if --table-file is also specified, it doesn’t override the overwrite config for the file specified in the XML. We didn’t change the DDL, as we’ll demonstrate later how to regenerate it.

To use the configuration file, use:

ext-table-gen --config-in=C:\FirebirdData\csv\persons.xml

With the same persons.csv as the original CSV file, the persons.dat is now:

persons.dat with wider columns
1  Doe       John      john@example.org         
2  Doe       Jane      jane@example.org         
3  Deer      Jason     jason@example.org        
4  Deer      Jillian   jillian@example.org      

Compared to the original persons.dat, you’ll notice that the columns are wider.

Keep in mind that external table files are a binary data format. The examples up-to-now use only CHAR columns with the same character set, so these examples effectively behave as a fixed width text format.

This changes once we start using multiple character sets, or other data types.

3.3.2. Regenerating the configuration file

In the previous section, Changing column lengths, we modified the configuration XML, but didn’t touch the DDL. We can of course modify the DDL manually, but we can also regenerate the configuration XML using --config-out. Be aware, the external table file specified in the configuration file will also be recreated as part of this command.

For example, building on the previous example:

ext-table-gen --config-in=C:\FirebirdData\csv\persons.xml ^
  --config-out=C:\FirebirdData\csv\persons.xml --overwrite-config

Here we specify the same file for --config-in and --config-out, so we need to specify --overwrite-config, otherwise the file is not overwritten.

The updated persons.xml is now:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<extTableGenConfig xmlns="https://www.lawinegevaar.nl/xsd/ext-table-gen-1.0.xsd" schemaVersion="2.0">
    <externalTable name="DEFAULT_EXTERNAL_TABLE_NAME" byteOrder="LITTLE_ENDIAN">
        <columns>
            <column name="ID">
                <char length="3" encoding="ISO8859_1"/>
            </column>
            <column name="Lastname">
                <char length="10" encoding="ISO8859_1"/>
            </column>
            <column name="Firstname">
                <char length="10" encoding="ISO8859_1"/>
            </column>
            <column name="Email">
                <char length="25" encoding="ISO8859_1"/>
            </column>
            <endColumn type="LF"/>
        </columns>
        <tableFile path="C:\FirebirdData\exttables\persons.dat" overwrite="true"/>
    </externalTable>
    <tableDerivation columnEncoding="ISO8859_1" endColumnType="LF"/>
    <csvFile path="C:\FirebirdData\csv\persons.csv" charset="UTF-8" headerRow="true">
        <rfc4180CsvParser/>
    </csvFile>
    <informational>
        <ddl>create table "DEFAULT_EXTERNAL_TABLE_NAME" external file 'C:\FirebirdData\exttables\persons.dat' (
  "ID" char(3) character set ISO8859_1,
  "Lastname" char(10) character set ISO8859_1,
  "Firstname" char(10) character set ISO8859_1,
  "Email" char(25) character set ISO8859_1,
  "LF" char(1) character set ASCII default _ASCII x'0a'
);
</ddl>
    </informational>
</extTableGenConfig>

As you can see the DDL is now updated to match the definition in externalTable.

3.3.3. Changing column character set

Taking the initial example of Generating an external table from CSV, say we also need to import someone with the Czech name Eliška[3]:

persons.csv with Czech name
ID,Lastname,Firstname,Email
1,Doe,John,john@example.org
2,Doe,Jane,jane@example.org
3,Deer,Jason,jason@example.org
4,Deer,Jillian,jillian@example.org
5,Deer,Eliška,eliska@example.org

If we try to generate the external table file with the original configuration:

ext-table-gen --config-in=C:\FirebirdData\csv\persons.xml

We now get:

persons.dat with replacement character
1Doe John   john@example.org   
2Doe Jane   jane@example.org   
3DeerJason  jason@example.org  
4DeerJillianjillian@example.org
5DeerEli?ka eliska@example.org 

As you can see, the š has been replaced by a ?[4], because š cannot be encoded in ISO-8859-1. To be able to import the right character, we need to use a different encoding. In our simplified example, we could change the column to ISO8859_2 or WIN1250 (which supports Czech codepoints like š), but we’ll use UTF8 to also demonstrate the effects of using a multibyte character set.

In our example, we’ll only modify Firstname to use UTF8:

persons.xml modified for UTF8
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<extTableGenConfig xmlns="https://www.lawinegevaar.nl/xsd/ext-table-gen-1.0.xsd" schemaVersion="2.0">
    <externalTable name="DEFAULT_EXTERNAL_TABLE_NAME" byteOrder="LITTLE_ENDIAN">
        <columns>
            <column name="ID">
                <char length="1" encoding="ISO8859_1"/>
            </column>
            <column name="Lastname">
                <char length="4" encoding="ISO8859_1"/>
            </column>
            <column name="Firstname">
                <char length="7" encoding="UTF8"/>
            </column>
            <column name="Email">
                <char length="19" encoding="ISO8859_1"/>
            </column>
            <endColumn type="LF"/>
        </columns>
        <tableFile path="C:\FirebirdData\exttables\persons.dat" overwrite="true"/>
    </externalTable>
    <tableDerivation columnEncoding="ISO8859_1" endColumnType="LF"/>
    <csvFile path="C:\FirebirdData\csv\persons.csv" charset="UTF-8" headerRow="true">
        <rfc4180CsvParser/>
    </csvFile>
    <informational>
        <ddl>create table "DEFAULT_EXTERNAL_TABLE_NAME" external file 'C:\FirebirdData\exttables\persons.dat' (
  "ID" char(1) character set ISO8859_1,
  "Lastname" char(4) character set ISO8859_1,
  "Firstname" char(7) character set ISO8859_1,
  "Email" char(19) character set ISO8859_1,
  "LF" char(1) character set ASCII default _ASCII x'0a'
);
</ddl>
    </informational>
</extTableGenConfig>

Running:

ext-table-gen --config-in=C:\FirebirdData\csv\persons.xml

We now get:

persons.dat
1Doe John                        john@example.org   
2Doe Jane                        jane@example.org   
3DeerJason                       jason@example.org  
4DeerJillian                     jillian@example.org
5DeerEliška                     eliska@example.org 

As you can see, for the first four rows, the Firstname column now seems to occupy 28 characters, while in the fifth row it only seems to occupy 27 characters. This is because the external table file is a fixed-width binary format, not a fixed-width text format.

The Firstname column is now defined as <char length="7" encoding="UTF8"/>, and in UTF-8, š is encoded in two bytes. UTF-8 encodes characters in 1 to 4 bytes, and Firebird requires the column to be 4 * 7 = 28 bytes (that is, the maximum possible length), and superfluous bytes need to be populated with byte 0x20 (space character).

The effect is that Jillian (which is 7 characters and 7 bytes), is followed by 21 spaces (total: 28 bytes), while Eliška (6 characters, but 7 bytes) is also followed by 21 spaces (total: 28 bytes).

If you try to trick the server, by manually editing the contents of such a column to have more than 7 characters (upto the length of 28 bytes), it will not work and Firebird will report a truncation error for values longer than 7 characters.

Again, it is important to keep in mind, that by using two different and incompatible character sets in the external table file, it is now truly a binary file. In our example it just happens to work OK when shown as text, because all other characters are in ASCII, which is a subset of both ISO-8859-1 and UTF-8. If some other column had ISO-8859-1 characters beyond 0x7f, they would have been rendered as Unicode replacement characters (because the AsciiDoctor tool used to create this manual also reads our example data as UTF-8).

3.3.4. Using a different column data type

The default configuration file derived from a CSV file only uses CHAR columns. It can be more efficient — both in terms of storage space and for performance of loading the data into Firebird — to use a different datatype.

In this example, we’ll change the data type of the ID column from a CHAR(1)

<column name="ID">
    <char length="1" encoding="ISO8859_1"/>
</column>

to a SMALLINT

<column name="ID">
    <smallint/>
</column>

For a full list of possible data types and their attributes, see datatype.

If you change the data type to anything other than char, the generated file becomes truly binary. Having an end-column makes less sense for this type of file, so we recommend removing it.

Taking the original example from Generating an external table from CSV, modifying it like above, and then regenerating the file as described in Regenerating the configuration file results in the following file:

persons.xml with ID changed to smallint, and no end-column
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<extTableGenConfig xmlns="https://www.lawinegevaar.nl/xsd/ext-table-gen-1.0.xsd" schemaVersion="2.0">
    <externalTable name="DEFAULT_EXTERNAL_TABLE_NAME" byteOrder="LITTLE_ENDIAN">
        <columns>
            <column name="ID">
                <smallint/>
            </column>
            <column name="Lastname">
                <char length="4" encoding="ISO8859_1"/>
            </column>
            <column name="Firstname">
                <char length="7" encoding="ISO8859_1"/>
            </column>
            <column name="Email">
                <char length="19" encoding="ISO8859_1"/>
            </column>
        </columns>
        <tableFile path="C:\FirebirdData\exttables\persons.dat" overwrite="false"/>
    </externalTable>
    <tableDerivation columnEncoding="ISO8859_1" endColumnType="LF"/>
    <csvFile path="C:\FirebirdData\csv\persons.csv" charset="UTF-8" headerRow="true">
        <rfc4180CsvParser/>
    </csvFile>
    <informational>
        <ddl>create table "DEFAULT_EXTERNAL_TABLE_NAME" external file 'C:\FirebirdData\exttables\persons.dat' (
  "ID" smallint,
  "Lastname" char(4) character set ISO8859_1,
  "Firstname" char(7) character set ISO8859_1,
  "Email" char(19) character set ISO8859_1
);
</ddl>
    </informational>
</extTableGenConfig>

For data types other than char the byteOrder is important for determining how to write out data. Absence of byteOrder will use AUTO, but when writing the configuration file, it will report the byte-order of the system that created the file. Specify byteOrder="AUTO" to use the endianness of the system where ext-table-gen is run. However, if you generate the table file on a little-endian machine, while your Firebird server runs on a big-endian machine (or vice-versa), you’ll need to set the byteOrder to match the endianness of the Firebird server.

See externalTable for more information on byteOrder.

3.3.5. Using a different data type conversion

Each data type has a default conversion from the CSV string value to a Firebird data type value. This default conversion is not always suitable. For example, a CSV with hexadecimal string values for integers cannot be read as an integer, because the default conversion uses radix 10.

To address this, the data type elements in the configuration have an optional element, converter, to specify a conversion using a converter step. We’ll provide examples for each converter step below.

parseBigDecimal

The converter step parseBigDecimal can be used to specify a locale for converter CSV values to exact numeric values.

For example, to convert a number with . as the grouping symbol and , as the decimal separator (e.g. "12.345,67"), use the following data type definition:

<decimal precision="9" scale="2">
    <converter>
        <parseBigDecimal locale="nl-NL"/>
    </converter>
</decimal>

The allowed range of values is restricted by the enclosing data type and its precision (or more precisely, the backing datatype derived from the precision).

parseDatetime

The converter step parseDatetime can be used to specify a different pattern and locale for converting CSV values to datetime values.

For example, to convert a date in d MMMM yyyy format in Dutch (e.g. 15 juli 2023), use the following data type definition:

<date>
    <converter>
        <parseDatetime pattern="d MMMM yyyy" locale="nl-NL"/>
    </converter>
</date>
parseIntegralNumber

The converter step parseIntegralNumber can be used to specify a different radix for converting CSV values to integral numbers.

For example, to convert integer values in radix 16 instead of radix 10, use the following data type definition:

<integer>
    <converter>
        <parseIntegralNumber radix="16"/>
    </converter>
</integer>

The allowed range of values is restricted by the enclosing data type.

3.4. Recommended workflow

The previous examples are all that — we think — you’ll need to create usable external table files.

For one-shot imports, the second example from Generating an external table from CSV can be sufficient. You may need to use --column-encoding if you need something other than ISO8859_1 for the external table columns, and --csv-charset if the CSV file is not encoded in ASCII or UTF-8.

For repeated imports, or where multiple character sets or other data types are needed, we recommend the following steps:

  1. Generate a config file using Generating an external table from CSV,

  2. Increase columns lengths to expected maximums (see Changing column lengths),

  3. If needed, change the encoding of columns (see Changing column character set),

  4. Determine if there are more appropriate data types for columns and change the type if needed (see Using a different column data type),

  5. Determine if a custom converter is needed (see Using a different data type conversion),

  6. Finally, regenerate the config file (see Regenerating the configuration file) for up-to-date DDL.

Use the DDL from the last step to create the external table, and use the external table file it generated to verify the definition by querying the external table from Firebird.

From that point on, the configuration file can be used to transform a CSV file to an external table file.

For example, assuming the tableFile in the configuration has attribute overwrite set to true, and you always want to read the configured file from csvFile:

ext-table-gen --config-in=C:\FirebirdData\csv\persons.xml

If a configuration contains relative paths, they are resolved against the current working directory.

If you need to import CSV files with a different name, you can use:

ext-table-gen --config-in=C:\FirebirdData\csv\persons.xml ^
  --csv-file=C:\FirebirdData\csv\persons-20230622.csv

If a CSV file changes incompatibly (e.g. too wide columns, or different number of columns) or has inconsistent column counts, the import will fail, but the external table file will have been overwritten with valid row data before the failing row.

3.5. Customizing the CSV Format

By default, ext-table-gen reads CSV files in RFC 4180 format.

It is possible to configure the RFC 4180 parser with the following options:

The CHAR option accepts any single character value, a character mnemonic, or a Unicode escape.

You can also select a different CSV parser (default is RFC_4180, or the parser specified in --config-in):

The custom parser supports more options:

For example, to import a custom CSV with single quotes, tab as a separator and # as the escape symbol.

On the commandline:

ext-table-gen --csv-file=persons.csv --table-file=persons.dat ^
  --csv-parser=CUSTOM --csv-quote-char=APOS
  --csv-separator=TAB --csv-escape-char=#

In the configuration file:

<csvFile path="C:\FirebirdData\csv\persons.csv" charset="UTF-8" headerRow="true">
  <customCsvParser quoteChar="APOS" separator="TAB" escapeChar="#"/>
</csvFile>

4. Reference

This chapter describes the various configuration options. First it describes the commandline options, followed by the XML configuration file format.

4.1. Commandline options

ext-table-gen provides a number of commandline configuration options to configure its behaviour. We discuss the options here grouped by functional area.

4.1.1. General options

Help (-h, --help)

The help option (-h or --help) displays the general usage summary of ext-table-gen and then exits. The general usage summary is also printed for command-line errors.

Version (-V, --version)

The version option (-V or --version) prints version information and then exits.

4.1.2. CSV file configuration

CSV file (--csv-file)

The CSV file option (--csv-file=FILE) specifies the path of the CSV file. FILE is the absolute or relative path to the file. Relative paths are resolved against the current working directory. By default, the file will be parsed as RFC 4180 format, but this can be configured with the CSV parser (--csv-parser) and other commandline options.

We recommend specifying absolute paths, as the configuration file will store the path as specified, which means that if a relative path is used, and the configuration file is then used from a different location, it may attempt to read a non-existent or wrong file.

This option can override the XML config of /extTableGenConfig/csvFile[@path].

CSV character set (--csv-charset)

The CSV character set option (--csv-charset=CHARSET) specifies the character set of the CSV file. CHARSET is any valid Java character set name or alias. The default is UTF-8.

This option can override the XML config of /extTableGenConfig/csvFile[@charset] if explicitly specified.

CSV header (--[no-]csv-header)

The CSV header boolean option (--[no-]csv-header) specifies if the first row of the CSV file is a header (--csv-header, the default), or data (--no-csv-header). The default is --csv-header.

If the first row is a header, and the external table is derived, the values in the first row are used as column names. If an existing external table configuration is used, the header row is skipped.

If the first row is not a header, and the external table is derived, columns are assigned a name in the form of COLUMN_idx, where idx is the 1-based index of the column (first is COLUMN_1, second is COLUMN_2, etc.).

This option can override the XML config of /extTableGenConfig/csvFile[@headerRow] if explicitly specified.

CSV parser (--csv-parser)

The CSV parser option (--csv-parser={RFC_4180|CUSTOM}) configures the CSV parser to use. The default is RFC_4180.

Possible options are:

RFC_4180

selects RFC 4180 parser (implementation: com.opencsv.RFC4180Parser)

CUSTOM

selects the “custom” parser (implementation: com.opencsv.CSVParser)

This option can override the XML config of /extTableGenConfig/csvFile/customCsvParser or /extTableGenConfig/csvFile/rfc4180CsvParser if explicitly specified. Applicable configuration properties from the overridden parser will be retained unless explicitly specified on the commandline.

CSV Quote Character (--csv-quote-char)

The CSV quote character option (--csv-parser=CHAR) configures the quote character of the CSV parser. The default is double quote (").

CHAR can be a single character, a character mnemonic or a Unicode escape.

For the RFC 4180 parser, the quote character is used to quote values (e.g. "value"value), or to escape itself (e.g. "quote ""in"" value"quote "in" value). For the “custom” parser, the quote character is only used to quote values.

This option can override the XML config of /extTableGenConfig/csvFile/customCsvParser[@quoteChar] or /extTableGenConfig/csvFile/rfc4180CsvParser[@quoteChar] if explicitly specified.

CSV Separator (--csv-separator)

The CSV separator option (--csv-separator=CHAR) configures the separator character of the CSV parser. The default is comma (,).

CHAR can be a single character, a character mnemonic or a Unicode escape.

The separator is the character used to separate columns in a row.

This option can override the XML config of /extTableGenConfig/csvFile/customCsvParser[@separator] or /extTableGenConfig/csvFile/rfc4180CsvParser[@separator] if explicitly specified.

CSV Escape Character (--csv-escape-char)

The CSV escape character option (--csv-escape-char=CHAR) configures the escape character of the “custom” CSV parser. The default — for the “custom” parser — is back-slash (\).

CHAR can be a single character, a character mnemonic or a Unicode escape.

This option is ignored for the RFC 4180 parser, as that uses the quote character for escaping.

This option can override the XML config of /extTableGenConfig/csvFile/customCsvParser[@escapeChar] if explicitly specified.

CSV Ignore Leading White Space (--[no-]ignore-leading-white-space)

The CSV ignore leading white space Boolean option (--[no-]ignore-leading-white-space) configures if the “custom” CSV parser ignores leading spaces before a (quoted) value. The default — for the “custom” parser — is true (--ignore-leading-white-space).

This option is ignored for the RFC 4180 parser.

This option can override the XML config of /extTableGenConfig/csvFile/customCsvParser[@ignoreLeadingWhiteSpace] if explicitly specified.

CSV Ignore Quotations (--[no-]ignore-quotations)

The CSV ignore quotations Boolean option (--[no-]ignore-quotations) configures if the “custom” CSV parser ignores quotations. The default — for the “custom” parser — is false (--no-ignore-quotations).

This option is ignored for the RFC 4180 parser.

This option can override the XML config of /extTableGenConfig/csvFile/customCsvParser[@ignoreQuotations] if explicitly specified.

CSV Strict Quotes (--[no-]strict-quotes)

The CSV strict quotes Boolean option (--[no-]strict-quotes) configures the “strict quotes” feature of the “custom” CSV parser (ignore characters outside quotes). The default — for the “custom” parser — is false (--no-strict-quotes).

This option is ignored for the RFC 4180 parser.

This option can override the XML config of /extTableGenConfig/csvFile/customCsvParser[@strictQuotes] if explicitly specified.

4.1.3. External table file configuration

Table file path (--table-file)

The table file path option (--table-file=FILE) specifies the path of the external table file. FILE is the absolute or relative path to the file. Relative paths are resolved against the current working directory.

We recommend specifying absolute paths, as the configuration file will store the path as specified, which means that if a relative path is used, and the configuration file is then used from a different location, it may attempt to write to the wrong location or file.

This option can override the XML config of /extTableGenConfig/externalTable/tableFile[@path]. If specified, the XML config of /extTableGenConfig/externalTable/tableFile[@overwrite] is ignored, and defaults to false if --overwrite-table-file is not specified.

Overwrite table file (--[no-]overwrite-table-file)

The overwrite table file boolean option (--[no-]overwrite-table-file) specifies if an existing external table file can be overwritten (--overwrite-table-file) or not overwritten (--no-overwrite-table-file, the default). The default is --no-overwrite-table-file.

This option can only be specified if --table-file is specified. In other words, it does not override the XML config.

We only allow this override if also specifying the external table file, so you’re aware what file you’re going to overwrite.

4.1.4. Table configuration and derivation

Table derivation options configure how ext-table-gen derives the external table definition.

Apart from --table-name and --byte-order, these options only have effect when a new external table definition is generated (see also Table derivation mode (--table-derivation-mode)).

Table name (--table-name)

The table name option (--table-name=NAME) specifies the name of the external table. NAME is the table name. The value will be converted to a quoted identifier. When no name is specified through --table-name or the XML file, the table name DEFAULT_EXTERNAL_TABLE_NAME is used.

This option can override the XML config of /extTableGenConfig/externalTable[@name] if explicitly specified.

Byte order (--byte-order)

The byte order option (--byte-order=ORDER) specifies the byte order of the external table. ORDER can have value:

BIG_ENDIAN

big-endian

LITTLE_ENDIAN

little-endian

AUTO

automatically select based on the byte order of the runtime platform

The default is the effective value of AUTO (that is, either BIG_ENDIAN or LITTLE_ENDIAN). Explicitly specifying AUTO will store AUTO in the configuration file.

This option can override the XML config of /extTableGenConfig/externalTable[@byteOrder] if explicitly specified.

Column encoding (--column-encoding)

The column encoding option (--column-encoding=ENCODING) specifies the column encoding to use for all columns when deriving a table. ENCODING is the Firebird name of a supported Firebird character set. Default value is ISO8859_1.

This option can override the XML config of /extTableGenConfig/tableDerivation[@columnEncoding] if explicitly specified. It does not override the column encoding of existing columns (unless you use an explicit Table derivation mode (--table-derivation-mode)).

Unsupported character sets

Not all Firebird character sets are supported.

  • OCTETS and NONE are not supported, as we need to know which character set to use to encode string data when writing the external table file, so we require that a “real” character set is specified as the column encoding.

  • The character set UNICODE_FSS is not supported, as we can’t support this variant of UTF-8 with the standard Java character sets.

  • Other multibyte character sets (SJIS_0208, EUCJ_0208, KSC_5601, BIG_5, GB_2312, GBK, CP943C, GB18030) are not supported as we’re not sure how to handle padding for these character sets (support may be added in the future after we’ve experimented with this).

  • Character sets which have no equivalent Java character set are also not supported (e.g. NEXT, CYRL).

End column type (--end-column)

The end column type option (--end-column=TYPE) specifies if an extra column is added with a linebreak, and if so, what linebreak. TYPE can have value:

LF

linefeed, a.k.a. LF, \n, or 0x0a

CRLF

carriage return + linefeed, a.k.a CR/LF, \r\n, or 0x0d0a

NONE

no end column

The default is LF.

An end column is not necessary, but can improve the “human readability” of the external table file by starting each row on a new line. If this is no concern for you, use --end-column=NONE to reduce file size and overhead of writing and reading this column.

This option can override the XML config of /extTableGenConfig/tableDerivation[@endColumnType] if explicitly specified. It does not override the current end-column of /extTableGenConfig/extTable/columns/endColumn (unless you use an explicit Table derivation mode (--table-derivation-mode)).

Table derivation mode (--table-derivation-mode)

The table derivation mode option (--table-derivation-mode=MODE) specifies when an external table definition should be derived from a CSV file. MODE can have value:

INCOMPLETE

Only derive the external table when the current configuration is incomplete.

ALWAYS

Always derive the external table, even if the current configuration defines a valid external table.

NEVER

Never derive the external table, always use the current configuration. Fails if the current configuration is incomplete or invalid.

Specifying NEVER on the commandline is technically possible, but rarely makes sense, except maybe to make the behaviour explicit.

The formal default is INCOMPLETE, but this is only applied when --config-in is not specified, and without an existing configuration, INCOMPLETE and ALWAYS behave the same.

When --config-in is specified, the default is NEVER. This default was chosen to avoid accidentally deriving a new table configuration and creating an external table file which does not match the existing external table used by Firebird server (e.g. because the config file was modified incorrectly, or because ext-table-gen incorrectly or due to implementation changes suddenly considers the configuration as incomplete or invalid). The assumption is that if you have an existing configuration file, it is complete and correct. If that is not the case, or if you want to derive the external table definition anew, you must explicitly declare this with --table-derivation-mode.

4.1.5. Configuration file management

Configuration input (--config-in)

The configuration input (--config-in=FILE) specifies the path of the configuration file to read. FILE is the absolute or relative path to the file to read.

The configuration input is used as the initial configuration, and can be modified by the other commandline options. If --config-in is not specified, the default values are applied in the absence of a commandline option.

Configuration output (--config-out)

The configuration output (--config-out=FILE) specifies the path of the configuration XML file to write. FILE is the absolute or relative path to the file to read.

If --config-out is not specified, no configuration file is written. By default, an existing configuration file is not overwritten. Specify --overwrite-config to overwrite an existing configuration file.

If the configuration file exists and --overwrite-config is not specified, this is reported on the standard error output. The export of the CSV to an external table will proceed as normal.

When both --config-in and --config-out are specified without other options (except --overwrite-config), generally a file with the same configuration will be written, except for:

  1. New configuration options introduced in ext-table-gen after the configuration input was created

  2. The content of the informational element is always regenerated, so if it was outdated (e.g. see Regenerating the configuration file), or ext-table-gen changed how the DDL or other information is generated, it may change.

When both --config-in and --config-out are specified with other options, the values of those options are reflected in the new configuration file.

Except for the table name (/extTableGenConfig/externalTable[@name]) and table file (/extTableGenConfig/externalTable/tableFile), the externalTable element will not change unless --table-derivation-mode=ALWAYS is specified, or the existing configuration is considered invalid or incomplete and --table-derivation-mode=INCOMPLETE is specified.

Overwrite configuration (--[no-]overwrite-config)

The overwrite configuration boolean option (--[no-]overwrite-config) specifies whether the configuration output (--config-out) can be overwritten if it exists (--overwrite-config) or not overwritten (--no-overwrite-config, the default). The default is --no-overwrite-config.

This option can only be specified if --config-out is specified.

The option --no-overwrite-config is technically unnecessary, but can be specified for explicitness (and exists for consistency with other boolean options).

4.2. XML configuration

The XML configuration file format uses an XSD to define its structure. This XSD can be found in XML schemas. This section follows the structure of the XSD.

The root of the XML is extTableGenConfig.

Attributes

None

Elements

4.2.1. externalTable

The externalTable element defines the name, columns and table file of the external table. The columns it contains are used by ext-table-gen to validate the columns in the input CSV file (e.g. number of columns, length of column values), and format the output of the external table file, and generate the DDL of the table.

Attributes
name

Name of the table (used for the DDL; required)

byteOrder

Byte order of the table (optional) Possible values:

BIG_ENDIAN

big-endian

LITTLE_ENDIAN

little-endian

AUTO

automatically select based on the byte order of the runtime platform

If the attribute is absent, AUTO is used.

Elements
columns

The element columns is a sequence of one or more column elements, defining the columns of the external table.

Attributes

None

Elements, in order
column

The column element defines a normal data column.

Attributes
name

Name of the column (used for the DDL; required)

Elements
  • Element of type datatype — required

datatype

A datatype is a placeholder for an element corresponding to a specific Firebird data type.

Attributes

None, though subtypes may define attributes

Elements, in order

At this time, ext-table-gen supports the following column data types:

bigint

The bigint elements represents the Firebird datatype BIGINT.

Attributes

None

Elements
char

The char element represents the Firebird datatype CHAR.

Attributes
length

Integer in the range [1, 32767] with length in Unicode codepoints (required).
The XSD does not validate maximum length for multibyte character sets (e.g. for encoding="UTF8" the actual maximum length would be 8191, but this is not enforced).

encoding

Encoding (character set) of the column, using Firebird character set names (required).
The XSD uses an enum-type, encoding, with supported names.

Elements
date

The date element represents the Firebird datatype DATE. The Firebird DATE type only accepts values between 0001-01-01 and 9999-12-31.

The default conversion parses the string using the ISO 8601 calendar date format (pattern yyyy-MM-dd, or named pattern ISO_LOCAL_DATE).

Attributes

None

Elements
decimal

The decimal element represents the Firebird datatype DECIMAL.

The default conversion parses the string using the rules described in the constructor BigDecimal(String). Contrary to the parseBigDecimal converter, this converter can also parse strings in scientific notation.

Attributes
precision

Integer in the range [1, 38] for the precision in decimal digits (required).
Precision greater than 19 requires Firebird 4.0 and higher.

scale

Integer in the range [0, scale] for the scale in decimal digits (required).
The XSD does not enforce that scale must be less than or equal to precision.

roundingMode

Rounding mode applied when reducing the scale of a parsed number with more digits after the decimal point than allowed by scale (optional, defaults to HALF_UP).
The XSD uses an enum-type, roundingMode, with supported names.
Possible values are:

CEILING

Round towards positive infinity.

DOWN

Round towards zero.

FLOOR

Round towards negative infinity.

HALF_DOWN

Round towards “nearest neighbor” unless both neighbors are equidistant, in which case round down.

HALF_EVEN

Round towards the “nearest neighbor” unless both neighbors are equidistant, in which case, round towards the even neighbor.

HALF_UP

(default) Round towards “nearest neighbor” unless both neighbors are equidistant, in which case round up.

UNNECESSARY

Asserts that the requested operation has an exact result, hence no rounding is necessary.
Using this rounding mode will reject values with more non-zero digits after the decimal point than scale.

UP

Round away from zero.

The precision is used only to determine the backing datatype, and is not used to enforce length limits. Just like in Firebird itself, the allowed range of values is determined by the backing datatype. For DECIMAL, that means:

1 <= precision <= 9

INTEGER, unscaled range [-2147483648, 2147483647]

10 <= precision <= 18

BIGINT, unscaled range [-9223372036854775808, 9223372036854775807]

19 <= precision <= 38

INT128, unscaled range [-170141183460469231731687303715884105728, 170141183460469231731687303715884105727]

Elements
int128

The int128 element represents the Firebird datatype INT128.

Attributes

None

Elements
integer

The integer element represents the Firebird datatype INTEGER.

Attributes

None

Elements
numeric

The numeric element represents the Firebird datatype NUMERIC.

The default conversion parses the string using the rules described in the constructor BigDecimal(String). Contrary to the parseBigDecimal converter, this converter can also parse strings in scientific notation.

Attributes
precision

Integer in the range [1, 38] for the precision in decimal digits (required).
Precision greater than 19 requires Firebird 4.0 and higher.

scale

Integer in the range [0, scale] for the scale in decimal digits (required).
The XSD does not enforce that scale must be less than or equal to precision.

roundingMode

Rounding mode applied when reducing the scale of a parsed number with more digits after the decimal point than allowed by scale (optional, defaults to HALF_UP).
The XSD uses an enum-type, roundingMode, with supported names.
Possible values are:

CEILING

Round towards positive infinity.

DOWN

Round towards zero.

FLOOR

Round towards negative infinity.

HALF_DOWN

Round towards “nearest neighbor” unless both neighbors are equidistant, in which case round down.

HALF_EVEN

Round towards the “nearest neighbor” unless both neighbors are equidistant, in which case, round towards the even neighbor.

HALF_UP

(default) Round towards “nearest neighbor” unless both neighbors are equidistant, in which case round up.

UNNECESSARY

Asserts that the requested operation has an exact result, hence no rounding is necessary.
Using this rounding mode will reject values with more non-zero digits after the decimal point than scale.

UP

Round away from zero.

The precision is used only to determine the backing datatype, and is not used to enforce length limits. Just like in Firebird itself, the allowed range of values is determined by the backing datatype. For NUMERIC, that means:

1 <= precision <= 4

SMALLINT, unscaled range [-32768, 32767]

5 <= precision <= 9

INTEGER, unscaled range [-2147483648, 2147483647]

10 <= precision <= 18

BIGINT, unscaled range [-9223372036854775808, 9223372036854775807]

19 <= precision <= 38

INT128, unscaled range [-170141183460469231731687303715884105728, 170141183460469231731687303715884105727]

Elements
smallint

The smallint element represents the Firebird datatype SMALLINT.

Attributes

None

Elements
time

The time elements represents the Firebird datatype TIME [WITHOUT TIME ZONE^]. The Firebird TIME type only accepts values between 00:00:00.0000 and 23:59:59.9999.

Attributes

None

Elements
timestamp

The timestamp elements represents the Firebird datatype TIMESTAMP [WITHOUT TIME ZONE^]. See also the limitations of date and time.

For additional flexibility, the time part is not required. For example if you use a parseDatetime with a pattern which only provides date information, but doesn’t provide time information (e.g. yyyy-MM-dd), it will save the date with time set to 00:00:00.0000.

Attributes

None

Elements
converter

The converter element — if present — defines a conversion from the CSV string value to a Firebird data type value. If absent, a default conversion will be applied, specific to the column data type.

Attributes

None

Elements
  • One of the converter steps listed below — required

The name converter step may seem to imply that a converter can consist of multiple steps chained together. That is currently not supported, but it is something that may be introduced in a future version.

In the actual program code, a converter step is also called a converter, but in this manual we wanted to prevent overloading the term as it already refers to the converter element.

parseBigDecimal

Parses a string using the Java NumberFormat instance for the specified locale to a BigDecimal.

This converter step is only valid in decimal and numeric. Contrary to the default converter of numeric and decimal, this converter does not parse strings in scientific notation.

Attributes
locale

Locale as a BCP 47 language tag (required)

Elements

None

parseDatetime

Parses a string using the specified pattern and locale to a datetime value.

This converter step is only valid in date, time, and timestamp.

Attributes
pattern

Datetime pattern, or one of the pre-defined formatter names (required)

The pattern format uses the syntax described in java.time.format.DateTimeFormatter. The pre-defined formatter names are the constants listed under Field Summary in DateTimeFormatter.

In addition, the following custom pre-defined formatter names are available:

  • SQL_TIMESTAMP — basically the same as ISO_LOCAL_DATE_TIME, but uses a space as the separator instead of T.

locale

Locale as a BCP 47 language tag (optional)

Generally only relevant to parse datetime strings with localized names for months and days.

Elements

None

parseIntegralNumber

Parses a string using a specified radix to an integral number of the type of the enclosing datatype.

This converter step is only valid in bigint, integer, int128, and smallint.

Attributes
radix

Radix for parsing a string to an integral number. Default 10, range [2, 36].

Elements

None

endColumn

The endColumn elements represents an extra column which is added to the external table to introduce a linebreak after each row in the external table file. This linebreak is primarily intended to make the file more “human-readable”, and is not needed for Firebird itself.

Attributes
type

Type of linebreak (required)
Possible values:

LF

Linefeed (\n or 0x0a)

CRLF

Carriage return and linefeed (\r\n or 0x0d0a)

There is no value equivalent to --end-column=NONE. Instead, that is represented by absence of the endColumn element.

Elements

None

tableFile

The element tableFile specifies the external table file.

Attributes
path

Path of the external table file (required).
If a relative path, it is resolved against the current working directory of ext-table-gen. It is recommended to use absolute paths.

overwrite

Boolean specifying if ext-table-gen is allowed to overwrite an existing file (optional; defaults to false)

Elements

None

4.2.2. tableDerivation

The element tableDerivation stores the configuration used to derive an external table. This is only used when ext-table-gen is used with --table-derivation-mode=ALWAYS, or --table-derivation-mode=INCOMPLETE and the current configuration is considered incomplete or invalid.

Be aware, it does not necessarily represent the configuration used to derive the current table described by externalTable, but instead the last configuration stored using --config-out.

Attributes
columnEncoding

Encoding (character set) to apply to columns when deriving an external table (optional).
The XSD uses an enum-type, encoding, with supported names.

endColumnType

Type of linebreak (optional)
Possible values:

LF

Linefeed (\n or 0x0a)

CRLF

Carriage return and linefeed (\r\n or 0x0d0a)

NONE

No end column

Absence of the attribute implies LF.

Elements

None

The commandline option Table derivation mode (--table-derivation-mode) is intentionally not represented in the configuration file. If a configuration file is specified (using --config-in), ext-table-gen defaults to NEVER, and needs to be explicitly overridden by the commandline to derive the external table.

4.2.3. csvFile

The element csvFile specifies the CSV file to use, and how to read it.

Attributes
path

Path of the CSV file (required).
If a relative path, it is resolved against the current working directory of ext-table-gen. It is recommended to use absolute paths.

charset

Character set of the CSV file (required).
Expects a Java character set name or alias. As available character sets depend on the JVM and its classpath, this is not restricted by the XSD.

headerRow

Boolean specifying if the first row of the CSV file is a header row or a data row (optional; defaults to true)

Elements
rfc4180CsvParser

The element rfc4180CsvParser specifies use of the RFC 4180 parser (implementation: com.opencsv.RFC4180Parser) and its configuration.

Attributes
quoteChar

Quote character (optional, defaults to ").
Value can be a single character, a character mnemonic or a Unicode escape.

separator

Separator (optional, defaults to ,).
Value can be a single character, a character mnemonic or a Unicode escape.

Elements

None

customCsvParser

The element customCsvParser specifies use of the “custom” parser (implementation: com.opencsv.CSVParser) and its configuration.

Attributes
quoteChar

Quote character (optional, defaults to ").
Value can be a single character, a character mnemonic or a Unicode escape.

separator

Separator (optional, defaults to ,).
Value can be a single character, a character mnemonic or a Unicode escape.

escapeChar

Escape character (optional, defaults to \).
Value can be a single character, a character mnemonic or a Unicode escape.

ignoreLeadingWhiteSpace

Ignore leading white space before (quoted) value (optional, defaults to true). Boolean value.

ignoreQuotations

Ignores quotes (optional, defaults to false). Boolean value.

strictQuotes

Strict quotes, ignores values outside of quotes (optional, defaults to false). Boolean value.

Elements

None

4.2.4. informational

The element informational contains data which is informational only, and does not represent configuration used by ext-table-gen. Be aware that if the configuration file was changed outside ext-table-gen, the contents of informational may be out-of-sync. See Regenerating the configuration file on how to regenerate the config file.

Attributes

None

Elements
  • ddl — required

ddl

The ddl element contains the Firebird CREATE TABLE statement derived from externalTable.

Attributes

None

Elements

None

4.3. Common Configuration

The following definitions are shared between commandline and XML configuration

4.3.1. Character Mnemonics

Character mnemonics are aliases for characters. They can be used for characters that are unprintable (e.g. the tab character) or are hard to use on the commandline or XML (e.g. the double quote character).

Supported mnemonics are:

TAB

horizontal tab (Unicode U+0009)

SPACE

space (Unicode U+0020)

QUOT

quotation mark/double quote (", Unicode U+0022)

APOS

apostrophe/single quote (', Unicode U+0027)

GRAVE

grave accent/back tick (`, Unicode U+0060)

If you’re missing a mnemonic, use Unicode Escape.

If you think we need to add another mnemonic, please create an issue on https://github.com/mrotteveel/ext-table-gen.

4.3.2. Unicode Escape

The Unicode escape can be used to represent any Unicode character supported by UTF-16[5].

Its format is:

<escape> =
    U+<h><h><h><h>
  | u+<h><h><h><h>

<h> = hexadecimal digit, one of 0-9, A-F, a-f

For example:

U+20AC -- € (EURO SIGN)

Appendices

Appendix A: XML schemas

The following is the XML schema used by ext-table-gen for its configuration files.

A.1. ext-table-gen-1.0.xsd

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!-- SPDX-FileCopyrightText: Copyright 2023 Mark Rotteveel -->
<!-- SPDX-License-Identifier: Apache-2.0 -->
<!-- NOTE: Updates to /xs:schema[@version] also need to be reflected in ConfigMapper.CURRENT_SCHEMA_VERSION -->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="https://www.lawinegevaar.nl/xsd/ext-table-gen-1.0.xsd"
           xmlns="https://www.lawinegevaar.nl/xsd/ext-table-gen-1.0.xsd"
           elementFormDefault="qualified"
           xmlns:jaxb="https://jakarta.ee/xml/ns/jaxb"
           jaxb:version="3.0"
           version="2.0">

    <xs:element name="extTableGenConfig">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="externalTable" type="ExternalTableType"/>
                <xs:element name="tableDerivation" type="TableDerivationType" minOccurs="0"/>
                <xs:element name="csvFile" type="CsvFileType" minOccurs="0"/>
                <xs:element name="informational" type="InformationalType" minOccurs="0"/>
            </xs:sequence>
            <xs:attribute name="schemaVersion" type="xs:token" use="required"/>
        </xs:complexType>
    </xs:element>

    <xs:complexType name="ExternalTableType">
        <xs:sequence>
            <xs:element name="columns" type="ColumnListType"/>
            <xs:element name="tableFile" type="TableFileType" minOccurs="0"/>
        </xs:sequence>
        <xs:attribute name="name" type="xs:string" use="required"/>
        <xs:attribute name="byteOrder" type="ByteOrderEnum"/>
    </xs:complexType>

    <xs:complexType name="TableDerivationType">
        <xs:attribute name="columnEncoding" type="encoding"/>
        <xs:attribute name="endColumnType" type="EndColumnEnum"/>
        <!--
        NOTE: The tableDerivationMode is intentionally not part of this configuration, and will default to NEVER
        -->
    </xs:complexType>

    <xs:complexType name="ColumnListType">
        <xs:sequence>
            <xs:element name="column" type="ColumnType" maxOccurs="unbounded">
                <xs:annotation>
                    <xs:appinfo>
                        <jaxb:property name="columns"/>
                    </xs:appinfo>
                </xs:annotation>
            </xs:element>
            <xs:element name="endColumn" type="EndColumnType" minOccurs="0"/>
        </xs:sequence>
    </xs:complexType>

    <xs:complexType name="ColumnType">
        <xs:sequence>
            <xs:element ref="datatype"/>
        </xs:sequence>
        <xs:attribute name="name" type="xs:string"/>
    </xs:complexType>

    <xs:element name="datatype" type="DatatypeType" abstract="true"/>
    <xs:element name="char" type="CharType" substitutionGroup="datatype"/>
    <xs:element name="smallint" type="DatatypeType" substitutionGroup="datatype"/>
    <xs:element name="integer" type="DatatypeType" substitutionGroup="datatype"/>
    <xs:element name="bigint" type="DatatypeType" substitutionGroup="datatype"/>
    <xs:element name="int128" type="DatatypeType" substitutionGroup="datatype"/>
    <xs:element name="date" type="DatatypeType" substitutionGroup="datatype"/>
    <xs:element name="time" type="DatatypeType" substitutionGroup="datatype"/>
    <xs:element name="timestamp" type="DatatypeType" substitutionGroup="datatype"/>
    <xs:element name="numeric" type="FixedPointType" substitutionGroup="datatype"/>
    <xs:element name="decimal" type="FixedPointType" substitutionGroup="datatype"/>

    <xs:complexType name="DatatypeType">
        <xs:sequence>
            <xs:element name="converter" type="ConverterType" minOccurs="0"/>
        </xs:sequence>
    </xs:complexType>

    <xs:complexType name="CharType">
        <xs:complexContent>
            <xs:extension base="DatatypeType">
                <xs:attribute name="length" use="required">
                    <xs:simpleType>
                        <xs:restriction base="xs:int">
                            <xs:minInclusive value="1"/>
                            <xs:maxInclusive value="32767"/>
                        </xs:restriction>
                    </xs:simpleType>
                </xs:attribute>
                <xs:attribute name="encoding" type="encoding" use="required"/>
            </xs:extension>
        </xs:complexContent>
    </xs:complexType>

    <xs:simpleType name="encoding">
        <xs:annotation>
            <xs:appinfo>
                <jaxb:typesafeEnumClass map="false"/>
            </xs:appinfo>
        </xs:annotation>
        <xs:restriction base="xs:string">
            <!--Not supported-->
            <!--<xs:enumeration value="NONE"/>-->
            <!--<xs:enumeration value="OCTETS"/>-->
            <xs:enumeration value="ASCII"/>
            <!--We don't want to implement writing only max 3 bytes and correct UNICODE_FSS-->
            <!--<xs:enumeration value="UNICODE_FSS"/>-->
            <xs:enumeration value="UTF8"/>
            <!--TODO Need to know how to handle spaces-->
            <!--<xs:enumeration value="SJIS_0208"/>-->
            <!--<xs:enumeration value="EUCJ_0208"/>-->
            <xs:enumeration value="DOS737"/>
            <xs:enumeration value="DOS437"/>
            <xs:enumeration value="DOS850"/>
            <xs:enumeration value="DOS865"/>
            <xs:enumeration value="DOS860"/>
            <xs:enumeration value="DOS863"/>
            <xs:enumeration value="DOS775"/>
            <xs:enumeration value="DOS858"/>
            <xs:enumeration value="DOS862"/>
            <xs:enumeration value="DOS864"/>
            <xs:enumeration value="ISO8859_1"/>
            <xs:enumeration value="ISO8859_2"/>
            <xs:enumeration value="ISO8859_3"/>
            <xs:enumeration value="ISO8859_4"/>
            <xs:enumeration value="ISO8859_5"/>
            <xs:enumeration value="ISO8859_6"/>
            <xs:enumeration value="ISO8859_7"/>
            <xs:enumeration value="ISO8859_8"/>
            <xs:enumeration value="ISO8859_9"/>
            <xs:enumeration value="ISO8859_13"/>
            <!--TODO Need to know how to handle spaces-->
            <!--<xs:enumeration value="KSC_5601"/>-->
            <xs:enumeration value="DOS852"/>
            <xs:enumeration value="DOS857"/>
            <xs:enumeration value="DOS861"/>
            <xs:enumeration value="DOS866"/>
            <xs:enumeration value="DOS869"/>
            <xs:enumeration value="WIN1250"/>
            <xs:enumeration value="WIN1251"/>
            <xs:enumeration value="WIN1252"/>
            <xs:enumeration value="WIN1253"/>
            <xs:enumeration value="WIN1254"/>
            <!--TODO Need to know how to handle spaces-->
            <!--<xs:enumeration value="BIG_5"/>-->
            <!--<xs:enumeration value="GB_2312"/>-->
            <xs:enumeration value="WIN1255"/>
            <xs:enumeration value="WIN1256"/>
            <xs:enumeration value="WIN1257"/>
            <xs:enumeration value="KOI8R"/>
            <xs:enumeration value="KOI8U"/>
            <xs:enumeration value="WIN1258"/>
            <xs:enumeration value="TIS620"/>
        </xs:restriction>
    </xs:simpleType>

    <xs:complexType name="FixedPointType">
        <xs:complexContent>
            <xs:extension base="DatatypeType">
                <xs:attribute name="precision" use="required">
                    <xs:simpleType>
                        <xs:restriction base="xs:int">
                            <xs:minInclusive value="1"/>
                            <xs:maxInclusive value="38"/>
                        </xs:restriction>
                    </xs:simpleType>
                </xs:attribute>
                <xs:attribute name="scale" use="required">
                    <xs:simpleType>
                        <xs:restriction base="xs:int">
                            <xs:minInclusive value="0"/>
                            <xs:maxInclusive value="38"/>
                        </xs:restriction>
                    </xs:simpleType>
                </xs:attribute>
                <xs:attribute name="roundingMode" type="roundingMode" default="HALF_UP"/>
            </xs:extension>
        </xs:complexContent>
    </xs:complexType>

    <xs:simpleType name="roundingMode">
        <xs:annotation>
            <xs:appinfo>
                <jaxb:typesafeEnumClass map="false"/>
            </xs:appinfo>
        </xs:annotation>
        <xs:restriction base="xs:string">
            <xs:enumeration value="CEILING"/>
            <xs:enumeration value="DOWN"/>
            <xs:enumeration value="FLOOR"/>
            <xs:enumeration value="HALF_DOWN"/>
            <xs:enumeration value="HALF_EVEN"/>
            <xs:enumeration value="HALF_UP"/>
            <xs:enumeration value="UNNECESSARY"/>
            <xs:enumeration value="UP"/>
        </xs:restriction>
    </xs:simpleType>

    <xs:complexType name="ConverterType">
        <xs:sequence>
            <xs:element ref="converterStep"/>
        </xs:sequence>
    </xs:complexType>

    <xs:element name="converterStep" type="ConverterStepType" abstract="true"/>
    <xs:element name="parseIntegralNumber" type="ParseIntegralType" substitutionGroup="converterStep"/>
    <xs:element name="parseDatetime" type="ParseDatetimeType" substitutionGroup="converterStep"/>
    <xs:element name="parseBigDecimal" type="ParseBigDecimalType" substitutionGroup="converterStep"/>

    <xs:complexType name="ConverterStepType" abstract="true"/>

    <xs:complexType name="ParseIntegralType">
        <xs:complexContent>
            <xs:extension base="ConverterStepType">
                <xs:attribute name="radix" default="10">
                    <xs:simpleType>
                        <xs:restriction base="xs:int">
                            <xs:minInclusive value="2"/>
                            <xs:maxInclusive value="36"/>
                        </xs:restriction>
                    </xs:simpleType>
                </xs:attribute>
            </xs:extension>
        </xs:complexContent>
    </xs:complexType>

    <xs:complexType name="ParseDatetimeType">
        <xs:complexContent>
            <xs:extension base="ConverterStepType">
                <xs:attribute name="pattern" type="xs:normalizedString" use="required"/>
                <xs:attribute name="locale" type="xs:language"/>
            </xs:extension>
        </xs:complexContent>
    </xs:complexType>

    <xs:complexType name="ParseBigDecimalType">
        <xs:complexContent>
            <xs:extension base="ConverterStepType">
                <xs:attribute name="locale" type="xs:language" use="required"/>
            </xs:extension>
        </xs:complexContent>
    </xs:complexType>

    <xs:complexType name="EndColumnType">
        <xs:attribute name="type" use="required">
            <xs:simpleType>
                <xs:restriction base="EndColumnEnum">
                    <xs:enumeration value="LF"/>
                    <xs:enumeration value="CRLF"/>
                </xs:restriction>
            </xs:simpleType>
        </xs:attribute>
    </xs:complexType>

    <xs:simpleType name="EndColumnEnum">
        <xs:annotation>
            <xs:appinfo>
                <jaxb:typesafeEnumClass map="false"/>
            </xs:appinfo>
        </xs:annotation>
        <xs:restriction base="xs:string">
            <xs:enumeration value="LF"/>
            <xs:enumeration value="CRLF"/>
            <xs:enumeration value="NONE"/>
        </xs:restriction>
    </xs:simpleType>

    <xs:simpleType name="ByteOrderEnum">
        <xs:annotation>
            <xs:appinfo>
                <jaxb:typesafeEnumClass map="false"/>
            </xs:appinfo>
        </xs:annotation>
        <xs:restriction base="xs:string">
            <xs:enumeration value="BIG_ENDIAN"/>
            <xs:enumeration value="LITTLE_ENDIAN"/>
            <xs:enumeration value="AUTO"/>
        </xs:restriction>
    </xs:simpleType>

    <xs:complexType name="TableFileType">
        <xs:attribute name="path" type="xs:string" use="required"/>
        <xs:attribute name="overwrite" type="xs:boolean" default="false"/>
    </xs:complexType>

    <xs:complexType name="CsvFileType">
        <xs:sequence>
            <xs:element ref="csvParser" minOccurs="0"/>
        </xs:sequence>
        <xs:attribute name="path" type="xs:string" use="required"/>
        <xs:attribute name="charset" type="xs:string" use="required"/>
        <xs:attribute name="headerRow" type="xs:boolean" default="true"/>
    </xs:complexType>

    <xs:element name="csvParser" type="CsvParserType" abstract="true"/>
    <xs:element name="rfc4180CsvParser" type="Rfc4180CsvParserType" substitutionGroup="csvParser"/>
    <xs:element name="customCsvParser" type="CustomCsvParserType" substitutionGroup="csvParser"/>

    <xs:complexType name="CsvParserType" abstract="true">
        <xs:attribute name="quoteChar" type="xs:string"/>
        <xs:attribute name="separator" type="xs:string"/>
    </xs:complexType>

    <xs:complexType name="Rfc4180CsvParserType">
        <xs:complexContent>
            <xs:extension base="CsvParserType"/>
        </xs:complexContent>
    </xs:complexType>

    <xs:complexType name="CustomCsvParserType">
        <xs:complexContent>
            <xs:extension base="CsvParserType">
                <xs:attribute name="escapeChar" type="xs:string"/>
                <xs:attribute name="ignoreLeadingWhiteSpace" type="xs:boolean"/>
                <xs:attribute name="ignoreQuotations" type="xs:boolean"/>
                <xs:attribute name="strictQuotes" type="xs:boolean"/>
            </xs:extension>
        </xs:complexContent>
    </xs:complexType>

    <xs:complexType name="InformationalType">
        <xs:sequence>
            <xs:element name="ddl" type="xs:string"/>
        </xs:sequence>
    </xs:complexType>

</xs:schema>

1. The format is essentially the in-memory layout of rows which Firebird also uses internally
2. So the external table file will not be too wide for display purposes
3. I semi-randomly picked this name from “Czech name” on Wikipedia
4. This is default behaviour of Java when encoding characters which don’t exist in an encoding
5. this limitation is in place because the escape character must be a Java char