Copyright 2023 Mark Rotteveel and all contributing authors under the Apache 2.0 license.
1. Introduction
The “Firebird External Table Generator” or ext-table-gen is a commandline tool to convert RFC 4180 CSV files and CSV files with custom formats to Firebird external tables (a binary format for Firebird external table data).
External tables are a good way for bulk-loading data into Firebird. Unfortunately, external tables use a fixed-width binary format[1], and not a (more) standard format like CSV. It is not always easy to create an appropriate external table file, and this is where ext-table-gen can be used.
At a high level, ext-table-gen provides the following features:
-
Derive a
CHAR
-based external table definition (i.e. aCREATE TABLE
statement and a configuration for ext-table-gen) from a CSV file -
Transform a CSV file to an external table file (either based on the CSV file itself, or based on a configuration file)
By default, ext-table-gen derives tables with only CHAR
columns.
This makes the generated file essentially a fixed-width text format.
However, you can modify the configuration file to make ext-table-gen generate columns of different types, in which case the file becomes a binary format.
ext-table-gen supports the following types of columns:
-
String types:
-
char
-
-
Integral number types:
-
smallint
-
integer
-
bigint
-
int128
-
-
Exact numeric types (fixed point):
-
numeric
-
decimal
-
-
Datetime types:
-
date
-
time
(without time zone) -
timestamp
(without time zone)
-
This user manual provides examples of use, and serves as reference documentation for the commandline options and XML format.
The project itself is hosted on GitHub: https://github.com/mrotteveel/ext-table-gen.
2. Installation
2.1. Requirements
The “Firebird External Table Generator” or ext-table-gen requires Java 17. We have not tested it on higher versions, but expect it to work on higher versions as well. We will only fix issues reported against Java versions which are in support according to the OpenJDK project, at the time of writing this — August 2023, that means Java 17 and Java 20.
2.2. Distribution package
The ext-table-gen is distributed in a zip:
-
ext-table-gen-2.0.zip
— for Linux, macOS and Windows
The layout of the archive is as follows:
-
ext-table-gen-2.0
— root directory with application files-
bin
— contains executable filesext-table-gen
(sh
script) andext-table-gen.bat
(Windows Batch script) -
docs
— contains documentation, including this manual -
lib
— contains the application code and other Java dependencies
-
2.3. Installing
We gloss over describing OS-specific configuration like installing Java, configuring the |
2.3.1. Extract archive
Use your favourite zip or unzip tool to extract the archive to a user-accessible directory.
When copying or extracting to C:\Program Files\
or C:\Program Files (x86)\
, be aware that the archive tool must be run as administrator.
You can also use Explorer to open ZIP files, and copy or extract the ext-table-gen-2.0
directory from the zip.
It may be necessary to make bin/ext-table-gen
explicitly executable with chmod +x bin/ext-table-gen
.
2.3.2. Setting environment variables
To be able to call ext-table-gen, Java 17 or higher must be on the PATH
(the bin
directory of the Java installation), or alternatively, you need to set JAVA_HOME
to point to the Java 17 installation directory (not its bin
directory, but the directory which contains the bin
directory).
If you don’t have Java installed, install it using your package manager, sdkman (Linux/macOS), or download it, for example from Eclipse Adoptium.
If you want to call ext-table-gen without specifying the full path, you need to add its bin
directory to the PATH
.
For information on setting environment variables, see What are PATH and other environment variables, and how can I set or use them?
2.3.3. Verify installation
To verify if ext-table-gen is installed and can be run, run ext-table-gen
from the shell terminal or Command Prompt or Powershell.
If you added ext-table-gen to the PATH
, you can try from any directory, otherwise go to the bin
directory of ext-table-gen.
> ext-table-gen Missing option(s): --csv-file=CSV, --table-file=FILE, or --config-in=FILE Usage: ext-table-gen ...
Above output of the usage instruction was truncated for brevity.
If you get another error than “Missing option(s): …”, the installation is not correct.
2.4. Firebird configuration
The default Firebird configuration disallows use of external tables, or more specifically, it doesn’t allow Firebird to read or write external table files from any location.
To enable access, you need to configure the ExternalFileAccess
in firebird.conf
(globally) or in — Firebird 3.0 and higher — databases.conf
(per database).
The ExternalFileAccess
setting accepts Restrict
followed by a semicolon separated list of paths where Firebird can read and write external table files.
The user running the Firebird database server needs to have read and write access to these directories.
The user(s) who need to be able to create external table files using ext-table-gen also need read and write access.
On Linux, there is generally a common group (e.g. firebird
) which can be assigned these rights, or consider creating a separate group (e.g. fb_external_tables
) and assign the user running Firebird (e.g. user firebird
) and the users creating external tables to this group.
3. Usage Examples
This chapter illustrates how to use ext-table-gen by showing basic usage examples.
The author primarily uses Windows, so most examples are based on Windows. In most cases, the only relevant differences are file paths. Examples may use the Windows Command Prompt line continuation |
3.1. General notes
ext-table-gen by defaults reads CSV files in RFC 4180 format, with or without a header line.
The commandline and XML file support configuring a different quote character and separator for the RFC 4180 parser (com.opencsv.RFC4180Parser
), and configuring a different parser with more options (com.opencsv.CSVParser
).
See Customizing the CSV Format for more information.
CSV files are assumed to be encoded in UTF-8.
The character set can be explicitly configured using --csv-charset=CHARSET
, where CHARSET
is a Java character set name (e.g. --csv-charset=windows-1252
).
The external table definition derived from a CSV file uses only CHAR
columns, and the default character set is ISO8859_1
.
This can be overridden by specifying --column-encoding=ENCODING
, where ENCODING
is a Firebird character set name (e.g. column-encoding=WIN1252
).
Alternatively, an existing configuration file can be edited to change the column encoding.
The configuration file allows you to configure the character set per column, and also allows you to use data types other than CHAR
.
Given the overhead of using UTF8
columns (e.g. a CHAR(1000) CHARACTER SET UTF8
column will occupy 4000 bytes in an external table), we recommend using a single-byte character set where possible, and only defining columns as UTF8
when absolutely necessary.
Naming external table files
There is no common naming convention for external table files.
In this manual, we use the |
3.2. Generating an external table from CSV
The following demonstrates how you can use ext-table-gen to derive an external table definition and generate an external table file matching that definition.
This example uses a very simple CSV file, shown below.
ID,Lastname,Firstname,Email 1,Doe,John,john@example.org 2,Doe,Jane,jane@example.org 3,Deer,Jason,jason@example.org 4,Deer,Jillian,jillian@example.org
The minimum necessary commandline to create an external table file is to use:
ext-table-gen --csv-file=persons.csv --table-file=persons.dat
However, this will not show you the external table that was derived from the CSV file, so it is not very useful.
Instead, also specify the --config-out
option to create a configuration file:
ext-table-gen --csv-file=C:\FirebirdData\csv\persons.csv ^ --table-file=C:\FirebirdData\exttables\persons.dat ^ --config-out=C:\FirebirdData\csv\persons.xml
We’re specifying absolute paths so the output of running this command matches the XML file shown below. It is possible to use relative paths, but those will also appear as relative paths in the configuration file.
Above command will create the following configuration file:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<extTableGenConfig xmlns="https://www.lawinegevaar.nl/xsd/ext-table-gen-1.0.xsd" schemaVersion="2.0">
<externalTable name="DEFAULT_EXTERNAL_TABLE_NAME" byteOrder="LITTLE_ENDIAN">
<columns>
<column name="ID">
<char length="1" encoding="ISO8859_1"/>
</column>
<column name="Lastname">
<char length="4" encoding="ISO8859_1"/>
</column>
<column name="Firstname">
<char length="7" encoding="ISO8859_1"/>
</column>
<column name="Email">
<char length="19" encoding="ISO8859_1"/>
</column>
<endColumn type="LF"/>
</columns>
<tableFile path="C:\FirebirdData\exttables\persons.dat" overwrite="false"/>
</externalTable>
<tableDerivation columnEncoding="ISO8859_1" endColumnType="LF"/>
<csvFile path="C:\FirebirdData\csv\persons.csv" charset="UTF-8" headerRow="true">
<rfc4180CsvParser/>
</csvFile>
<informational>
<ddl>create table "DEFAULT_EXTERNAL_TABLE_NAME" external file 'C:\FirebirdData\exttables\persons.dat' (
"ID" char(1) character set ISO8859_1,
"Lastname" char(4) character set ISO8859_1,
"Firstname" char(7) character set ISO8859_1,
"Email" char(19) character set ISO8859_1,
"LF" char(1) character set ASCII default _ASCII x'0a'
);
</ddl>
</informational>
</extTableGenConfig>
This configuration file shows the definition of the external table and its columns, including length and character set.
By default, an extra column, LF
, is added with a linefeed to make the generated external table file somewhat human-readable.
This can be configured with --end-column={LF|CRLF|NONE}
.
It also shows the DDL necessary to create the external table matching the definition.
Given we didn’t specify an explicit table name (--table-name=NAME
), the name DEFAULT_EXTERNAL_TABLE_NAME
is used.
ext-table-gen also created the following persons.dat
1Doe John john@example.org 2Doe Jane jane@example.org 3DeerJason jason@example.org 4DeerJillianjillian@example.org
By default, ext-table-gen will not overwrite existing external table files or configuration files.
This needs to be explicitly enabled with --overwrite-table-file
and/or --overwrite-config
.
3.3. Using configuration
Generating an external table file using the instructions of the previous section, Generating an external table from CSV, is simple and easy, but has some limitations and downsides:
-
The datatype is always
CHAR
, while using a different data type may be more efficient (both from a storage perspective, and in loading it into Firebird) -
The maximum lengths of columns is derived from the CSV data. If you want to reuse the external table without dropping and recreating it, with data files derived from CSV files with the same layout, but columns have longer (or shorter!) maximum length, this will not work.
For example, we want to process a
persons.csv
file where theID
column has two or more digits, or someone has a lastnameRotteveel
, or everyone has an email address shorter than 19 characters, then Firebird will read the file incorrectly using the previous definition. -
All columns will use the same character set. For Western Europe and North America, using
ISO8859_1
is generally sufficient, but if you have persons from — for example — Eastern Europe or China and their names use characters from their region, those characters will be replaced by a?
. UsingUTF8
would be a catch-all solution, but given its downsides (wider columns, especially if a lot of data is Latin-1 or even plain ASCII), we recommend to only use it if you really need it, and only for those columns that need it.
These limitations can be addressed by modifying the configuration file and using it for subsequent calls to ext-table-gen.
We’ll demonstrate this separately for length, character set, and using a different column type.
3.3.1. Changing column lengths
The length of a column can be modified by editing the length
attribute of the char
element of a column.
For example, from length 1
<column name="ID">
<char length="1" encoding="ISO8859_1"/>
</column>
to length 3
<column name="ID">
<char length="3" encoding="ISO8859_1"/>
</column>
Below, we modified the XML generated in section Generating an external table from CSV, slightly increasing the lengths of each column[2].
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<extTableGenConfig xmlns="https://www.lawinegevaar.nl/xsd/ext-table-gen-1.0.xsd" schemaVersion="2.0">
<externalTable name="DEFAULT_EXTERNAL_TABLE_NAME" byteOrder="LITTLE_ENDIAN">
<columns>
<column name="ID">
<char length="3" encoding="ISO8859_1"/>
</column>
<column name="Lastname">
<char length="10" encoding="ISO8859_1"/>
</column>
<column name="Firstname">
<char length="10" encoding="ISO8859_1"/>
</column>
<column name="Email">
<char length="25" encoding="ISO8859_1"/>
</column>
<endColumn type="LF"/>
</columns>
<tableFile path="C:\FirebirdData\exttables\persons.dat" overwrite="true"/>
</externalTable>
<tableDerivation columnEncoding="ISO8859_1" endColumnType="LF"/>
<csvFile path="C:\FirebirdData\csv\persons.csv" charset="UTF-8" headerRow="true">
<rfc4180CsvParser/>
</csvFile>
<informational>
<ddl>create table "DEFAULT_EXTERNAL_TABLE_NAME" external file 'C:\FirebirdData\exttables\persons.dat' (
"ID" char(1) character set ISO8859_1,
"Lastname" char(4) character set ISO8859_1,
"Firstname" char(7) character set ISO8859_1,
"Email" char(19) character set ISO8859_1,
"LF" char(1) character set ASCII default _ASCII x'0a'
);
</ddl>
</informational>
</extTableGenConfig>
We also set the attribute overwrite
to true
of tableFile
to simplify the next commandline.
The --overwrite-table-file
option only works if --table-file
is also specified, it doesn’t override the overwrite config for the file specified in the XML.
We didn’t change the DDL, as we’ll demonstrate later how to regenerate it.
To use the configuration file, use:
ext-table-gen --config-in=C:\FirebirdData\csv\persons.xml
With the same persons.csv
as the original CSV file, the persons.dat
is now:
1 Doe John john@example.org 2 Doe Jane jane@example.org 3 Deer Jason jason@example.org 4 Deer Jillian jillian@example.org
Compared to the original persons.dat, you’ll notice that the columns are wider.
Keep in mind that external table files are a binary data format.
The examples up-to-now use only This changes once we start using multiple character sets, or other data types. |
3.3.2. Regenerating the configuration file
In the previous section, Changing column lengths, we modified the configuration XML, but didn’t touch the DDL.
We can of course modify the DDL manually, but we can also regenerate the configuration XML using --config-out
.
Be aware, the external table file specified in the configuration file will also be recreated as part of this command.
For example, building on the previous example:
ext-table-gen --config-in=C:\FirebirdData\csv\persons.xml ^ --config-out=C:\FirebirdData\csv\persons.xml --overwrite-config
Here we specify the same file for --config-in
and --config-out
, so we need to specify --overwrite-config
, otherwise the file is not overwritten.
The updated persons.xml
is now:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<extTableGenConfig xmlns="https://www.lawinegevaar.nl/xsd/ext-table-gen-1.0.xsd" schemaVersion="2.0">
<externalTable name="DEFAULT_EXTERNAL_TABLE_NAME" byteOrder="LITTLE_ENDIAN">
<columns>
<column name="ID">
<char length="3" encoding="ISO8859_1"/>
</column>
<column name="Lastname">
<char length="10" encoding="ISO8859_1"/>
</column>
<column name="Firstname">
<char length="10" encoding="ISO8859_1"/>
</column>
<column name="Email">
<char length="25" encoding="ISO8859_1"/>
</column>
<endColumn type="LF"/>
</columns>
<tableFile path="C:\FirebirdData\exttables\persons.dat" overwrite="true"/>
</externalTable>
<tableDerivation columnEncoding="ISO8859_1" endColumnType="LF"/>
<csvFile path="C:\FirebirdData\csv\persons.csv" charset="UTF-8" headerRow="true">
<rfc4180CsvParser/>
</csvFile>
<informational>
<ddl>create table "DEFAULT_EXTERNAL_TABLE_NAME" external file 'C:\FirebirdData\exttables\persons.dat' (
"ID" char(3) character set ISO8859_1,
"Lastname" char(10) character set ISO8859_1,
"Firstname" char(10) character set ISO8859_1,
"Email" char(25) character set ISO8859_1,
"LF" char(1) character set ASCII default _ASCII x'0a'
);
</ddl>
</informational>
</extTableGenConfig>
As you can see the DDL is now updated to match the definition in externalTable
.
3.3.3. Changing column character set
Taking the initial example of Generating an external table from CSV, say we also need to import someone with the Czech name Eliška[3]:
ID,Lastname,Firstname,Email 1,Doe,John,john@example.org 2,Doe,Jane,jane@example.org 3,Deer,Jason,jason@example.org 4,Deer,Jillian,jillian@example.org 5,Deer,Eliška,eliska@example.org
If we try to generate the external table file with the original configuration:
ext-table-gen --config-in=C:\FirebirdData\csv\persons.xml
We now get:
1Doe John john@example.org 2Doe Jane jane@example.org 3DeerJason jason@example.org 4DeerJillianjillian@example.org 5DeerEli?ka eliska@example.org
As you can see, the š
has been replaced by a ?
[4], because š
cannot be encoded in ISO-8859-1.
To be able to import the right character, we need to use a different encoding.
In our simplified example, we could change the column to ISO8859_2
or WIN1250
(which supports Czech codepoints like š
), but we’ll use UTF8
to also demonstrate the effects of using a multibyte character set.
In our example, we’ll only modify Firstname
to use UTF8
:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<extTableGenConfig xmlns="https://www.lawinegevaar.nl/xsd/ext-table-gen-1.0.xsd" schemaVersion="2.0">
<externalTable name="DEFAULT_EXTERNAL_TABLE_NAME" byteOrder="LITTLE_ENDIAN">
<columns>
<column name="ID">
<char length="1" encoding="ISO8859_1"/>
</column>
<column name="Lastname">
<char length="4" encoding="ISO8859_1"/>
</column>
<column name="Firstname">
<char length="7" encoding="UTF8"/>
</column>
<column name="Email">
<char length="19" encoding="ISO8859_1"/>
</column>
<endColumn type="LF"/>
</columns>
<tableFile path="C:\FirebirdData\exttables\persons.dat" overwrite="true"/>
</externalTable>
<tableDerivation columnEncoding="ISO8859_1" endColumnType="LF"/>
<csvFile path="C:\FirebirdData\csv\persons.csv" charset="UTF-8" headerRow="true">
<rfc4180CsvParser/>
</csvFile>
<informational>
<ddl>create table "DEFAULT_EXTERNAL_TABLE_NAME" external file 'C:\FirebirdData\exttables\persons.dat' (
"ID" char(1) character set ISO8859_1,
"Lastname" char(4) character set ISO8859_1,
"Firstname" char(7) character set ISO8859_1,
"Email" char(19) character set ISO8859_1,
"LF" char(1) character set ASCII default _ASCII x'0a'
);
</ddl>
</informational>
</extTableGenConfig>
Running:
ext-table-gen --config-in=C:\FirebirdData\csv\persons.xml
We now get:
1Doe John john@example.org 2Doe Jane jane@example.org 3DeerJason jason@example.org 4DeerJillian jillian@example.org 5DeerEliška eliska@example.org
As you can see, for the first four rows, the Firstname
column now seems to occupy 28 characters, while in the fifth row it only seems to occupy 27 characters.
This is because the external table file is a fixed-width binary format, not a fixed-width text format.
The Firstname
column is now defined as <char length="7" encoding="UTF8"/>
, and in UTF-8, š
is encoded in two bytes.
UTF-8 encodes characters in 1 to 4 bytes, and Firebird requires the column to be 4 * 7 = 28 bytes (that is, the maximum possible length), and superfluous bytes need to be populated with byte 0x20 (space character).
The effect is that Jillian
(which is 7 characters and 7 bytes), is followed by 21 spaces (total: 28 bytes), while Eliška
(6 characters, but 7 bytes) is also followed by 21 spaces (total: 28 bytes).
If you try to trick the server, by manually editing the contents of such a column to have more than 7 characters (upto the length of 28 bytes), it will not work and Firebird will report a truncation error for values longer than 7 characters.
Again, it is important to keep in mind, that by using two different and incompatible character sets in the external table file, it is now truly a binary file. In our example it just happens to work OK when shown as text, because all other characters are in ASCII, which is a subset of both ISO-8859-1 and UTF-8. If some other column had ISO-8859-1 characters beyond 0x7f, they would have been rendered as Unicode replacement characters (because the AsciiDoctor tool used to create this manual also reads our example data as UTF-8).
3.3.4. Using a different column data type
The default configuration file derived from a CSV file only uses CHAR
columns.
It can be more efficient — both in terms of storage space and for performance of loading the data into Firebird — to use a different datatype.
In this example, we’ll change the data type of the ID
column from a CHAR(1)
<column name="ID">
<char length="1" encoding="ISO8859_1"/>
</column>
to a SMALLINT
<column name="ID">
<smallint/>
</column>
For a full list of possible data types and their attributes, see datatype.
If you change the data type to anything other than char
, the generated file becomes truly binary.
Having an end-column makes less sense for this type of file, so we recommend removing it.
Taking the original example from Generating an external table from CSV, modifying it like above, and then regenerating the file as described in Regenerating the configuration file results in the following file:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<extTableGenConfig xmlns="https://www.lawinegevaar.nl/xsd/ext-table-gen-1.0.xsd" schemaVersion="2.0">
<externalTable name="DEFAULT_EXTERNAL_TABLE_NAME" byteOrder="LITTLE_ENDIAN">
<columns>
<column name="ID">
<smallint/>
</column>
<column name="Lastname">
<char length="4" encoding="ISO8859_1"/>
</column>
<column name="Firstname">
<char length="7" encoding="ISO8859_1"/>
</column>
<column name="Email">
<char length="19" encoding="ISO8859_1"/>
</column>
</columns>
<tableFile path="C:\FirebirdData\exttables\persons.dat" overwrite="false"/>
</externalTable>
<tableDerivation columnEncoding="ISO8859_1" endColumnType="LF"/>
<csvFile path="C:\FirebirdData\csv\persons.csv" charset="UTF-8" headerRow="true">
<rfc4180CsvParser/>
</csvFile>
<informational>
<ddl>create table "DEFAULT_EXTERNAL_TABLE_NAME" external file 'C:\FirebirdData\exttables\persons.dat' (
"ID" smallint,
"Lastname" char(4) character set ISO8859_1,
"Firstname" char(7) character set ISO8859_1,
"Email" char(19) character set ISO8859_1
);
</ddl>
</informational>
</extTableGenConfig>
For data types other than char
the byteOrder
is important for determining how to write out data.
Absence of byteOrder
will use AUTO
, but when writing the configuration file, it will report the byte-order of the system that created the file.
Specify byteOrder="AUTO"
to use the endianness of the system where ext-table-gen is run.
However, if you generate the table file on a little-endian machine, while your Firebird server runs on a big-endian machine (or vice-versa), you’ll need to set the byteOrder
to match the endianness of the Firebird server.
See externalTable
for more information on byteOrder
.
3.3.5. Using a different data type conversion
Each data type has a default conversion from the CSV string value to a Firebird data type value.
This default conversion is not always suitable.
For example, a CSV with hexadecimal string values for integers cannot be read as an integer
, because the default conversion uses radix 10.
To address this, the data type elements in the configuration have an optional element, converter
, to specify a conversion using a converter step.
We’ll provide examples for each converter step below.
parseBigDecimal
The converter step parseBigDecimal
can be used to specify a locale for converter CSV values to exact numeric values.
For example, to convert a number with .
as the grouping symbol and ,
as the decimal separator (e.g. "12.345,67"
), use the following data type definition:
<decimal precision="9" scale="2">
<converter>
<parseBigDecimal locale="nl-NL"/>
</converter>
</decimal>
The allowed range of values is restricted by the enclosing data type and its precision (or more precisely, the backing datatype derived from the precision).
parseDatetime
The converter step parseDatetime
can be used to specify a different pattern and locale for converting CSV values to datetime values.
For example, to convert a date in d MMMM yyyy
format in Dutch (e.g. 15 juli 2023
), use the following data type definition:
<date>
<converter>
<parseDatetime pattern="d MMMM yyyy" locale="nl-NL"/>
</converter>
</date>
parseIntegralNumber
The converter step parseIntegralNumber
can be used to specify a different radix for converting CSV values to integral numbers.
For example, to convert integer values in radix 16 instead of radix 10, use the following data type definition:
<integer>
<converter>
<parseIntegralNumber radix="16"/>
</converter>
</integer>
The allowed range of values is restricted by the enclosing data type.
3.4. Recommended workflow
The previous examples are all that — we think — you’ll need to create usable external table files.
For one-shot imports, the second example from Generating an external table from CSV can be sufficient.
You may need to use --column-encoding
if you need something other than ISO8859_1
for the external table columns, and --csv-charset
if the CSV file is not encoded in ASCII or UTF-8.
For repeated imports, or where multiple character sets or other data types are needed, we recommend the following steps:
-
Generate a config file using Generating an external table from CSV,
-
Increase columns lengths to expected maximums (see Changing column lengths),
-
If needed, change the encoding of columns (see Changing column character set),
-
Determine if there are more appropriate data types for columns and change the type if needed (see Using a different column data type),
-
Determine if a custom converter is needed (see Using a different data type conversion),
-
Finally, regenerate the config file (see Regenerating the configuration file) for up-to-date DDL.
Use the DDL from the last step to create the external table, and use the external table file it generated to verify the definition by querying the external table from Firebird.
From that point on, the configuration file can be used to transform a CSV file to an external table file.
For example, assuming the tableFile
in the configuration has attribute overwrite
set to true
, and you always want to read the configured file from csvFile
:
ext-table-gen --config-in=C:\FirebirdData\csv\persons.xml
If a configuration contains relative paths, they are resolved against the current working directory. |
If you need to import CSV files with a different name, you can use:
ext-table-gen --config-in=C:\FirebirdData\csv\persons.xml ^ --csv-file=C:\FirebirdData\csv\persons-20230622.csv
If a CSV file changes incompatibly (e.g. too wide columns, or different number of columns) or has inconsistent column counts, the import will fail, but the external table file will have been overwritten with valid row data before the failing row.
3.5. Customizing the CSV Format
By default, ext-table-gen reads CSV files in RFC 4180 format.
It is possible to configure the RFC 4180 parser with the following options:
-
Quote character (used to enclose values, and to escape itself, default is
"
)-
Commandline —
--csv-quote-char=CHAR
-
XML — attribute
quoteChar="CHAR"
on elementrfc4180CsvParser
-
-
Separator (used to separate values, default is
,
)-
Commandline —
--csv-separator=CHAR
-
XML — attribute
separator="CHAR"
on elementrfc4180CsvParser
-
The CHAR
option accepts any single character value, a character mnemonic, or a Unicode escape.
You can also select a different CSV parser (default is RFC_4180
, or the parser specified in --config-in
):
-
Commandline:
--csv-parser={RFC_4180|CUSTOM}
-
XML: element
rfc4180CsvParser
orcustomCsvParser
incsvFile
The custom parser supports more options:
-
Quote character (used to enclose values, default is
"
)-
Commandline —
--csv-quote-char=CHAR
-
XML — attribute
quoteChar="CHAR"
on elementcustomCsvParser
-
-
Separator (used to separate values, default is
,
)-
Commandline —
--csv-separator=CHAR
-
XML — attribute
separator="CHAR"
on elementcustomCsvParser
-
-
Escape character (used to escape characters, default is
\
)-
Commandline —
--csv-escape-char=CHAR
-
XML — attribute
escapeChar="CHAR"
on elementcustomCsvParser
-
-
Ignore leading white space (ignores/skips white space before a value, default is
true
)-
Commandline —
--[no-]ignore-leading-white-space
-
XML — attribute
ignoreLeadingWhiteSpace="{true|false}"
on elementcustomCsvParser
-
-
Ignore quotations (quotation marks are ignored, default is
false
)-
Commandline —
--[no-]ignore-quotations
-
XML — attribute
ignoreQuotations="{true|false}"
on elementcustomCsvParser
-
-
Strict quotes (OpenCSV strict quotes behaviour, default is
false
)-
Commandline —
--[no-]strict-quotes
-
XML — attribute
strictQuotes="{true|false}"
on elementcustomCsvParser
`
-
For example, to import a custom CSV with single quotes, tab as a separator and #
as the escape symbol.
On the commandline:
ext-table-gen --csv-file=persons.csv --table-file=persons.dat ^ --csv-parser=CUSTOM --csv-quote-char=APOS --csv-separator=TAB --csv-escape-char=#
In the configuration file:
<csvFile path="C:\FirebirdData\csv\persons.csv" charset="UTF-8" headerRow="true">
<customCsvParser quoteChar="APOS" separator="TAB" escapeChar="#"/>
</csvFile>
4. Reference
This chapter describes the various configuration options. First it describes the commandline options, followed by the XML configuration file format.
4.1. Commandline options
ext-table-gen provides a number of commandline configuration options to configure its behaviour. We discuss the options here grouped by functional area.
4.1.1. General options
4.1.2. CSV file configuration
CSV file (--csv-file
)
The CSV file option (--csv-file=FILE
) specifies the path of the CSV file.
FILE
is the absolute or relative path to the file.
Relative paths are resolved against the current working directory.
By default, the file will be parsed as RFC 4180 format, but this can be configured with the CSV parser (--csv-parser
) and other commandline options.
We recommend specifying absolute paths, as the configuration file will store the path as specified, which means that if a relative path is used, and the configuration file is then used from a different location, it may attempt to read a non-existent or wrong file.
This option can override the XML config of /extTableGenConfig/csvFile[@path]
.
CSV character set (--csv-charset
)
The CSV character set option (--csv-charset=CHARSET
) specifies the character set of the CSV file.
CHARSET
is any valid Java character set name or alias.
The default is UTF-8
.
This option can override the XML config of /extTableGenConfig/csvFile[@charset]
if explicitly specified.
CSV header (--[no-]csv-header
)
The CSV header boolean option (--[no-]csv-header
) specifies if the first row of the CSV file is a header (--csv-header
, the default), or data (--no-csv-header
).
The default is --csv-header
.
If the first row is a header, and the external table is derived, the values in the first row are used as column names. If an existing external table configuration is used, the header row is skipped.
If the first row is not a header, and the external table is derived, columns are assigned a name in the form of COLUMN_idx
, where idx is the 1-based index of the column (first is COLUMN_1
, second is COLUMN_2
, etc.).
This option can override the XML config of /extTableGenConfig/csvFile[@headerRow]
if explicitly specified.
CSV parser (--csv-parser
)
The CSV parser option (--csv-parser={RFC_4180|CUSTOM}
) configures the CSV parser to use.
The default is RFC_4180
.
Possible options are:
RFC_4180
|
selects RFC 4180 parser (implementation: |
CUSTOM
|
selects the “custom” parser (implementation: |
This option can override the XML config of /extTableGenConfig/csvFile/customCsvParser
or /extTableGenConfig/csvFile/rfc4180CsvParser
if explicitly specified.
Applicable configuration properties from the overridden parser will be retained unless explicitly specified on the commandline.
CSV Quote Character (--csv-quote-char
)
The CSV quote character option (--csv-parser=CHAR
) configures the quote character of the CSV parser.
The default is double quote ("
).
CHAR
can be a single character, a character mnemonic or a Unicode escape.
For the RFC 4180 parser, the quote character is used to quote values (e.g. "value"
→ value
), or to escape itself (e.g. "quote ""in"" value"
→ quote "in" value
).
For the “custom” parser, the quote character is only used to quote values.
This option can override the XML config of /extTableGenConfig/csvFile/customCsvParser[@quoteChar]
or /extTableGenConfig/csvFile/rfc4180CsvParser[@quoteChar]
if explicitly specified.
CSV Separator (--csv-separator
)
The CSV separator option (--csv-separator=CHAR
) configures the separator character of the CSV parser.
The default is comma (,
).
CHAR
can be a single character, a character mnemonic or a Unicode escape.
The separator is the character used to separate columns in a row.
This option can override the XML config of /extTableGenConfig/csvFile/customCsvParser[@separator]
or /extTableGenConfig/csvFile/rfc4180CsvParser[@separator]
if explicitly specified.
CSV Escape Character (--csv-escape-char
)
The CSV escape character option (--csv-escape-char=CHAR
) configures the escape character of the “custom” CSV parser.
The default — for the “custom” parser — is back-slash (\
).
CHAR
can be a single character, a character mnemonic or a Unicode escape.
This option is ignored for the RFC 4180 parser, as that uses the quote character for escaping.
This option can override the XML config of /extTableGenConfig/csvFile/customCsvParser[@escapeChar]
if explicitly specified.
CSV Ignore Leading White Space (--[no-]ignore-leading-white-space
)
The CSV ignore leading white space Boolean option (--[no-]ignore-leading-white-space
) configures if the “custom” CSV parser ignores leading spaces before a (quoted) value.
The default — for the “custom” parser — is true
(--ignore-leading-white-space
).
This option is ignored for the RFC 4180 parser.
This option can override the XML config of /extTableGenConfig/csvFile/customCsvParser[@ignoreLeadingWhiteSpace]
if explicitly specified.
CSV Ignore Quotations (--[no-]ignore-quotations
)
The CSV ignore quotations Boolean option (--[no-]ignore-quotations
) configures if the “custom” CSV parser ignores quotations.
The default — for the “custom” parser — is false
(--no-ignore-quotations
).
This option is ignored for the RFC 4180 parser.
This option can override the XML config of /extTableGenConfig/csvFile/customCsvParser[@ignoreQuotations]
if explicitly specified.
CSV Strict Quotes (--[no-]strict-quotes
)
The CSV strict quotes Boolean option (--[no-]strict-quotes
) configures the “strict quotes” feature of the “custom” CSV parser (ignore characters outside quotes).
The default — for the “custom” parser — is false
(--no-strict-quotes
).
This option is ignored for the RFC 4180 parser.
This option can override the XML config of /extTableGenConfig/csvFile/customCsvParser[@strictQuotes]
if explicitly specified.
4.1.3. External table file configuration
Table file path (--table-file
)
The table file path option (--table-file=FILE
) specifies the path of the external table file.
FILE
is the absolute or relative path to the file.
Relative paths are resolved against the current working directory.
We recommend specifying absolute paths, as the configuration file will store the path as specified, which means that if a relative path is used, and the configuration file is then used from a different location, it may attempt to write to the wrong location or file.
This option can override the XML config of /extTableGenConfig/externalTable/tableFile[@path]
.
If specified, the XML config of /extTableGenConfig/externalTable/tableFile[@overwrite]
is ignored, and defaults to false
if --overwrite-table-file
is not specified.
Overwrite table file (--[no-]overwrite-table-file
)
The overwrite table file boolean option (--[no-]overwrite-table-file
) specifies if an existing external table file can be overwritten (--overwrite-table-file
) or not overwritten (--no-overwrite-table-file
, the default).
The default is --no-overwrite-table-file
.
This option can only be specified if --table-file
is specified.
In other words, it does not override the XML config.
We only allow this override if also specifying the external table file, so you’re aware what file you’re going to overwrite. |
4.1.4. Table configuration and derivation
Table derivation options configure how ext-table-gen derives the external table definition.
Apart from --table-name
and --byte-order
, these options only have effect when a new external table definition is generated (see also Table derivation mode (--table-derivation-mode
)).
Table name (--table-name
)
The table name option (--table-name=NAME
) specifies the name of the external table.
NAME
is the table name.
The value will be converted to a quoted identifier.
When no name is specified through --table-name
or the XML file, the table name DEFAULT_EXTERNAL_TABLE_NAME
is used.
This option can override the XML config of /extTableGenConfig/externalTable[@name]
if explicitly specified.
Byte order (--byte-order
)
The byte order option (--byte-order=ORDER
) specifies the byte order of the external table.
ORDER
can have value:
BIG_ENDIAN
|
big-endian |
LITTLE_ENDIAN
|
little-endian |
AUTO
|
automatically select based on the byte order of the runtime platform |
The default is the effective value of AUTO
(that is, either BIG_ENDIAN
or LITTLE_ENDIAN
).
Explicitly specifying AUTO
will store AUTO
in the configuration file.
This option can override the XML config of /extTableGenConfig/externalTable[@byteOrder]
if explicitly specified.
Column encoding (--column-encoding
)
The column encoding option (--column-encoding=ENCODING
) specifies the column encoding to use for all columns when deriving a table.
ENCODING
is the Firebird name of a supported Firebird character set.
Default value is ISO8859_1
.
This option can override the XML config of /extTableGenConfig/tableDerivation[@columnEncoding]
if explicitly specified.
It does not override the column encoding of existing columns (unless you use an explicit Table derivation mode (--table-derivation-mode
)).
Unsupported character sets
Not all Firebird character sets are supported.
-
OCTETS
andNONE
are not supported, as we need to know which character set to use to encode string data when writing the external table file, so we require that a “real” character set is specified as the column encoding. -
The character set
UNICODE_FSS
is not supported, as we can’t support this variant of UTF-8 with the standard Java character sets. -
Other multibyte character sets (
SJIS_0208
,EUCJ_0208
,KSC_5601
,BIG_5
,GB_2312
,GBK
,CP943C
,GB18030
) are not supported as we’re not sure how to handle padding for these character sets (support may be added in the future after we’ve experimented with this). -
Character sets which have no equivalent Java character set are also not supported (e.g.
NEXT
,CYRL
).
End column type (--end-column
)
The end column type option (--end-column=TYPE
) specifies if an extra column is added with a linebreak, and if so, what linebreak.
TYPE
can have value:
LF
|
linefeed, a.k.a. LF, \n, or 0x0a |
CRLF
|
carriage return + linefeed, a.k.a CR/LF, \r\n, or 0x0d0a |
NONE
|
no end column |
The default is LF
.
An end column is not necessary, but can improve the “human readability” of the external table file by starting each row on a new line.
If this is no concern for you, use --end-column=NONE
to reduce file size and overhead of writing and reading this column.
This option can override the XML config of /extTableGenConfig/tableDerivation[@endColumnType]
if explicitly specified.
It does not override the current end-column of /extTableGenConfig/extTable/columns/endColumn
(unless you use an explicit Table derivation mode (--table-derivation-mode
)).
Table derivation mode (--table-derivation-mode
)
The table derivation mode option (--table-derivation-mode=MODE
) specifies when an external table definition should be derived from a CSV file.
MODE
can have value:
INCOMPLETE
|
Only derive the external table when the current configuration is incomplete. |
ALWAYS
|
Always derive the external table, even if the current configuration defines a valid external table. |
NEVER
|
Never derive the external table, always use the current configuration. Fails if the current configuration is incomplete or invalid. Specifying |
The formal default is INCOMPLETE
, but this is only applied when --config-in
is not specified, and without an existing configuration, INCOMPLETE
and ALWAYS
behave the same.
When --config-in
is specified, the default is NEVER
.
This default was chosen to avoid accidentally deriving a new table configuration and creating an external table file which does not match the existing external table used by Firebird server (e.g. because the config file was modified incorrectly, or because ext-table-gen incorrectly or due to implementation changes suddenly considers the configuration as incomplete or invalid).
The assumption is that if you have an existing configuration file, it is complete and correct.
If that is not the case, or if you want to derive the external table definition anew, you must explicitly declare this with --table-derivation-mode
.
4.1.5. Configuration file management
Configuration input (--config-in
)
The configuration input (--config-in=FILE
) specifies the path of the configuration file to read.
FILE
is the absolute or relative path to the file to read.
The configuration input is used as the initial configuration, and can be modified by the other commandline options.
If --config-in
is not specified, the default values are applied in the absence of a commandline option.
Configuration output (--config-out
)
The configuration output (--config-out=FILE
) specifies the path of the configuration XML file to write.
FILE
is the absolute or relative path to the file to read.
If --config-out
is not specified, no configuration file is written.
By default, an existing configuration file is not overwritten.
Specify --overwrite-config
to overwrite an existing configuration file.
If the configuration file exists and --overwrite-config
is not specified, this is reported on the standard error output.
The export of the CSV to an external table will proceed as normal.
When both --config-in
and --config-out
are specified without other options (except --overwrite-config
), generally a file with the same configuration will be written, except for:
-
New configuration options introduced in ext-table-gen after the configuration input was created
-
The content of the informational element is always regenerated, so if it was outdated (e.g. see Regenerating the configuration file), or ext-table-gen changed how the DDL or other information is generated, it may change.
When both --config-in
and --config-out
are specified with other options, the values of those options are reflected in the new configuration file.
Except for the table name (/extTableGenConfig/externalTable[@name]
) and table file (/extTableGenConfig/externalTable/tableFile
), the externalTable
element will not change unless --table-derivation-mode=ALWAYS
is specified, or the existing configuration is considered invalid or incomplete and --table-derivation-mode=INCOMPLETE
is specified.
Overwrite configuration (--[no-]overwrite-config
)
The overwrite configuration boolean option (--[no-]overwrite-config
) specifies whether the configuration output (--config-out
) can be overwritten if it exists (--overwrite-config
) or not overwritten (--no-overwrite-config
, the default).
The default is --no-overwrite-config
.
This option can only be specified if --config-out
is specified.
The option --no-overwrite-config
is technically unnecessary, but can be specified for explicitness (and exists for consistency with other boolean options).
4.2. XML configuration
The XML configuration file format uses an XSD to define its structure. This XSD can be found in XML schemas. This section follows the structure of the XSD.
The root of the XML is extTableGenConfig
.
None
-
externalTable
— required -
tableDerivation
— optional -
csvFile
— optional -
informational
— optional
4.2.1. externalTable
The externalTable
element defines the name, columns and table file of the external table.
The columns it contains are used by ext-table-gen to validate the columns in the input CSV file (e.g. number of columns, length of column values), and format the output of the external table file, and generate the DDL of the table.
name
|
Name of the table (used for the DDL; required) |
||||||
byteOrder
|
Byte order of the table (optional) Possible values:
If the attribute is absent, |
columns
The element columns
is a sequence of one or more column
elements, defining the columns of the external table.
None
column
The column
element defines a normal data column.
name
|
Name of the column (used for the DDL; required) |
-
Element of type datatype — required
A datatype is a placeholder for an element corresponding to a specific Firebird data type.
None, though subtypes may define attributes
-
converter
— optional
At this time, ext-table-gen supports the following column data types:
bigint
The bigint
elements represents the Firebird datatype BIGINT
.
None
-
converter
— optional
char
The char
element represents the Firebird datatype CHAR
.
length
|
Integer in the range [1, 32767] with length in Unicode codepoints (required). |
encoding
|
Encoding (character set) of the column, using Firebird character set names (required). |
-
converter
— optional
date
The date
element represents the Firebird datatype DATE
.
The Firebird DATE
type only accepts values between 0001-01-01 and 9999-12-31.
The default conversion parses the string using the ISO 8601 calendar date format (pattern yyyy-MM-dd
, or named pattern ISO_LOCAL_DATE
).
None
-
converter
— optional
decimal
The decimal
element represents the Firebird datatype DECIMAL
.
The default conversion parses the string using the rules described in the constructor BigDecimal(String)
.
Contrary to the parseBigDecimal
converter, this converter can also parse strings in scientific notation.
precision
|
Integer in the range [1, 38] for the precision in decimal digits (required). |
||||||||||||||||
scale
|
Integer in the range [0, scale] for the scale in decimal digits (required). |
||||||||||||||||
roundingMode
|
Rounding mode applied when reducing the scale of a parsed number with more digits after the decimal point than allowed by scale (optional, defaults to
|
The precision is used only to determine the backing datatype, and is not used to enforce length limits.
Just like in Firebird itself, the allowed range of values is determined by the backing datatype.
For DECIMAL
, that means:
1 <= precision <= 9
|
|
10 <= precision <= 18
|
|
19 <= precision <= 38
|
|
-
converter
— optional
int128
The int128
element represents the Firebird datatype INT128
.
None
-
converter
— optional
integer
The integer
element represents the Firebird datatype INTEGER
.
None
-
converter
— optional
numeric
The numeric
element represents the Firebird datatype NUMERIC
.
The default conversion parses the string using the rules described in the constructor BigDecimal(String)
.
Contrary to the parseBigDecimal
converter, this converter can also parse strings in scientific notation.
precision
|
Integer in the range [1, 38] for the precision in decimal digits (required). |
||||||||||||||||
scale
|
Integer in the range [0, scale] for the scale in decimal digits (required). |
||||||||||||||||
roundingMode
|
Rounding mode applied when reducing the scale of a parsed number with more digits after the decimal point than allowed by scale (optional, defaults to
|
The precision is used only to determine the backing datatype, and is not used to enforce length limits.
Just like in Firebird itself, the allowed range of values is determined by the backing datatype.
For NUMERIC
, that means:
1 <= precision <= 4
|
|
5 <= precision <= 9
|
|
10 <= precision <= 18
|
|
19 <= precision <= 38
|
|
-
converter
— optional
smallint
The smallint
element represents the Firebird datatype SMALLINT
.
None
-
converter
— optional
time
The time
elements represents the Firebird datatype TIME [WITHOUT TIME ZONE
^].
The Firebird TIME
type only accepts values between 00:00:00.0000 and 23:59:59.9999.
None
-
converter
— optional
timestamp
The timestamp
elements represents the Firebird datatype TIMESTAMP [WITHOUT TIME ZONE
^].
See also the limitations of date
and time
.
For additional flexibility, the time part is not required.
For example if you use a parseDatetime
with a pattern which only provides date information, but doesn’t provide time information (e.g. yyyy-MM-dd
), it will save the date with time set to 00:00:00.0000.
None
-
converter
— optional
converter
The converter
element — if present — defines a conversion from the CSV string value to a Firebird data type value.
If absent, a default conversion will be applied, specific to the column data type.
None
-
One of the converter steps listed below — required
The name converter step may seem to imply that a converter can consist of multiple steps chained together. That is currently not supported, but it is something that may be introduced in a future version. In the actual program code, a converter step is also called a converter, but in this manual we wanted to prevent overloading the term as it already refers to the |
parseBigDecimal
Parses a string using the Java NumberFormat
instance for the specified locale to a BigDecimal
.
This converter step is only valid in decimal
and numeric
.
Contrary to the default converter of numeric
and decimal
, this converter does not parse strings in scientific notation.
locale
|
Locale as a BCP 47 language tag (required) |
None
parseDatetime
Parses a string using the specified pattern and locale to a datetime value.
pattern
|
Datetime pattern, or one of the pre-defined formatter names (required) The pattern format uses the syntax described in In addition, the following custom pre-defined formatter names are available:
|
locale
|
Locale as a BCP 47 language tag (optional) Generally only relevant to parse datetime strings with localized names for months and days. |
None
parseIntegralNumber
Parses a string using a specified radix to an integral number of the type of the enclosing datatype.
radix
|
Radix for parsing a string to an integral number.
Default |
None
endColumn
The endColumn
elements represents an extra column which is added to the external table to introduce a linebreak after each row in the external table file.
This linebreak is primarily intended to make the file more “human-readable”, and is not needed for Firebird itself.
type
|
Type of linebreak (required)
There is no value equivalent to |
None
tableFile
The element tableFile
specifies the external table file.
path
|
Path of the external table file (required). |
overwrite
|
Boolean specifying if ext-table-gen is allowed to overwrite an existing file (optional; defaults to |
None
4.2.2. tableDerivation
The element tableDerivation
stores the configuration used to derive an external table.
This is only used when ext-table-gen is used with --table-derivation-mode=ALWAYS
, or --table-derivation-mode=INCOMPLETE
and the current configuration is considered incomplete or invalid.
Be aware, it does not necessarily represent the configuration used to derive the current table described by externalTable
, but instead the last configuration stored using --config-out
.
columnEncoding
|
Encoding (character set) to apply to columns when deriving an external table (optional). |
||||||
endColumnType
|
Type of linebreak (optional)
Absence of the attribute implies |
None
The commandline option Table derivation mode (--table-derivation-mode
) is intentionally not represented in the configuration file.
If a configuration file is specified (using --config-in
), ext-table-gen defaults to NEVER
, and needs to be explicitly overridden by the commandline to derive the external table.
4.2.3. csvFile
The element csvFile
specifies the CSV file to use, and how to read it.
path
|
Path of the CSV file (required). |
charset
|
Character set of the CSV file (required). |
headerRow
|
Boolean specifying if the first row of the CSV file is a header row or a data row (optional; defaults to |
-
CSV parser configuration — optional
If absent, behaves as if an emptyrfc4180CsvParser
is specified.
One of:
rfc4180CsvParser
The element rfc4180CsvParser
specifies use of the RFC 4180 parser (implementation: com.opencsv.RFC4180Parser
) and its configuration.
quoteChar
|
Quote character (optional, defaults to |
separator
|
Separator (optional, defaults to |
None
customCsvParser
The element customCsvParser
specifies use of the “custom” parser (implementation: com.opencsv.CSVParser
) and its configuration.
quoteChar
|
Quote character (optional, defaults to |
separator
|
Separator (optional, defaults to |
escapeChar
|
Escape character (optional, defaults to |
ignoreLeadingWhiteSpace
|
Ignore leading white space before (quoted) value (optional, defaults to |
ignoreQuotations
|
Ignores quotes (optional, defaults to |
strictQuotes
|
Strict quotes, ignores values outside of quotes (optional, defaults to |
None
4.2.4. informational
The element informational
contains data which is informational only, and does not represent configuration used by ext-table-gen.
Be aware that if the configuration file was changed outside ext-table-gen, the contents of informational
may be out-of-sync.
See Regenerating the configuration file on how to regenerate the config file.
None
-
ddl
— required
ddl
The ddl
element contains the Firebird CREATE TABLE
statement derived from externalTable
.
None
None
4.3. Common Configuration
The following definitions are shared between commandline and XML configuration
4.3.1. Character Mnemonics
Character mnemonics are aliases for characters. They can be used for characters that are unprintable (e.g. the tab character) or are hard to use on the commandline or XML (e.g. the double quote character).
Supported mnemonics are:
TAB
|
horizontal tab (Unicode U+0009) |
SPACE
|
space (Unicode U+0020) |
QUOT
|
quotation mark/double quote ( |
APOS
|
apostrophe/single quote ( |
GRAVE
|
grave accent/back tick ( |
If you’re missing a mnemonic, use Unicode Escape. If you think we need to add another mnemonic, please create an issue on https://github.com/mrotteveel/ext-table-gen. |
4.3.2. Unicode Escape
The Unicode escape can be used to represent any Unicode character supported by UTF-16[5].
Its format is:
<escape> = U+<h><h><h><h> | u+<h><h><h><h> <h> = hexadecimal digit, one of 0-9, A-F, a-f
For example:
U+20AC -- € (EURO SIGN)
Appendices
Appendix A: XML schemas
The following is the XML schema used by ext-table-gen for its configuration files.
A.1. ext-table-gen-1.0.xsd
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!-- SPDX-FileCopyrightText: Copyright 2023 Mark Rotteveel -->
<!-- SPDX-License-Identifier: Apache-2.0 -->
<!-- NOTE: Updates to /xs:schema[@version] also need to be reflected in ConfigMapper.CURRENT_SCHEMA_VERSION -->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="https://www.lawinegevaar.nl/xsd/ext-table-gen-1.0.xsd"
xmlns="https://www.lawinegevaar.nl/xsd/ext-table-gen-1.0.xsd"
elementFormDefault="qualified"
xmlns:jaxb="https://jakarta.ee/xml/ns/jaxb"
jaxb:version="3.0"
version="2.0">
<xs:element name="extTableGenConfig">
<xs:complexType>
<xs:sequence>
<xs:element name="externalTable" type="ExternalTableType"/>
<xs:element name="tableDerivation" type="TableDerivationType" minOccurs="0"/>
<xs:element name="csvFile" type="CsvFileType" minOccurs="0"/>
<xs:element name="informational" type="InformationalType" minOccurs="0"/>
</xs:sequence>
<xs:attribute name="schemaVersion" type="xs:token" use="required"/>
</xs:complexType>
</xs:element>
<xs:complexType name="ExternalTableType">
<xs:sequence>
<xs:element name="columns" type="ColumnListType"/>
<xs:element name="tableFile" type="TableFileType" minOccurs="0"/>
</xs:sequence>
<xs:attribute name="name" type="xs:string" use="required"/>
<xs:attribute name="byteOrder" type="ByteOrderEnum"/>
</xs:complexType>
<xs:complexType name="TableDerivationType">
<xs:attribute name="columnEncoding" type="encoding"/>
<xs:attribute name="endColumnType" type="EndColumnEnum"/>
<!--
NOTE: The tableDerivationMode is intentionally not part of this configuration, and will default to NEVER
-->
</xs:complexType>
<xs:complexType name="ColumnListType">
<xs:sequence>
<xs:element name="column" type="ColumnType" maxOccurs="unbounded">
<xs:annotation>
<xs:appinfo>
<jaxb:property name="columns"/>
</xs:appinfo>
</xs:annotation>
</xs:element>
<xs:element name="endColumn" type="EndColumnType" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="ColumnType">
<xs:sequence>
<xs:element ref="datatype"/>
</xs:sequence>
<xs:attribute name="name" type="xs:string"/>
</xs:complexType>
<xs:element name="datatype" type="DatatypeType" abstract="true"/>
<xs:element name="char" type="CharType" substitutionGroup="datatype"/>
<xs:element name="smallint" type="DatatypeType" substitutionGroup="datatype"/>
<xs:element name="integer" type="DatatypeType" substitutionGroup="datatype"/>
<xs:element name="bigint" type="DatatypeType" substitutionGroup="datatype"/>
<xs:element name="int128" type="DatatypeType" substitutionGroup="datatype"/>
<xs:element name="date" type="DatatypeType" substitutionGroup="datatype"/>
<xs:element name="time" type="DatatypeType" substitutionGroup="datatype"/>
<xs:element name="timestamp" type="DatatypeType" substitutionGroup="datatype"/>
<xs:element name="numeric" type="FixedPointType" substitutionGroup="datatype"/>
<xs:element name="decimal" type="FixedPointType" substitutionGroup="datatype"/>
<xs:complexType name="DatatypeType">
<xs:sequence>
<xs:element name="converter" type="ConverterType" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="CharType">
<xs:complexContent>
<xs:extension base="DatatypeType">
<xs:attribute name="length" use="required">
<xs:simpleType>
<xs:restriction base="xs:int">
<xs:minInclusive value="1"/>
<xs:maxInclusive value="32767"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="encoding" type="encoding" use="required"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>
<xs:simpleType name="encoding">
<xs:annotation>
<xs:appinfo>
<jaxb:typesafeEnumClass map="false"/>
</xs:appinfo>
</xs:annotation>
<xs:restriction base="xs:string">
<!--Not supported-->
<!--<xs:enumeration value="NONE"/>-->
<!--<xs:enumeration value="OCTETS"/>-->
<xs:enumeration value="ASCII"/>
<!--We don't want to implement writing only max 3 bytes and correct UNICODE_FSS-->
<!--<xs:enumeration value="UNICODE_FSS"/>-->
<xs:enumeration value="UTF8"/>
<!--TODO Need to know how to handle spaces-->
<!--<xs:enumeration value="SJIS_0208"/>-->
<!--<xs:enumeration value="EUCJ_0208"/>-->
<xs:enumeration value="DOS737"/>
<xs:enumeration value="DOS437"/>
<xs:enumeration value="DOS850"/>
<xs:enumeration value="DOS865"/>
<xs:enumeration value="DOS860"/>
<xs:enumeration value="DOS863"/>
<xs:enumeration value="DOS775"/>
<xs:enumeration value="DOS858"/>
<xs:enumeration value="DOS862"/>
<xs:enumeration value="DOS864"/>
<xs:enumeration value="ISO8859_1"/>
<xs:enumeration value="ISO8859_2"/>
<xs:enumeration value="ISO8859_3"/>
<xs:enumeration value="ISO8859_4"/>
<xs:enumeration value="ISO8859_5"/>
<xs:enumeration value="ISO8859_6"/>
<xs:enumeration value="ISO8859_7"/>
<xs:enumeration value="ISO8859_8"/>
<xs:enumeration value="ISO8859_9"/>
<xs:enumeration value="ISO8859_13"/>
<!--TODO Need to know how to handle spaces-->
<!--<xs:enumeration value="KSC_5601"/>-->
<xs:enumeration value="DOS852"/>
<xs:enumeration value="DOS857"/>
<xs:enumeration value="DOS861"/>
<xs:enumeration value="DOS866"/>
<xs:enumeration value="DOS869"/>
<xs:enumeration value="WIN1250"/>
<xs:enumeration value="WIN1251"/>
<xs:enumeration value="WIN1252"/>
<xs:enumeration value="WIN1253"/>
<xs:enumeration value="WIN1254"/>
<!--TODO Need to know how to handle spaces-->
<!--<xs:enumeration value="BIG_5"/>-->
<!--<xs:enumeration value="GB_2312"/>-->
<xs:enumeration value="WIN1255"/>
<xs:enumeration value="WIN1256"/>
<xs:enumeration value="WIN1257"/>
<xs:enumeration value="KOI8R"/>
<xs:enumeration value="KOI8U"/>
<xs:enumeration value="WIN1258"/>
<xs:enumeration value="TIS620"/>
</xs:restriction>
</xs:simpleType>
<xs:complexType name="FixedPointType">
<xs:complexContent>
<xs:extension base="DatatypeType">
<xs:attribute name="precision" use="required">
<xs:simpleType>
<xs:restriction base="xs:int">
<xs:minInclusive value="1"/>
<xs:maxInclusive value="38"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="scale" use="required">
<xs:simpleType>
<xs:restriction base="xs:int">
<xs:minInclusive value="0"/>
<xs:maxInclusive value="38"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="roundingMode" type="roundingMode" default="HALF_UP"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>
<xs:simpleType name="roundingMode">
<xs:annotation>
<xs:appinfo>
<jaxb:typesafeEnumClass map="false"/>
</xs:appinfo>
</xs:annotation>
<xs:restriction base="xs:string">
<xs:enumeration value="CEILING"/>
<xs:enumeration value="DOWN"/>
<xs:enumeration value="FLOOR"/>
<xs:enumeration value="HALF_DOWN"/>
<xs:enumeration value="HALF_EVEN"/>
<xs:enumeration value="HALF_UP"/>
<xs:enumeration value="UNNECESSARY"/>
<xs:enumeration value="UP"/>
</xs:restriction>
</xs:simpleType>
<xs:complexType name="ConverterType">
<xs:sequence>
<xs:element ref="converterStep"/>
</xs:sequence>
</xs:complexType>
<xs:element name="converterStep" type="ConverterStepType" abstract="true"/>
<xs:element name="parseIntegralNumber" type="ParseIntegralType" substitutionGroup="converterStep"/>
<xs:element name="parseDatetime" type="ParseDatetimeType" substitutionGroup="converterStep"/>
<xs:element name="parseBigDecimal" type="ParseBigDecimalType" substitutionGroup="converterStep"/>
<xs:complexType name="ConverterStepType" abstract="true"/>
<xs:complexType name="ParseIntegralType">
<xs:complexContent>
<xs:extension base="ConverterStepType">
<xs:attribute name="radix" default="10">
<xs:simpleType>
<xs:restriction base="xs:int">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="36"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:extension>
</xs:complexContent>
</xs:complexType>
<xs:complexType name="ParseDatetimeType">
<xs:complexContent>
<xs:extension base="ConverterStepType">
<xs:attribute name="pattern" type="xs:normalizedString" use="required"/>
<xs:attribute name="locale" type="xs:language"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>
<xs:complexType name="ParseBigDecimalType">
<xs:complexContent>
<xs:extension base="ConverterStepType">
<xs:attribute name="locale" type="xs:language" use="required"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>
<xs:complexType name="EndColumnType">
<xs:attribute name="type" use="required">
<xs:simpleType>
<xs:restriction base="EndColumnEnum">
<xs:enumeration value="LF"/>
<xs:enumeration value="CRLF"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:complexType>
<xs:simpleType name="EndColumnEnum">
<xs:annotation>
<xs:appinfo>
<jaxb:typesafeEnumClass map="false"/>
</xs:appinfo>
</xs:annotation>
<xs:restriction base="xs:string">
<xs:enumeration value="LF"/>
<xs:enumeration value="CRLF"/>
<xs:enumeration value="NONE"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="ByteOrderEnum">
<xs:annotation>
<xs:appinfo>
<jaxb:typesafeEnumClass map="false"/>
</xs:appinfo>
</xs:annotation>
<xs:restriction base="xs:string">
<xs:enumeration value="BIG_ENDIAN"/>
<xs:enumeration value="LITTLE_ENDIAN"/>
<xs:enumeration value="AUTO"/>
</xs:restriction>
</xs:simpleType>
<xs:complexType name="TableFileType">
<xs:attribute name="path" type="xs:string" use="required"/>
<xs:attribute name="overwrite" type="xs:boolean" default="false"/>
</xs:complexType>
<xs:complexType name="CsvFileType">
<xs:sequence>
<xs:element ref="csvParser" minOccurs="0"/>
</xs:sequence>
<xs:attribute name="path" type="xs:string" use="required"/>
<xs:attribute name="charset" type="xs:string" use="required"/>
<xs:attribute name="headerRow" type="xs:boolean" default="true"/>
</xs:complexType>
<xs:element name="csvParser" type="CsvParserType" abstract="true"/>
<xs:element name="rfc4180CsvParser" type="Rfc4180CsvParserType" substitutionGroup="csvParser"/>
<xs:element name="customCsvParser" type="CustomCsvParserType" substitutionGroup="csvParser"/>
<xs:complexType name="CsvParserType" abstract="true">
<xs:attribute name="quoteChar" type="xs:string"/>
<xs:attribute name="separator" type="xs:string"/>
</xs:complexType>
<xs:complexType name="Rfc4180CsvParserType">
<xs:complexContent>
<xs:extension base="CsvParserType"/>
</xs:complexContent>
</xs:complexType>
<xs:complexType name="CustomCsvParserType">
<xs:complexContent>
<xs:extension base="CsvParserType">
<xs:attribute name="escapeChar" type="xs:string"/>
<xs:attribute name="ignoreLeadingWhiteSpace" type="xs:boolean"/>
<xs:attribute name="ignoreQuotations" type="xs:boolean"/>
<xs:attribute name="strictQuotes" type="xs:boolean"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>
<xs:complexType name="InformationalType">
<xs:sequence>
<xs:element name="ddl" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:schema>