The column element
Columns in C-Phrase correspond exactly to the SQL notion of a column. From the
Products
table from northwind.cphrase
, consider the column element:
<column id="unitprice" type="int" notnull="true" lex="unit price,unit prices,price,prices" units="USD"/>
More abstractly column
elements have the following DTD-based definition:
<!ELEMENT column (value|set_value|max|min|map)*> <!ATTLIST column id CDATA #REQUIRED type CDATA #REQUIRED lex CDATA #IMPLIED select CDATA #IMPLIED properties CDATA #IMPLIED notnull CDATA #IMPLIED default CDATA #IMPLIED units CDATA #IMPLIED alive CDATA #IMPLIED description CDATA #IMPLIED>
id is the exact identifier of the column in the underlying database table (e.g.
unitprice
in table products
).
type is the type of the column. In C-Phrase this may be one of:
text
, int
, real
, memo
, boolean
,
date
, timestamp
or file
.
-
text
covers all the variarations of CHAR arrays, VARCHARs, etc. -
int
covers all the variations of integers, including SERIALS. -
real
covers all the variations of floating point numbers. -
memo
represents CLOBS (Character Large Objects). -
boolean
is boolean and is literally either "true" or "false". -
date
is a standard date expression (e.g. 2019-01-20) -
timestamp
is a DATE TIME representation (e.g. 2019-01-20T12:33:32 or 2019-01-20T12:33) -
file
is actually stored as a string that names a file on the C-Phrase host's filesystem. Note the file type is only used in the stand-alone mode and the file is specifically the subdirectory./static/files
of the configuration directory.
lex is a comma separated set of phrases that are attached to the column. The first phrase gives the preferred singular form (e.g. 'unit price') while the second gives the preferred plural form (e.g. 'unit prices'). Additional phrases after the second can then be specified in any order. We say 'preferred' because C-Phrase will use the first two phrases when generating paraphrases of queries.
select determines whether the column can be selected (in the SQL sense)
via natural language. Defaulting to "true"
, the value "false"
means that select is disabled.
Note if you wish to update attribute values, this must be "true"
. Typically, foreign keys and attributes not
of interest to users will have select turned off.
properties gives a specification of what types of conditions may be posed over the column as well as how references to column values should be handled. This specification is a comma separated sequence of properties with special meanings defined below.
notnull defaults to "false" but, if "true", signals that the value can not
be NULL
in the underlying database.
default gives the SQL-based expression which expresses what the default value of the
column is. In SQLite for literals, this is just the bare value, but for expressions this must be enclosed in parentheses.
So default="2"
for attribute priority, but default="(date('now'))"
for the column posted.
units is the measure associated with a numerical value. The value of "USD" will present the numbers as currency values in US dollars.
Sub-elements
Columns may contain literal values, set values, as well as min/max values as sub-elements. There is also a special, but only rarely used, map element which builds synonyms to column values via SQL executed at load-time from the underlying database.
Column Properties
Much of the nuance in C-Phrase is captured by setting specific properties on columns. Here we describe how to set these properties and what type of behaviors they specify.
All the properties on columns are atomic and are turned on or off via +property
or
-property
respectively. For example:
<column id="date" type="date" properties="+inequalities,+value-stands-for-column,-superlatives"/>
A properties
specification is processed from left to right adding or taking away a given property.
This is all done relative to the default column properies set in the
parameters of an NLI (specifically see the Lexicon parameters).
A special keyword "none" turns off all the properties which may then be followed by a turning some properties
back on. For example:
<column id="id" type="int" properties="none,+value-stands-for-column,+equalities"/>
Condition properties
Condition properties control what type of natural language conditions can be expressed over the column. This usually determines what type of conditions can be built in the WHERE clause of the query, but it also includes whether you can group on certain columns.
- comparisons [>]; lets you specifcy simple conditions of greater than or less than on the column with a literal value over the column. For example in 'length greather than 500', 'name greater than "m", 'date after 2020-01-01'
- equalities [=] lets you specify simple equality condition on the column with literal values. For example 'id equal to 7', 'name = John Smith', 'name is "John Smith"', 'date is 2018-02-23'. Note that in the case of 'name = John Smith', the attribute name will need to be materialized (see below) for C-Phrase to recognize it. Likewise for 'name is "John Smith"', quotes will need to be turned on for the name column for C-Phrase to accept the condition.
- grouping [g] lets you specify that the column can be grouped over. For example if the column name of the table city has grouping on, then C-Phrase would recognize 'number of cities group by state name'.
- inequalities [!=] lets you specify a simple inequality condition on the column with literal values. So for example you could specify 'id not equal to 7'.
- like [l] lets you use the LIKE operator for text columns to test for case sensative text containment. Note that you must quote material to use in a LIKE condition. (e.g. 'restaurant review containing "excellent"')
- nulls [n] (e.g. 'without a picture') lets you use spefify that a column is NULL or is NOT NULL. (e.g. 'restaurant without a review', 'restaurant with a review', 'restaurant review null', etc.)
- quotes [q] lets you specify liter text values by encosing in quotes. So in the case that we did not materialize customer name we could ask for 'name is "Bob Smith"'. This is also used when we want to insert a new record in the database 'add food with name "lemon pie"'.
- sets [{}]
lets users specify a list of literals, including NULL, resulting in an
IN
orNOT IN
SQL condition. For example 'status new, open or null'. - superlatives [s] lets users specify superlative type questions over numeric or temporal columns. (e.g. 'the longest river','the earliest appointment',etc.)
- totalling [t] lets users place conditions on summing aggregations within a question. (e.g. 'customers with transactions amounts totalling more than 1000.') This property is always off by default.
The following table shows the set of condition properties applicable for each type:
NUMERIC (int,real) |
TEXT | BOOLEAN | TEMPORAL (date,timestamp) |
MEMO | FILE |
---|---|---|---|---|---|
comparisons, equalities, grouping, inequalities, nulls, sets, superlatives, totalling |
comparisons, equalities, grouping, inequalities, like, nulls, quotes, sets |
equalities, inequalities, nulls |
comparisons, equalities, grouping, inequalities, nulls, sets, superlatives |
like, nulls |
equalities, inequalities, nulls |
Reference properties
Reference properties determine how columns and their values may be referenced by the user.
- aggregation [a] The presence of this property only makes sense for numerical columns. It indicates that you can ask for things such as sums, averages, mins and maxes. (e.g. 'average population of cities')
- column-stands-for-table [C2T]
allows you to refer to the table by just mentioning this attribute.
A common use of this is when you have a fact table for sales and a date dimension table which
has
quarter
as an attribute. You would like to ask "total revenue sales per quarter" rather than needing to fully specify, "total revenue sales per date-dimension quarter". In this case quarter should have the property+column-stands-for-table
. This is an example of ellipsis in C-Phrase. This property is always off by default. - completions [c] determines if materialized values of a text column will be used in dynamic word completion in the NLI Portal. For large databases you may receive a warning if too many completions are being loaded. In such applications you should turn off completions for every column via the NLI parameters, or you should turn completions off on a column by column process.
- materialization [m] controls whether text-based column values are loaded into C-Phrase at start-up. If so, these literal values may be used by the user without quotes. However, in cases of very large databases, this property often needs to be turned off for certain columns or the administrator must run a larger host that has more RAM.
- menu [menu] makes attribute values appear in a menu under inserting or editing records in the NLI Portal. This property is always off by default.
-
plurals [pl]
adds the plural forms in for materialized values. So if there is an attribute
name
in a table for foods and a value 'taco', one can ask 'calories of tacos'. - project [p] means that the column will be projected in any table type query over the table.
By default this is always on. See for example
Eatings.id
inPIM.cphrase
for a case in which you turn off projection. This property is always on by default. - value-stands-for-column [V2C] allows you to skip naming the column when
specifying conditions. For example value-stands-for-column on the
color
column would allow you say "red boxes" instead of requiring "boxes color red". This is an example of ellipsis in C-Phrase. - value-stands-for-table [V2T]
allows values to
stand for the whole table. For example assume we have an employee table with the attribute
type
which can be either 'permanent' or 'contractor'. If the attribute type is undervalue-stands-for-table
then one can leave out the table employee in questions. Thus you could ask "contractors hired after 2020-01-01". This is an example of ellipsis in C-Phrase. This property is always off by default.
The following table shows the set of reference properties applicable for each type:
NUMERIC (int,real) |
TEXT | BOOLEAN | TEMPORAL (date,timestamp) |
MEMO | FILE |
---|---|---|---|---|---|
aggregation, column-stands-for-table, project, value-stands-for-column |
column-stands-for-table, completions, materialization, menu, plurals, project, value-stands-for-column, value-stands-for-table |
project |
column-stands-for-table, project, value-stands-for-column |
column-stands-for-table, project |
column-stands-for-table, project, value-stands-for-column |