NLI Parameters
The .py file that sits in a configuration directory specifies the parameter settings of an NLI. Parameters are grouped into six types: Logging, NLU Processing, SQL, NLI Portal, Lexicon and Open AI. Here is the initial .py file that is created for a standalone configuration:
# *** Logging *** cphrase.log_to_file = True # print log messages to .log file cphrase.log_sql = False # log all SQL queries/answers to/from database cphrase.log_nlu = False # deeply log natural language analysis cphrase.log_sql_idm = True # Log all SQL insert,deletes and updates to special file. # *** NLI Portal Settings *** cphrase.intro_message = '<p> Welcome to the NLI Portal.</p>' cphrase.batch_loader = False # Does a batch loader appear on the NLI Portal cphrase.sql_names_in_explorer = False # present SQL literals for the names of relations and attributes. cphrase.rows_displayed_per_page = 10 # The maximum number of rows displayed for each page cphrase.always_offer_csv_exports = True # always generate .csv files for table answers. cphrase.show_sql = True # Can the user inspect SQL in the NLI Portal cphrase.update_tuple = True # Does the user get the chance to update tuples in relation selects. cphrase.data_column_width_limit = 200 # Maximum size of string presented in tables. This needs to be high for HTML links. # *** SQL *** cphrase.table_limit = None # The maximum number of rows displayed in table result cphrase.list_limit = None # The maximum number of values displayed in list result cphrase.groups_limit = 200 # The limit on number of chart items cphrase.distinct_on_tuple_queries = True # Make generated SQL over tuple queries DISTINCT. cphrase.bracket_tables_and_cols = True # Should all SQL use bracketing for table and attribute names. cphrase.generate_fk_names = True # Does this system attempt to reach to generate full names for foreign keys in SQL cphrase.self_joins = True # Can the generated SQL include multiple aliased mentions of the same table. cphrase.hard_foreign_keys = True # Assume that foreign keys always hold. cphrase.allow_dangling_variables = True # Allow empty foreign key following # *** NLU Processing *** cphrase.nlu_timeout = None # The maximum number of seconds of analysis for a user utterance # -- word-ordering sensitivity cphrase.command_can_bind_to_any_table = False # SELECT *, UPDATE, INSERT and DELETE can match any table in utterance cphrase.command_can_bind_to_any_column = False # SELECT, UPDATE can match any column in utterance cphrase.operator_can_bind_to_any_table = False # NOT, EVERY, etc can match any table in utterance cphrase.operator_can_bind_to_any_column = False # GROUP BY, ORDER BY, etc. can match any attribute in utterance cphrase.value_can_bind_to_any_column = False # Values (e.g. 'red',12,etc) can match any column in utterance # *** Lexicon *** cphrase.default_column_select = True # are columns, by default, individually SELECTable in the SQL sense. cphrase.foreign_keys_selectable = False # Are foreign keys able to be selected. cphrase.plural_columns_and_tables = True # Generate default plural forms for tables and columns cphrase.lex_shadows_element = False # When elements have lexes, they shadow the underlying element (table, attribute) name. cphrase.default_focus_table = None # The default relation users are asking about. For example 'items' in e-commerce. cphrase.us_date_format = False # Uses US type date format. mm/dd/yyyy and mm/dd/yy # -- Default column condition properties cphrase.default_column_comparisons = True cphrase.default_column_equalities = True cphrase.default_column_grouping = True cphrase.default_column_inequalities = True cphrase.default_column_text_like = False cphrase.default_column_memo_like = True cphrase.default_column_nulls = True cphrase.default_column_quotes = False cphrase.default_column_sets = True cphrase.default_column_superlatives = True # -- Default column reference properties cphrase.default_column_aggregation = True cphrase.default_column_completions = True cphrase.default_column_materialization = True cphrase.default_plural_values = True # Generate plural values as a part of lexicon cphrase.default_text_value_stands_for_column = True cphrase.default_memo_value_stands_for_column = False cphrase.default_numeric_value_stands_for_column = False cphrase.default_temporal_value_stands_for_column = True cphrase.lex = {'AFTER': ['after', 'later than'], 'AGO': ['ago', 'earlier', 'in past', 'past'], 'AND': ['and'], 'ANSWER_NUMBER': ['Number:'], 'AT_LEAST_CARD': ['at least'], 'AT_MOST_CARD': ['at most'], 'AVERAGE': ['average'], 'BE': ['is', 'does', 'do'], 'BEFORE': ['before', 'earlier than'], 'BETWEEN': ['between', 'range', 'from'], 'BETWEEN_SEPERATOR': ['and', 'to', ''], 'CLAUSE_MARKERS': ['that', 'which', 'who'], 'COUNT': ['how many', 'number', 'count', 'total'], 'DATE': ['date', 'day'], 'DAY': ['days', 'day'], 'DECREASING_ORDER_BY': ['decreasing'], 'DELETE': ['delete', 'remove', 'drop', 'forget'], 'EQUAL_TO': ['equal to', '=', 'exactly'], 'EVERY': ['every', 'always'], 'EXACTLY_CARD': ['exactly'], 'EXISTS': ['exists'], 'FALSE': ['false'], 'FUTURE': ['from now', 'in future', 'future'], 'GREATER_THAN': ['more than', 'greater than', 'more', 'over', '>'], 'GROUP_BY': ['group by', 'grouped by', 'group', 'by', 'per'], 'HOUR': ['hour', 'hours'], 'INCREASING_ORDER_BY': ['increasing'], 'INSERT': ['insert', 'add'], 'LESS_CARD': ['less than'], 'LESS_THAN': ['less than', 'less', 'fewer than', 'under', '<'], 'MATCHING': ['containing', 'like', 'matching', 'contains', 'contain', 'including', 'includes'], 'MAX': ['maximum', 'max'], 'MAX_NUMERICAL': ['largest', 'greatest', 'biggest', 'highest'], 'MAX_TIME': ['latest'], 'MIN': ['minimum', 'min'], 'MINUTE': ['minutes', 'min', 'minute'], 'MIN_NUMERICAL': ['smallest', 'lowest', 'least'], 'MIN_TIME': ['earliest', 'least recent', 'oldest'], 'MONTH': ['month', 'months'], 'MORE_CARD': ['more than'], 'NOT': ['not', 'no', 'without', 'non'], 'NOT_EQUAL_TO': ['not equal to', '<>', 'not equal', 'not exactly', 'not'], 'NOT_EXACTLY_CARD': ['not exactly'], 'NOT_EXISTS': ['without', 'not', 'no'], 'NOT_NULL': ['not null', 'provided', 'defined'], 'NOT_VALUE': ['not'], 'NOW': ['now', 'just now'], 'NO_ANSWERS': ['The database does not contain:'], 'NULL': ['unknown', 'null', 'nothing', 'no value', 'without', 'missing', 'no'], 'OF': ['of'], 'OR': ['or'], 'ORDER_BY': ['order by', 'order on', 'ordered by', 'ordered on', 'sort by', 'sort on', 'sorted by', 'sorted on', 'order', 'sort'], 'STOP': ['a', 'about', 'all', 'an', 'and', 'any', 'are', 'be', 'by', 'can', 'could', 'did', 'do', 'does', 'done', 'each', 'exist', 'for', 'from', 'has', 'have', 'i', 'in', 'into', 'is', 'it', 'me', 'me', 'much', 'of', 'on', 'or', 'other', 'please', 'show', 'tell', 'to', 'that', 'the', 'them', 'there', 'those', 'through', 'umm', 'was', 'were', 'where', 'which', 'whose', 'with', 'would', 'you'], 'SUM': ['total', 'sum', 'combined'], 'TO': ['to'], 'TODAY': ['today'], 'TOMORROW': ['tomorrow'], 'TOTALLING_EXACTLY': ['totalling exactly'], 'TOTALLING_OVER': ['totalling over'], 'TOTALLING_UNDER': ['totalling under'], 'TRUE': ['true'], 'UPDATE': ['set', 'update', 'change', 'modify', 'edit'], 'WEEK': ['weeks', 'week'], 'WH': ['what', 'which', 'give', 'tell', 'list', 'show', 'select'], 'WITH': ['with'], 'YEAR': ['year', 'years'], 'YESTERDAY': ['yesterday']}
Logging
These parameters control what types of logs are built as an NLI runs. These log file can be searched and processed
in the
edit page of the Admin Interface. The parameter log_to_file
turns on logging and log_sql
and log_nlu
will add further details to the log. The parameter log_sql_idm
logs databade modifications.
When this is on, the SQL for INSERT,UPDATE and DELETE commands will be
logged to a file
that, if needed, can later be executed as a script. This is useful in the data quality use case of C-Phrase where users
fix data in the database.
NLI Portal
These parameters control aspects of the NLI Portal.
The parameter intro_message
is the HTML that is displayed in the main results area when the user first
enters the NLI portal. The parameter batch_loader
determines if the user will be able to open the batch loader
modal. This is useful if the user plans on pasting in material that they can then massage into insert command form
to build a database (see for example this video). The parameter
sql_names_in_explorer
determines if just raw table names and attributes
should be used in the explorer. rows_displayed_per_page
is the default page size in C-Phrase table pagination.
The parameter always_offer_csv_exports
lets users always be able to download results as a CSV. The
parameter show_sql
determines if the user can always inspect the generated SQL. The parameter
update_tuple
enables, if there are write permission on a table, a button that lets users edit tuples. Finally the
parameter data_column_width_limit
sets the maximum visible with of text material as values in tables.
SQL
The parameters table_limit
and
list_limit
put SQL limits on the sizes of tables and lists respectively.
For larger databases it is critical that these
be set. Otherwise, answers might be too large and queries will need to be canceled. The parameter
groups_limit
will limit the number of values on the x-axis of charts.
The parameter distinct_on_tuple_queries
will apply SQL distinct on table queries.
The parameter bracket_tables_and_cols
encloses table and column names in the SQL with the
engine specific strategy to allow for tables and columns with spaces or being SQL keywords. Although it slightly
clutters the generated SQL, we highly recommend that you keep this parameter set to true. The parameter
generate_fk_names
will instruct C-Phrase to expand selects of foreign keys on tables to join through
and display primary values if they are declared. This gives users a much more readable experience.
The parameter hard_foreign_keys
determines that on database rebuilds, foreign keys will be defined.
The parameter self_joins
lets C-Phrase search for interpretations that include more than
one mention of a given table in the resulting SQL. If self-joining is false, then C-Phrase will
not search for these more complex interpretations. The advantage of that is less likelihood of
spurious ambiguity. This is sufficient for many simple applications. Still for many others there is a need
for SQL to be self-joining. Finally, the parameter allow_dangling_variables
will allow SQL to include
dangling references to tuples that strictly speaking are always true based on foreign keys.
NLU Processing
The parameters of this type control the natural language understanding process. The parameter nlu_timeout
sets a total amount of time C-Phrase has to analyse and run a query.
You may specify the rigidity of the word ordering in the question or command. The more rigid, the less likelihood of spurious ambiguity. However, the more rigid the higher likelihood that no match is found for the user's question or command. The defaults setting is to go for rigidity. You can experiment with alternative settings, but it is likely that you will not need to change these parameters.
Lexicon
These parameters determine how C-Phrase builds its internal lexicon. The parameter default_column_select
determines if columns are selectable by default. The parameter foreign_keys_selectable
will determine
if foreign keys are selectable. The parameter plural_columns_and_tables
will attempt to default English plural endings
on automatically generated names from tables and columns. The parameter lex_shadows_element
covers
a special case when you do not want the underlying table or column name to be recognized once you specify
lexical material. This only rarely is required. The parameter default_focus_table
lets
you specify that, by default questions or commands are over a given table. This is useful in one-table domains
and lets the user skip always having to mention the table name. Finally, The parameter us_date_format
will use the US dates rather than the default
ISO Date format.
The default condition and reference properties specify the default
setting of properties on columns. These are overriden on a column by column basis in the actual
column elements (see). Finally the lex
property is
the dictionary of function words in the
underlying language. You can restrict or extend the language by editing this dictionary.