NLI Parameters

The .py file that sits in a configuration directory specifies the parameter settings of an NLI. Parameters are grouped into six types: Logging, NLU Processing, SQL, NLI Portal, Lexicon and Open AI. Here is the initial .py file that is created for a standalone configuration:


# *** Logging ***

cphrase.log_to_file = True  # print log messages to .log file
cphrase.log_sql = False  # log all SQL queries/answers to/from database
cphrase.log_nlu = False  # deeply log natural language analysis
cphrase.log_sql_idm = True # Log all SQL insert,deletes and updates to special file.


# *** NLI Portal Settings ***

cphrase.intro_message = '<p> Welcome to the NLI Portal.</p>'
cphrase.batch_loader = False # Does a batch loader appear on the NLI Portal
cphrase.sql_names_in_explorer = False # present SQL literals for the names of relations and attributes.
cphrase.rows_displayed_per_page = 10  # The maximum number of rows displayed for each page
cphrase.always_offer_csv_exports = True # always generate .csv files for table answers.
cphrase.show_sql = True # Can the user inspect SQL in the NLI Portal
cphrase.update_tuple = True # Does the user get the chance to update tuples in relation selects.
cphrase.data_column_width_limit = 200 # Maximum size of string presented in tables. This needs to be high for HTML links.


# *** SQL ***

cphrase.table_limit = None  # The maximum number of rows displayed in table result
cphrase.list_limit = None  # The maximum number of values displayed in list result
cphrase.groups_limit = 200 # The limit on number of chart items
cphrase.distinct_on_tuple_queries = True # Make generated SQL over tuple queries DISTINCT.
cphrase.bracket_tables_and_cols = True  # Should all SQL use bracketing for table and attribute names.
cphrase.generate_fk_names = True  # Does this system attempt to reach to generate full names for foreign keys in SQL
cphrase.self_joins = True  # Can the generated SQL include multiple aliased mentions of the same table.
cphrase.hard_foreign_keys = True # Assume that foreign keys always hold.
cphrase.allow_dangling_variables = True # Allow empty foreign key following


# *** NLU Processing ***

cphrase.nlu_timeout = None  # The maximum number of seconds of analysis for a user utterance

# -- word-ordering sensitivity
cphrase.command_can_bind_to_any_table = False  # SELECT *, UPDATE, INSERT and DELETE can match any table in utterance
cphrase.command_can_bind_to_any_column = False # SELECT, UPDATE can match any column in utterance
cphrase.operator_can_bind_to_any_table = False # NOT, EVERY, etc can match any table in utterance
cphrase.operator_can_bind_to_any_column = False # GROUP BY, ORDER BY, etc. can match any attribute in utterance
cphrase.value_can_bind_to_any_column = False # Values (e.g. 'red',12,etc) can match any column in utterance


# *** Lexicon ***

cphrase.default_column_select = True # are columns, by default, individually SELECTable in the SQL sense.
cphrase.foreign_keys_selectable = False  # Are foreign keys able to be selected.
cphrase.plural_columns_and_tables = True  # Generate default plural forms for tables and columns
cphrase.lex_shadows_element = False # When elements have lexes, they shadow the underlying element (table, attribute) name.
cphrase.default_focus_table = None  # The default relation users are asking about. For example 'items' in e-commerce.
cphrase.us_date_format = False # Uses US type date format. mm/dd/yyyy and mm/dd/yy

# -- Default column condition properties

cphrase.default_column_comparisons = True
cphrase.default_column_equalities = True
cphrase.default_column_grouping = True
cphrase.default_column_inequalities = True
cphrase.default_column_text_like = False
cphrase.default_column_memo_like = True
cphrase.default_column_nulls = True
cphrase.default_column_quotes = False
cphrase.default_column_sets = True
cphrase.default_column_superlatives = True

# -- Default column reference properties

cphrase.default_column_aggregation = True
cphrase.default_column_completions = True
cphrase.default_column_materialization = True
cphrase.default_plural_values = True # Generate plural values as a part of lexicon
cphrase.default_text_value_stands_for_column = True
cphrase.default_memo_value_stands_for_column = False
cphrase.default_numeric_value_stands_for_column = False
cphrase.default_temporal_value_stands_for_column = True

cphrase.lex = {'AFTER': ['after', 'later than'],
 'AGO': ['ago', 'earlier', 'in past', 'past'],
 'AND': ['and'],
 'ANSWER_NUMBER': ['Number:'],
 'AT_LEAST_CARD': ['at least'],
 'AT_MOST_CARD': ['at most'],
 'AVERAGE': ['average'],
 'BE': ['is', 'does', 'do'],
 'BEFORE': ['before', 'earlier than'],
 'BETWEEN': ['between', 'range', 'from'],
 'BETWEEN_SEPERATOR': ['and', 'to', ''],
 'CLAUSE_MARKERS': ['that', 'which', 'who'],
 'COUNT': ['how many', 'number', 'count', 'total'],
 'DATE': ['date', 'day'],
 'DAY': ['days', 'day'],
 'DECREASING_ORDER_BY': ['decreasing'],
 'DELETE': ['delete', 'remove', 'drop', 'forget'],
 'EQUAL_TO': ['equal to', '=', 'exactly'],
 'EVERY': ['every', 'always'],
 'EXACTLY_CARD': ['exactly'],
 'EXISTS': ['exists'],
 'FALSE': ['false'],
 'FUTURE': ['from now', 'in future', 'future'],
 'GREATER_THAN': ['more than', 'greater than', 'more', 'over', '>'],
 'GROUP_BY': ['group by', 'grouped by', 'group', 'by', 'per'],
 'HOUR': ['hour', 'hours'],
 'INCREASING_ORDER_BY': ['increasing'],
 'INSERT': ['insert', 'add'],
 'LESS_CARD': ['less than'],
 'LESS_THAN': ['less than', 'less', 'fewer than', 'under', '<'],
 'MATCHING': ['containing',
              'like',
              'matching',
              'contains',
              'contain',
              'including',
              'includes'],
 'MAX': ['maximum', 'max'],
 'MAX_NUMERICAL': ['largest', 'greatest', 'biggest', 'highest'],
 'MAX_TIME': ['latest'],
 'MIN': ['minimum', 'min'],
 'MINUTE': ['minutes', 'min', 'minute'],
 'MIN_NUMERICAL': ['smallest', 'lowest', 'least'],
 'MIN_TIME': ['earliest', 'least recent', 'oldest'],
 'MONTH': ['month', 'months'],
 'MORE_CARD': ['more than'],
 'NOT': ['not', 'no', 'without', 'non'],
 'NOT_EQUAL_TO': ['not equal to', '<>', 'not equal', 'not exactly', 'not'],
 'NOT_EXACTLY_CARD': ['not exactly'],
 'NOT_EXISTS': ['without', 'not', 'no'],
 'NOT_NULL': ['not null', 'provided', 'defined'],
 'NOT_VALUE': ['not'],
 'NOW': ['now', 'just now'],
 'NO_ANSWERS': ['The database does not contain:'],
 'NULL': ['unknown', 'null', 'nothing', 'no value', 'without', 'missing', 'no'],
 'OF': ['of'],
 'OR': ['or'],
 'ORDER_BY': ['order by',
              'order on',
              'ordered by',
              'ordered on',
              'sort by',
              'sort on',
              'sorted by',
              'sorted on',
              'order',
              'sort'],
 'STOP': ['a',
          'about',
          'all',
          'an',
          'and',
          'any',
          'are',
          'be',
          'by',
          'can',
          'could',
          'did',
          'do',
          'does',
          'done',
          'each',
          'exist',
          'for',
          'from',
          'has',
          'have',
          'i',
          'in',
          'into',
          'is',
          'it',
          'me',
          'me',
          'much',
          'of',
          'on',
          'or',
          'other',
          'please',
          'show',
          'tell',
          'to',
          'that',
          'the',
          'them',
          'there',
          'those',
          'through',
          'umm',
          'was',
          'were',
          'where',
          'which',
          'whose',
          'with',
          'would',
          'you'],
 'SUM': ['total', 'sum', 'combined'],
 'TO': ['to'],
 'TODAY': ['today'],
 'TOMORROW': ['tomorrow'],
 'TOTALLING_EXACTLY': ['totalling exactly'],
 'TOTALLING_OVER': ['totalling over'],
 'TOTALLING_UNDER': ['totalling under'],
 'TRUE': ['true'],
 'UPDATE': ['set', 'update', 'change', 'modify', 'edit'],
 'WEEK': ['weeks', 'week'],
 'WH': ['what', 'which', 'give', 'tell', 'list', 'show', 'select'],
 'WITH': ['with'],
 'YEAR': ['year', 'years'],
 'YESTERDAY': ['yesterday']}


Logging

These parameters control what types of logs are built as an NLI runs. These log file can be searched and processed in the edit page of the Admin Interface. The parameter log_to_file turns on logging and log_sql and log_nlu will add further details to the log. The parameter log_sql_idm logs databade modifications. When this is on, the SQL for INSERT,UPDATE and DELETE commands will be logged to a file that, if needed, can later be executed as a script. This is useful in the data quality use case of C-Phrase where users fix data in the database.

NLI Portal

These parameters control aspects of the NLI Portal. The parameter intro_message is the HTML that is displayed in the main results area when the user first enters the NLI portal. The parameter batch_loader determines if the user will be able to open the batch loader modal. This is useful if the user plans on pasting in material that they can then massage into insert command form to build a database (see for example this video). The parameter sql_names_in_explorer determines if just raw table names and attributes should be used in the explorer. rows_displayed_per_page is the default page size in C-Phrase table pagination. The parameter always_offer_csv_exports lets users always be able to download results as a CSV. The parameter show_sql determines if the user can always inspect the generated SQL. The parameter update_tuple enables, if there are write permission on a table, a button that lets users edit tuples. Finally the parameter data_column_width_limit sets the maximum visible with of text material as values in tables.

SQL

The parameters table_limit and list_limit put SQL limits on the sizes of tables and lists respectively. For larger databases it is critical that these be set. Otherwise, answers might be too large and queries will need to be canceled. The parameter groups_limit will limit the number of values on the x-axis of charts. The parameter distinct_on_tuple_queries will apply SQL distinct on table queries. The parameter bracket_tables_and_cols encloses table and column names in the SQL with the engine specific strategy to allow for tables and columns with spaces or being SQL keywords. Although it slightly clutters the generated SQL, we highly recommend that you keep this parameter set to true. The parameter generate_fk_names will instruct C-Phrase to expand selects of foreign keys on tables to join through and display primary values if they are declared. This gives users a much more readable experience. The parameter hard_foreign_keys determines that on database rebuilds, foreign keys will be defined. The parameter self_joins lets C-Phrase search for interpretations that include more than one mention of a given table in the resulting SQL. If self-joining is false, then C-Phrase will not search for these more complex interpretations. The advantage of that is less likelihood of spurious ambiguity. This is sufficient for many simple applications. Still for many others there is a need for SQL to be self-joining. Finally, the parameter allow_dangling_variables will allow SQL to include dangling references to tuples that strictly speaking are always true based on foreign keys.

NLU Processing

The parameters of this type control the natural language understanding process. The parameter nlu_timeout sets a total amount of time C-Phrase has to analyse and run a query.

You may specify the rigidity of the word ordering in the question or command. The more rigid, the less likelihood of spurious ambiguity. However, the more rigid the higher likelihood that no match is found for the user's question or command. The defaults setting is to go for rigidity. You can experiment with alternative settings, but it is likely that you will not need to change these parameters.

Lexicon

These parameters determine how C-Phrase builds its internal lexicon. The parameter default_column_select determines if columns are selectable by default. The parameter foreign_keys_selectable will determine if foreign keys are selectable. The parameter plural_columns_and_tables will attempt to default English plural endings on automatically generated names from tables and columns. The parameter lex_shadows_element covers a special case when you do not want the underlying table or column name to be recognized once you specify lexical material. This only rarely is required. The parameter default_focus_table lets you specify that, by default questions or commands are over a given table. This is useful in one-table domains and lets the user skip always having to mention the table name. Finally, The parameter us_date_format will use the US dates rather than the default ISO Date format.

The default condition and reference properties specify the default setting of properties on columns. These are overriden on a column by column basis in the actual column elements (see). Finally the lex property is the dictionary of function words in the underlying language. You can restrict or extend the language by editing this dictionary.