Aggregation Browsing Backends

Backends for browsing aggregates of various data sources

SQL

SQL backend uses SQLAlchemy for generating queries. It supports all databases that the SQLAlchemy supports such as:

  • Drizzle
  • Firebird
  • Informix
  • Microsoft SQL Server
  • MySQL
  • Oracle
  • PostgreSQL
  • SQLite
  • Sybase

To create a SQL workspace use:

workspace = cubes.create_workspace("sql", model, url="postgres://localhost/database")

Workspace

cubes.backends.sql.workspace.create_workspace(model, **options)

Create workspace for model with configuration in dictionary options. This method is used by the slicer server.

The options are:

Required (one of the two, engine takes precedence):

  • url - database URL in form of: backend://user:password@host:port/database
  • sqlalchemy_options - this backend accepts options for SQLAlchemy in the form: option1=value1[&option2=value2]...
  • engine - SQLAlchemy engine - either this or URL should be provided

Optional:

  • schema - default schema, where all tables are located (if not explicitly stated otherwise)
  • fact_prefix - used by the snowflake mapper to find fact table for a cube, when no explicit fact table name is specified
  • dimension_prefix - used by snowflake mapper to find dimension tables when no explicit mapping is specified
  • dimension_schema – schema where dimension tables are stored, if different than common schema.

Options for denormalized views:

  • use_denormalization - browser will use dernormalized view instead of snowflake
  • denormalized_view_prefix - if denormalization is used, then this prefix is added for cube name to find corresponding cube view
  • denormalized_view_schema - schema wehere denormalized views are located (use this if the views are in different schema than fact tables, otherwise default schema is going to be used)
class cubes.backends.sql.workspace.SQLStarWorkspace(model, engine, **options)

Create a workspace. For description of options see create_workspace()

browser(cube, locale=None)

Returns a browser for a cube.

create_conformed_rollup(cube, dimension, level=None, hierarchy=None, schema=None, dimension_prefix=None, replace=False)

Extracts dimension values at certain level into a separate table. The new table name will be composed of dimension_prefix, dimension name and suffixed by dimension level. For example a product dimension at category level with prefix dim_ will be called dim_product_category

Attributes:

  • dimension – dimension to be extracted
  • level – grain level
  • hierarchy – hierarchy to be used
  • schema – target schema
  • dimension_prefix – prefix used for the dimension table
  • replace – if True then existing table will be replaced, otherwise an exception is raised if table already exists.
create_conformed_rollups(cube, dimensions, grain=None, schema=None, dimension_prefix=None, replace=False)

Extract multiple dimensions from a snowflake. See extract_dimension() for more information. grain is a dictionary where keys are dimension names and values are levels, if level is None then all levels are considered.

create_cube_aggregate(cube, table_name=None, dimensions=None, required_dimensions=None, schema=None, replace=False)

Creates an aggregate table. If dimensions is None then all cube’s dimensions are considered.

Arguments:

  • dimensions: list of dimensions to use in the aggregated cuboid, if None then all cube dimensions are used
  • required_dimensions: list of dimensions that are required for each aggregation (for example a date dimension in most of the cases). The list should be a subsed of dimensions.
  • aggregates_prefix: aggregated table prefix
  • aggregates_schema: schema where aggregates are stored
create_denormalized_view(cube, view_name=None, materialize=False, replace=False, create_index=False, keys_only=False, schema=None)

Creates a denormalized view named view_name of a cube. If view_name is None then view name is constructed by pre-pending value of denormalized_view_prefix from workspace options to the cube name. If no prefix is specified in the options, then view name will be equal to the cube name.

Options:

  • materialize - whether the view is materialized (a table) or regular view
  • replace - if True then existing table/view will be replaced, otherwise an exception is raised when trying to create view/table with already existing name
  • create_index - if True then index is created for each key attribute. Can be used only on materialized view, otherwise raises an exception
  • keys_only - if True then only key attributes are used in the view, all other detail attributes are ignored
  • schema - target schema of the denormalized view, if not specified, then denormalized_view_schema from options is used if specified, otherwise default workspace schema is used (same schema as fact table schema).
validate_model()

Validate physical representation of model. Returns a list of dictionaries with keys: type, issue, object.

Types might be: join or attribute.

The join issues are:

  • no_table - there is no table for join
  • duplicity - either table or alias is specified more than once

The attribute issues are:

  • no_table - there is no table for attribute
  • no_column - there is no column for attribute
  • duplicity - attribute is found more than once

Browser

class cubes.backends.sql.star.SnowflakeBrowser(cube, connectable=None, locale=None, metadata=None, debug=False, **options)

SnowflakeBrowser is a SQL-based AggregationBrowser implementation that can aggregate star and snowflake schemas without need of having explicit view or physical denormalized table.

Attributes:

  • cube - browsed cube
  • connectable - SQLAlchemy connectable object (engine or connection)
  • locale - locale used for browsing
  • metadata - SQLAlchemy MetaData object
  • debug - output SQL to the logger at INFO level
  • options - passed to the mapper and context (see their respective documentation)

Tuning:

  • include_summary - it True then summary is included in aggregation result. Turned on by default.
  • include_cell_count – if True then total cell count is included in aggregation result. Turned on by default, might be turned off for performance reasons

Limitations:

  • only one locale can be used for browsing at a time
  • locale is implemented as denormalized: one column for each language
aggregate(cell=None, measures=None, drilldown=None, attributes=None, page=None, page_size=None, order=None, include_summary=None, include_cell_count=None, **options)

Return aggregated result.

Arguments:

  • cell: cell to be aggregated
  • measures: list of measures to be considered in aggregation
  • drilldown: list of dimensions or list of tuples: (dimension, hierarchy, level)
  • attributes: list of attributes from drilled-down dimensions to be returned in the result

Query tuning:

  • include_cell_count: if True (default) then result.total_cell_count is computed as well, otherwise it will be None.
  • include_summary: if True (default) then summary is computed, otherwise it will be None

Result is paginated by page_size and ordered by order.

Number of database queries:

  • without drill-down: 1 – summary
  • with drill-down (default): 3 – summary, drilldown, total drill-down record count

Notes:

  • measures can be only in the fact table
fact(key_value)

Get a single fact with key key_value from cube.

Number of SQL queries: 1.

facts(cell, order=None, page=None, page_size=None)

Return all facts from cell, might be ordered and paginated.

Number of SQL queries: 1.

path_details(dimension, path, hierarchy=None)

Returns details for path in dimension. Can be used for multi-dimensional “breadcrumbs” in a used interface.

Number of SQL queries: 1.

set_locale(locale)

Change the browser’s locale

validate()

Validate physical representation of model. Returns a list of dictionaries with keys: type, issue, object.

Types might be: join or attribute.

The join issues are:

  • no_table - there is no table for join
  • duplicity - either table or alias is specified more than once

The attribute issues are:

  • no_table - there is no table for attribute
  • no_column - there is no column for attribute
  • duplicity - attribute is found more than once
values(cell, dimension, depth=None, hierarchy=None, page=None, page_size=None, order=None, **options)

Return values for dimension with level depth depth. If depth is None, all levels are returned.

Number of database queries: 1.

class cubes.backends.sql.star.QueryContext(cube, mapper, metadata, **options)

Object providing context for constructing queries. Puts together the mapper and physical structure. mapper - which is used for mapping logical to physical attributes and performing joins. metadata is a sqlalchemy.MetaData instance for getting physical table representations.

Object attributes:

  • fact_table – the physical fact table - sqlalchemy.Table instance
  • tables – a dictionary where keys are table references (schema, table) or (shchema, alias) to real tables - sqlalchemy.Table instances

Note

To get results as a dictionary, you should zip() the returned rows after statement execution with:

labels = [column.name for column in statement.columns] ... record = dict(zip(labels, row))

This is little overhead for a workaround for SQLAlchemy behaviour in SQLite database. SQLite engine does not respect dots in column names which results in “duplicate column name” error.

aggregation_statement(cell, measures=None, attributes=None, drilldown=None)

Return a statement for summarized aggregation. whereclause is same as SQLAlchemy whereclause for sqlalchemy.sql.expression.select(). attributes is list of logical references to attributes to be selected. If it is None then all attributes are used. drilldown has to be a dictionary. Use levels_from_drilldown() to prepare correct drill-down statement.

aggregations_for_measure(measure)

Returns list of aggregation functions (sqlalchemy) on measure columns. The result columns are labeled as measure + _ = aggregation, for example: amount_sum or discount_min.

measure has to be Attribute instance.

If measure has no explicit aggregations associated, then sum is assumed.

boundary_condition(dim, hierarchy, path, bound, first=None)

Return a Condition tuple for a boundary condition. If bound is 1 then path is considered to be upper bound (operators < and <= are used), otherwise path is considered as lower bound (operators > and >= are used )

column(attribute, locale=None)

Return a column object for attribute. locale is explicit locale to be used. If not specified, then the current browsing/mapping locale is used for localizable attributes.

columns(attributes, expand_locales=False)

Returns list of columns.If expand_locales is True, then one column per attribute locale is added.

condition_for_cell(cell)

Constructs conditions for all cuts in the cell. Returns a named tuple with keys:

  • condition - SQL conditions
  • attributes - attributes that are involved in the conditions. This should be used for join construction.
  • group_by - attributes used for GROUP BY expression
condition_for_point(dim, path, hierarchy=None)

Returns a Condition tuple (attributes, conditions, group_by) dimension dim point at path. It is a compound condition - one equality condition for each path element in form: level[i].key = path[i]

denormalized_statement(attributes=None, expand_locales=False, include_fact_key=True, condition_attributes=None)

Return a statement (see class description for more information) for denormalized view. whereclause is same as SQLAlchemy whereclause for sqlalchemy.sql.expression.select(). attributes is list of logical references to attributes to be selected. If it is None then all attributes are used. condition_attributes contains list of attributes that are not going to be selected, but are required for WHERE condition.

Set expand_locales to True to expand all localized attributes.

detail_statement(dimension, path, hierarchy=None)

Returns statement for dimension details. attributes should be a list of attributes from one dimension that is one branch (master-detail) of a star/snowflake.

fact_statement(key_value)

Return a statement for selecting a single fact based on key_value

join_expression(joins, include_fact=True)

Create partial expression on a fact table with joins that can be used as core for a SELECT statement. join is a list of joins returned from mapper (most probably by Mapper.relevant_joins())

If include_fact is True (default) then fact table is considered as starting point. If it is False The first detail table is considered as starting point for joins. This might be useful when getting values of a dimension without cell restrictions.

join_expression_for_attributes(attributes, expand_locales=False, include_fact=True)

Returns a join expression for attributes

logical_labels(columns)

Returns list of logical attribute labels from list of columns or column labels.

This method and additional internal references were added because some database dialects, such as Exasol, can not handle dots in column names, even when quoted.

ordered_statement(statement, order, dimension_levels=None)

Returns a SQL statement which is ordered according to the order. If the statement contains attributes that have natural order specified, then the natural order is used, if not overriden in the order.

dimension_levels is list of considered dimension levels in form of tuples (dimension, levels). For each level it’s sort key is used.

paginated_statement(statement, page, page_size)

Returns paginated statement if page is provided, otherwise returns the same statement.

range_condition(dim, hierarchy, from_path, to_path)

Return a condition for a hierarchical range (from_path, to_path). Return value is a Condition tuple.

table(schema, table_name)

Return a SQLAlchemy Table instance. If table was already accessed, then existing table is returned. Otherwise new instance is created.

If schema is None then browser’s default schema is used.

Note

cubes.QueryContext does not serve to it’s original purpose anymore and will be verylikely meerged with the browser.

Slicer

This backend is just for backend development demonstration purposes.

class cubes.backends.slicer.SlicerBrowser(cube, url, locale=None)

Demo backend browser. This backend is serves just as example of a backend. Uses another Slicer server instance for doing all the work. You might use it as a template for your own browser.

Attributes:

  • cube – obligatory, but currently unused here
  • url - base url of Cubes Slicer OLAP server

Implementing Custom Backend

Custom backend is just a subclass of cubes.AggregationBrowser class.

Requirements for the backned:

  • implement create_workspace(model, config)
  • subclass cubes.AggregationBrowser

Requirements for the AggregationBrowser.aggregate() function:

  • it should accept Null cell - it should default to whole cube
  • it should set result.cell to the cell argument
  • it should set result.measures to the measures argument
  • it should provide result.summary
  • it might fill result.total_cell_count
  • it should provide result.levels
  • result.cells is recommended to be an iterator

Slicer and Server Integration

If the backend is intended to be used by the Slicer server, then backend should be placed in its own module. The module should contain a method create_workspace(model, config) which returns a workspace object. config is a configuration dictionary taken from the config file (see below). The workspace object should implement browser_for_cube(cube, locale) which returns an AggregationBrowser subclass.

The create_workspace() can be compared to create_engine() in a database abstraction framework and the browser_for_cube() can be compared to engine.connect() to get a connection from a pool.

The configuration for create_workspace() comes from slicer .ini configuration file in section [workspace] and is provided as dict object.

Table Of Contents

Previous topic

Formatters Reference

Next topic

HTTP WSGI OLAP Server Reference

This Page