Clusterpoint
Private | |
Industry |
enterprise software database software cloud computing |
Founded | August 21, 2006 |
Founder |
Gints Ernestsons Jurgis Orups Zigmars Rasscevskis Oskars Viksna |
Headquarters | London, United Kingdom |
Products |
Clusterpoint DBMS Clusterpoint DBaaS NTSS GOL |
Website |
www |
Developer(s) | Clusterpoint Ltd. |
---|---|
Initial release | 2006 |
Stable release |
4.0
/ October 8, 2015 |
Development status | Active |
Written in | C, C++ |
Operating system | Cross-platform |
Available in | English |
Type |
distributed database enterprise search operational database document-oriented NoSQL, XML, JSON, SQL database cloud DBAAS |
License | free-commercial |
Clusterpoint is a European software technology company developing and supporting Clusterpoint database management system platform. [1][2][3]
Company was founded by software engineers.[4] Company is venture capital backed. [5] [6][7] [8]
Clusterpoint is a schema-free document database that removes complexity, scalability problems and performance limitations of relational database architecture.[9]
Clusterpoint database eliminates customer integration efforts among database, search and analytics platforms. Clusterpoint database replaces integrated multi-platform solutions with a single-platform and one-API solution, typically, where SQL RDBMS data is used in combination with an enterprise search engine to address performance and scalability needs of web and mobile applications, or where Big data and analytics tools such as Hadoop might be needed due to sheer volume of data or large computing workloads.[10]
The first version of the Clusterpoint database was released in 2006. The most recent Clusterpoint version 4 includes JavaScript computing engine and JS/SQL query language, it was released in October, 2015.[11]
Clusterpoint database is a document-oriented database server platform for storage and processing of XML and JSON data in a distributed fashion on large clusters of commodity hardware. Database architecture blends ACID-compliant OLTP transactions, full-text search and analytics in the same code, delivering high availability, fault-tolerance, data replication and security.[12][13]
Clusterpoint database enables to perform transactions in a distributed document database model in the same way as in a SQL database. Users can perform secure real-time updates, free text search, analytical SQL querying and reporting at high velocity in very large distributed databases containing XML or JSON document type data. Transactions are implemented without database consistency issues plaguing most of NoSQL databases and can safely run at high-performance speed previously available only with relational databases.[14] Real time Big data analytics, replication, loadsharing and high-availability are standard features of Clusterpoint database software platform.[15]
Clusterpoint database enables web-style free text search with natural language keywords and programmable relevance sorting of results. Constant and predictable search response time with latency in milliseconds and high quality of search results are achieved using policy-based inverted indexation and unique relevance ranking method. Clusterpoint database version 4 supports JS/SQL query language. Classic SQL queries can be combined with free text search and with custom distributed computing functions written in JavaScript, executed in a single REST API call.[16]
For most of its history Clusterpoint was servicing business customers as an enterprise software vendor.[17][18][19]
Use cases
Clusterpoint database delivers real-time business information management in electronic XML or JSON document format. It can be used as a high-performance operational database for web and mobile database services requiring scalability, fast speed and strong security. Software enables to safely handle financial, billing, security, medical, travel, information services, e-commerce, government and municipal open data and other data stored in electronic document data format that uses industry standard XML and JSON markup.[20][21][22]
Generic database use cases can also be where flexible XML or JSON document data model commonly fits best: processing mix of variable data, including structured data, unstructured data (textual), semi-structured data and blobs such as images, voice, video files. Software can be used for computing tasks requiring low millisecond-range latency data processing services in distributed databases, for instance, to feed data at high speed to interactive NoSQL visualizations, Big data online analytics and safe reporting in large databases.[23]
Distinctive technology
High-speed ACID-compliant Transactions in Distributed Document Database
Clusterpoint database provides distributed, ACID-compliant transactions, including basic SQL support, in a document model database that is massively scalable for Big data volumes. Distributed transactions, data storage, search and analytics can be performed at high performance and high availability, while delivering strong database consistency and security. It gives Clusterpoint performance and scalability advantage over other NoSQL document databases, that are compromising on security and integrity of customer data, typically providing only limited eventual consistency at high availability.[24]
Programmable database ranking for search relevance in Big data
Another distinction is programmable ranking index, that can be flexibly customized through relevance rules assigned in the Document Policy configuration file. It is a small XML configuration file accompanying each Clusterpoint database. Database search behavior can be quickly changed through configuring of ranking index rules vs modifying software code. The increasing importance of ranking is directly derived from the explosion in the volume of data handled by current applications. The user would be overwhelmed by too many unranked results. Furthermore, the sheer amount of data makes it almost impossible to process queries in the traditional compute-then-sort approach. Customer application software code can be simplified by delegating most indexing and search sorting details, including ranking algorithms, to the Document policy configuration attributes in Clusterpoint database. Document policy, when customized for a particular web or mobile application need, determines the particular ranking index organization at the physical storage level by presorting the actual index data for custom relevance algorithms. Developers can avoid most of complex SQL programming for data sorting and grouping in their application software code, while database hardware can be liberated from the excessive Big data sorting per each database query. Instead the Clusterpoint database ranking index delivers fast search and relevance sorting functionality, without performance degradation characteristic to relational SQL databases.
Ranking index method, applied to document database model, enables Clusterpoint to outperform SQL databases at search by several orders of magnitude. It solves information overload and latency problem for interactive web and mobile applications processing Big data. Today limited-size mobile device screens and network bandwidth restrictions prevent users requesting and processing large size data volumes per each query. Database search and querying need to be interactive and transactional to satisfy Internet users. Clusterpoint ranking index was designed for this computing model. It extracts relevant data first and returns information page by page in decreasing relevance. For instance, using only free text search, latency in large databases containing billions of document will be milliseconds, while relevance ranking will prevent overwhelming end-user with too much low-quality search results. This is also a crucial design element for distributed document database architecture: it makes its index scalable so that it can be safely shared across large cluster of servers without ignificant performance loss at data injection, free text search and access.[25]
Additionally Clusterpoint ranking index can be fine-tuned by developers to match the natural language terms in queries to the most relevant textual data content in a customer database. When querying a distributed database with free text format keywords in natural language or with phrases, ranking index sorts out the best relevant documents where query is matching textual content parts in the database, taking into account natural language density, word statistics and language-specific grammatics attributes (incl. stemming, spelling, collation), performing automatic self merged joins. Very few database products support similar type of self-merge joins.[26]
Adjusting ranking rules, customers can configure various grouping, ordering and positioning algorithms for their search results through the ranking index so that it starts delivering the best end-user search experience. A set of ranking configuration rules, once established for a particular database, is then being applied and maintained automatically by Clusterpoint database when customer data is loaded or updated through Clusterpoint database CRUD API commands.
Developers can freely use full text search as the fastest information access method in Clusterpoint databases, while having capability to flexibly query the database structure with standard analytics using SQL. In Clusterpoint database both methods can be combined in a single query, enabling combined analytical and search queries in mixed structured and unstructured data content.
Clusterpoint database deployments
Clusterpoint database is used in production deployments of enterprise customers operating their 24/7 web and mobile services from 2006. Vendor has built partnerships that provide solutions in different industry sectors, such as:
- Governance, Risk Management and Regulatory Compliance[27]
- Agile Web Software Development[28]
- Online Business Intelligence in NoSQL and Big Data[29]
- Cloud Computing Services
- Web Site Design[30]
- Cybersecurity and Lawful Intercept[31]
A public demonstration solution powered by Clusterpoint database, illustrating how document type data of the entire Wikipedia and DBpedia (English) data corpus can be efficiently managed within a single consolidated database platform is available on the Web site Wikisearch.net.
Competitors
Clusterpoint database technology is positioned by industry experts among other emerging NoSQL and Big data technologies having distributed data management architecture.[32]
Platform Components
The Clusterpoint database software source code is being developed in C and C++ programming languages and supports multi-threading, multi-core CPUs and distributed computing. Primary method of developer's access to the platform capabilities is REST API. Clusterpoint database software is being managed across the large cluster of commodity hardware with Clusterpoint Console application. Console provides centralized administration and control for all customer databases through a single web GUI. In order to access Clusterpoint Console, or download it along Clusterpoint database software for on-premises use, customers have to sign up for Clusterpoint Cloud Database Account on the vendor website. Sign-up is free, no credit card required.
Architecture
Clusterpoint database has multi-master shared-nothing, distributed, document-oriented database architecture storing XML and JSON data types. [33]
It works as transactional high-speed OLTP database for XML and JSON data objects. New content can be added, updated and deleted in real-time, with real-time all changed data indexing, including full text, date, numeric, geospatial data. Index data immediately can be read for search and analytics after each document has been inserted, updated or deleted, while ACID-compliant transactions provide security and consistency. Database API also supports storage and processing of binary data as part of document data object model.
It supports no-single-point-of failure fault-tolerant infrastructure hardware setup with multi-datacenter replication capability for the entire distributed database cluster.
Query syntax
To query a database customers can use either free text query, XML-based syntax, JS/SQL query or Clusterpoint REST API that supports JSON.
General features
- Data is managed in open, cross-platform, industry standard XML or JSON format using open API, for instance, Python API[34][35] or JavaScript Node.js API[36]
- Data structure agnostic and type-rich database, handles variable data structure XML or JSON documents in a single database. Supports unstructured textual data, dates, numbers, meta-data (all XML and JSON types)
- Cross-platform support: binaries are available for Linux, FreeBSD, Mac OS X and Windows. Clusterpoint database software can be compiled on other operating systems.
- Multi-master cluster software architecture: no single point of failure, any cluster node can serve as a master and run the management application
- Horizontal database scalability: scales out from a single server to few thousands of servers networked into a cluster infrastructure
Access features
- REST API is used for XML and JSON document format management, search and data manipulation.
- Consistent UTF-8 encoding. Non-UTF-8 data can be saved, queried, and retrieved with a special binary data type.
- XML and JSON objects for API queries and responses: enable direct integration in other programming languages supporting XML or JSON parsing, no specific client software required
Search/query features
- Built-in rich full text search functionality, with fast and free use of keywords and phrases, result snippeting, highlighting, term proximity search and other full-text search options[37]
- Querying with term stemming, term wildcards and character position patterns, for inflected words and plural word forms delivering automagical self merge-joins[38]
- SQL-like XML-structured (fielded) queries like in SQL SELECT ... WHERE ... statements
- Cluster-wide analytics aggregation with MIN(), MAX(), COUNT(), AVG() like in SQL SELECT ... GROUP BY ..., ORDER BY ... statements
- JS/SQL querying language, than combines well-known familiarity of SQL language with ubiquitous JavaScript code and the Web programming skills
- Sorting of results in alphabetic, numeric, date order or according to result relevance
- Autocomplete (instant search as you type) using the actual index data
- Spell-check of query terms with alternative spelling suggestions for "Did you mean that?" functionality
- Boosting of search query terms at query time, in order to increase, decrease or overwrite through the API relevancy weights or sorting rules built into the ranking index
- Dynamic data classification per query by multi-level customer defined facets with exact hit counting (examples: categories, themes, product catalogs, geographic locations etc.)
- Text-analytics driven similar content search across the entire database
- XML or JSON data structure relevance ranking by tag weighting and document relevance ranking by document rating
- Textual relevance ranking for matching search query terms to context, taking into account frequency and density of natural language terms
- Predictive calculation of expected number of results based on the actual index statistics in large size databases to optimize performance
Administration/production use features
- Granular security partitioning: API users and their access rights are based on groups and permissions assigned per specific databases and API commands
- Transaction journaling, redo logs, access logs, error logs and audit logs enabled by default
- Document versioning enabled by default (preserving previous document versions for a certain time period)
- Reindexing in background with automatic switchover provides availability during reindexation
- Online, offline and incremental database backup
- Automatic or manual synchronization of database replicas
- Multiple administrator accounts for secure multi-tenancy of different customer databases on the same hardware
- Centralized web GUI based database administration Console, including one-click configuration of clustered and replicated databases across all nodes
Automatic full database content indexing
Clusterpoint software automatically builds and maintains document-type XML and JSON data content index when data us loaded, updated or deleted. A single database index (ranking index) is maintained to support these types of querying:
- natural language based full text search indexing, including language-specific stemming and collation rules
- XML or JSON data structure queries (with full-text, exact match and binary match options) or Essential SQL queries for analytics
- virtual data structure search created from aliasing multiple real tags values to speed up Boolean OR queries
- ad hoc search across all database content irrespectively from the database structure
- numeric and date range search
- geospatial search by range, distance or polygon coordinates and ordering by distance from a certain point
- multi-level faceted search with automatic results classification by XML / JSON tags assigned as containing facets
- combination of any of the above database search criteria into complex nested multi-part query expressions using Boolean AND, OR, NOT logic
Database administration
Clusterpoint database can be controlled centrally through the Clusterpoint Console application. It is a web-GUI dashboard that enables to control all database services enterprise-wide, including cluster database administration, configuration of indexing and ranking policy, secure user account management, audit and log file view, database backup/restore, database sharding and replication.
Each customer database is being started and stopped as an isolated database server process for the controlled management of CPU resources, RAM memory and disk storage. All databases share a single networked computing and storage infrastructure.
Clusterpoint Console is used to manage underlying hardware (cluster nodes) to share computing resources among different databases in parallel.
Process and storage architecture
Clusterpoint database processes are safely isolated, each process runs only in its own RAM memory address space. It can access only its own local file system storage folder with the same name containing the particular database XML or JSON documents, index, configuration and log files stored on that local cluster node (shard). This architecture delivers elastic horizontal scale out ability and cluster-wide control over resource consumption for a particular customer database. It also prevents unauthorized access to multi-tenant databases using the same computing hardware infrastructure, with option to fully encrypt sensitive data.
Multi-tenancy and virtualization
Clusterpoint supports secure multi-tenant database services. Software platform takes care about safe partitioning of runtime database computing environment among all cluster CPUs nodes, all RAM processes and all storage resources within a larger cluster, while operating databases in parallel on the same hardware equipment. This method delivers the best utilization of modern multi-core CPU hardware arranged in large distributed clusters.
Use of native multi-tenancy is the preferred method for high-performance database computing with Clusterpoint software vs operating system level virtualization or software containerization for safe multi-tenancy. OS-level virtualization may decrease available network bandwidth and computing resource, creating also unexpected bottlenecks at storage I/O level, that could result in increased application latencies. Database virtualization can be best use for prototyping and development where operational performance guarantees and low latency are not the first priority.[39]
Clusterpoint Cloud Database as A Service (DBaaS) is a secure multi-tenant database platform, with isolated data for each customer account and encrypted access security. Clusterpoint software does not need virtualization for safe and efficient multi-tenancy.
Multi-copy database replication
Automatic multi-copy replication for the entire database is built into the Clusterpoint database software. It is active replication, with workload sharing within a cluster. Clusterpoint supports high-performance OLTP transactions, ACID-compliant, within a main cluster in a single data center, while providing fail-over to more datacenters running database replica clusters. Fail-over takes only few seconds, if communication latency among data centers is minor.
Database replicas in Clusterpoint architecture are used for automatic load balancing of database search queries through Clusterpoint API.
In multi-datacenter use network bandwidth among locations may become the critical issue for Clusterpoint architecture because of increased latencies for database updates and synchronization delays among replicas, in particular, if encrypted VPN networking over the Internet links is used.
A high-capacity bandwidth might be required for high-performance database replication among geographically different location datacenters.
Extendable server-side scripting with Lua
The Lua extends Clusterpoint Server functionality with custom server-side scripts. Lua scripts can implement customer-specific functions such as data aggregation, ETL tasks, meta-data markup, call-back to external programming languages using web services for extra functionality, real-time alerting or asynchronous triggers. Scripts can be executed before, during or after Clusterpoint API transactions of interest. Built-in configurable server-side hooks activate Lua scripts in different stages of each Clusterpoint transaction execution process.
Custom Lua scripts can be stored in Clusterpoint Server to work as "stored procedures".
Extendable server-side scripting with JavaScript computing engine
Starting from Clustepoint database version 4, JS/SQL has been added as main scripting engine. JS/SQL is representing SQL query language that can be custom extended with free JavaScript user code. JavaScript can be used within WHERE, GROUP BY, ORDER BY and other SQL statament clauses. This feature enables to custom extend Clusterpoint database functionality beyond standard database and search features. For example, users can perform highly parallel computing tasks within a database where local data storage will provide the fastest possible performance, while only using familiar SQL syntax extended with own JavaScript functionality, all within a single JS/SQL query in Clusterpoint database architecture.
Programming language support
Clusterpoint database uses REST principles and HTTP/HTTPS messaging for client-server communications between customer software applications and Clusterpoint database server. Any client programming language or development environment, supporting HTTP POST/GET messaging, can connect to Clusterpoint Server directly and read, write, update, delete and search XML and JSON documents.
In versions 1.x, 2.x and 3.0 REST API interface for JSON data format transforms customer data between JSON and XML, while only XML is used for internal server-side data storage and processing by Clusterpoint Server.
Clusterpoint Server has native client API Libraries using HTTP and faster TCP/IP transport protocol for the following popular programming environments:
- XML
- JSON
- REST (http / https)
- TCP IP (wire-format drivers)
- PHP
- .NET
- Java
- Python
- JavaScript Node.js
- C, C++
Please check the vendor web site for API support in other languages.
Licensing and support
Clusterpoint offers two database licensing options based on functionality and scalability:
- Clusterpoint Enterprise - The most comprehensive DBMS product solution, delivering unlimited scalability and the highest standards of enterprise grade functionality, fulfilling the most demanding of customer requirements.
- Clusterpoint Lite - The Clusterpoint DBMS solution for smaller organisations who require high standards in basic database functionality, supported by replication on 2 servers, but for whom scalability and sharding is not an immediate operational requirement.
There are four types of on-premise licencing models available - Perpetual licence, Subscription licence, OEM licence and Developer licence.
Vendor provides standard software maintenance and technical support service based on subscription model (on premises or Clusterpoint Database Cloud), delivering it over email, Skype or phone.[40]
Premium technical support for customers using the software in 24h/7d production environments includes remote problem diagnostics and resolution based on Service-level agreement. Vendor provides installation support, help-desk, training and partnership programs.[41][42][43]
Clusterpoint Products
- Clusterpoint DBMS - clustered NoSQL database, which uses approach of multiple server system to spread load and increase performance. Clusterpoint database facilitates high parallelism of computing and distribution of data.
- GOL: Big Data SIEM Analytics tool from Clusterpark - Log, Events and Security Records Search and Analytics.[44]
- DigiBrowser: Quick SQL denormalization into NoSQL database - imports multi-table SQL database into one Clusterpoint database using automagic denormalization.[45]
- NTSS: Network Traffic Sureveillance System for Lawful Intercept - High-speed capture, store, search and analysis of all Internet traffic for the corporate network.[46][47]
See also
References
- ↑ "Clusterpoint Group Limited". Companies House (UK). Retrieved March 5, 2015.
- ↑ "Clusterpoint Development Center". Lursoft (LV). Retrieved March 5, 2015.
- ↑ "Clusterpoint Profile on Firmas.lv". Firmas.lv (LV). Retrieved March 5, 2015.
- ↑ "Bring the Power of Big Data to Small Businesses". Data-Informed.com (US). Retrieved July 1, 2015.
- ↑ "Imprimatur Capital About Clusterpoint". Imprimatur Capital. Retrieved March 9, 2015.
- ↑ "Clusterpoint Raises EUR1 Million From BaltCap". Privateequitywire. Retrieved June 14, 2013.
- ↑ "Clusterpoint Receives €1 Million From BaltCap". Arcticstartup.com. Retrieved June 14, 2013.
- ↑ "Latvian Database Platform Clusterpoint secures 1.25 million". Arcticstartup.com. Retrieved March 16, 2015.
- ↑ "Clusterpoint 4 Computing Engine Combines Instantly Scalable Database and Computational Power". InsideBigdata.com (US). Retrieved October 6, 2015.
- ↑ "A new document database emerges from the cloud". infoworld.com (US). Retrieved June 25, 2015.
- ↑ "Clusterpoint adds computation to NoSQL database engine". SiliconAngle.com (US). Retrieved October 8, 2015.
- ↑ "List of NOSQL Databases". Nosql-database.org. Retrieved March 9, 2015.
- ↑ "The NoSQL movement: document databases". Dataversity. Retrieved June 14, 2013.
- ↑ "Big data startups / document stores". Bigdata-startups.com. Retrieved June 14, 2013.
- ↑ "Technology Behind Clusterpoint Database". Gints Ernestsons, Founder. Retrieved March 9, 2015.
- ↑ "Fulltext search engines". Mediawiki.org. Retrieved June 14, 2013.
- ↑ "Bloomberg Company Research Profile". Bloomberg.com. Retrieved March 9, 2015.
- ↑ "Crunchbase Clusterpoint Profile". Crunchbase.com. Retrieved June 14, 2013.
- ↑ "BusinessWeek Clusterpoint Profile". Businessweek. Retrieved June 14, 2013.
- ↑ "Business Directory Use Case". Yellow Search Today. Retrieved March 4, 2015.
- ↑ "Clusterpoint Use Case In E-commerce". Exim.lv. Retrieved March 4, 2015.
- ↑ "Open Data and Public Services 2015". Garage48 Foundation. Retrieved March 9, 2015.
- ↑ "Clusterpoint and ZoomCharts". Zoomcharts.com. Retrieved March 4, 2015.
- ↑ "Developers Club NoSQL Meetup with Clusterpoint". Dev Club Riga. Retrieved April 16, 2015.
- ↑ "Top NOSQL document databases". Big Data Analytics Today. Retrieved March 9, 2015.
- ↑ "How to make a Google App Engine application searchable using self merge joins". Google, Inc. Retrieved March 9, 2015.
- ↑ "Infogov Proteus iGRC (Internet Governance and Regulatory Compliance)". Infogov Ltd (United Kingdom). Retrieved March 9, 2015.
- ↑ "Agile Web Software Development". Agile.org. Retrieved March 9, 2015.
- ↑ "Turbocharge HTML5 web applications". Ambienttech. Retrieved March 9, 2015.
- ↑ "Converting web sites to NoSQL". Rixtellab. Retrieved March 9, 2015.
- ↑ "Bit IT Solution for Network Traffic Control". Bit IT solutions. Retrieved March 9, 2015.
- ↑ "NoSQL Scaling Beyond Traditional SQL" (PDF). Intel Corp. Retrieved March 9, 2015.
- ↑ "HP Guide to NoSQL". Hewlett-Packard Corp. March 5, 2015.
- ↑ "Clusterpoint API on Github". Github.com. Retrieved March 9, 2015.
- ↑ "Python API for Clusterpoint Server". Python.org. Retrieved March 9, 2015.
- ↑ "Clusterpoint Node.js API". NPM, inc. Retrieved March 9, 2015.
- ↑ "Full Text Search Explained". Everything.Explained.At. Retrieved March 9, 2015.
- ↑ "Making you app searchable using self merge-joins". Google. Retrieved June 14, 2013.
- ↑ "The Do's and Don'ts of Virtualizing Database Servers". Network Computing. Retrieved March 9, 2015.
- ↑ "Clusterpoint DBaaS Cloud Service". Facebook. Retrieved March 9, 2015.
- ↑ "Clusterpoint DBMS by 1DataGroup". 1DataGroup. Retrieved March 9, 2015.
- ↑ "Knowledge Academy Training Course in Clusterpoint DBMS". Knowledge Academy. Retrieved March 9, 2015.
- ↑ "Big Data Meetup. Clusterpoint XML Database Engine". Meetup.com. Retrieved March 9, 2015.
- ↑ "Clusterpoint GOL - fast log data analytics & search application software". Clusterpoint. Retrieved March 4, 2016.
- ↑ "DigiBrowser: Quick SQL denormalization into NoSQL database". Datorikas Instituts DIVI. Retrieved March 4, 2015.
- ↑ "Clusterpoint NTSS Product Review". SpiceWorks, Inc. Retrieved March 9, 2015.
- ↑ "Clusterpoint Network Traffic Surveillance System". iiGrowth LLC. Retrieved March 9, 2015.