Updates from September, 2014 Toggle Comment Threads | Keyboard Shortcuts

  • ratneshparihar 3:51 am on September 2, 2014 Permalink | Reply  

    Research on possibility of RDBMS to have performance benchmarks like no-sql Databases (part 5) 

    We are continuing our discovery for a new OLTP Design (https://ratneshparihar.wordpress.com/2014/07/31/research-on-possibility-of-rdbms-to-have-performance-benchmarks-like-no-sql-databases-part-3/) .

    in last post we looked into Main memory changes (https://ratneshparihar.wordpress.com/2014/08/05/research-on-possibility-of-rdbms-to-have-performance-benchmarks-like-no-sql-databases-part-4/)

    Today we will look into multi threading.

     

     

    23

     

    OLTP transactions are very lightweight. For example, the heaviest transaction in TPC-C reads about 400 records. In a main memory environment, the useful work of such a transaction consumes less than one millisecond on a low-end machine. In addition, most OLTP environments we are familiar with do not have “user stalls”. 

    For example, when an Amazon user clicks “buy it”, he activates an OLTP transaction which will only report back to the user when it finishes. Because of an absence of disk operations and user stalls, the elapsed time of an OLTP transaction is minimal. In such a world it makes sense to run all SQL commands in a transaction to completion with a single threaded execution model, rather than paying for the overheads of isolation between concurrently executing statements. 

    Current RDBMSs have elaborate multi-threading systems to try to fully utilize CPU and disk resources. This allows several-to-many queries to be running in parallel. Moreover, they also have resource governors to limit the multiprogramming load, so that other resources (IP connections, file handles, main memory for sorting, etc.) do not become exhausted. These features are irrelevant in a single threaded execution model. 

    No resource governor is required in a single threaded system. In a single-threaded execution model, there is also no reason to have multi-threaded data structures. Hence the elaborate code required to support, for example, concurrent B-trees can be completely removed. This results in a more reliable system, and one with higher performance. 

    At this point, one might ask “What about long running commands?” In real-world OLTP systems, there aren’t any for

    two reasons: First, operations that appear to involve long-running transactions, such as a user inputting data for a purchase on a web store, are usually split into several transactions to keep transaction time short. In other words, good application design will keep OLTP queries small. Second, longer-running ad-hoc queries are not processed by the OLTP system; instead such queries are directed to a data warehouse system, optimized for this activity. 

    There is no reason for an OLTP system to solve a non-OLTP problem. Such thinking only applies in a “one size fits all” world.

     
  • ratneshparihar 4:13 am on August 5, 2014 Permalink | Reply  

    Research on possibility of RDBMS to have performance benchmarks like no-sql Databases (part 4) 

    We are continuing our discussion on new OLTP design from previous post.

    ( https://ratneshparihar.wordpress.com/2014/07/31/research-on-possibility-of-rdbms-to-have-performance-benchmarks-like-no-sql-databases-part-3/ )

    Why New OLTP Design Considerations?

    Currently the choice is limited when it comes to OLTP selection (MySql , Sql Server , Oracle ) . These OLTPs are leaders for decades but we know the specialized DBs are beating them in order of two or three . New OLTPs are coming every month  in market with new designs and claims to beat present OLTPs and it will make our job harder when it comes to selecting OLTP vendor . The performance alone can never be a criterion what new OLTP vendors are claiming primarily . This is the reason we are researching a hypothetical OLTP which will have new designs . In the end of blog series we will create a cheat sheet which will analyze the current vendors in term of our new design expectations.

    Today we will look into main memory.

    Main Memory

    In the late 1970’s a large machine had somewhere around a megabyte of main memory. Today, several Gbytes are common and large machines are approaching 100 Gbytes. In a few years a terabyte of main memory will not be unusual. Imagine a shared nothing grid system of 20 nodes, each with 32 Gbytes of main memory now, (soon to be 100 Gbytes), and costing less than $50,000. As such, any database less than a terabyte in size, is capable of main memory deployment now or in the near future.

    The overwhelming majority of OLTP databases are less than 1 Tbyte in size and growing in size quite slowly. For example, it is a telling statement that TPC-C requires about 100 Mbytes per physical distribution center (warehouse). A very large retail enterprise might have 1000 warehouses, requiring around 100 Gbytes of storage, which fits our envelope for main memory deployment.

    As such, we believe that OLTP should be considered a main memory market, if not now then within a very small number of years. Consequently, the current RDBMS vendors have disk oriented solutions for a main memory problem. In summary, 30 years of Moore’s law has antiquated the disk-oriented relational architecture for OLTP applications.

    Although there are some main memory database products on the market, such as TimesTen and  olidDB, these systems inherit the baggage of legacy RDBMS as well. This includes such features as a disk-based recovery log and dynamic locking, which, as we discuss in the following sections, impose substantial performance overheads.

    In next post we will look into Multithreading .

     
  • ratneshparihar 5:03 am on July 31, 2014 Permalink | Reply  

    Research on possibility of RDBMS to have performance benchmarks like no-sql Databases (part 3) 

    The following Diagram will represent how a futuristic OLTP design might look like .

    New OLTP Design

     

    In next posts we will discuss them one by one . for now refer to above diagram of thousand words.

     
  • ratneshparihar 5:13 am on July 29, 2014 Permalink | Reply  

    Research on possibility of RDBMS to have performance benchmarks like no-sql Databases (part 2) 

    There are clear evidences now that major RDBMS (row and column ) will be out performed by specialized engines and RDBMS will be left with business data processing(OLTP) and hybrid markets . but new OLTP systems are also booming who claims to beat RDBMS in OLTP as well . Most of RDBMS are 25 year old legacy code which follows “one size fits all ” should be retire and a complete new thoughts has to be put for next generation of RDBMS.

    Specilized DB

    In previous post we looked into some transaction and processing  bottlenecks in RDBMS .(https://ratneshparihar.wordpress.com/2014/07/28/research-on-possibility-of-rdbms-to-have-performance-benchmarks-like-no-sql-databases/)

    Today we will look into single-sited transactions .

    in many OLTP workloads every table except a single one called the root, has exactly one join term which is a n-1 relationship to its ancestor. Hence, the schema is a tree of 1-n relationships. We denote this class of schemas as tree schemas. Such schemas are popular; for example, customers produce orders, which have line items and fulfillment schedules.Tree schemas have an obvious horizontal partitioning over the nodes in a grid. Specifically, the root table can be range or hash partitioned on the primary key(s). Every descendant table can be partitioned such that all equi-joins in the tree span only a single site.

    In a tree schema, suppose every command in every transaction class has equality predicates on the primary key(s) of the root node (for example, in an e-commerce application, many commands will be rooted with a specific customer, so will include predicates like customer_id = 27). Using the horizontal partitioning discussed above, it is clear that in this case every SQL command in every transaction is local to one site. If, in addition, every command in each transaction class is limited to the same  single site . This valuable feature that every transaction can be run to completion at a single site is backbone of transaction managers in RDBMS .

    The single sited transaction will remain in any futuristic RDBMS but to include distributed processing and memory grids there has to be some changes which will address in some other day.

     
  • ratneshparihar 11:40 am on July 28, 2014 Permalink | Reply
    Tags: in memory databases, No-sql, RDBMS   

    Research on possibility of RDBMS to have performance benchmarks like no-sql Databases 

    Research on possibility of RDBMS to have performance benchmarks like no-sql Databases.

     
  • ratneshparihar 11:09 am on July 28, 2014 Permalink | Reply  

    Research on possibility of RDBMS to have performance benchmarks like no-sql Databases 

    What is research ?

    Research is complete reinvention of wheel to solve no of problems . A tweak or change in design to solve problems with less changes in Architecture is not research . For example features added to existing compiler of a language to support multiple cores is feature addition/versifying but writing a compiler  for multiple core is research .

    In this blog series  we will see how a futuristic RDBMS will look like if some of the bottle necks can be handled . 

    RDBMS

     

     

    Today we will look into some bottlenecks of RDBMS which are related to transaction and processing.

    If one assumes a grid of systems with main memory storage, builtin high availability, no user stalls, and useful transaction work under 1 millisecond, then the following conclusions become
    evident:

    1) A persistent redo log is almost guaranteed to be a significant performance bottleneck. Even with group commit, forced writes of commit records can add milliseconds to the runtime
    of each transaction.

    2) With redo gone, getting transactions into and out of the system is likely to be the next significant bottleneck. The overhead of JDBC/ODBC style interfaces will be onerous,
    and something more efficient should be used. In particular, running application logic – in the form of stored procedures – “in process” inside the database system, rather than the inter-process overheads implied by the traditional database client / server model.

    3) An undo log should be eliminated wherever practical, since it will also be a significant  bottleneck.

    4) Every effort should be made to eliminate the cost of traditional dynamic locking for concurrency control, which will also be a bottleneck.

    5) The latching associated with multi-threaded data structures is likely to be onerous. Given the short runtime of transactions, moving to a single threaded execution model will eliminate
    this overhead at little loss in performance. 

    6) One should avoid a two-phase commit protocol for distributed transactions, wherever possible, as network latencies imposed by round trip communications in 2PC often take on the order of milliseconds.

    Our ability to remove concurrency control, commit processing and undo logging depends on several characteristics of OLTP schemas and transaction workloads, a topic to which we will cover in next post.

     
    • ratneshparihar 8:24 am on August 6, 2014 Permalink | Reply

      got a comment
      http://www.sqlservercentral.com/Forums/Topic1599687-373-1.aspx#bm1599997
      ————————————————————————————————————————-
      The 6 points raised in the first post of teh series are all a bit boring/mundane and don’t, for me, address the title (which in any case is ambiguous) at all.

      Taking the title first, it’s not clear whether you are looking at the possiblity of benchmark results for RDBMS similar to those obtained for no-sql databases or for benchmarks of similar workloads on no-sql databases and on RDBMS.

      I’ll comment on your 6 numbered points in order:

      1) Abandoning a persistent redo log means abandoning the ability to recover from system breaks unless it means insisting that when and where where a log-based system would secure redo data in the log the system secures it in the main persistent data store. The performance penalty of securing data in main persistent data store at an early enough point to provide an acceptable recovery capability is far greater than the penalty of securing the data in a redo log unless the recovery requirement is extremely weak.

      2) The relational model makes no statement about what business logic can be included in the database, other than that those business rules which can be used to determine normality of schemas must be in the database. There are of course people who claim that no business rule should ever be in the database, which of course is nonsense because that would mean that no normalisation could ever be carried out. There are others who claim that no business logic other than that implemented as normalisation (in the form of keys and constraints) should ever be in the database, but that is not an RDBMS vs No-Sql distinction, the same people who make that claim about RDBMS make it also about no-sql databases; it will often be nonsense for RDBMS, since a single SQL statement carrying out a MERGE and a PIVOT a couple of joins and a projection or two actually embodies a large amount of business logic and the proponents of “no business logic in the database” are careful to stress that this business logic doesn’t count as business logic lest they been seen as totally inept. Principles of modular design tend to separate logic in such a way as to maximise the narrowness of inter-component interfaces, and also to ensure that components which run in different contexts are sufficiently large that the context switching overhead is not unacceptably great, and modular design is equally important in no-sql systems and in RDBMS. So I find it hard to believe that point 2 says anything at all about difference between RDBMS and No-Sql.

      3) An undo log can be eliminated only when it is guaranteed that the system never fails, unless it is acceptable for part of a transaction to be done and the rest not; moving money from one account to another is a nice example of where this is usually not acceptable. The RDBMS view is that it is so rare for this to be acceptable that the simplicity of always having the undo log rather than sometimes having it and sometimes not is better than the complication of not having it in the rare cases where it is not needed (for example where no transaction ever updates more than one value in the database). A different argument can be made for writing the undo log as rarely as possible – that nothing should be written to the undo log until it is neccesary to write data which may need redoing to permanent store; but that’s a considerably weaker position that your point 3, and it’s still possible that on the whole it is better to keep things simple and always log undo information.

      4) Sql Server currently provides a range of isolation levels, not all of which use traditional locking. Depending on the workload, tradiditional locking may deliver better or worse performance than maintaining mulitple consistent views. The idea that traditinal locking always costs more than the alternatives is pure nonsense.

      5) Single threaded execution will only deliver decent performance when all IO latencies are very low indeed. Asynchronous IO at some level is necessary for most workloads to run efficiantly on most hardware, and Asynchronous IO won’t deliver the required concurrency unless it is used to assist multi-threading. This is true whether there is a formal RDBMS or N-SqL database or just a bunch of ad-hoc files with no formal database concept.

      6) Two-phase commit should be avoided where possible is an overstatement; it should not be avoided when it is more efficient and/or more acceptable than the alternatives. It is always possible to avoid it, although it may or may not be efficient or accptable to do so (for example the system could be made to hold all data at one central location, so that there would no requirement at all for two-phase commit).

      Tom
      ————————————————————————————————————————-

      Like

      • ratneshparihar 8:55 am on August 6, 2014 Permalink | Reply

        Thank you Tom for providing feedback .
        Yes , title is little out of context and i put that way intentionally (backfired though) because i am hearing a lot that for a RDBMS will be difficult to have the Webscale like no-sql Dbs are providing , But it is difficult to understand why they can’t . Now there are vendors in OLTP market to claim have that and in order to evaluate we have to understand how a futuristic RDBMS can handle if Moder hardware (mostly memory and cpu ) is cheap .

        Simply , i am not bench marking anything here just putting some thoughts what criterion will help us to decide next OLTP for our future application . sql-server , oracle are written on 30 year old architecture and complete redesign is required if they are competing with modern OLTPs.

        I am completely agree on your feedback for all 6 points as far as current state of RDBMS are concern , but you can visualize a OLTP completely on MainMemory and other non oltp issues on Datawarehouse (columnar ) then all transaction will be very small and all processing will indeed occur in mainmemory and no need of file handles either for accessing datapages for accessing log files and the whole database will be a cluster of multiple PCs .

        If you consider above scenario and look again into 6 points i mentioned , then they might make sense . Since both distributed processing and memory is getting cheap and feasible the whole ecosystem around the RDBMS is likely to be changed and vendors has started to build systems without long lasting transaction logs , two phase commits and locking ofcourse .

        Like

c
Compose new post
j
Next post/Next comment
k
Previous post/Previous comment
r
Reply
e
Edit
o
Show/Hide comments
t
Go to top
l
Go to login
h
Show/Hide help
shift + esc
Cancel