Our Blog

Pentaho Reporting Export Options

Jagdeesh SS

This is a part of a series of blogs that provide quick tips and tricks to enable better usage of Pentaho. We start with this piece on improvising your Pentaho reporting export options. Have a look at the screenshot below.

Pentaho Reporting

This is a snapshot of a PRPTI report exported to excel. As you can see the date and timestamp reserve one entire row before the column headings come in. Certain consumers of the report may not want this display. This is because they want to economize real estate as well as to keep a focused span of attention on the data. Now have a look at the screenshot below.

Pentaho Reporting

As you can see the page count consumes an entire row of display, which is again inefficient usage of real estate. What if you do not want to have the date, timestamp and page count displayed?

Pentaho Reporting

Tags :

PDI best practices – Why avoid insert/update step

Sandeep Kemparaju

We are looking at a transformation designed to use insert/update step to perform data load on the target table. Pentaho Kettle step Insert/Update works as follows

Let us take an example of loading a target table. Assume that there is a daily load of 100k records into a target table with 10 million records and every incoming row from the source table looks up against all the 10 million records in the target table. This process continues for all the 100k input records.
Pentaho Kettle
Insert/Update step involves multiple round trips to the database depending on the commit size.

Performance of most steps depends heavily upon the number of round trips and speed of round trip. Speed depends on a combination of speed of network, the latency on the network and the performance of database.

Find below the wiki link to explain the Insert/Update step in Pentaho Kettle


As there is an extra lookup process involved in this step, it definitely slows down the process as it needs to run through the entire lot of records. The look up is to check for a matching record (to update) and if none matches (to perform insert).

This step is also slower than the regular “Table Output step”.

There is very little one can do on the network latency side. Reduction of number of round trips to the database is the first thing that you should consider. This can be accomplished by having a mechanism to load lookup data in memory (cache)

In Pentaho Kettle, most Lookup steps have options to cache data. Steps such as “Stream Lookup” & “Merge Join” are some of them.

In the above scenario, design the transformation with “Merge join” step to perform operations within a single statement. By doing this, you will minimize the number of times data in the source and target tables are processed. This helps in less memory consumption and good performance. You need to ensure that the input data is sorted to perform the “Merge join” step.

Find below the wiki link to explain the Merge Join step.


Tags :