Categories
Hints and Tips SQL SQL Server

A SQL Tip .. In-Line UNION for previous “n” Months

Here’s an interesting requirement:

“A list of Business Units with the period (YYYYMM) and the monthname (e.g. October) against each one for the prior 3 months”.

A simple problem, but here’s my solution:

SELECT SETCNTRLVALUE AS [BUSINESS_UNIT] , 
CONVERT(VARCHAR(6),PREVIOUS_MONTHS.MYDATE,112) AS [YYYYMM],
DATENAME(MONTH,PREVIOUS_MONTHS.MYDATE) AS [MONTH_NAME]
FROM PS_SET_CNTRL_TBL S,
( SELECT DATEADD(MONTH,-1,GETDATE()) AS [MYDATE]
UNION ALL
SELECT DATEADD(MONTH,-2,GETDATE()) 
UNION ALL
SELECT DATEADD(MONTH,-3,GETDATE()) 
) PREVIOUS_MONTHS
ORDER BY 1,2

Of course, the in-line inner UNION ALL could be created as a view if it was going to be reused, but it suffices for this example. Key points to note:

  1. The use of UNION ALL to eliminate the SORT you get with UNION – we know the values are unique but the optimizer doesn’t (it probably should though)!
  2. Perform the bare functions on the date value (CONVERT,DATENAME) in the outer loop rather than create multiple columns in the in-line SQL.
  3. Yes this is a cartesian product but that is perfectly acceptable when you know your data.
  4. This example is for the prior 3 months with reference to GETDATE() – extending to prior “n” months and a different reference date is trivial. But be careful what table you use to get the reference date). Typically you might want the “n” prior months before (say) the MAX() date on some transactional table – if so make sure you have an index to make finding that value as quick as possible.
  5. This is SQL server syntax – on Oracle you would have to SELECT from DUAL etc.

Enjoy.

 

 

 

Categories
Peoplesoft PeopleTools Performance SQL Server Tools Tuning Utilities

_WA_Sys Stats or Hand Crafted?

SQL Server generates histogram statistics automatically whenever a column is referenced in a predicate and it is not already the leading column in an existing histogram statistic. By design, indexes create histogram statistics automatically so any column you have that is the leading column in an index will already have histogram statistics.

The autogenerated statistics are prefixed _WA_Sys_ followed by:

  • the column name (in older SQL Server versions) or the column number in hex (for SQL Server 2008 onwards) e.g. EMPLID or 00000001
  • the object_id of the table in hex e.g. 57490C28

separated by an underscore. So auto-generated statistics look like this:

  • _WA_Sys_DESCR_57490C28 or _WA_Sys_00000013_57490C28

The issue for me as a PeopleSoft Administrator/DBA is whether I should keep these statistics or recreate them as “user” statistics with more sensible names. Oddly enough the “old style” naming convention which included the column name was actually better than the “hex” version we have now. After all what is column 00000014 (decimal 20) in table X?

My personal preference is to process the system generated statistics and recreate them as statistic “column name” on table X. I have a perl script that generates the relevant DROP/CREATE STATISTICS commands for an entire PeopleSoft database. The output looks like this:

--- Remove statistic _WA_Sys_AUDITRECNAME_5921A398 for column AUDITRECNAME on table PSRECDEFN
 
DROP STATISTICS [dbo].[PSRECDEFN].[_WA_Sys_AUDITRECNAME_5921A398];
 
--- Create User Statistic for column AUDITRECNAME (+ 1 keys) on table PSRECDEFN
 
CREATE STATISTICS [AUDITRECNAME] ON [dbo].[PSRECDEFN] ([AUDITRECNAME],[RECNAME]) WITH FULLSCAN;

Why do I like to do this? Well there are a number of reasons:

  1. The naming convention is clearer to me – open the statistics tree within a table in object explorer and you get a list of column names plus the statistics created for any indexes.
  2. Once I have changed to “user” statistics I can query for any new _WA_Sys_ statistics (yes I leave auto-generate switched on). This shows me any “new” query activity that accesses via previously unused predicate columns. This can help in deciding if perhaps new indexes are required.
  3. I can ensure the existing statistic are recreated using a “WITH FULLSCAN” not just sampled data. For many PeopleSoft systems this is essential – data can often be very skewed and sampling can miss this on occasion.
  4. I like to add in the “key” columns of the table to the histogram to provide better density information for the optimizer.  This can be essential information when intermediate key fields have a single value e.g. EMPL_RCD in HR is typically 0 in most cases.

If you would like to discuss this in more detail, leave a comment.

Categories
Hints and Tips SQL SQL Server

GETDATE()-1

Ever noticed some “odd” predicates in execution plans using GETDATE()-n where “n” is the number of days? Things like:

SOME_DATETIME_FIELD >= GETDATE()-‘1900-01-02 00:00:00.000’

It may look a bit odd but it does make sense where you think about the data type precedence rules in SQL Server. In this case, as GETDATE() is a DATETIME, there is an implicit conversion of the “n” into a DATETIME. So, if we run:

SELECT CAST(1 AS DATETIME)

what do we get? ‘1900-01-02 00:00:00.000’

And yes, “2” gives ‘1900-01-03 00:00:00.000’ as you might expect.

As the subtraction of the two dates is actually just using decimal subtraction, the GETDATE()-1 does work to give you “yesterday”. It’s just not quite as clear (in my view) as using:

DATEADD(day,-1,GETDATE())

Categories
Hints and Tips Oracle SQL Server Tuning

Oracle vs SQL Server Terminology

Full Table Scan == Clustered Index Scan

In Oracle we are generally taught that a Full Table Scan is a bad thing. Whilst this isn’t always true (it depends on so many factors), the terminology used in Oracle makes it sound bad. With SQL server we have the concept of a Clustered Index (which is essentially just the table itself) but the equivalent terminology we see in the query plan is “Clustered Index Scan”. This sounds almost like a good thing to those from an Oracle background. But it isn’t. It’s a full table scan. 🙂

Categories
Peoplesoft PeopleTools SQL SQL Server Tuning

The hidden cost of CONVERT_IMPLICIT() …

There are numerous articles on the web about CONVERT_IMPLICIT() and how it can impact performance of queries in SQL Server. Recently I had cause to look at a SQL statement that referenced tables in two databases on the same SQL Server instance but running on different Peopletools releases. As one system is on PeopleTools 8.49 and the other is on 8.53, there is an underlying difference in how character table columns are created – 8.49 uses CHAR(), 8.53 uses NVARCHAR().

However, when you join tables from the 8.49 system to the 8.53 system on a character column such as EMPLID, SQL Server data type precedence rules will silently add a CONVERT_IMPLICIT() around the 8.49 CHAR() column to “upgrade it” to an NVARCHAR(). Therein lies a performance issue – with large tables on the 8.49 side we see significant spills to TempDB and even TempDB exhaustion in the worst case. Execution plans are adversely affected as a result of the loss of cardinality information as well.

The fix? CAST( … AS CHAR()) around the 8.53 column e.g.

 

1
WHERE CAST(A.EMPLID AS CHAR(11)) = B.EMPLID 

The moral of course of the story is always check your data types – especially if you are joining pre-8.5x and post-8.5x PeopleTools versions.

This is, of course, a generic issue. The Case for Specifying Data Types and Query Performance is an excellent article on the problems you might encounter using .Net or ORM database abstraction layers if you fail to specify your data types correctly.

 

Categories
SQL Server Trace Flags

Trace Flag 2371

This trace flag (available in SQL Server 2008 R2 SP1 and later) alters the threshold calculation for auto statistics update on SQL Server instances so that larger tables do not require as many updates before statistics are updated i.e. the old 20% + 500 rows calculation does not apply for larger tables.

Personally, I would always set this as a start-up parameter on instances hosting PeopleSoft databases as in general those systems have a small number of very large tables which typically also have significant columns with a skewed data distribution.

Read more here

Categories
Databases Peoplesoft PeopleTools SQL Server

PeopleSoft Integers

Peoplesoft allows you to create integer fields by specifying a number field with a given number of “Integer Positions” and Zero “Decimal Positions”. Application Designer maps these definitions at table create time to the underlying integer data types in the database. The data type the underlying table column gets depends on the number of integer positons specified. For example, on SQL Server any integer field up to 5 digits will map to a SMALLINT in the database. Integers with 6 to 10 digits will map to an INT. Beyond that a decimal field is used.

This introduces some interesting “features” of these fields:

  • If you define a two digit unsigned integer in Application Designer, then Peoplesoft will limit the input values to 00 through 99. But the column definition in the database will allow positive and negative values in the range -32,768 to +32,767. Any attempt to insert values outside this range will elicit a Arithmetic Overflow error from the database as you would expect.
  • As the database can accept values above the range that Peoplesoft allows, inserts into the tables using SQL in an Application Engine could potentially create values that would not display correctly within the application itself.
  • A 5 digit integer in PeopleSoft can have values up to 99999 – but the database will error above +32,767. A nice little “gotcha” there … 🙂
  • More interestingly if you decided to change a two digit unsigned integer field to (say) a 5 digit integer you would not need to alter the underlying table. In fact, Application Designer would not even generate a script for you as there would be nothing to change – 1 to 5 digit integers all map to a SMALLINT. If you increased the field to 6 or more integers then a change would be required as this would be mapped to an INT or even a DECIMAL in the database.
Categories
Configuration Peoplesoft Performance SQL Server

Trace Flag 4136 and PeopleSoft

Trace flag 4136 effectively adds an OPTION(OPTIMIZE FOR UNKNOWN) to all SQL statements. This has the effect of making the optimizer ignore histogram statistics in generating execution plans.

If you have extremely skewed data distributions this can actually result in a much better execution plan – it essentially gives the impression of more “consistent” performance by ensuring that you don’t end up with an execution plan suited to only one subset of your data. But it will not necessarily be the best execution plan you could manage,

Categories
Peoplesoft SQL Server

Read Committed Snapshot and PeopleSoft

  1. Turn it on.
  2. And if you decide to create a database manually and not via the PeopleSoft scripts – remember to turn it on.
Categories
Peoplesoft SQL Server Tuning

Trace Flag 4199 and PeopleSoft

Whilst I have a general rule not to use SQL Server trace flags unless I actually have the underlying issue, there is one exception:

Trace flag 4199

It switches on all the Query Optimizer (QO) fixes and saves you turning on the numerous individual QO fixes one by one. This trace flag is very well documented on the web (e.g. at Microsoft) and many people consider it essential for their systems. I do too.

One of the interesting comments in the above Microsoft article (last updated in 2012, but present in the article from the original in 2010) is the statement that future query optimizer fixes will not have individual trace flags of their own. So you are pretty much forced to use it in preference to individual trace flags going forwards.

PeopleSoft and TF 4199

In my experience, TF4199 is essential for PeopleSoft systems. Some of the early 2005/2008 fixes covered by TF 4199 are fundamental to good query execution plans in many areas of Peoplesoft. In particular, the fixes related to Fast Forward Read Only cursors are especially relevant as Peoplesoft makes extensive use of them.

Note: Trace flag 4199 is not a panacea for all your PeopleSoft performance problems – I have seen it slow down some SQL statements as well. But typically the affected statements were user queries that were badly designed in the first place 🙂