SQL Knowledge For Software Testers
Basics of the SELECT Statement
In a relational database, data is stored in tables. An example table would relate Social Security Number, Name, and Address:
|512687458||Joe||Smith||83 First Street||Howard||Ohio|
|758420012||Mary||Scott||842 Vine Ave.||Losantiville||Ohio|
|102254896||Sam||Jones||33 Elm St.||Paris||New York|
|876512563||Sarah||Ackerman||440 U.S. 110||Upton||Michigan|
Now, let’s say you want to see the address of each employee. Use the SELECT statement, like so:
SELECT FirstName, LastName, Address, City, State
The following is the results of your query of the database:
|First Name||Last Name||Address||City||State|
|Joe||Smith||83 First Street||Howard||Ohio|
|Mary||Scott||842 Vine Ave.||Losantiville||Ohio|
|Sam||Jones||33 Elm St.||Paris||New York|
|Sarah||Ackerman||440 U.S. 110||Upton||Michigan|
To get all columns of a table without typing all column names, use:
SELECT * FROM TableName;
Note: Each database management system (DBMS) has different methods for logging in to the database and entering SQL commands.
To further discuss the SELECT statement, let’s look at a new example table:
a) Relational Operators
There are six Relational Operators in SQL, and after introducing them, we’ll see how they’re used:
|< or !=||Not Equal|
|<=||Less Than or Equal To|
|>=||Greater Than or Equal To|
The WHERE clause is used to specify that only certain rows of the table are displayed, based on the criteria described in that WHERE clause.
If you wanted to see the EMPLOYEEIDNO’s of those making at or over $50,000, use the following:
SELECT EMPLOYEEIDNO FROM EMPLOYEESTATISTICSTABLE WHERE SALARY >= 50000;
Notice that the >= (greater than or equal to) sign is used, as we wanted to see those who made greater than $50,000, or equal to $50,000, listed together. This displays:
The WHERE description, SALARY >= 50000, is known as a condition (an operation which evaluates to True or False). The same can be done for text columns:
WHERE POSITION = ‘Manager’;
This displays the ID Numbers of all Managers.
More Complex Conditions: Compound Conditions / Logical Operators
The AND operator joins two or more conditions, and displays a row only if that row’s data satisfies ALL conditions listed (i.e. all conditions hold true).
For example, to display all staff making over $40,000, use:
WHERE SALARY > 40000 AND POSITION = ‘Staff’;
The OR operator joins two or more conditions, but returns a row if ANY of the conditions listed hold true.
To see all those who make less than $40,000 or have less than $10,000 in benefits, listed together, use the following query:
WHERE SALARY < 40000 OR BENEFITS < 10000;
AND and OR can be combined, for example:
WHERE POSITION = ‘Manager’ AND SALARY > 60000 OR BENEFITS > 12000;
First, SQL finds the rows where the salary is greater than $60,000 and the position column is equal to Manager, then taking this new list of rows, SQL then sees if any of these rows satisfies the previous AND condition or the condition that the Benefits column is greater than $12,000. Subsequently, SQL only displays this second new list of rows, keeping in mind that anyone with Benefits over $12,000 will be included as the OR operator includes a row if either resulting condition is True. Also note that the AND operation is done first.
IN & BETWEEN
An easier method of using compound conditions uses IN or BETWEEN.
For example, if you wanted to list all managers and staff:
WHERE POSITION IN (‘Manager’, ‘Staff’);
or to list those making greater than or equal to $30,000, but less than or equal to $50,000, use:
WHERE SALARY BETWEEN 30000 AND 50000;
To list everyone not in this range, try:
WHERE SALARY NOT BETWEEN 30000 AND 50000;
Similarly, NOT IN lists all rows excluded from the IN list.
Additionally, NOT’s can be thrown in with AND’s & OR’s, except that NOT is a unary operator (evaluates one condition, reversing its value, whereas, AND’s & OR’s evaluate two conditions), and that all NOT’s are performed before any AND’s or OR’s.
SQL Order of Logical Operations (each operates from left to right)
Look at the EmployeeStatisticsTable, and say you wanted to see all people whose last names started with “S”; try:
WHERE LASTNAME LIKE ‘S%’;
The percent sign (%) is used to represent any possible character (number, letter, or punctuation) or set of characters that might appear after the “S”.
To find those people with LastName’s ending in “S”, use ‘%S’, or if you wanted the “S” in the middle of the word, try ‘%S%’.
The ‘%’ can be used for any characters in the same position relative to the given characters.
SQL Knowledge for Software ProfessionalsSQL Joins
Good database design suggests that each table lists data only about a singleentity, and detailed information can be obtained in a relational database, by using additional tables, and by using a join.
First, take a look at these example tables:
First, let’s discuss the concept of keys.
A primary key is a column or set of columns that uniquely identifies the rest of the data in any given row. For example, in the AntiqueOwners table, the OwnerID column uniquely identifies that row. This means two things: no two rows can have the same OwnerID, and, even if two owners have the same first and last names, the OwnerID column ensures that the two owners will not be confused with each other, because the unique OwnerID column will be used throughout the database to track the owners, rather than the names.
A foreign key is a column in a table where that column is a primary key of another table, which means that any data in a foreign key column must have corresponding data in the other table where that column is the primary key.
In DBMS-speak, this correspondence is known as referential integrity. For example, in the Antiques table, both the BuyerID and SellerID are foreign keys to the primary key of the AntiqueOwners table (OwnerID; for purposes of argument, one has to be an Antique Owner before one can buy or sell any items), as, in both tables, the ID rows are used to identify the owners or buyers and sellers, and that the OwnerID is the primary key of the AntiqueOwners table. In other words, all of this “ID” data is used to refer to the owners, buyers, or sellers of antiques, themselves, without having to use the actual names.
Performing a Join
The purpose of these keys is so that data can be related across tables, without having to repeat data in every table–this is the power of relational databases.
For example, you can find the names of those who bought a chair without having to list the full name of the buyer in the Antiques table…you can get the name by relating those who bought a chair with the names in the AntiqueOwners table through the use of the OwnerID, which relates the data in the two tables.
To find the names of those who bought a chair, use the following query:
SELECT OWNERLASTNAME, OWNERFIRSTNAME
FROM ANTIQUEOWNERS, ANTIQUES
WHERE BUYERID = OWNERID AND ITEM = ‘Chair’;
Note the following about this query…notice that both tables involved in the relation are listed in the FROM clause of the statement.
In the WHERE clause, first notice that the ITEM = ‘Chair’ part restricts the listing to those who have bought (and in this example, thereby own) a chair. Secondly, notice how the ID columns are related from one table to the next by use of the BUYERID = OWNERID clause. Only where ID’s match across tables and the item purchased is a chair (because of the AND), will the names from the AntiqueOwners table be listed. Because the joining condition used an equal sign, this join is called an equijoin. The result of this query is two names: Smith, Bob & Fowler, Sam.
Dot notation refers to prefixing the table names to column names, to avoid ambiguity, as follows:
SELECT ANTIQUEOWNERS.OWNERLASTNAME, ANTIQUEOWNERS.OWNERFIRSTNAME
FROM ANTIQUEOWNERS, ANTIQUES
WHERE ANTIQUES.BUYERID = ANTIQUEOWNERS.OWNERID AND ANTIQUES.ITEM = ‘Chair’;
As the column names are different in each table, however, this wasn’t necessary.
DISTINCT and Eliminating Duplicates
Let’s say that you want to list the ID and names of only those people who have sold an antique.
Obviously, you want a list where each seller is only listed once–you don’t want to know how many antiques a person sold, just the fact that this person sold one (for counts, see the Aggregate Function section below).
This means that you will need to tell SQL to eliminate duplicate sales rows, and just list each person only once.
To do this, use the DISTINCT keyword.
First, we will need an equijoin to the AntiqueOwners table to get the detail data of the person’s LastName and FirstName. However, keep in mind that since the SellerID column in the Antiques table is a foreign key to the AntiqueOwners table, a seller will only be listed if there is a row in the AntiqueOwners table listing the ID and names. We also want to eliminate multiple occurrences of the SellerID in our listing, so we use DISTINCT on the column where the repeats may occur (however, it is generally not necessary to strictly put the Distinct in front of the column name).
To throw in one more twist, we will also want the list alphabetized by LastName, then by
FirstName (on a LastName tie). Thus, we will use the ORDER BY clause:
SELECT DISTINCT SELLERID, OWNERLASTNAME, OWNERFIRSTNAME
FROM ANTIQUES, ANTIQUEOWNERS
WHERE SELLERID = OWNERID
ORDER BY OWNERLASTNAME, OWNERFIRSTNAME;
In this example, since everyone has sold an item, we will get a listing of all of the owners, in alphabetical order by last name.
For future reference (and in case anyone asks), this type of join is considered to be in the category of inner joins.
Aliases & In/Sub queries
In this section, we will talk about Aliases, In and the use of sub queries, and how these can be used in a 3-table example.
First, look at this query which prints the last name of those owners who have placed an order and what the order is, only listing those orders which can be filled (that is, there is a buyer who owns that ordered item):
SELECT OWN.OWNERLASTNAME Last Name, ORD.ITEMDESIRED Item Ordered
FROM ORDERS ORD, ANTIQUEOWNERS OWN
WHERE ORD.OWNERID = OWN.OWNERID
AND ORD.ITEMDESIRED IN
Last Name Item Ordered
Miscellaneous SQL Statements
Five important aggregate functions: SUM, AVG, MAX, MIN, and COUNT.
Aggregate functions summarize the results of a query, rather than listing all of the rows.
- SUM () gives the total of all the rows, satisfying any conditions, of the given column, where the given column is numeric.
- AVG () gives the average of the given column.
- MAX () gives the largest figure in the given column.
- MIN () gives the smallest figure in the given column.
- COUNT(*) gives the number of rows satisfying the conditions.
Looking at the tables at the top of the document, let’s look at three examples:
SELECT SUM(SALARY), AVG(SALARY)
This query shows the total of all salaries in the table, and the average salary of all of the entries in the table.
WHERE POSITION = ‘Manager’;
This query gives the smallest figure of the Benefits column, of the employees who are Managers, which is 12500.
WHERE POSITION = ‘Staff’;
This query tells you how many employees have Staff status (3).
In SQL, you might (check your DBA) have access to create views for yourself. What a view does is to allow you to assign the results of a query to a new, personal table, that you can use in other queries, where this new table is given the view name in your FROM clause.
When you access a view, the query that is defined in your view creation statement is performed (generally), and the results of that query look just like another table in the query that you wrote invoking the view.
For example, to create a view:
CREATE VIEW ANTVIEW AS SELECT ITEMDESIRED FROM ORDERS;
Now, write a query using this view as a table, where the table is just a listing of all Items Desired from the Orders table:
FROM ANTIQUES, ANTVIEW
WHERE ITEMDESIRED = ITEM;
This query shows all SellerID’s from the Antiques table where the Item in that table happens to appear in the Antview view, which is just all of the Items Desired in the Orders table.
The listing is generated by going through the Antique Items one-by-one until there’s a match with the Antview view. Views can be used to restrict database access, as well as, in this case, simplify a complex query.
Creating New Tables
All tables within a database must be created at some point in time…let’s see how we would create the Orders table:
CREATE TABLE ORDERS
(OWNERID INTEGER NOT NULL,
ITEMDESIRED CHAR(40) NOT NULL);
This statement gives the table name and tells the DBMS about each column in the table.
Please note that this statement uses generic data types, and that the data types might be different, depending on what DBMS you are using. As usual, check local listings. Some common generic data types are:
- Char(x) – A column of characters, where x is a number designating the maximum number of characters allowed (maximum length) in the column.
- Integer – A column of whole numbers, positive or negative.
- Decimal(x, y) – A column of decimal numbers, where x is the maximum length in digits of the decimal numbers in this column, and y is the maximum number of digits allowed after the decimal point. The maximum (4,2) number would be 99.99.
- Date – A date column in a DBMS-specific format.
- Logical – A column that can hold only two values: TRUE or FALSE.
One other note, the NOT NULL means that the column must have a value in each row. If NULL was used, that column may be left empty in a given row.
Let’s add a column to the Antiques table to allow the entry of the price of a given Item (Parentheses optional):
ALTER TABLE ANTIQUES ADD (PRICE DECIMAL(8,2) NULL);
The data for this new column can be updated or inserted as shown later.
To insert rows into a table, do the following:
INSERT INTO ANTIQUES VALUES (21, 01, ‘Ottoman’, 200.00);
This inserts the data into the table, as a new row, column-by-column, in the pre-defined order.
Instead, let’s change the order and leave Price blank:
INSERT INTO ANTIQUES (BUYERID, SELLERID, ITEM)
VALUES (01, 21, ‘Ottoman’);
Let’s update a Price into a row that doesn’t have a price listed yet:
UPDATE ANTIQUES SET PRICE = 500.00 WHERE ITEM = ‘Chair’;
This sets all Chair’s Prices to 500.00. As shown above, more WHERE conditionals, using AND, must be used to limit the updating to more specific rows. Also, additional columns may be set by separating equal statements with commas.
Let’s delete this new row back out of the database:
DELETE FROM ANTIQUES
WHERE ITEM = ‘Ottoman’;
But if there is another row that contains ‘Ottoman’, that row will be deleted also. Let’s delete all rows (one, in this case) that contain the specific data we added before:
DELETE FROM ANTIQUES
WHERE ITEM = ‘Ottoman’ AND BUYERID = 01 AND SELLERID = 21;
Indexes allow a DBMS to access data quicker (please note: this feature is nonstandard/not available on all systems). The system creates this internal data structure (the index) which causes selection of rows, when the selection is based on indexed columns, to occur faster.
This index tells the DBMS where a certain row is in the table given an indexed-column value, much like a book index tells you what page a given word appears.
Let’s create an index for the OwnerID in the AntiqueOwners table:
CREATE INDEX OID_IDX ON ANTIQUEOWNERS (OWNERID);
Now on the names:
CREATE INDEX NAME_IDX ON ANTIQUEOWNERS (OWNERLASTNAME, OWNERFIRSTNAME);
To get rid of an index, drop it:
DROP INDEX OID_IDX;
By the way, you can also “drop” a table, as well (careful!–that means that your table is deleted). In the second example, the index is kept on the two columns, aggregated together–strange behavior might occur in this situation…check the manual before performing such an operation.
Some DBMS’s do not enforce primary keys; in other words, the uniqueness of a column is not enforced automatically. What that means is, if, for example, I tried to insert another row into the AntiqueOwners table with an OwnerID of 02, some systems will allow me to do that, even though we do not, as that column is supposed to be unique to that table (every row value is supposed to be different). One way to get around that is to create a unique index on the column that we want to be a primary key, to force the system to enforce prohibition of duplicates:
CREATE UNIQUE INDEX OID_IDX ON ANTIQUEOWNERS (OWNERID);
GROUP BY & HAVING
One special use of GROUP BY is to associate an aggregate function (especially COUNT; counting the number of rows in each group) with groups of rows. First, assume that the Antiques table has the Price column, and each row has a value for that column. We want to see the price of the most expensive item bought by each owner. We have to tell SQL to group each owner’s purchases, and tell us the maximum purchase price:
SELECT BUYERID, MAX(PRICE)
GROUP BY BUYERID;
Now, say we only want to see the maximum purchase price if the purchase is over $1000, so we use the HAVING clause:
SELECT BUYERID, MAX(PRICE)
GROUP BY BUYERID
HAVING PRICE > 1000;
More Sub Queries
Another common usage of sub queries involves the use of operators to allow a Where condition to include the Select output of a sub query.
First, list the buyers who purchased an expensive item (the Price of the item is $100 greater than the average price of all items purchased):
WHERE PRICE >
(SELECT AVG(PRICE) + 100
The subquery calculates the average Price, plus $100, and using that figure, an OwnerID is printed for every item costing over that figure. One could use DISTINCT BUYERID, to eliminate duplicates.
List the Last Names of those in the AntiqueOwners table, ONLY if they have bought an item:
WHERE OWNERID IN
(SELECT DISTINCT BUYERID
The subquery returns a list of buyers, and the Last Name is printed for an Antique Owner if and only if the Owner’s ID appears in the subquery list (sometimes called a candidate list). Note: on some DBMS’s, equals can be used instead of IN, but for clarity’s sake, since a set is returned from the subquery, IN is the better choice.
For an Update example, we know that the gentleman who bought the bookcase has the wrong First Name in the database…it should be John:
SET OWNERFIRSTNAME = ‘John’
WHERE OWNERID =
WHERE ITEM = ‘Bookcase’);
First, the subquery finds the BuyerID for the person(s) who bought the Bookcase, then the outer query updates his First Name.
Remember this rule about sub queries:
when you have a sub query as part of a WHERE condition, the Select clause in the sub query must have columns that match in number and type to those in the Where clause of the outer query. In other words, if you have “WHERE ColumnName = (SELECT…);”, the Select must have only one column in it, to match the ColumnName in the outer Where clause, and they must match in type (both being integers, both being character strings, etc.).
EXISTS & ALL
EXISTS uses a sub query as a condition, where the condition is True if the sub query returns any rows, and False if the sub query does not return any rows; this is a nonintuitive feature with few unique uses. However, if a prospective customer wanted to see the list of Owners only if the shop dealt in Chairs, try:
SELECT OWNERFIRSTNAME, OWNERLASTNAME
WHERE ITEM = ‘Chair’);
If there are any Chairs in the Antiques column, the subquery would return a row or rows, making the EXISTS clause true, causing SQL to list the Antique Owners. If there had been no Chairs, no rows would have been returned by the outside query.
ALL is another unusual feature, as ALL queries can usually be done with different, and possibly simpler methods; let’s take a look at an example query:
SELECT BUYERID, ITEM
WHERE PRICE >= ALL
This will return the largest priced item (or more than one item if there is a tie), and its buyer. The sub query returns a list of all Prices in the Antiques table, and the outer query goes through each row of the Antiques table, and if its Price is greater than or equal to every (or ALL) Prices in the list, it is listed, giving the highest priced Item. The reason “=” must be used is that the highest priced item will be equal to the highest price on the list, because this Item is in the Price list.