Chapter 6 Discrete Structures Notes
See Notes on Relations as a starting point.
This covers some of the points in 6.1 and 6.5. In this chapter, we will talk about
relations and graphs together when the same properties apply.
Some thought on partially ordered sets and their use in determining perquisites.(Homework Assignment)
Some general observations on relations:
Relations are subsets of Cartesian products (the set of all sets that connect on set to another).
Relations can be functions if the values in the first set (domain) uniquely define those in the second set (codomain).
Relations can be looked at as graphs.
Relations can be looked at as matrices.
Relations can be views as having set operations (called relational algebra).
Relations can be viewed as tables.
Relational Algebra
| Operations | Set Notation | SQL Notation | Comments | |
| union | A U B | UNION | Combines rows of relation A with rows of relation B | |
| intersection | A ^ B | JOIN | Combines those rows of A and B that are alike. | |
| difference | A - B | LEFT JOIN/RIGHT JOIN | Combines the rows of A that are not in B | |
| negation | ~A | NOT | Takes the complement | |
| projection | selection rows | SELECT | Selects columns from a relationship | |
| selection | selecting columns | WHERE | Selects rows that meet the WHERE condition. | |
| join | Natural Join | A^B | A JOIN B ON condition | Combines the rows that are in relation A and relation B that match the join condition. |
| Left Join | A-B | A LEFT JOIN N ON condition | Combines all the data that is in relation A with those items in B that match the join condition. | |
| Right Join | B-A | A RIGHT JOIN B ON condition | Combines all the data that is in relation B with those items in A that match the join condition. | |
SELECT x.id, x.b, y.id, y.c
FROM x JOIN y ON x.id=y.id
Selects columns id and b from relation X and id and c from relation Y and combines them
into a relation id, b, c under the condition that the id in X matches the id in Y.
SELECT x.id, x.b, y.id, y.c
FROM x LEFT JOIN y ON x.id=y.id
Selects columns id and b from relation X and id and c from relation Y and combines them
into a relation id, b, c under the condition that all the id's in X are included and those
in Y that are also in X are included.
From these examples, you can see how you can build up rather complicated
expressions to get the information that you want.
Some other SQL statements are ORDER BY (which sorts the results) GROUP BY (which groups
rows) and HAVING (which selects grouped conditions). There are basic functions such as
COUNT and SUM and many SQL's have extended functions (such as those that are included in
MS ACCESS) that allow you to use any BASIC function.
But the basics remain: what is going of here is that you are using the basic operations of relational algebra to group, order and select data and combine data across relations. What you see in this chapter is some of the symbolic ways of expressing these ideas. Section 2 has some real examples.
A note of functional dependencies: In database design, a function dependency means that a column in a table is determined by another column. For example, A student's SSN determines the student's name so we say that name is functionally dependent on SSN. In database design, we would like all the non-key columns to be functionally dependent only on the key. If they are dependent on each other, we have transitive dependencies.
A transitive dependency means that one column determines another and that column then determines a third column. For example, a SSN may define a class the student is taking and the class determine the room. In this relation, A determines B, B determines C so A determines C. In database design, this is not a good idea since if we remove the class from the student's schedule, we loose the information about the class's room. (in database design, we split the table into two tables: Student-class and class-room. This is called putting the database in 3rd normal for.)
For the work that I do, we have extract files that have a lot of redundancy. For example, we have the credits and the class level; however, class level is determined by credits so we have an redundancy. Worse, the data may not be consistent. On the otherhand, it is easy to get at the data in a flat file since you do not have to rejoin all the split tables. There is always a trade off between good design and a design that people can understand and use.
Things to keep in mind:
A relation is defined a a subset of a Cartesian product. In two dimensions, this is A X A; in n dimensions, as we frequently see in tables, this is A X A X A X ... An. A relation may be a function if the in a unique mapping from a point a in the domain to a point b in the second set. (A table with n column is an Rn relationship.)
Like sets, relations have the operations of union, intersection, difference, complement, inverse and composite. These make up what is known a relational algebra.
A composite of a relationship is formed the same way that a composite of a set is: R2 =R o R.
Relations can be represented by a matrix. For example, in the relation on the sets A={1,2,3} and A={1,2,3} where R is define as R={(1,2),(2,3)}, the relation can be put in a matrix where the value of r(ij)=1 when the relation exists and r(ij)=0 when it does not:
| R | 1 | 2 | 3 |
| 1 | 0 | 1 | 0 |
| 2 | 0 | 0 | 1 |
| 3 | 0 | 0 | 0 |
A relation can also be define in terms of a graph and visa-versa.
This first graph above show that point 1 in the first set maps to point 2 in the second set, point 2 in the first set maps to point 3 in the second set.
In the second graph, It shows that point 1 maps to point 2 and point 2 maps to point 3. This demonstrates the transitive property: if you can go from 1 to 2 and 2 to 3, you can go from 1 to 3.
THIS IS THE INTERESTING POINT, if you take R2 or RoR, what
you will get is a matrix that says that in 2 sets, you can get from 1 to 3! This is the
key to using matrices as a way to solve network graph problems.
0 |
1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | ||
| 0 | 0 | 1 | X | 0 | 0 | 1 | = | 0 | 0 | 1 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
This shows that there is a path from 1 to 3 that uses 2 paths of length 1 each. This is a powerful technique since it allows us to represent graphs, relations and networks as matrix and then calculate the length of paths between vertices. It also lets us determine IF a path exists. This is know as transitive closure.
If there are n points and we find Rn, and r(ij)=1, then we know that the is a path from i to j in the graphs.
As we will see later with weighted graphs (those with a value associated with traveling from one vertex to another) where the elements in the matrix represents things like cost or distance, matrix multiplication of a graph gives us information about trips of length n. If we take this out to Rn, it gives us some information about the minimum distance/cost to get from one point to another. This is important in areas such a network design and solving problems in linear programming / operations research.
Thoughts on Partially Ordered Sets and Graphs.
The Notes on Relations contain some definitions and diagrams. What I want to do here is to jot down a few thoughts on how to look at things that I might not get a chance to say in class. if you look at the diagrams on pages430 (building a house) and 435 (writing a program), you will see two real examples of how Hasse Diagrams are used, how a topographical sort can be looked at and how how minimal, minimal, upper bounds, least upper bound, greatest elements etc can be looked at. In class, I am going to go over one of these two diagrams as a way to introduce the terminology. Below is a discussion of the homework assignment where you are going to develop a graph showing the order for taking CS courses. Not only is this a good example of what we are covering, but it will also help you to look at you course selection over a longer horizon. Just as an example, if you need to take Computer Calculus, you know that you have to take MATH115 then MATH 116 and the MATH160, so you need to get started several semesters before you plan to take MATH1160.
As a side note, I have worked out these charts with many students when I did advisement since only the most resent prereq is listed (This is a good example of the Hasse diagram's removal of the transitive rule: if MATH115 is needed for CS230 and CS230 is needed for CS260, then MATH115 is needed for CS260; however, in the catalog and in the Hasse diagram, this transitive relation is removed.)
The graphs show a partial ordering in that they show what steps must be done before other steps can be started and they also show what steps can be done in any order given that a previous step has been done. This is also similar to prerequisites for courses; you have to complete the prereq before you can take the next course and there may be situations in which you can take courses in any order once you have finished a particular course. For example, for this course, you had to finish CS230. This a lower bound for this course. In order to take CS230, you had to take MATH115 or equivalent . MATH115 or equivalent is also a lower bound since it has to be taken before this course. The greatest lower bound for this course is CS230 since if you took CS230 it implies that you also took MATH115 or equivalent . (That is one of the reasons that MATH115 is not listed on the syllabus.) You can take CS280, CS 240 and CS260 in any order after you finish CS230, so CS230 is the lower bound for these courses. CS241 has prereqs of CS260 and CS280. It is therefore an upper bound to these courses. There are some courses, such as CS441 and CS480 that have no courses that follow it. These are called maximal. There can be more than one maximal. A course like CS260 has a greatest lower bound (CS230) but several upper bounds (CS241, CS242, CS441 etc.). There is no least upper bound in this case. When you list out an order for taking Computer Science courses so that you always meet the prereqs, you have what is known an a total ordering from the partial ordering diagram. There may be more than one total ordering. Think about the fact that you can take CS230 then CS 240 then CS260 or CS230, CS260 then CS240. Since CS260 and CS240 have no relation in terms of order, they are said to be noncomparable. On the otherhand, CS230 and CS260 are comparable since there is a relation between them. If all courses are comparable, then they are said to be in total ordering. When you do this out, what you end up with is a Hasse Diagram in which you show the relation / ordering of courses to each other. When you get the total ordering after doing a topographical sort, you get an order to take courses. If you list out all the total ordering and look at when things are scheduled, you can determine you optimum sequence. If none of the sequences work out well, you may want to ask the Chair to reorder when courses are give or you may have to choose a less desirable sequence.
The diagram that we get when we map out all these relations is called a Hasse Diagram or in this case a PERT(Program Evaluation and Review Technique) diagram is we let the greatest element be graduation and the least element be entering college. (In a PERT diagram, you need to exclude nonessential courses, but you may need to keep a marker for an elective.) You may want to add MATH rereqs since this will make your diagram more complete. This is closely related to the CPM (Critical Path Method). In CPM, times or distances are associated with each path and the goal is to find the shortest time/distance that allows you to get from the start to the finish, with the constraint that steps that have to be done before other steps are done that way. It is also useful for identifying bottlenecks (where is may be a good idea to get additional help). It is also useful to do since it may show that a prereq is missing. Check here for some examples of graphs used as model of work activity schedules.
You are the visitor to this site.