Current Catalog Description

Design of large databases; normalization; query languages (including SQL); transaction-processing protocols; query optimization; performance tuning; distributed systems. Not available to students who have credit for CSE 241. Prerequisites: CSE 17

Instructor: Hank Korth (Spring 2022)

Textbook

Silberschatz, Korth, "Database Systems Concepts", 7th Ed., McGraw Hill, 2020 ISBN 978-0078022159

References

Optional papers in the database research literature pertaining to the history of database systems posted online.

COURSE OUTCOMES

Students will have:

  1. Skill in SQL at an advanced level, including complex queries, data definition, constraint specification, coding with PL/SQL (or similar if a non-Oracle system is used), and JAVA programming with JDBC
  2. Database design using the ER Model and using relational normalization. Mathematical foundations of data dependencies and their use in relational database design
  3. Understanding at a considerable level of algorithmic detail of the internal structure of a database system including indexing, query processing, transaction management, and fault tolerance for both centralized and distributed systems
  4. Experience in designing a database for anew enterprise, implementing a design for that database in a real database system, and construction user interfaces to the database via JDBC and/or PL/SQL (or similar)

RELATIONSHIP BETWEEN COURSE OUTCOMES AND STUDENT ENABLED CHARACTERISTICS

CSE 341 substantially supports the following student enabled characteristics:

A. An ability to apply knowledge of computing and mathematics appropriate to the discipline

J. An ability to apply mathematical foundations, algorithmic principles, and computer science theory in the modeling and design of computer-based systems in a way that demonstrates comprehension of the tradeoffs involved in design choices

 

Prerequisites by Topic:

  • CSE 017 (advanced JAVA and data structures)
  • Basic data structures: trees, lists, and graphs, etc.
  • Basic sorting algorithms
  • JAVA programming up to and including classes/objects
  • A higher general degree of computer science experience suitable to permit grasping of CSE 241 concepts at a faster pace to allow deeper converge of database system internals and the mathematical foundations of database design

Major Topics Covered in the Course

Note: The topic list below closely parallels that of CSE 241. The distinction in the courses is not mainly in the topics themselves but rather in the manner in which they are treated. CSE 341 includes more formal rigor (mathematical foundations and proofs), more language details (formal database languages details (formal database languages and query optimization) and more detailed coverage of how database systems interact with broader computing system functions such as those of the operating system

  • Relational algebra and calculus
  • SQL: schema definition, queries, nested queries, updates, complex join expressions, integrity constraints, authorization, index creation, transactions
  • Entity-relationship design
  • JBDC, SQL functions, SQL stored procedures, map-reduce
  • Triggers
  • Recursive queries
  • Ranking and OLAP queries
  • Relational database design:
    • normalization and decomposition, losslessness, dependency preservation
    • proofs of properties of relational designs
    • soundness and completeness of axioms for functional dependencies
    • higher normal forms and other forms of redundancy
  • Enterprise database implementation
    • RAID disks
    • buffer management
    • index structures (both B+ tree and extendible hashing)
  • Query processing algorithms (esp. join algorithms) and query optimization
  • Transaction management
    • ACID properties
    • Recovery
      • write-ahead log, fault tolerance
      • operation logging and ARIES-style recovery
    • Concurrency
      • Widely-used concepts: serializability, two-phase locking, snapshot isolation
      • Alternative approaches: timestamp ordering, validation
  • Parallel and distributed databases (includes cloud)
    • Two-and three-phase commit
    • Distributed deadlock
    • Distributed query optimization
  • Data mining and data warehouses
  • Tuning and benchmarking

Assessment Plan for the Course

The students are given several short homework assignments (8 to 10), a major project, one or two mid-term quizzes, and a final examination. Each homework covers a single topic. The project is an end-to-end design and implementation project in which students model part of an enterprises databases (e.g. product and sales data for a retailer), create the database using SQL (on Orable or PostgreSQL), populate it with data, and use JDBC to implement several simple user interfaces to the database (due to the set of prerequisite courses, I accept command-line interfaces, but often get Web-bases ones). The quizzes and exams have a varying number of questions many of which are aimed at a specific topic but others of which test the student's understanding of the relationship among the various components of a database system and its interaction with the rest of a computing environment. Ethical and social issues are discussed in the course and an exam question ties those issues to some aspect of the course I track the performance of the students on each homework assignment, various project checkpoints, and each question on the tests and final examination.

How Data in the Course are Used to Assess Program Outcomes: (unless adequately covered already in the assessment discussion under Criterion 4)

Each semester I include the above data from the assessment plan for the course in my self-assessment of the course. This report is reviewed, in turn, by the Curriculum Committee.