COSC5336 Database Management Systems

Fall  2012

under construction and subject to changes

The Basics of Relational Databases


  2    Data Models                                                      Chapter 2 
  3    The Relational Database Model                          Chapter 3 
  4    Entity Relationship (E-R) Modeling                     Chapter 4 
  5    Normalization of Database Tables                       Chapter 5 
  6    Structured Query Language (SQL)                     Chapter 6
10    Distributed Dababase Management Systems       Chapter 10 
12    The Data Warehouse                                         Chapter 12

Data Integration: Past, Present and Future (Data Integration Projects World-Wide)

Overview of Data Integration (slides-1, slides-2)


       GAV (global as view)

1.      S. Chawathe et al. The TSIMMIS project: Integration of Heterogeneous Information Sources. In Proc. 10th Meeting of the Information Processing Society of Japan, 1994.

2.      H. Garcia-Molina, Y. Papakonstantinou, D. Quass, A. Rajaraman, Y. Sagiv, J. Ullman, V. Vassalos, and J.Widom. The TSIMMIS approach to mediation: Data models and Languages. Journal of Information Systems, 1997.

3.      M. Roth and P. Schwarz. Don't Scrap it, Wrap it! A Wrapper Architecture for Legacy Data Sources. VLDB 1997. Slides (Presentation)

                  4.   L. Haas, D. Kossman, E. L. Wimmers, and J. Yang. Optimizing Queries across Diverse Data Sources. VLDB 1997. Slides (Presentation)

                  5.   A. Tomasic, L. Raschid, P. Valduriez. Scaling Heterogeneous Databases and the Design of Disco. In Int. Conf. on Distributed Computing

                        Systems, 1996. Slides (Presentation)

6.      S. Cluet, C. Delobel, J. Simeon, and K. Smaga. Your Mediators Need Data Conversion. In SIGMOD Conf. on Management of Data, 1998.

Slides (Presentation)

          LAV (local as view)


         Other Suggested reading:



1.      M. Friedman, A. Levy, and T. Millstein. Navigational Plans for Data Integration. In Proc. 16th National Conf. on AI, 1999. Slides



             BAV (both as view)

                1. P. McBrien and A. Poulovassilis. Data Integration by Bi-Directional Schema Transformation Rules. In ICDE 2003. Slides (Presentation)

                2. P. McBrien and A.Poulovassilis. Defining peer-to-peer data integration using both as view rules. In Proc. Workshop on Databases,

                         Information Systems and Peer-to-Peer Computing (at VLDB'03), 2003.


             Here is a very good survey on data integration.


Answering Queries Using Views Algorithms (Overview) (slides-1, slides-2, slides-3)

            Suggested reading:

                 1. A. Halevy. Answering Queries Using Views: A Survey, VLDB Journal, 2001.

                 2. O. M. Duschka, M. R. Genesereth, and A. Y. Levy. Recursive Query Plans for Data Integration. Journal of Logic Programming, 2000. Slides 


                  3. X. Qian. Query Folding. In ICDE,1996. Slides (Presentation)


Answering Queries Using Views Algorithms (Bucket Algorithm) (slides)

             Suggested reading:

                  1. A. Y. Levy, A. Rajaraman, and J. J. Ordille. Querying Heterogeneous Information Sources Using Source Descriptions. VLDB 1996.

                  2. A. Halevy, Answering Queries Using Views: A Survey, VLDB Journal, 2001.


Answering Queries Using Views Algorithms (MiniCon Algorithm) (slides)

              Suggested reading:

                   1. R. Pottinger and A. Halevy. MiniCon: A Scalable Algorithm for Answering Queries Using Views. VLDB Journal, 2000.

                   2. A. Halevy, Answering Queries Using Views: A Survey, VLDB Journal, 2001.


Limited Source Capabilities (slides)

           Suggested reading:

                  1. R. Yerneni, C. Li, H. Garcia-Molina, and J. D. Ullman. Computing Capabilities of Mediators. SIGMOD Conference 1999.

                  2. V. Vassalos, Y. Papakonstantinou. Describing and Using Query Capabilities of Heterogeneous Sources. VLDB 1997. Slides (Presentation)

                  3. Y. Papakonstantinou, A. Gupta, H. Garcia-MoIina, and J. UIIman. A query translation scheme for the rapid implementation of wrappers. In Proc.

                      DOOD Conf., 1995. Slides (Presentation)

4.      R. Rajaraman, Y. Sagiv, and J. Ullman. Answering Queries using Templates with Binding Patterns.

      In Proc. PODS Conf., 1995. Slides (Presentation)

                  5. H. Garcia-Molina, W. Labio, and R. Yerneni. Capability-Sensitive Query Processing on Internet Sources, ICDE 1999. Slides (Presentation)


XML Data Model / Document Type Definition (DTD) (slides)


XPath / XQuery (slides-1, slides-2)


XML-Based Data Integration (slides)

        Suggested reading:

                1. B. Ludäscher, Y. Papakonstantinou, and P. Velikhov. Navigation-Driven Evaluation of Virtual Mediated Views. In EDBT 2000.

          2. Y. Papakonstantinou and V. Vassalos. Architecture and Implementation of an XQuery-based Information Integration Platform.

               In IEEE Data Eng. Bull. 25(1), 2002.

                3. Y. Papakonstantinou, V. R. Borkar, etc. XML Queries and Algebra in the Enosys Integration Platform. In Data Knowl. Eng. 44(3), 2003.

                4. I. Manolescu, D. Florescu, and D. Kossmann. Answering XML Queries on Heterogeneous Data Sources. VLDB 2001. Slides

                5. S. Madria, K. Passi, and S. Bhowmick. An XML schema integration and query mechanism system. In Data Knowl. Eng. 65(2), 2008.



Schemaless Data Integration (slides)

          Suggested reading:

                  1. E. Rahm, A. Thor, D. Aumueller, etc. IFuice-Information Fusion utilizing Instance Correspondence and Peer Mappings.  In WebDB 2005.

            2. T. Kirsten and E. Rahm. BioFuice: Mapping-Based Data Integration in Bioinformatics. In 3rd Int. Workshop on Data Integration in the Life Sciences

                      (DILS), 2006. (Slides)


Automatic Schema Matching and Data Cleaning (slides-1, slides-2)

Suggested reading:

1.            E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. VLDB J. 10(4): 334-350 (2001)

2.            E. Rahm and H. Do: Data Cleaning: Problems and Current Approaches. IEEE Data Eng. Bull. 23(4): 3-13 (2000)

3.            J. Madhavan, P. A. Bernstein, and E. Rahm. Generic Schema Matching with Cupid. VLDB 2001.

4.            H. H. Do and E. Rahm. COMA - A System for Flexible Combination of Schema Matching Approaches. VLDB 2002. (Slides)

5.            P. A. Bernstein, S. Melnik, M. Petropoulos, and C. Quix. Industrial-Strength Schema Matching. SIGMOD Record 33(4), 2004.

6.            P. Shvaiko and J. Euzenat. A survey of schema-based matching approaches. Journal of Data Semantics, 4:146-171, Dec. 2005. (Slides)

7.            A. Doan and A. Halevy. Semantic integration research in the database community: a brief survey. AI Magazine, 2005.

8.            A. Gal. Why is schema matching tough and what can we do about it. SIGMOD Record 35(4), 2006.

9.            A. Aboulnaga and K. E. Gebaly. MuBE: User guided source selection and schema mediation for Internet scale data integration. ICDE 2007.

10.        B. He and K. Chang. Statistical Schema Matching across Web Query Interfaces. In SIGMOD Conference 2003.


Mediated Schema Generation and Normalization

1.      A. Radwan, L. Popa, L. Stanoi, and A. Younis. Top-k generation of integrated schemas based on directed and weighted correspondences. In SIGMOD Conference, 2009.

2.      G. Gottlob, R. Pichler, and V. Savenkov. Normalization and optimization of schema mappings. In VLDB Conference, 2009.

3.      A. D. Sarma, X. Dong, andd A. Halevy. Bootstrapping pay-as-you-go data integration systems. In SIGMOD Conference, 2008.

4.      L. Chiticariu, P. Kolaitis, and L. Popa. Interactive generation of integrated schemas. In SIGMOD Conference, 2008.

5.      R. Pottinger and P. Bernstein. Schema merging and mapping creation for relational sources. In EDBT Conference, 2008.

6.      R. McCann, A. Doan, V. Varadaran, A. Kramnik, and C. Zhai. Building data integration systems: A mass collaboration approach. In WebDB 2003.


Web wrappers and form matching (slides)

Suggested reading:

1.      H. F. Laender, B. A. Ribeiro-Neto, A. S. da Silva, and J. S. Teixeira. A Brief Survey of Web Data Extraction Tools. SIGMOD Record 31(2): 84-93 (2002)  Juliana’s notes (?)

2.       V. Crescenzi, G. Mecca, and  M. P. Merialdo. RoadRunner: Towards Automatic Data Extraction from Large Web Sites. VLDB 2001. And if you’d like to play with the system, you can download it from


Peer-To-Peer Data Integration: (Slides, OWL)

Suggested reading:



1.      W. Nejdl, W. Siberski, and M. Sintek. EDUTELLA: A P2P Networking Infrastructure Based on RDF, in WWW 2002.

2.      W. Nejdl, W. Siberski, and M. Sintek. Design Issues and Challenges for RDF- and Schema-Based Peer-To-Peer Systems. SIGMOD Record 32(3): 41-46, September 2003.

3.      W. Nejdl, B. Wolf, S. Staab, and J. Tane. EDUTELLA: Searching and Annotating Resources within an RDF-based P2P Network. In Semantic Web Workshop, 2002.


PeerDB (Presenter: Michael Sheets):

1.      Chin Ooi, Y. Shu, and K. L. Tan. Relational data sharing in peer-based data management systems. SIGMOD Record, 32(3), 2003.

2.      W. Siong Ng, B. Chin Ooi, K. L. Tan, and A. Ying Zhou. Peerdb: A p2p-based system for distributed data sharing. In International Conference 

                        On  Data Engineering (ICDE), 2003.


Hyperion (Presenter: Kien Tran):

1.      A. Kementsietsidis, M. Arenas, and R. J. Miller. Mapping data in peer-to peer systems: Semantics and algorithmic issues. In ACM SIGMOD, 2003.

2.      M. Arenas, V. Kantere, A. Kementsietsidis, I. Kiringa, R. J. Miller, and J. Mylopoulos. The hyperion project: From data integration to data  coordination. SIGMOD Record, 32(3), 2003.



Piazza (Presenter: Hayford Osei):

1.      A. Halevy, Z. Ives, P. Mork, and I. Tatarinov. Piazza: Data Management Infrastructure for Semantic Web Applications. In WWW 2003.

2.      A. Halevy and Z. Ives and D. Suciu and I. Tatarinov. Schema mediation in peer data management systems. In ICDE 2003.

3.      A. Halevy and Z. Ives and D. Suciu and I. Tatarinov. Schema mediation for large-scale semantic data sharing. VLDB J. 14:68-83, 2005.


coDB (Presenter: Himabindu Katangur):

1.      E. Franconi, G. Kuper, A. Lopatenko, I. Zaihrayeu. The coDB Robust Peer to Peer Database System, The Second Workshop on Semantics in Peer-to-Peer and Grid Computing, 2004. (Slides)

2.      E. Franconi, G. Kuper, A. Lopatenko, and L. Serafini. A Robust Logical and Computational Characterisation of Peer-to-Peer Database Systems, in International Workshop On Databases, Information Systems and Peer-to-Peer Computing, 2003.  (Slides)


SomeWhere (Presenter: gangaprasad gujjari):

1.      P. Adjiman, P. Chatalic, F. Goasdoué, M. Rousset, and L. Simon. Distributed Reasoning in a Peer-to-Peer Setting: Application to the Semantic Web. In Journal of Artificial Intelligence Research, Vol. 25, pages 269-314, 2006. (Slides)


                  Semantic Gossiping (Presenter: Anil Nalluri):

1.      K. Aberer, P. Cudre-Mauroux, and M. Hauswirth. A Framework for Semantic Gossiping. SIGOMD RECORD, 31(4), 2002. (Slides)




1.      D. Alvarez, A. Smukler, and A. Vaisman. Peer-To-Peer Databases for e-Science: A Biodiversity Case Study. In Brazilian Symposium in Databases, 2005.

2.      P. Bernstein, F. Giunchiglia, A. Kementsietsidis, J. Mylopoulos, L. Serafini, and I. Zaihrayeu. Data Management for Peer-To-Peer Computing: A Vision. In ACM SIGMOD WebDB Workshop, 2002.

3.      L. Serafini, F. Giunchiglia, F. Mylopoulos, and P. Bernstein. Local Relational Model: A Logical Formalization Of Database. In CONTEX 2003.

4.      D. Calvanese, E. Damaggio, G. D. Giacomo, M. Lenzerini, and R. Rosati. Semantic Data Integration in P2P Systems, in International Workshop On Databases, Information Systems and Peer-to-Peer Computing, 2003.  (Slides)


SRB & iRORDS (SRB overview,  SRB Core Technology, iRODS Overview)


Dataspace (tutorial)

  1. M. Franklin, A. Halevy, and D. Maier. From Databases to Dataspaces: A New Abstraction for Information Management. In ACM SIGMOD Record, 2005.  (Slides)      
  2. X. Dong, A. Y. Halevy, and C. Yu: Data Integration with Uncertainties. In VLDB, 2007. (SLIDE)


Personal data management system  

  1. M. Salles, J. Dittrich, etc. iTrails: Pay-as-you-go Information Integration in Dataspaces, VLDB 2007. (slides)