Stage 7 : Inserting Records Into Relations (20 hours)
- Implement the record insertion functionality in NITCbase
- Implement the functions responsible for allocating and making use of previously unoccupied disk blocks
- Implement functions to handle runtime updates to the relation and attribute cache
Introduction
Inserting records is one of the most essential features of a DBMS. In this stage, we will implement all the methods required to start inserting tuples into an existing relation in NITCbase. Inserting a record into a relation primarily involves three main steps
- Iterate through the blocks of the relation and find a free slot or allocate a new block if all the blocks are full.
- Store the contents of the record in the appropriate slot in the block and update the slotmap.
- Update the relation catalog/cache to reflect the change in number of records. This entails a write-back of the updated cache values to the corresponding buffer of the catalog blocks.
A new record can be inserted using the INSERT INTO TABLE VALUES command which is handled by the Algebra Layer. The allocation of a new block is handled by the Buffer Layer.
Implementation
Allocating New Blocks
Recall that the BlockBuffer class is the parent of the RecBuffer class. Both BlockBuffer
and RecBuffer
classes have two constructors each (referred to as Constructor1 and Constructor2). Thus, put together, these two classes have four constructors.
When a disk block is allocated to a relation for the first time, Constructor1 of the RecBuffer
class must be invoked. This function has no arguments, and in turn invokes the Constructor1 of BlockBuffer
with argument 'R'
(indicating that the allocated block is to be used as a record block and not an index block. Allocation of index blocks will be discussed in later stages of the roadmap.). Constructor1 of BlockBuffer
will iterate through the block allocation map for an unallocated block and return a free block, if available.
The number of the allocated block will be stored in the private data field blockNum
of the object allocated. (IMPORTANT NOTE: If block allocation failed, appropriate error code will be written on to the blockNum
field. Care should be taken to check whether the allocation was successful before going ahead with accessing the block!)
Recall that we have already completed the implementation of RecBuffer::Constructor2
and BlockBuffer::Constructor2
in the previous stages. If a record block was already allocated to a relation using RecBuffer::Constructor1
in the past, then RecBuffer::Constructor2
is used for the subseqent access of the block. This constructor, as already seen, involves loading the block from the disk to the buffer (if not already present) and updating the data structures in the class StaticBuffer. Since the block number is already known in this case, it must be passed as an argument to the constructor. RecBuffer::Constructor2
in turn invokes the parent class constructor BlockBuffer::Constructor2
, passing the block number as an argument, for performing the required functionality.
NITCbase requires that the RecBuffer::Constructor1
be invoked only when both the conditions below hold true:
- a record is added into a relation (or when a new relation is created, resulting in an insertion into the catalogs)
- none of its already allocated blocks (if any) have a free slot to store the inserted record, implying that insertion cannot be performed without allocating a new block.
In all the previous stages, such insertion was performed only through the XFS Interface, and we never had to do a fresh block allocation for any relation. In this stage, we will be implementing RecBuffer::Constructor1
and BlockBuffer::Constructor1
to support record insertions into existing relations.
The methods relevant to adding this functionality are shown in the class diagram below.
NOTE: The functions are denoted with circles as follows.
🔵 -> methods that are already in their final state
🟢 -> methods that will attain their final state in this stage
🟠 -> methods that we will modify in this stage, and in subsequent stages
🟤 -> methods that we built earlier and require more work later, but will leave as is in this stage
An instance declared using constructor 1 will call the BlockBuffer::getFreeBlock()
method to get a free block. This method makes use of the setHeader()
method to set up the header in the newly acquired disk block and the setBlockType()
method to update the type of the block in the block allocation map.
Similar to other disk data structures, the block allocation map too is loaded into memory during the runtime of the database. In the StaticBuffer
file, we declare the blockAllocMap
member field and update our constructor and destructor to initialise and write-back the block alloc map between the disk and memory.
Buffer/StaticBuffer.cpp
// declare the blockAllocMap array
unsigned char StaticBuffer::blockAllocMap[DISK_BLOCKS];
StaticBuffer::StaticBuffer() {
// copy blockAllocMap blocks from disk to buffer (using readblock() of disk)
// blocks 0 to 3
/* initialise metainfo of all the buffer blocks with
dirty:false, free:true, timestamp:-1 and blockNum:-1
(you did this already)
*/
}
StaticBuffer::~StaticBuffer() {
// copy blockAllocMap blocks from buffer to disk(using writeblock() of disk)
/*iterate through all the buffer blocks,
write back blocks with metainfo as free:false,dirty:true
(you did this already)
*/
}
Buffer/BlockBuffer.cpp
Implement the following functions looking at their respective design docs
Handling Cache Updates
Recall that every relation catalog entry (and consequently every relation cache entry) contains a field numRecords
which stores the number of records that are part of the relation. After we insert a record, this value will have to be incremented. This far, we have only been reading from the caches. In this stage, we will implement write-back for the relation cache.
Similar to the buffer, each entry in the caches contain a dirty
field which stores whether that entry has been modified. If a cache entry is dirty, it will need to be written back (to the copy of the disk block in the buffer) either when the relation is closed (with the CLOSE TABLE command) or at system exit when all open relations are closed. The cache entry can be written back to the disk buffer using an instance of the RecBuffer
class. Our earlier implementation did not need to do this since we did not have any operations which modified the caches.
A class diagram indicating the relevant methods in the Cache Layer is shown below.
In earlier stages, we had implemented the RelCacheTable::getRelCatEntry()
function to get an entry from the relation cache. In this stage, we will implement it's counterpart RelCacheTable::setRelCatEntry()
which is how we update the values stored in the relation cache during runtime. We will also implement the RelCacheTable::relCatEntryToRecord()
function which converts from the struct RelCatEntry to a relation catalog entry record. This is useful when the relation is closed and the cache entries are written back to the buffer.
Cache/RelCacheTable.cpp
Implement the following functions looking at their respective design docs
Now, we modify our OpenRelTable::closeRel()
function to handle write-back for the relation cache.
Cache/OpenRelTable.cpp
int OpenRelTable::closeRel(int relId) {
// confirm that rel-id fits the following conditions
// 2 <=relId < MAX_OPEN
// does not correspond to a free slot
// (you have done this already)
/****** Releasing the Relation Cache entry of the relation ******/
if (/* RelCatEntry of the relId-th Relation Cache entry has been modified */)
{
/* Get the Relation Catalog entry from RelCacheTable::relCache
Then convert it to a record using RelCacheTable::relCatEntryToRecord(). */
// declaring an object of RecBuffer class to write back to the buffer
RecBuffer relCatBlock(recId.block);
// Write back to the buffer using relCatBlock.setRecord() with recId.slot
}
/****** Releasing the Attribute Cache entry of the relation ******/
// free the memory allocated in the attribute caches which was
// allocated in the OpenRelTable::openRel() function
// (because we are not modifying the attribute cache at this stage,
// write-back is not required. We will do it in subsequent
// stages when it becomes needed)
/****** Set the Open Relation Table entry of the relation as free ******/
// update `metainfo` to set `relId` as a free slot
return SUCCESS;
}
Inserting a Record
Now that we have our block allocation and cache write-back in place, we can implement the Algebra Layer and Block Access Layer functionalities required to insert a record in a block. We also need to implement the Buffer Layer methods to update the slotmap.
A sequence diagram documenting the call sequence for a record insertion is shown below along with a class diagram with the relevant classes.
Buffer Layer
As shown in the sequence diagram above, the Frontend User Interface will parse the INSERT INTO TABLE VALUES
command and call the Frontend::insert_into_table_values()
function in the Frontend Programming Interface. This call is then transferred along to the Algebra Layer. Hence, the implementation of the Frontend::insert_into_table_values()
function only involves a call to the Algebra::insert()
function.
Frontend/Frontend.cpp
int Frontend::insert_into_table_values(char relname[ATTR_SIZE], int attr_count, char attr_values[][ATTR_SIZE]) {
return Algebra::insert(relname, attr_count, attr_values);
}
The Algebra::insert()
function does some validation on the input and converts the user inputs, which are in the form of a string, and assigns it to an union Attribute type. It then calls the BlockAccess::insert()
function to handle the insertion of the record.
Algebra/Algebra.cpp
Implement the Algebra::insert()
function by looking at the design docs.
Before we implement the record insertion, we implement the Buffer Layer method to update the slotmap and a getter function for the BlockBuffer class to access the private blockNum
member of the class.
Buffer/BlockBuffer.cpp
Implement the following functions looking at their respective design docs
Now, we implement the insert functionality in the Block Access Layer. Inserting a record involves finding a free slot (or allocating a new block if none), and then setting the record in that slot followed by updating the slotmap and the catalog caches.
BlockAccess/BlockAccess.cpp
int BlockAccess::insert(int relId, Attribute *record) {
// get the relation catalog entry from relation cache
// ( use RelCacheTable::getRelCatEntry() of Cache Layer)
int blockNum = /* first record block of the relation (from the rel-cat entry)*/;
// rec_id will be used to store where the new record will be inserted
RecId rec_id = {-1, -1};
int numOfSlots = /* number of slots per record block */;
int numOfAttributes = /* number of attributes of the relation */;
int prevBlockNum = /* block number of the last element in the linked list = -1 */;
/*
Traversing the linked list of existing record blocks of the relation
until a free slot is found OR
until the end of the list is reached
*/
while (blockNum != -1) {
// create a RecBuffer object for blockNum (using appropriate constructor!)
// get header of block(blockNum) using RecBuffer::getHeader() function
// get slot map of block(blockNum) using RecBuffer::getSlotMap() function
// search for free slot in the block 'blockNum' and store it's rec-id in rec_id
// (Free slot can be found by iterating over the slot map of the block)
/* slot map stores SLOT_UNOCCUPIED if slot is free and
SLOT_OCCUPIED if slot is occupied) */
/* if a free slot is found, set rec_id and discontinue the traversal
of the linked list of record blocks (break from the loop) */
/* otherwise, continue to check the next block by updating the
block numbers as follows:
update prevBlockNum = blockNum
update blockNum = header.rblock (next element in the linked
list of record blocks)
*/
}
// if no free slot is found in existing record blocks (rec_id = {-1, -1})
{
// if relation is RELCAT, do not allocate any more blocks
// return E_MAXRELATIONS;
// Otherwise,
// get a new record block (using the appropriate RecBuffer constructor!)
// get the block number of the newly allocated block
// (use BlockBuffer::getBlockNum() function)
// let ret be the return value of getBlockNum() function call
if (ret == E_DISKFULL) {
return E_DISKFULL;
}
// Assign rec_id.block = new block number(i.e. ret) and rec_id.slot = 0
/*
set the header of the new record block such that it links with
existing record blocks of the relation
set the block's header as follows:
blockType: REC, pblock: -1
lblock
= -1 (if linked list of existing record blocks was empty
i.e this is the first insertion into the relation)
= prevBlockNum (otherwise),
rblock: -1, numEntries: 0,
numSlots: numOfSlots, numAttrs: numOfAttributes
(use BlockBuffer::setHeader() function)
*/
/*
set block's slot map with all slots marked as free
(i.e. store SLOT_UNOCCUPIED for all the entries)
(use RecBuffer::setSlotMap() function)
*/
// if prevBlockNum != -1
{
// create a RecBuffer object for prevBlockNum
// get the header of the block prevBlockNum and
// update the rblock field of the header to the new block
// number i.e. rec_id.block
// (use BlockBuffer::setHeader() function)
}
// else
{
// update first block field in the relation catalog entry to the
// new block (using RelCacheTable::setRelCatEntry() function)
}
// update last block field in the relation catalog entry to the
// new block (using RelCacheTable::setRelCatEntry() function)
}
// create a RecBuffer object for rec_id.block
// insert the record into rec_id'th slot using RecBuffer.setRecord())
/* update the slot map of the block by marking entry of the slot to
which record was inserted as occupied) */
// (ie store SLOT_OCCUPIED in free_slot'th entry of slot map)
// (use RecBuffer::getSlotMap() and RecBuffer::setSlotMap() functions)
// increment the numEntries field in the header of the block to
// which record was inserted
// (use BlockBuffer::getHeader() and BlockBuffer::setHeader() functions)
// Increment the number of records field in the relation cache entry for
// the relation. (use RelCacheTable::setRelCatEntry function)
return SUCCESS;
}
Your NITCbase should now support insertion of records into an existing relation using the INSERT INTO TABLE VALUES command. In addition, you can also make use of the INSERT INTO TABLE FROM FILE command. This command is used to populate a relation using values from a CSV file. The Frontend User Interface handles the file input and repeatedly calls the Frontend::insert_into_table_values()
function that you implemented.
While inserting the records, the number of records is updated in the relation cache. However, if we were to immediately run the SELECT command to read the values stored in the relation catalog, we will find that the #Records
field is not up to date.
This is because while the relation is open, all the updates made to the catalog relations are only present in the respective caches. It is only when the relation is closed that the changes are written to the buffer (and subsequently the disk). If we were to close the relation and then read the records of the relation catalog, we would see that the updated #Records
value is indeed reflected in the record.
Though this is an inconsistency in the values, all runtime operations only make use of the values in the caches and do not read from the disk. So, any unexpected behaviour is avoided.
Exercises
Q1. In an exercise of a previous stage, you created a relation Locations(name STR, capacity NUM)
. Insert the following values into the relation using the INSERT command in your NITCbase.
elhc, 300
nlhc, 300
eclc, 500
pits, 150
oat, 950
audi, 1000
Then, verify the insertion using the SELECT command that you implemented earlier. Additionally, run the SELECT command to read the records of the relation catalog and verify that the #Records
field matches the number of records you inserted as shown below.
SELECT * FROM Locations INTO null WHERE capacity>0;
CLOSE TABLE Locations; # required to update the relation catalog
SELECT * FROM RELATIONCAT INTO null WHERE RelName=Locations;
Q2. Using the INSERT INTO TABLE FROM FILE command, insert the values present here into the relation Events(id NUM, title STR, location STR)
. Then, use the SELECT command to verify that all 200 events were created.
Q3. Run the following commands in your NITCbase and ensure that you get the corresponding output for all the error conditions.
insert into Participants values (43, Ragam, ELHC) # Error: Mismatch in number of attributes
insert into Participants values (four, Ragam) # Error: Mismatch in attribute type
insert into RELATIONCAT values (test) # Error: This operation is not permitted