Author Archives: Shanoj

About Shanoj

Author : Shanoj is a Data engineer and solutions architect passionate about delivering business value and actionable insights through well-architected data products. He holds several certifications on AWS, Oracle, Apache, Google Cloud, Docker, Linux and focuses on data engineering and analysis using SQL, Python, BigData, RDBMS, Apache Spark, among other technologies. He has 17+ years of history working with various technologies in the Retail and BFS domains.

https://soundcloud.com/user-884996143

Leave a reply

AWS Data Lake Solution | A comprehensive & complete overview

Leave a reply

In-Place Querying in S3 based AWS Data Lake solutions

Leave a reply

What is AWS Global Infrastructure?

Leave a reply

Setting storage driver in docker

Leave a reply

Reference:

https://docs.docker.com/storage/storagedriver/select-storage-driver/

https://docs.docker.com/storage/storagedriver/

Linux distribution	Recommended storage drivers	Alternative drivers
Docker Engine – Community on Ubuntu	`overlay2` or `aufs` (for Ubuntu 14.04 running on kernel 3.13)	`overlay`¹, `devicemapper`², `zfs`, `vfs`
Docker Engine – Community on Debian	`overlay2` (Debian Stretch), `aufs` or `devicemapper` (older versions)	`overlay`¹, `vfs`
Docker Engine – Community on CentOS	`overlay2`	`overlay`¹, `devicemapper`², `zfs`, `vfs`
Docker Engine – Community on Fedora	`overlay2`	`overlay`¹, `devicemapper`², `zfs`, `vfs`

Get the current storage driver:

docker info

Set the storage driver explicitly using the daemon configuration file. This is the method that Docker recommends.

sudo vi /etc/docker/daemon.json

Add the details of storage driver in the daemon configuration file:

{
  "storage-driver": "devicemapper"
}

Restart Docker after editing the file.

sudo systemctl restart docker
sudo systemctl status docker

Installing Docker on CentOS

Leave a reply

Install required packages, these packages are pre-requsite for docker installation on CentOS:

sudo yum install -y device-mapper-persistent-data lvm2

Add the Docker CE repo:

sudo yum-config-manager \
    --add-repo \
    https://download.docker.com/linux/centos/docker-ce.repo

Install the Docker CE packages and containerd.io:

sudo yum install -y docker-ce-18.09.5 docker-ce-cli-18.09.5 containerd.io

Start and enable the Docker service:

sudo systemctl start docker
sudo systemctl enable docker

Add test_user to the docker group, giving the user permission to run docker commands:

sudo usermod -a -G docker test_user

Log out and back log in and test the installation by running a simple container:

docker run hello-world

I am very exited, my book is out for sale.

Leave a reply

https://www.amazon.com/dp/1688290648/ref=cm_sw_em_r_mt_dp_U_U8vBDb403T9FX

Looking for oracle RAC administration jobs?

This book will provide complete details on Oracle RAC administration interview questions and answers. This book helps you in cracking your interview & acquire your dream career as Oracle RAC Administrator. This book is a perfect companion to stand ahead above the rest in today’s competitive job market.

Sections to be discussed:
Basic to advance RAC administration interview Questions
RAC installation Questions
RAC Upgrade/Patching Questions
RAC Data Guard Configuration Questions
RAC troubleshooting Questions

390 Oracle RAC administration interview questions for getting hired as an Oracle Database RAC administration.

Using Kafka Connect to Capture Data from a Relational Database (sqlite3)

Leave a reply

Use any Kafka docker images to install and start kafka.

reference:

https://docs.confluent.io/current/connect/userguide.html
https://github.com/bitnami/bitnami-docker-kafka
https://docs.confluent.io/3.1.1/connect/connect-jdbc/docs/sink_connector.html

JDBC driver download for SQLlite3:
https://bitbucket.org/xerial/sqlite-jdbc/downloads/

Start Kafka.

confluent start

Install SQLite3.

apt-get update
apt-get install sqlite3

Create a New Database and Populate It with a Table and Some Data
Create a new database called “test.db”.

root@shanoj_srv1:/# sqlite3 test.db

Create a new table in the SQLite database called “accounts”.

CREATE TABLE accounts (id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, 
name VARCHAR (255));

Insert values into the table to begin populating it.

INSERT INTO accounts(name) VALUES('sabu');
INSERT INTO accounts(name) VALUES('ronnie');
.quit

Stop Kafka Connect.

confluent stop connect

Make necessary changes to below files:

root@shanoj_srv1:/# vi /etc/schema-registry/connect-avro-standalone.properties
bootstrap.servers=localhost:9092
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://localhost:8081
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost:8081

# The internal converter used for offsets and config data is configurable and must be specified,
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false

# Local storage file for offset data
offset.storage.file.filename=/tmp/connect.offsets

root@shanoj_srv1:/# vi etc/kafka-connect-jdbc/source-quickstart-sqlite.properties

# A simple example that copies all tables from a SQLite database. The first few settings are
# required for all connectors: a name, the connector class to run, and the maximum number of
# tasks to create:
name=test-source-sqlite-jdbc-autoincrement
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
tasks.max=1
# The remaining configs are specific to the JDBC source connector. In this example, we connect to a
# SQLite database stored in the file test.db, use and auto-incrementing column called 'id' to
# detect new rows as they are added, and output to topics prefixed with 'test-sqlite-jdbc-', e.g.
# a table called 'users' will be written to the topic 'test-sqlite-jdbc-users'.
connection.url=jdbc:sqlite:test.db
mode=incrementing
incrementing.column.name=id

Start Kafka Connect in standalone mode.

root@shanoj_srv1:/#connect-standalone -daemon /etc/schema-registry/connect-avro-standalone.properties /etc/kafka-connect-jdbc/source-quickstart-sqlite.properties

Verify that the connector was created.

root@shanoj_srv1:/# cat /logs/connectStandalone.out | grep -i "finished"
[2019-08-15 15:45:49,421] INFO Finished creating connector test-source-sqlite-jdbc-autoincrement (org.apache.kafka.connect.runtime.Worker:225)
[2019-08-15 15:45:49,504] INFO Source task WorkerSourceTask{id=test-source-sqlite-jdbc-autoincrement-0} finished initialization and start (org.apache.kafka.connect.runtime.WorkerSourceTask:143)
[2019-08-15 15:46:49,484] INFO Finished WorkerSourceTask{id=test-source-sqlite-jdbc-autoincrement-0} commitOffsets successfully in 6 ms (org.apache.kafka.connect.runtime.WorkerSourceTask:373)
root@shanoj_srv1:/# curl -s localhost:8083/connectors

Examine the Kafka topic created.

root@shanoj_srv1:/# kafka-topics --list --zookeeper localhost:2181 | grep test-sqlite-jdbc
test-sqlite-jdbc-accounts

Start a Kafka Consumer and Write New Data to the Database

Open a Kafka consumer.

root@shanoj_srv1:/# kafka-avro-console-consumer --new-consumer --bootstrap-server localhost:9092 --topic test-sqlite-jdbc-accounts --from-beginning

Open a new tab to a new terminal session.
Open a new shell in this session.

root@shanoj_srv1:/# sudo docker exec -it sqlite-test //bin//bash

Transfer to the tmp directory.

root@shanoj_srv1:/# cd /tmp

Access the SQLite database test.db.

root@shanoj_srv1:/# sqlite3 test.db

Insert a new value into the accounts table.

root@shanoj_srv1:/tmp# sqlite3 test.db
SQLite version 3.8.7.1 2014-10-29 13:59:56
Enter ".help" for usage hints.
sqlite> INSERT INTO accounts(name) VALUES('rama');
sqlite> INSERT INTO accounts(name) VALUES('lev');
sqlite> INSERT INTO accounts(name) VALUES('sriram');
sqlite> INSERT INTO accounts(name) VALUES('joby');
sqlite> INSERT INTO accounts(name) VALUES('shanoj');
sqlite>

Return to the previous session with the consumer and verify the data has been written.

root@ip-10-0-1-100:/# kafka-avro-console-consumer --new-consumer --bootstrap-server localhost:9092 --topic test-sqlite-jdbc-accounts --from-beginning
{"id":3,"name":{"string":"rama"}}
{"id":4,"name":{"string":"lev"}}
{"id":5,"name":{"string":"sriram"}}
{"id":6,"name":{"string":"joby"}}
{"id":7,"name":{"string":"shanoj"}}

Install and Configure PostgreSQL 9.x: RHEL/CentOS

Leave a reply

1.Download and install it using the appropriate package management

~ $ rpm -Uvh https://yum.postgresql.org/9.4/redhat/rhel-7-x86_64/pgdg-centos94-9.4-3.noarch.rpm
Retrieving https://yum.postgresql.org/9.4/redhat/rhel-7-x86_64/pgdg-centos94-9.4-3.noarch.rpm
warning: /var/tmp/rpm-tmp.IZow7N: Header V4 DSA/SHA1 Signature, key ID 442df0f8: NOKEY
Preparing... ################################# [100%]
Updating / installing...
1:pgdg-redhat-repo-42.0-4 ################################# [100%]

2. Applying any necessary updates.

[root@tcox6 ~]# yum update

3. Install the PostgreSQL 9.4 server and associated contribed modules and utilities. Once installed, run the database initialization routine before starting the database.

[root@tcox6 ~]# yum install postgresql94-server postgresql94-contrib

4. Enable the PostgreSQL 9.4 server to run on system start and then start the database server.

[root@tcox6 ~]# systemctl enable postgresql-9.4

ln -s '/usr/lib/systemd/system/postgresql-9.4.service' '/etc/systemd/system/multi-user.target.wants/postgresql-9.4.service'

[root@tcox6 ~]# systemctl start postgresql-9.4

5. Check to see if SELinux is being run in enforced mode on your system. If so, run the command to allow external HTTP DB connections to the server through SELinux configuration.

# cat /etc/selinux/config

# This file controls the state of SELinux on the system.

# SELINUX= can take one of these three values:

# enforcing - SELinux security policy is enforced.

# permissive - SELinux prints warnings instead of enforcing.

# disabled - No SELinux policy is loaded.

SELINUX=enforcing

# SELINUXTYPE= can take one of these two values:

# targeted - Targeted processes are protected,

# minimum - Modification of targeted policy. Only selected processes are protected.

# mls - Multi Level Security protection.

SELINUXTYPE=targeted

# setsebool -P httpd_can_network_connect_db 1

6. Login to the ‘postgres’ user and run the ‘psql’ command. Once at the database prompt, set a password for the ‘psql’ user.

[root@tcox6 ~]# su - postgres

Last login: Wed Sep 2 13:35:21 UTC 2015 on pts/0

-bash-4.2$ psql

psql (9.4.4)

Type "help" for help.

postgres=# \password postgres

Enter new password:

Enter it again:

postgres=# quit

postgres-# \q

-bash-4.2$ exit

logout

Oracle Exadata Interview Questions and Answers:

Leave a reply

1) What are the advantages of Exadata?

The Exadata cluster allows for consistent performance while allowing for increased throughput. As load increases on the cluster the performance remains consistent by utilizing inter-instance and intra-instance parallelism.

It should not be expected that just moving to Exadata will improve performance. In most cases it will especially if the current database host is overloaded.

2) What is the secret behind Exadata’s higher throughput?

Exadata ships less data through the pipes between the storage and the database nodes and other nodes in the RAC cluster.

Also it’s ability to do massive parallelism by running parallel processes across all the nodes in the cluster provides it much higher level of throughput.

It also has much bigger pipes in the cluster using Infiniband interconnect for inter-instance data block transfers as high as 5X of fiberchannel networks.

3) What are the key Hardware components?

DB Server

Storage Server Cells

High Speed Infiniband Switch

Cisco Switch
4) What are the Key Software Features?

Smart Scan,

Smart Flash Cache

Storage Index

Exadata Hybrid Columnar Compression (EHCC)

IORM (I/O Resource Manager)

5) What is a Cell and Grid Disk?

Cell and Grid Disk are a logical component of the physical Exadata storage. A cell or Exadata Storage server cell is a combination of Disk Drives put together to store user data. Each Cell Disk corresponds to a LUN (Logical Unit) which has been formatted by the Exadata Storage Server Software. Typically, each cell has 12 disk drives mapped to it.

Grid Disks are created on top of Cell Disks and are presented to Oracle ASM as ASM disks. Space is allocated in chunks from the outer tracks of the Cell disk and moving inwards. One can have multiple Grid Disks per Cell disk.
6) What is IORM?

IORM stands for I/O Resource Manager.

It manages the I/O demand based on the configuration, with the amount of resources available. It ensures that none of the I/O cells become oversubscribed with the I/O requests. This is achieved by managing the incoming requests at a consumer group level.

Using IORM, you can divide the I/O bandwidth between multiple databases.

To implement IORM resource groups, consumers and plans need to be created first.

7) What is hybrid columnar compression?
Hybrid Columnar compression, also called HCC, is a feature of Exadata which is used for compressing data at column level for a table.

It creates compression data units which consist of logical grouping of columns values typically having several data blocks in it. Each data block has data from columns for multiple rows.

This logarithm has the potential to reduce the storage used by the data and reduce disk I/O enhancing performance for the queries.

The different types of HCC compression include:

• Query Low

• Query High

• Archive High

• Archive Low

8) What is Flash cache?

Four 96G PCIe flash memory cards are present on each Exadata Storage Server cell which provide very fast access to the data stored on it.

This is further achieved by also provides mechanism to reduces data access latency by retrieving data from memory rather than having to access data from disk. A total flash storage of 384GB per cell is available on the Exadata appliance.

9) What is Smart Scan?

It is a feature of the Exadata Software which enhances the database performance many times over. It processes queries in an intelligent way, retrieving specific rows rather than the complete blocks.

It applies filtering criteria at the storage level based on the selection criteria specified in the query.

It also performs column projection which is a process of sending only required columns for the query back to the database host/instance.

10) What are the Parallelism instance parameter used in Exadata?
The parameter PARALLEL_FORCE_LOCAL can be specified at the session level for a particular job.

11) How do you Test performance of Exadata?

You can use the “calibrate” commands at the cellcli command line.

12)What are the ways to migrate onto Exadata?
Depending on the downtime allowed there are several options:

Oracle DataGuard

Traditional Export/Import

Tablespace transportation

Goldengate Replication after a data restore onto Exadata.

13) What types of operations does Exadata “offload”?

Some of the operations that are offloaded from the database host to the cell servers are:

Predicate filtering

Column project filtering

Join processing

Backups

14) What is cellcli?

This is the command line utility used to managed the cell storage.

15) How do you create obtain info on the Celldisks?

At the cellcli command line you can issue the “list celldisk” command.

16) How would you create a grid disk?

At the cellcli command you would need to issue the “create grididsk all ..” command.

16) What are the cellinit.ora and the cellip.ora files used for?

These files have the hostnames and the ip address of all the nodes in the cluster. They are used to run commands on remote database and cellserver nodes from a local host.

17) Which package can be used to estimate the compression ration of table?

DBMS_COMPRESSION

18) Background services of Cell Server

MS- Management Server

cellsrv – Cell Server

RS – Restart Server

19) How many disk comes with in a storage cell?

12

20) What is the purpose of spine switch?

Spine switch is used to connect or add more Exadata machine in the cluster

21) How to migrate database from normal setup to Exadata ?

There many methods we can use to migrate DB to Exadata. Below are some of them.

1. Export/Import

2. Physical Standby

3. Logical Standby

4. Transportable Tablespace

5. Transportable Database

6. Golden gate

7. RMAN cold and hot backup restoration

8. Oracle Streams

22) Can we use flash disk as ASM disk?

Yes

23) Which protocol used for communication between database server and storage server?

iDB protocol

24) which OS is supports in Exadata?

Database servers has two option for OS either Linux or Solaris which can be finalized at the time of configuration Cell storage comes with Linux only

25) What is ASR?

ASR is the tool to manage the Oracle hardware. Full form of ASR is Auto Service Request. Whenever any hardware fault occurs ASR automatically raise SR in Oracle Support and send notification to respective customer.

26) How to upgrade firmware of Exadata components?

It can be done through ILOM of DB or Cell server.

27) Where we can define which cell storage can be used by particular database server?

CELLIP.ORA file contains the list of storage server which is accessed by DB server.

28) What are the Exadata Health check tools?

1. Exacheck

2. sundiagtest

3. oswatcher

4. OEM 12c

29) What is EHCC?

EHCC is Exadata Hybrid Columnar Compression which is used to compress data in the Database.

30) What is offloading and how it works?

It refers to the fact that part of the traditional SQL processing done by the database can be “offloaded” from the database layer to the storage layer

The primary benefit of Offloading is the reduction in the volume of data that must be returned to the database server. This is one of the major bottlenecks of most large databases.

31) What is the difference between cellcli and dcli?

Cellcli can be used on respective cell storage only.

DCLi (Distributed command Line Utility) – DCLI can be used to replicate command on multipla storage as well as DB servers.

32) What is IORM and what is its role in Exadata?

IORM stand for I/O Resource Manager which manages the I/Os of multiple database on storage cell.

33) How we can check whether oracle best practice has been configured on Exadata?

We can execute Exacheck and verify the best practice setup on Exadata machine.

34) How many networks required in Exadata?

1. Public/Client Network — For Application Connectivity

2. Management Network — For Exadata H/W management

3. Private Network — For cluster inter connectivity and Storage connectivity

35) What is the command to enable query high compression on table?

SQL>alter table table_name move compress for query high;

36) How to take cell storage software backup?

It is not required to take a backup as it happens automatically. Exadata use internal USB drive called the Cellboot Flash Drive to take backup of software.

37) What is the difference between wright-through and write-back flashcache mode?

1. writethrough –> Falshcache will be used only for reading purpose

2. writeback –> Flashcache will be used for both reading and writing

38) Which feature of Exadata is used to eliminate disk IO?

Flash Cache

39) What is the capacity of Infiniband port ?

40 Gbps

40) What is the difference between high capacity and high performance disk?

1. High capacity disk comes with more storage space and less rpm (7.5k)

2. High Performance disk comes with less storage and high rpm (15k)

41) When one should execute Exacheck?

Before and after any configuration change in Database Machine

42) What is grid disk?

Grid Disks are created on top of Cell Disks and are presented to Oracle ASM as ASM disks.

Space is allocated in chunks from the outer tracks of the Cell disk and moving inwards. One can have multiple Grid Disks per Cell disk.

43) Which network is used for RAC inter-connectivity?

Infiniband Network

44) What is Smart Scan?

It is a feature of the Exadata Software which enhances the database performance many times over. It processes queries in an intelligent way, retrieving specific rows rather than the complete blocks. It applies filtering criteria at the storage level based on the selection criteria specified in the query. It also performs column projection which is a process of sending only required columns for the query back to the database host/instance.

45) What are the Parallelism instance parameter used in Exadata?

The parameter PARALLEL_FORCE_LOCAL can be specified at the session level for a particular job.

46) Which statistic can be used to check flash hit ration on database level?

Cell flash cache read hits

47) Which disk group is used to keep OCR files on Exadata?

+DBFS_DG

48) How many Exadata wait events contained in 11.2.0.3 release?

There are 53 wait events are exadata specific events.

49) What is the difference between DBRM and IORM?

DBRM is the feature of database while IORM is the feature of storage server software.

50) Which ASM parameters are responsible for Auto disk management in Exadata?

_AUTO_MANAGE_MAX_ONLINE_TRIES — It controls maximum number of attempts to make disk Online

_AUTO_MANAGE_EXADATA_DISKS — It control auto disk management feature

_AUTO_MANAGE_NUM_TRIES — It controls maximum number of attempt to perform an automatic operation

51) How to enable Flashcache compression?

CellCLI> ALTER CELL flashCacheCompress=true

52) How many Exadata Storage Server Nodes are included in Exadata Database Machine X4-8?

14 storage nodes

53) What is client or public network in exadata?

Client or public network is used to established connectivity between database and application.

54) What are the steps involved for initial Exadata configuration?

Initial network preparation

Configure Exadata servers

Configure Exadata software

Configure database hosts to use Exadata

Configure ASM and database instances

Configure ASM disk group for Exadata

55) What is iDB protocol?

iDB stands for intelligent database protocol. It is a network based protocol which is responsible to communicate between storage cell and database server.

56) What is LIBCELL?

Libcell stands for Library Cell which is linked with Oracle kernel. It allows oracle kernel to talk with the storage server via network based instead of operating system reads and writes.

57) Which packaged is used by compression adviser utility?

DBMS_COMPRESSION package

58) What is the primary goal of storage index?

Storage indexes are a feature unique to the Exadata Database Machine whose primary goal is to reduce the amount of I/O required to service I/O requests for Exadata Smart Scan.

59) What is smart scan offloading?

Offloading and Smart Scan are two terms that are used somewhat interchangeably. Exadata Smart

Scan offloads processing of queries from the database server to the storage server.

Processors on the Exadata Storage Server process the data on behalf of the database SQL query. Only the data requested in the query is returned to the database server.

60) What is checkip and what the use of it?

Checkip is the OS level script which contains IP address and hostname which will be used by Exadata in configuration phase. It checks network readiness like proper DNS configuration, it also checks there is no IP duplication in the network by pinging it which not supposed to ping initially.

61) Which script is used to reclaim the disk space of unused operating system?

For Linux: reclaimdisks.sh

For Solaris: reclaimdisks.pl

62) How database server communicates to storage cell?

Database server communicates with storage cell through infiniband network.

63) Can I have multiple celldisk for one grid disk?

No. Celldisk can have multiple griddisk but griddisk cannot have multiple celldisk

64) How many FMods available on each flash card?

Four FMods (Flash Modules) are available on each flash card.

65) What is smart flash log?

Smart flash log is a temporary storage area on Exadata smart flash cache to store redoes log data.

66) Which parameter is used to enable and disable the smart scan?

cell_offload_processing

67) How to check infiniband topology?

We can verify infiniband switch topology by executing verify-topology script from one of our database server.

68) Can we use HCC on non-exadata environment?

No, HCC is only available data stored on Exadata storage server.

69) What is resource plan?

It is collection of plan directives that determines how database resources are to be allocated.

70) What is DBFS?

DBFS stands for Database File system which can be built on ASM disk group using database tablespace.

71) What is the purpose of infiniband spine switch?

Spine switch is used to connect multiple exadata database machines.

72) What is offload block filtering?

Exadata storage server filters out the blocks that are not required for the incremental backup in progress so only the blocks that are required for the backup are sent to the database.

73) Which protocol used by ASR to send notification?

SNMP

74) Is manually intervance possible in storage index?

No

75) What are the options to update cell_flashcache for any object?

KEEP

DEFAULT

NONE

76) What is the default size of smart flash log?

512MB per module.

Each storage cell having 4 modules so its 4X512 MB per CELL

77) What is flash cache and how it works?

The flash cache is a hardware component configured in the exadata storage cell server which delivers high performance in read and write operations.

Primary task of smart flash cache is to hold frequently accessed data in flash cache so next time if same data required than physical read can be avoided by reading the data from flash cache.