cheatsheet2
cheatsheet2
SQL Databases
Views are named queries stored in the database. They simplify complex queries and enhance security by limiting data access.
Stored procedures are precompiled SQL code blocks for improved network traffic, centralized business logic, and database
security. They have drawbacks like portability issues and the need for specialized skills.
Stored functions resemble stored procedures but must return a value.
Triggers are automatically executed in response to events like insert, update, or delete operations on tables.
Indexing speeds up data retrieval by avoiding full table scans.
Partitioning divides a table into separate parts for improved query performance, data distribution, and efficient data
management.
Transactions ensure data consistency by grouping operations.
Isolation levels control how transactions interact, affecting data consistency and performance.
NoSQL Databases
NoSQL databases deviate from traditional relational database principles. They handle large volumes of data, frequent schema
changes, and distributed architectures.
NoSQL data models:
o Key-Value: Uses a hash table structure for accessing data with unique keys.
o Column-Wide: Organizes data in columns for efficient retrieval of data with many columns.
o Graph: Represents interconnected data using nodes and edges.
o Document: Stores data in flexible, self-describing documents, often in JSON or XML format.
MongoDB is a popular document database, offering features like flexible schemas, distributed architecture, and support for
various data types.
Sharding in MongoDB distributes data across multiple machines for improved scalability and performance.
Big Data
Big data is characterized by volume, variety, and velocity.
Big data architecture consists of data sources, big data storage, and big data analytics and reporting.
Big data storage methods:
o Distributed file systems like HDFS store data across multiple machines.
o Sharding partitions data across multiple databases.
o Key-value storage systems store large quantities of small records.
Big data processing techniques:
o MapReduce is a platform for reliable and scalable parallel computing.
o Spark supports faster computation through in-memory processing using Resilient Distributed Datasets (RDDs).
Streaming data arrives continuously and is handled through publish-subscribe systems like Apache Kafka.
Data analytics extracts meaningful patterns from data for prediction and decision-making.
Data warehousing centralizes data from multiple sources under a unified schema.
OLAP (Online Analytical Processing) enables interactive data analysis for summarization and viewing data in different ways.
Data mining automatically extracts useful patterns from large datasets.
These concepts and technologies are essential for managing and analyzing data in various contexts, from traditional applications to large-
scale, distributed systems.
1. Which one is not a data type in JSON?
- Date
2. The reason why JSON format is more popular than XML format
- Quickly parse
3. Which one is correct for data warehousing?
- It is designed to focus on subject areas
4. When performing a correlated subquery:
- The external query will be executed later
5. Which one is to check whether the data is null
- Is Null
6. When creating foreign keys, the purpose of the ON DELETE SET NULL option is:
- Allows deleting the parent table, keeping child table information, and modifying reference values
7. The problem can occur when a transaction reads data from another transaction that has not been committed:
- Dirty Read
8. In HDFS, the default number of replication nodes is:
- 3.0
9. The number of join types in SQL is
- 4.0
10. Which is not a data type in JSON
- Date
11. The operation will be performed when an exception occurs in a transaction
- ROLLBACK
12. In the CAP theorem, MongoDB can be classified into
- CP
13. In Kafka, messages are always received in order
- It depends
14. Can update the data on a View in the database
- Depends on the View creation command
15. In MySQL, the default root account can access the MySQL server from any computers
- Not allowed
16. Choose the first 3V definition of the Bigdata:
- Volume, Velocity, Variety
17. In MySQL trigger, the command to query the address data inserted to the customer table
- NEW.address
18. In the left join operation between the customers table and the orders table, the number of rows in the result is
- Number of rows in the customers table
19. The default isolation level for MySQL InnoDB is
- REPEATABLE READ
20. Can the GROUP BY clause contain more than 1 column
- Yes
21. The inner query is executed before executing the outer query
- Depends on the situation
22. The character C in the word ACID is the abbreviation of the word
- Consistency
23. The character A in the word ACID is the abbreviation of the word
- ATOMIC
24. Choose DBMS supporting wide column model
- Cassandra
25. In the inner join operation between the customers table and the orders table, the number of rows in the result is
- Depends on the data
26. Using subquery is better than using join
- Depends on the situation
27. What field is required in all MongoDB documents
- _id
28. The method in JavaScript to load JSON data
- JSON.parse()
29. In MongoDB, find the correct query that returns documents with age greater than 21
- db.citizens.find({age: {$gte: 21}})
30. Choose the best answer. Subqueries can be used in
- All of the above
31. The HAVING clause has the following purpose
- Filter data groups
32. To optimize the query in MySQL, use the time function instead of the BETWEEN operator in the WHERE condition
- No
33. There is a fulltext index on the description column of the products table. The query: Select * from products where description like
'%apple%'
- Don't use the index
34. Suppose there is a compound index on 2 columns in order: last_name and first_name, select the query that does not use the index
- Select * from Customers where firstname='le'
35. With the same condition, the result of an outer join always includes the result of an inner join
- Yes
36. Select the concept not used in the graph database model
- Row
37. Choose a warehouse data modeling method
- Star Model
38. In data warehousing, SCD stands for the phrase
- Slowly Changing Dimension
39. What is correct format for a JSON name/value pair?
- "name" : "value"
40. Select the correct query to sort names in a MongoDB's collection
- db.customers.find({}).sort({name:1})