MySQL table_share總結
來自專欄我知kernel
前言
在計算機領域,緩存的思想無處不在,這是由於磁碟和內存的存在巨大性能gap以及價格gap。因此,有頻繁訪問磁碟需要的系統在設計的時候,常常會考慮到緩存的思想,也即緩存管理,table_share也是在mysql上sql層針對錶的元數據進行的緩存管理。
什麼是table_share
首先看下table_share在mysql-5.7.20中的定義:
/** This structure is shared between different table objects. There is one instance of table share per one table in the database.*/struct TABLE_SHARE{ TABLE_SHARE() {} /* Remove gcc warning */ /** Category of this table. */ TABLE_CATEGORY table_category; /* hash of field names (contains pointers to elements of field array) */ HASH name_hash; /* hash of field names */ MEM_ROOT mem_root; TYPELIB keynames; /* Pointers to keynames */ TYPELIB fieldnames; /* Pointer to fieldnames */ TYPELIB *intervals; /* pointer to interval info */ mysql_mutex_t LOCK_ha_data; /* To protect access to ha_data */ TABLE_SHARE *next, **prev; /* Link to unused shares */ /** Array of table_cache_instances pointers to elements of table caches respresenting this table in each of Table_cache instances. Allocated along with the share itself in alloc_table_share(). Each element of the array is protected by Table_cache::m_lock in the corresponding Table_cache. False sharing should not be a problem in this case as elements of this array are supposed to be updated rarely. */ Table_cache_element **cache_element; /* The following is copied to each TABLE on OPEN */ Field **field; Field **found_next_number_field; KEY *key_info; /* data of keys defined for the table */ uint *blob_field; /* Index to blobs in Field arrray*/ uchar *default_values; /* row with default values */ LEX_STRING comment; /* Comment about table */ LEX_STRING compress; /* Compression algorithm */ LEX_STRING encrypt_type; /* encryption algorithm */ const CHARSET_INFO *table_charset; /* Default charset of string fields */ MY_BITMAP all_set; /* Key which is used for looking-up table in table cache and in the list of threads temporary tables. Has the form of: "database_name table_name " + optional part for temporary tables. Note that all three table_cache_key, db and table_name members must be set (and be non-zero) for tables in table cache. They also should correspond to each other. To ensure this one can use set_table_cache() methods. */ LEX_STRING table_cache_key; LEX_STRING db; /* Pointer to db */ LEX_STRING table_name; /* Table name (for open) */ LEX_STRING path; /* Path to .frm file (from datadir) */ LEX_STRING normalized_path; /* unpack_filename(path) */ LEX_STRING connect_string; /* Set of keys in use, implemented as a Bitmap. Excludes keys disabled by ALTER TABLE ... DISABLE KEYS. */ key_map keys_in_use; key_map keys_for_keyread; ha_rows min_rows, max_rows; /* create information */ ulong avg_row_length; /* create information */ /** TABLE_SHARE version, if changed the TABLE_SHARE must be reopened. NOTE: The TABLE_SHARE will not be reopened during LOCK TABLES in close_thread_tables!!! */ ulong version; ulong mysql_version; /* 0 if .frm is created before 5.0 */ ulong reclength; /* Recordlength */ ulong stored_rec_length; /* Stored record length (no generated-only generated fields) */ plugin_ref db_plugin; /* storage engine plugin */ inline handlerton *db_type() const /* table_type for handler */ { // DBUG_ASSERT(db_plugin); return db_plugin ? plugin_data<handlerton*>(db_plugin) : NULL; } enum row_type row_type; /* How rows are stored */ enum tmp_table_type tmp_table; uint ref_count; /* How many TABLE objects uses this */ uint key_block_size; /* create key_block_size, if used */ uint stats_sample_pages; /* number of pages to sample during stats estimation, if used, otherwise 0. */ enum_stats_auto_recalc stats_auto_recalc; /* Automatic recalc of stats. */ uint null_bytes, last_null_bit_pos; uint fields; /* Number of fields */ uint stored_fields; /* Number of stored fields (i.e. without generated-only ones) */ uint rec_buff_length; /* Size of table->record[] buffer */ uint keys; /* Number of keys defined for the table*/ uint key_parts; /* Number of key parts of all keys defined for the table */ uint max_key_length; /* Length of the longest key */ uint max_unique_length; /* Length of the longest unique key */ uint total_key_length; uint uniques; /* Number of UNIQUE index */ uint null_fields; /* number of null fields */ uint blob_fields; /* number of blob fields */ uint varchar_fields; /* number of varchar fields */ uint db_create_options; /* Create options from database */ uint db_options_in_use; /* Options in use */ uint db_record_offset; /* if HA_REC_IN_SEQ */ uint rowid_field_offset; /* Field_nr +1 to rowid field */ /* Primary key index number, used in TABLE::key_info[] */ uint primary_key; uint next_number_index; /* autoincrement key number */ uint next_number_key_offset; /* autoinc keypart offset in a key */ uint next_number_keypart; /* autoinc keypart number in a key */ uint error, open_errno, errarg; /* error from open_table_def() */ uint column_bitmap_size; uchar frm_version; uint vfields; /* Number of generated fields */ bool null_field_first; bool system; /* Set if system table (one record) */ bool crypted; /* If .frm file is crypted */ bool db_low_byte_first; /* Portable row format */ bool crashed; bool is_view; bool m_open_in_progress; /* True: alloced, false: def opened */ Table_id table_map_id; /* for row-based replication */ /* Cache for row-based replication table share checks that does not need to be repeated. Possible values are: -1 when cache value is not calculated yet, 0 when table *shall not* be replicated, 1 when table *may* be replicated. */ int cached_row_logging_check; /* Storage media to use for this table (unless another storage media has been specified on an individual column - in versions where that is supported) */ enum ha_storage_media default_storage_media; /* Name of the tablespace used for this table */ char *tablespace; /* filled in when reading from frm */ bool auto_partitioned; char *partition_info_str; uint partition_info_str_len; uint partition_info_buffer_size; handlerton *default_part_db_type; ...}
從上述代碼中可以看到,一個table實例對應一個TABLE_SHARE,而裡面存儲的內容主要來自於.frm文件(.frm是mysql sql層的表的元數據信息,主要存儲了表結構的定義),相當於.frm文件的緩存。同時,一個table的不同table object會share一個一個TABLE_SHARE。
這裡的table object對應著Table_cache類型,其定義如下:
/** Cache for open TABLE objects. The idea behind this cache is that most statements dont need to go to a central table definition cache to get a TABLE object and therefore dont need to lock LOCK_open mutex. Instead they only need to go to one Table_cache instance (the specific instance is determined by thread id) and only lock the mutex protecting this cache. DDL statements that need to remove all TABLE objects from all caches need to lock mutexes for all Table_cache instances, but they are rare. This significantly increases scalability in some scenarios.*/class Table_cache{private: /** The table cache lock protects the following data: 1) m_unused_tables list. 2) m_cache hash. 3) used_tables, free_tables lists in Table_cache_element objects in this cache. 4) m_table_count - total number of TABLE objects in this cache. 5) the element in TABLE_SHARE::cache_element[] array that corresponds to this cache, 6) in_use member in TABLE object. 7) Also ownership of mutexes for all caches are required to update the refresh_version and table_def_shutdown_in_progress variables and TABLE_SHARE::version member. The intention is that any query that finds a cached table object in its designated table cache should only need to lock this mutex instance and there should be no need to lock LOCK_open. LOCK_open is still required however to create and release TABLE objects. However most usage of the MySQL Server should be able to set the cache size big enough so that the majority of the queries only need to lock this mutex instance and not LOCK_open. */ mysql_mutex_t m_lock; /** The hash of Table_cache_element objects, each table/table share that has any TABLE object in the Table_cache has a Table_cache_element from which the list of free TABLE objects in this table cache AND the list of used TABLE objects in this table cache is stored. We use Table_cache_element::share::table_cache_key as key for this hash. */ HASH m_cache; /** List that contains all TABLE instances for tables in this particular table cache that are in not use by any thread. Recently used TABLE instances are appended to the end of the list. Thus the beginning of the list contains which have been least recently used. */ TABLE *m_unused_tables; /** Total number of TABLE instances for tables in this particular table cache (both in use by threads and not in use). This value summed over all table caches is accessible to users as Open_tables status variable. */ uint m_table_count;...}
對於每個連接線程,訪問一張表時,每個線程都有自己本地的table cache,這樣可以避免每次訪問全局的table definetion cache來獲得table的元數據,以免多線程並發訪問時產生鎖等待,但是缺點是增加了一定的內存開銷。另外,每個table cache元素,除了有sql層TABLE_SHARE的內容外,還有對應storage層的handler,不同存儲引擎對應的handler不同,表示該表在存儲層需要使用對應的存儲引擎進行操作。
TABLE_SHARE管理
TABLE_SHARE使用hash表進行快速查找,資料庫名和table名作為其key,對應的名字是table_def_cache,TABLE_SHARE有大小限制,由table_definition_cache決定,因此,TABLE_SHARE使用LRU進行管理,對應的鏈表名字時oldest_unused_share。
1、查找使用:當用戶有對錶t1的查詢請求時:1)在其本地的查找有無對應的table_cache元素,2)如果沒有,調用get_table_share則到全局的table definition cache中查找有無對應的TABLE_SHARE,3)如果還沒有,則從本地的t1表對應的t1.frm讀取對TABLE_SHARE進行初始化,並插入到全局的table_def_cache中,並增加其引用計數。4)讀取到TABLE_SHARE,對應的客戶線程會將其緩存在table_cache,方便下次訪問直接使用。注意:1)如果TABLE_SHARE在第1步中找到,並且其在oldest_unused_share中,則需要將其從oldest_unused_share移除;2)如果讀取TABLE_SHARE的個數超過了table_def_cache的大小,需要從oldest_unused_share鏈表的尾部將其它表TABLE_SHARE從table_def_cache中刪除。
2、刪除失效:當用戶進行DDL操作時,會改變表的定義(也即TABLE_SHARE中的版本號會改變),也即會修改.frm文件。因此,對於.frm文件的緩存TABLE_SHARE,也需要在DDL時進行失效。因此會調用release_table_share,將表的TABLE_SHARE從table_def_cache和oldest_unused_share中刪除。當然,有時候一些操作如close table會調用release_table_share,來減少表對應的TABLE_SHARE的引用計數,如果引用計數為0了,但是版本沒發生變化,則將其放入到TABLE_SHARE尾部,並且不會將其從table_def_cache中刪除。下次,如果在使用該表,則無需磁碟上讀取。
推薦閱讀:
※MySQL訓練——Self join@sqlzoo.net
※SQL Server 相比 MySQL 有何優勢?
※python 如何連同依賴打包發布以及python的構建工具?
※使用MySQL服務一般會遇到的疑惑
※memcached plugin(handlersocket),Memcached的一點看法
TAG:MySQL |