MySQL 优化:让你的数据尽可能的小

| No Comments | No TrackBacks

年初进行后台DB版本升级:mysql 4.1.x -> mysql 5.0。有的机器是直接重新安装MySQL,然后按照dump的数据重做slave,再进行表数据校验(maatkit),一切正常。

后来服务逐渐迁移到新的MySQL5的版本后,升级原有的MySQL4.1的slave时,采取了偷懒的一种办法,直接覆盖安装MySQL,数据库文件不变,结果导致性能很差。

后跟进查原因,原来发现MySQL5.x之后的InnoDB默认行存储格式有了较大变化,下面列的就是相关资料了:

 

7.4.2. Make Your Data as Small as Possible

One of the most basic optimizations is to design your tables to take as little space on the disk as possible. This can result in huge improvements because disk reads are faster, and smaller tables normally require less main memory while their contents are being actively processed during query execution. Indexing also is a lesser resource burden if done on smaller columns.

MySQL supports many different storage engines (table types) and row formats. For each table, you can decide which storage and indexing method to use. Choosing the proper table format for your application may give you a big performance gain. See Chapter 13, Storage Engines.

You can get better performance for a table and minimize storage space by using the techniques listed here:

  • Use the most efficient (smallest) data types possible. MySQL has many specialized types that save disk space and memory. For example, use the smaller integer types if possible to get smaller tables. MEDIUMINT is often a better choice than INT because a MEDIUMINT column uses 25% less space.

  • Declare columns to be NOT NULL if possible. It makes everything faster and you save one bit per column. If you really need NULL in your application, you should definitely use it. Just avoid having it on all columns by default.

  • For MyISAM tables, if you do not have any variable-length columns (VARCHAR, TEXT, or BLOB columns), a fixed-size row format is used. This is faster but unfortunately may waste some space. See Section 13.4.3, "MyISAM Table Storage Formats". You can hint that you want to have fixed length rows even if you have VARCHAR columns with the CREATE TABLE option ROW_FORMAT=FIXED.

  • InnoDB tables use a compact storage format. In versions of MySQL earlier than 5.0.3, InnoDB rows contain some redundant information, such as the number of columns and the length of each column, even for fixed-size columns. By default, tables are created in the compact format (ROW_FORMAT=COMPACT). If you wish to downgrade to older versions of MySQL, you can request the old format with ROW_FORMAT=REDUNDANT.

    The presence of the compact row format decreases row storage space by about 20% at the cost of increasing CPU use for some operations. If your workload is a typical one that is limited by cache hit rates and disk speed it is likely to be faster. If it is a rare case that is limited by CPU speed, it might be slower.

    The compact InnoDB format also changes how CHAR columns containing UTF-8 data are stored. With ROW_FORMAT=REDUNDANT, a UTF-8 CHAR(N) occupies 3 × N bytes, given that the maximum length of a UTF-8 encoded character is three bytes. Many languages can be written primarily using single-byte UTF-8 characters, so a fixed storage length often wastes space. With ROW_FORMAT=COMPACT format, InnoDB allocates a variable amount of storage in the range from N to 3 × N bytes for these columns by stripping trailing spaces if necessary. The minimum storage length is kept as N bytes to facilitate in-place updates in typical cases.

  • The primary index of a table should be as short as possible. This makes identification of each row easy and efficient.

  • Create only the indexes that you really need. Indexes are good for retrieval but bad when you need to store data quickly. If you access a table mostly by searching on a combination of columns, create an index on them. The first part of the index should be the column most used. If you always use many columns when selecting from the table, you should use the column with more duplicates first to obtain better compression of the index.

  • If it is very likely that a string column has a unique prefix on the first number of characters, it's better to index only this prefix, using MySQL's support for creating an index on the leftmost part of the column (see Section 12.1.7, "CREATE INDEX Syntax"). Shorter indexes are faster, not only because they require less disk space, but because they also give you more hits in the index cache, and thus fewer disk seeks. See Section 7.5.2, "Tuning Server Parameters".

  • In some circumstances, it can be beneficial to split into two a table that is scanned very often. This is especially true if it is a dynamic-format table and it is possible to use a smaller static format table that can be used to find the relevant rows when scanning the table.

 

7.4.2. 使你的数据尽可能小

最基本的优化之一是使表在磁盘上占据的空间尽可能小。这能给出巨大的改进,因为磁盘读入较快,并且在查询执行过程中小表的内容被处理时占用较少的主存储器。如果在更小的列上做索引,索引也占据较少的资源。

MySQL支持许多不同的存储引擎(表类型)和行格式。对于每个表,可以确定使用哪个存储引擎和索引方法。为应用程序选择合适的表格式可以大大提高性能。参见第15章:存储引擎和表类型

可以使用下面的技术可以使表的性能更好并且使存储空间最小:

  • 尽可能地使用最有效(最小)的数据类型。MySQL有很多节省磁盘空间和内存的专业化类型。
  • 尽可能使用较小的整数类型使表更小。例如,MEDIUMINT经常比INT好一些,因为MEDIUMINT列使用的空间要少25%
  • 如果可能,声明列为NOT NULL。它使任何事情更快而且每列可以节省一位。注意如果在应用程序中确实需要NULL,应该毫无疑问使用它,只是避免 默认地在所有列上有它。
  • 对于MyISAM表,如果没有任何变长列(VARCHARTEXTBLOB),使用固定尺寸的记录格式。这比较快但是不幸地可能会浪费一些空间。参见15.1.3节,"MyISAM表的存储格式"。即使你已经用CREATE选项让VARCHARROW_FORMAT=fixed,也可以提示想使用固定长度的行。
  • MySQL/InnoDB中,InnoDB表使用更紧凑的存储格式。在以前版本的MySQL中,InnoDB记录包含一些冗余信息,例如列数目和每个列的长度,即使对于固定大小的列。默认情况,创建的表为紧凑格式(ROW_FORMAT=COMPACT)。如果想要降级旧版本的MySQL/InnoDB,可以用ROW_FORMAT=REDUNDANT要求旧的格式。
  • 紧凑InnoDB格式也改变了包含UTF-8数据的CHAR列的保存方式。在ROW_FORMAT=REDUNDANT格式中UTF-8 CHAR(n)占用3*n字节UTF-8编码的字符的最大长度是3字节。许多语言可以主要用单字节UTF-8字符来编写,固定的存储长度通常会浪费空间。通过根据需要剥离尾部的空格,ROW_FORMAT=COMPACT格式为这些列分配可变数量的n..3*n字节。最小存储长度按顺序保存为n字节,以在典型情况下帮助更新。
  • 每张表的主索引应该尽可能短。这使一行的识别容易而有效。
  • 只创建你确实需要的索引。索引对检索有好处,但是当你需要快速存储东西时就变得糟糕。如果主要通过搜索列的组合来存取一个表,对它们做一个索引。第一个索引部分应该是最常用的列。如果从表中选择时总是使用许多列,应该首先以更多的副本使用列以获得更好的索引压缩。
  • 如果很可能一个索引在头几个字符上有唯一的前缀,仅仅索引该前缀比较好。MySQL支持对一个字符列的最左边部分创建一个索引(参见13.1.4节,"CREATE INDEX语法")。更短的索引会更快,不仅因为它们占较少的磁盘空间,而且因为它们将在索引缓存中提供更多的访问,因此磁盘搜索更少。参见7.5.2节,"调节服务器参数"

·         在一些情形下,将一个经常被扫描的表分割为2个表是有益的。特别是如果它是一个动态格式的表,并且可能使用一个扫描表时能用来找出相关行的较小静态格式的表。

 

13.5.13.4. Physical Row Structure

The physical record structure for InnoDB tables is dependent on the row format specified when the table was created. For MySQL 5.1, by default InnoDB uses the COMPACT format, but the REDUNDANT format is available to retain compatibility with older versions of MySQL.

Records in InnoDB ROW_FORMAT=REDUNDANT tables have the following characteristics:

  • Each index record contains a six-byte header. The header is used to link together consecutive records, and also in row-level locking.

  • Records in the clustered index contain fields for all user-defined columns. In addition, there is a six-byte field for the transaction ID and a seven-byte field for the roll pointer.

  • If no primary key was defined for a table, each clustered index record also contains a six-byte row ID field.

  • Each secondary index record contains also all the fields defined for the clustered index key.

  • A record contains also a pointer to each field of the record. If the total length of the fields in a record is less than 128 bytes, the pointer is one byte; otherwise, two bytes. The array of these pointers is called the record directory. The area where these pointers point is called the data part of the record.

  • Internally, InnoDB stores fixed-length character columns such as CHAR(10) in a fixed-length format. InnoDB truncates trailing spaces from VARCHAR columns.

  • An SQL NULL value reserves 1 or 2 bytes in the record directory. Besides that, an SQL NULL value reserves zero bytes in the data part of the record if stored in a variable length column. In a fixed-length column, it reserves the fixed length of the column in the data part of the record. The motivation behind reserving the fixed space for NULL values is that it enables an update of the column from NULL to a non-NULL value to be done in place without causing fragmentation of the index page.

Records in InnoDB ROW_FORMAT=COMPACT tables have the following characteristics:

  • Each index record contains a five-byte header that may be preceded by a variable-length header. The header is used to link together consecutive records, and also in row-level locking.

  • The record header contains a bit vector for indicating NULL columns. The bit vector occupies (n_nullable+7)/8 bytes. Columns that are NULL will not occupy other space than the bit in this vector.

  • For each non-NULL variable-length field, the record header contains the length of the column in one or two bytes. Two bytes will only be needed if part of the column is stored externally or the maximum length exceeds 255 bytes and the actual length exceeds 127 bytes.

  • The record header is followed by the data contents of the columns. Columns that are NULL are omitted.

  • Records in the clustered index contain fields for all user-defined columns. In addition, there is a six-byte field for the transaction ID and a seven-byte field for the roll pointer.

  • If no primary key was defined for a table, each clustered index record also contains a six-byte row ID field.

  • Each secondary index record contains also all the fields defined for the clustered index key.

  • Internally, InnoDB stores fixed-length, fixed-width character columns such as CHAR(10) in a fixed-length format. InnoDB truncates trailing spaces from VARCHAR columns.

  • Internally, InnoDB attempts to store UTF-8 CHAR(n) columns in n bytes by trimming trailing spaces. In ROW_FORMAT=REDUNDANT, such columns occupy 3*n bytes. The motivation behind reserving the minimum space n is that it in many cases enables an update of the column to be done in place without causing fragmentation of the index page.

The presence of the compact row format decreases row storage space by about 20% at the cost of increasing CPU use for some operations. If your workload is a typical one that is limited by cache hit rates and disk speed it is likely to be faster. If it is a rare case that is limited by CPU speed, it might be slower.

--EOF--

No TrackBacks

TrackBack URL: http://www.guduo.net/cgi-bin/mt/mt-tb.cgi/183

Leave a comment

Pages

May 2016

Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31        

About this Entry

This page contains a single entry by 谷多 published on April 13, 2009 11:52 PM.

香港产子 was the previous entry in this blog.

MySQL数据校验时需要注意的问题:mysql 4 和 mysql5对varchar的处理方式不同 is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.