UFS Overview
UFS Overview
Devices
● External cards
● Embedded packages
● Future expansion devices such as I/O devices, cameras, and wireless
It supports data transfer with applications run on mobile phones, mobile PCs,
digital cameras, portable media players, MP3 players, and any other device that
needs mass storage and external cards.
Performance
Topology
Currently, only one device per controller is supported. The capability for multiple
devices on a single controller is planned for a future design.
Power supply
UFS uses three power supplies, including a separate power supply for I/O and
core.
Two signaling schemes are supported – Low-Speed mode with PWM signaling
and High-Speed Burst mode. Multiple gears are defined for both Low-Speed and
High-Speed modes. Signals use an 8b10b line coding and have a high reliability
– BER under 10(exp)-10.
UFS architecture overview
Application layer
● Task manager – Manager handles command queue control tasks, for
example, abort.
● UCS – Handles commands such as read and write. It implements the SCSI
command set as baseline protocol and can add the UFS native command
set to any extended UFS functionalities.
Device manager
The device manager handles device operation and configuration tasks. Device
operation is the processing of commands, such as sleep, suspend, power down,
and other power specific operations. Device configuration is the process of
maintaining and storing descriptors that are used to query, read, and modify
device configurations.
● UDM SAP – Through this SAP, the UFS transfer protocol (UTP) layer
allows the device manager to handle device level operations and
configurations, for example, query requests for a descriptor.
● UIO SAP – The UFS interconnected (UIC) layer exposes itself through this
SAP to trigger a reset of the UIC layer and request a reset from the host.
UTP
The UTP layer generates UFS protocol information units (UPIU) to transfer
messages from the device manager or application layer. UPIUs are transferred
from the host-side UTP to the device‑side UTP.
UTP supports three SAPs:
UIC
The UIC layer is the lowest layer in the UFS architecture. It handles transport
tasks and comprises the MIPI Unified Protocol (UniPro) and MIPI M-PHY
sublayers.
● UIC_SAP – Transports UPIU from the UFS host to the UFS device
● UIO_SAP – Transports queries and control of device management
MIPI UniPro
The MIPI M-PHY sublayer is the physical layer. Each lane within this layer
consists of a pair of differential signals—a Tx lane and Rx lane. A device may
support multiple lanes, and the number of lanes are always a multiple of 2.
MIPI M-PHY
The MIPI M-PHY sublayer is the physical layer. Each lane within this layer
consists of a pair of differential signals—a Tx lane and Rx lane. A device may
support multiple lanes, and the number of lanes are always a multiple of 2.
DIN_c
DIN_c
RST_n Input Reset; UFS device hardware reset signal
The UTP layer uses SAM-5, a client-server model used as a general architecture
for task management. In this model, clients are referred to as the host. The target
is a logical unit (LUN) located in the UFS device on the server.
UPIUs are used to communicate task information between the host and the
target. A task is a command or a sequence of commands/requests that perform a
service. The host provides a task tag that is used for multiple messages
executing the same task. LUNs service the tasks. Each LUN has a task queue
that may have more than one task, but each LUN can only service one task at a
time. Different LUNs can service the same tasks simultaneously.
● Command UPIU
● Data out UPIU (optional)
● Data in UPIU (optional)
● Response UPIU
Data pacing
When the host provides a write command, the device can pace the data out
phase by sending the ready to transfer UPIU to the target. This UPIU also
contains an embedded transfer context that can be used to initiate a DMA
transfer on a per packet basis at the initiating device.
Code Description
NOP out Acts as a ping from the host to the target; can be used to check for
a connection path to the device.
Command Originates in the host and is sent to a LUN within a target device;
contains a command descriptor block (CDB) and command
parameters. Represents the command phase.
Response Originates in the target and is sent back to the host; contains a
command-specific operation status and other response information.
Represents the status phase.
Data out Originates in the host and sends data from the host to the target
device; represents the data out phase.
Data in Originates in the target and sends data from the target back to the
host; represents the data in phase.
Task management request Carries the SAM task management function from the host to the
(TMR) target; standard functions are defined by the SAM-5 specification.
Additional functions defined by UFS.
Task management Carries the SAM task management function from the target back to
response the host.
Ready to transfer Indicates that the target has sufficient buffer space and is ready to
receive the next data out UPIU; the target can send multiple ready
to transfer UPIU if it has the buffer space receive multiple data out
UPIU packets. The maximum data buffer size is negotiated by the
host and target during enumeration and configuration. This UPIU
contains a DMA context and can be used to set up and trigger a
DMA action within a host controller.
Query request Originates in the host and is used to request descriptor information
data from the target; this transaction is defined outside of the
command and task management functions and is defined
exclusively by UFS.
Query response Originates in the target and sends descriptor information data back
to the host; this transaction is defined outside of the command and
task management functions and is defined exclusively by UFS.
Reject Originates in the target and is sent to the host; generated when the
target is unable to interpret or execute a UPIU. Indicates an error in
the field values of the received UPIU.
LUNs
LUNs are independent external processing units that process SCSI commands
and perform task management. A UFS device may have up to eight independent
LUNs. A LUN contains the following:
UFS device also contains well-known LUNs. These LUNs only support a limited
number of commands by which an application client can send a request or
receive a response that would contain information for the entire device.
Figure : LUN
Well-know LUN W-LUN LUN field in UPIU Command name
Report LUNs 01h 81h Inquiry, request sense, test unit ready,
start stop unit
UFS device 50h D0h Inquiry, request sense, test unit ready,
start stop unit
The SCSI write command sequence comprises the command phase, data phase,
and response phase. A write command is initiated from the host using the
command UPIU. The device sends the ready to transfer UPUI when it is ready to
receive data. The host sends the data using the data out UPIU. The transfer
terminates when the device that contains the status of the write command sends
the response UPIU.
The SCSI read command sequence initiates from the host. It first sends a read
command with the command UPIU, then sends the data in UPIU. It completes by
sending the status from the card with the response UPIU.
Figure : SCSI read sequence
The MIPI UniPro is a sublayer of the UIC layer that provides basic transfer
capabilities to the UTP layer. The data plane between UFS and UniPro uses the
UniPro transfer layer CPorts for data transfer. The command plane uses DME
service primitives. UniPro is composed of several layers defined in the MIPI
specification; however, from a UFS perspective, all the layers are considered one
black box.
Primitive Description
T_CO_DATA_req (message fragment, EOM) Issued by an application, this primitive tells
UniPro to send a message.
T_CO_DATA.ind (message fragment, EOM, SOM, Issued by UniPro, this primitive delivers a
MsgStatus) received message to the service. EOM
indicates it should be the last fragment. SOM
denotes the start of a message.
UniPro is configured and controlled via the control plane using the DME
primitives.
Primitive Description
The UFS host controller is responsible for managing the interface between the
host software and the UFS device. It has a wrapper that contains the following:
● UFS UTP controller (QTI specific) – Responsible for UTP layer functionality
of the UFS controller
● UniPro controller (third party) – Responsible for UIC layer functionality of
the UFS controller
● AHB interface – Used for software access and for programming and
initializing the controller; it connects as an AHB slave on the config NOC of
the chip operating at 75 MHz.
● AXI master interface – Used for data transfer to and from the system
memory; it connects as an AXI master on the system NOC operating at
200 MHz.
● Receives clocks from the global clock controller (GCC) of the chip.
● Maintains a dedicated interrupt output line that is connected to apps QGIC.
● Connects to M-PHY via standard Reference M-PHY Module Interface
(RMMI).
● M-PHY has a differential serial interface to the UFS device. This works as
a dual simplex Tx/Rx interface with two lanes in each direction.
UFS controller wrapper
The UTP controller implements a standard Host Controller Interface (HCI) for
hardware and software. For outbound transactions, the UTP controller:
● Receives inbound UPIU transactions from UniPro controller via the CPort
● Moves the received data to buffers in the system memory specified
through HCI
Register Description
layer errors.
device.
Host controller enable (HCE) When bit 0 is set to 1 by the software, the host
controller is enabled. This causes the host
controller and UniPro controller to reset. After
the host controller initialization process is
completed, the host controller sets the HCE
register to 1. When it is set to 0, the host
controller shuts down the UIC layer (UniPro
and PHY) and the device attached before
disabling.
UTP TMR register The UTP TMR list is an array that manages
and contains up to 8 UTMRDs.
HCI TMR FIFO The TMR FIFO is a logic that ensures that
TMRs are executed in a first-in-first-out
manner. Each time the doorbell register bit is
marked, the bit index is marked in the TMR
FIFO to maintain the sequence in which these
bits are written.
IAG The host controller supports IAG, a process in
which a single command completion interrupt is
generated for a predefined number of
command completions. IAG is only supported
for the TR list; it is not supported for TMR list.
interrupt aggregation.
The UFS host driver allocates and uses TR descriptors and TMR descriptors to
communicate with the host controller hardware. The host controller system
memory has the capability to receive up to 32 TR descriptors and 8 TM
descriptors simultaneously.
Each TR descriptor points to the command UPIU (sent from SCSI), allocates a
place holder for the response UPIU, and contains a physical region descriptor
table (PRDT) that points to the data buffer that needs to be transferred or the
memory address for the data that would be received during the read command.
The data structures represent the different system memory descriptors that the
device driver uses to communicate with the host controller as defined by the
specification. The corresponding data structures used in the UFSHC driver are
also described.
/* DW 0-3 */
__le32 command_desc_base_addr_lo;
__le32 command_desc_base_addr_hi;
/* DW 6 */
__le16 response_upiu_length;
__le16 response_upiu_offset;
/* DW 7 */
__le16 prd_table_length;
__le16 prd_table_offset;
};
A UTP command descriptor contains the UPIU for the command, offset and
length of the data buffer associated with the command, and offset and length of
the PRDT.
struct utp_transfer_cmd_desc {
u8 command_upiu[ALIGNED_UPIU_SIZE];
u8 response_upiu[ALIGNED_UPIU_SIZE];
};
struct utp_task_req_desc {
/* DW 0-3 */
/* DW 4-11 */
__le32 task_req_upiu[TASK_REQ_UPIU_SIZE_DWORDS];
/* DW 12-19 */
__le32 task_rsp_upiu[TASK_RSP_UPIU_SIZE_DWORDS];
};
Device tree
The device tree is a data structure that contains properties describing the
hardware. The device tree allows the hardware properties to be referenced
without being hardcoded in the device driver. It makes the code generic and
allows for the use of different hardware without changing drivers. The UFS device
tree is located at kernel/arch/arm/boot/dts/qcom/<target name>.dtsi.
The first hardware described by the UFS device tree is the UFS PHY layer. UFS
PHY nodes describe the UFS PHY hardware macro. To bind the UFS PHY layer
with the UFS host controller, the controller node should contain a phandle
reference to UFS PHY node.
compatible = “qcom,ufsphy”;
vdda-pll-supply = <&pma8084_l12>; // LD12 is the power rail for PHY PLL and
Power_Gen block
vdda-pll-max-microamp = <1000>;//
status = “disabled”;
UFSHC nodes are defined to describe the on-chip UFS host controller. Each
instance of the controller should have its own node.
ufs1: ufshc@0xfc594000 // Start of the node that has the physical address
of UFS controller.
vcc-supply = <&pma8084_l19>;
vccq-supply =<&pma8084_l4>;
vccq2-supply = <&pma8084_s4>; // The above 3 describe the power rail
supplying to vcc, vccq and vccq2.
vcc-max-microamp = <460000>;
vccq-max-microamp = <450000>;
“ref_clk”;
qcom,msm-bus,name = "ufs1";
qcom,msm-bus,num-cases = <22>;
qcom,msm-bus,num-paths = <2>;
qcom,msm-bus,vectors-KBps =
SNIP
qcom,bus-vector-names = "MIN",
UFS driver initialization starts from the platform driver. This driver reads the
device tree, parses the different device elements, allocates memory for the host
bus adaptor (HBA), and then calls driver initialization using ufshcd_init().
ufshcd_alloc_host
The ufshcd_alloc_host API allocates HBA resources. The ufs_hba data structure
defines the register addresses and points to the lower device driver, capabilities,
IRQ, voltage regulator information, and so on.
unsigned long lrb_in_use; Boolean flag that shows if the LRB is in use
● Operational
● Reset
● Error
get_variant_ops
struct ufs_hba_variant_ops {
bool, struct
ufs_pa_layer_attr *,
struct ufs_pa_layer_attr
*);
};
ufshcd_parse_clock_info
The ufshcd_parse_clock_info API reads the device tree for a list of possible
clocks and selects the name and value of the highest clock frequency.
ufshcd_parse_vcc_info
The ufshcd_parse_vcc_info API reads information from the device tree for VCC,
VCCQ, and VCCQ2 and sets it into the driver.
ufshcd_init
The following figure shows the TR and TMR descriptor lists that are created by
the initialization process.
Figure : TR and TMR descriptor lists
The following figure shows how each TR descriptor points to the command UPIU
sent from SCSI.
The SCSI host passes the command to the API, a local reference block structure
is assembled from the cmd argument, a UPIU structure is composed, and the
driver requests the doorbell for completion.
ufshcd_compose_upiu
The ufshcd_compose_upiu API takes the HBA as its first argument and the
configured LRB as its second argument. If the cmd type is SCSI, it prepares the:
Figure : ufshcd_compose_upiu
ufshcd_map_sg
Figure : ufshcd_map_sg
ufshcd_send_command
ufshcd_intr
The ufshcd_intr API is the main interrupt handler. It reads the interrupt status
register and, if any interrupt bit is set, calls the interrupt status handler.
ufshcd_sl_intr
The ufshcd_sl_intr API is called to service the interrupt routine. The interrupt
status bits are passed from ufshcd_intr() as an arg and the status is checked for
any errors. If errors are found, the driver calls error handler
ufshcd_check_errors(). If the TR is completed, the driver calls
ufshcd_transfer_req_compl().
ufshcd_transfer_req_compl()
If the status has an error, the sense data is read from the UPIU response packet
and passed to the SCSI layer. If there is an error, the error handler is invoked.
UFS task management
ufshcd_task_req_compl
The host controller driver has two work queues that execute two different types of
error handling.
ufshcd_tmc_handler
The ufshcd_tmc_handler API invokes ufshcd_task_req_compl to read the
overall command status register. If it succeeds, it passes the response
value to the SCSI layer.
ufshcd_reset_and_restore
The ufshcd_reset_and_restore API resets the host controller and the UIC
layer if the controller encounters any errors. This API also resets the UFS
card for MSM8998 and later chipsets.
After the reset, the status of the controller is restored and it gets initialized
for new commands.
ufshcd_exception_event_handler
Updated: Aug 03, 2023 80-NN752-1 Rev: E
ufshcd_get_ee_status
The ufshcd_get_ee_status API sends the dev cmd query to read the
device exception error.
ufshcd_bkops_ctrl
If the received flag has the URGENT BKOPS request set, the API calls
ufshcd_bkops_ctrl to handle the error request. It sends the query to read
the BKOPS status flag.
ufshcd_enable_auto_bkops
If the value of the status is equal to or more than PERF_IMPACT,
ufshcd_enable_auto_bkops is called. This allows UFS to execute BKOPS
when needed.
ufshcd_disable_auto_bkops
If the value is less than PERF_IMPACT, ufshcd_disable_auto_bkops is
called to disable auto BKOPS. The host then signals the device to execute
BKOPS.
Two-lane UFS host controller in
MSM8998
Starting from MSM8998, the UFS host controller is configured with two Tx and
two Rx lanes. The motivation for the two-lane configuration is to achieve higher
Rx throughput.
For example, Seq Read for MSM8996 with one lane is approximately 530 MB/s,
and Seq Read for MSM8998 with two lanes is approximately 880 MB/s,
achieving an increase of approximately 66% in Rx throughput.
Configure the target to either one lane Rx, one lane Tx, or one lane for both Rx
and Tx by changing the flags in the kernel/msm-4.4/drivers/scsi/ufs/ufs-qcom.h
file. In the following example, the lane setting is changed from 2 to 1:
#define UFS_QCOM_LIMIT_NUM_LANES_RX 2
#define UFS_QCOM_LIMIT_NUM_LANES_TX 2
#define UFS_QCOM_LIMIT_NUM_LANES_RX 1
#define UFS_QCOM_LIMIT_NUM_LANES_TX 1
AH8 is a hardware feature where the controller sets the state of the link between
the host and device to hibern8. AH8 is more aggressive than hibern8 and is
issued by the driver enabling overall power optimization after AH8 is enabled.
Although AH8 is a hardware feature, the driver sets the hibern8 time. The
driver sets the AH8 time in the
kernel_msm-4.4/kernel/msm-4.4/drivers/scsi/ufs/ufshcd.c
file in the following API:
ufshcd_init_hibern8_on_idle()
hba->hibern8_on_idle.delay_ms = 1
The UFS driver (ufshcd.c) registers with kernel devfreq with two structures:
If there is no activity for a given period of time, the UFS driver gates its clocks.
Clock gating time period is different at different clock frequencies, for
example:
● Time period = 10 ms when clocks are scaled to high frequency
● Time period = 50 ms when clocks are scaled to low frequency
Debugging
To log a history of commands and events along with their timestamp, execute
the following command from a Trace32 simulator on RAM dumps:
v.v ufs_qcom_hosts[0]->hba->cmd_log.entries[]
This debug feature enables logging UFS commands in RAM and is useful in
analyzing a sequence of commands leading to a failure.
● cmd type – Prints the type of command, for example, scsi, query, nop, and
so on
● string – For example, send or complete
● sequence number – Displays how many commands were executed before
the current command
● lun – Logical unit for this particular command
● cmd_id
● lba
● txfer_len – Size of the transaction for a command, in bytes
● tag – Current doorbell register for a command
● doorbell – Different bits of a doorbell register set when executing the
command
● idn – Used only for query idn, a value that indicates the type of data
● outstanding reqs – Bits representing number of transfer requests
● tstamp – Timestamp
We can see the following in a Trace32 simulator. Here, 20 different log entries
will be printed. Each log entry is unique, which can be identified by
understanding the str, cmd_type, cmd_id. The timestamp shows the time
taken for each command to complete execution.
1. (str = 0xFFFFFFA0A8E53D97 -> "dme_cmpl_2", cmd_type =
0xFFFFFFA0A8ECCAAC -> "dme", lun = 0x0, cmd_id = 0x17, lba = 0x0,
transfer_len = 0x0, idn = 0x0, doorbell = 0x0, outstanding_reqs = 0x0,
seq_num = 0xCB20, tag = 0x0, tstamp = 0x0000001B8EB168B2),
2. (str = 0xFFFFFFA0A8E53C3C -> "dme_send", cmd_type =
0xFFFFFFA0A8ECCAAC -> "dme", lun = 0x0, cmd_id = 0x18, lba = 0x0,
transfer_len = 0x0, idn = 0x0, doorbell = 0x0, outstanding_reqs = 0x0,
seq_num = 0xCB21, tag = 0x0, tstamp = 0x0000001B8F9FEA3B),
3. (str = 0xFFFFFFA0A8E53D97 -> "dme_cmpl_2", cmd_type =
0xFFFFFFA0A8ECCAAC -> "dme", lun = 0x0, cmd_id = 0x18, lba = 0x0,
transfer_len = 0x0, idn = 0x0, doorbell = 0x0, outstanding_reqs = 0x0,
seq_num = 0xCB22, tag = 0x0, tstamp = 0x0000001B8FA4570B),
4. (str = 0xFFFFFFA0A8E53A38 -> "scsi_send", cmd_type =
0xFFFFFFA0A8E50F61 -> "scsi", lun = 0x0, cmd_id = 0x2A, lba =
0x01A46458, transfer_len = 0x1000, idn = 0x0, doorbell = 0x1,
outstanding_reqs = 0x1, seq_num = 0xCB23, tag = 0x0, tstamp =
0x0000001B8FA59230),
5. (str = 0xFFFFFFA0A8E5448B -> "scsi_cmpl", cmd_type =
0xFFFFFFA0A8E50F61 -> "scsi", lun = 0x0, cmd_id = 0x2A, lba =
0x01A46458, transfer_len = 0x1000, idn = 0x0, doorbell = 0x0,
outstanding_reqs = 0x1, seq_num = 0xCB24, tag = 0x0, tstamp =
0x0000001B8FA6D33B)
To change the format of the individual fields of the log entries, select the value for
str/cmd_type and right-click the cursor to change the format to string. This will
show what the str/cmd_type is.
For example, in (1), the str is “dme_cmpl_2”, cmd_type is “dme”, the cmd_id is
0x17, which when looked up in scsi_proto.h, stands for “dme” command.
Similarly, if cmd_id is 0x18, it stands for copy. All these values can be looked up
in scsi_proto.h to understand the type of command. The control plane
communicates with the UniPro transport layer via device management entity
service primitives. The str shows us this was a dme complete command. The
sequence number tells us the order of the commands executed and we can see
that the value of the sequence number is incremented in each of the following
commands. For some of the commands, there is no need to access LUNs or
LBAs. Hence, in (1), (2), and (3), the value for LUN and LBA remains zero.
However, in (4) and (5), it is a scsi_send and scs-_cmpl command, and cmd_id –
0x2A, which stands for write_10 operation. Here the logical block address has a
value, which states that the write operation must happen with reference to the
LBA mentioned.
WriteBooster
Triple-level-cell flash NAND (TLC NAND) stores up to 3 bits. TLC NAND bits are
logically defined, which requires complicated programming and has a higher
error correction probability. Single-level-cell NAND (SLC NAND) stores 1 bit.
WriteBooster feature support is only for UFS 3.1 devices. It is enabled only when
the clocks are scaled up. ufshcd_wb_conifg checks if the WriteBooster feature is
supported with the API. ufshcd_wb_sup Bit[8] of dExtendedUFSFeaturesSupport
indicates if the device supports the WriteBooster feature. It also configures the
LUN, which can be configured in two ways:
● LU dedicated buffer
● Shared buffer ufshcd_wb_ctrl – Query request that attempts to set a
WriteBooster parameter in a configuration descriptor to a value different
from zero; if that is true, WriteBooster is enabled.
Flushing when the entire buffer for WriteBooster is consumed, data will be written
in normal storage instead of in the WriteBooster buffer. The device informs the
host when the WriteBooster buffer is full or near full with the exception event
WRITEBOOSTER_FLUSH_NEEDED. The ufshcd_wb_flush_needed event
mechanism is enabled by setting the WRITEBOOSTER_EVENT_EN bit of the
wExceptionEventControl attribute.
There are two methods for flushing data from the WriteBooster buffer to the
normal storage:
Provisioning is an event that creates and defines logical unit numbers (LUNs) on
a new card. The provisioning XML file can be found in
<target_build>\common\config\ufs\provision, where
<target_build> is the customer build ID.
HLOS kernel, HLOS filesystem, modem image, MBA, and ESP (EFI system
partition) are stored in LUN0. All peripheral image loader (PIL) images are stored
in the HLOS file system in LUN0. Well-known LUN 0x30 alternates between
LUN1 and LUN2 to provide a fail safe backup for the extended boot loader (XBL).
One-time programmable images (OTP) are stored in LUN3. The rest of the boot
chain is stored in LUN4, using the existing backup GPT mechanism for a fail safe
update.
Customers can configure the last two LUNs, but they must create all partitions at
the same time.
Qualcomm Flash Image Loader (QFIL) – A QTI tool that supports provisioning
and flashing images on UFS in a factory environment. If using a flashing tool
other than QFIL, the customer must change the block size to 4K to be compatible
with UFS.
echo 0 >
/sys/bus/platform/devices/1d84000.ufshc/hibern8_on_idle_enable
●
● Change auto_hibern8 time from 10 ms
By default, the time is set to 1 ms.
To view:
● cat /sys/bus/platform/devices/1d84000.ufshc/auto_hibern8
● Output displayed on the screen: 1000
To change the time to 100 ms:
● echo 100000 >
/sys/bus/platform/devices/1d84000.ufshc/auto_hibern8
Related documents
Title Number
Resources
Interface
CPort CPort is a SAP on the UniPro transport layer (L4) within a device that is
used for connection-oriented data transmission
FIFO First-in-first-out
TR Transfer request
UCS UFS command set
WP Write protect
Revision history
B July 2017 Added Section 5.3, Logical block provisioning, and Section 10.5,
Two-lane UFS host controller in MSM8998
D April 2020 Added WriteBooster section; updated UFS clock scaling, UFS
clock gating, and debugging sections
E August 2023 Added Provisioning and Customization sections; updated
WriterBooster callflow