_TOP_MENU

Jan 30, 2014

Digital Design Interview Question


General Questions on Digital Design
Bcd to binary conversation

Input is 2 and 8 in binary form , change it to binary 28.

Q. Design a circuit to calculate square root .


QA. The first question in an interview will be "tell me about yourself"

A. Candidate should be able to speak for 5 min without breaking. If you are looking for Junior post, start from your 10th/12th and graduation projects but if you are at senior (4+ exp) then you should start with your 1st company. As you gained more exp, you can reduce the details of your 1st company.

Build up your knowledge by sharing with others ,always keep yourself ready to market and adapt new technology. It is necessary today that you should get 1 hrs from your work load to update/upgrade yourself.

Make sure , you answered with in one line statement , if you tell stories, then it will leave bad impression on interviewer.


Q1 . What is the difference in D-flop and T-flop ?
D-flop will assign data from input to output at active clock edge, but in T-flop, inverted input will be assigned to output at active clock edge.

Q2. How to make D-flop using gates?






Q3. How to calculate fifo depth?
This will depend on the data rate at input and output, see example in Q14.



Q4. How to make latch using gates?

Q5. How to make D-flop using latches ?




Q6. what is blocking non-blocking statement in verilog?
Blocking statement - evaluation and execution done before executing next statement
Non-Blocking statement - Evaluation done first for all statement in always block and then execution happens.


Q7. What is combinational circuit and sequential circuit ?
Combinational - doesn't depend on clock ,executes concurrently
Sequential - depends on clock, and executes sequentially.

Q8. What is glitch and how glitches can be fatal for a design and how to solve it ?
Glitch is a unwanted short time active signal due to combinational logic, it could be fatal to a system if not taken care. One can't avoid the glitches in design but if following standard design guideline, then glitches will not be harmful to the system or design.

Q9. What is latch ? how it is different than flip-flop ?
Latch are level triggered , it is different than flip flop, flip flops are edge triggered , means input will assign to output at the clock edge but in latches , input will assign to output if clock is '1' (active high clock level). which means, in latches output may change multiple times depend on input if clock signal is high.

Q10. Diff b/t binary coding and one hot coding , which one having preference ?
In binary coding , all states will be assigned in binary style, for exp, in 4 state FSM , state will be assigned as below -

st1- 2'b00
st2 - 2'b01
st3 - 2'b10
st4- 2'b11

while in one-hot coding , states will be assigned as below -
st1 - 4'b0001
st2 - 4'b0010
st3 - 4'b0100
st4 - 4'b1000

In one hot coding, each state will have 1 flop and number of flops will be more , but in binary coding, flops will be shared by states through logic, number of flops will be less.

One hot coding is useful when you need to work on faster clock frequency , where FSM logic is bit high and tool build up large combo circuit between flops.

Q11. what are the types of FSM ?
Melay and Moore

Q12. Different ways to coding FSM
Q13. Implement AND, OR, NAND, NOR using 2:1 mux ?

Q13A. What are the issues will be facing if you have true combinational circuit ?
Ans. 1. simulation output wont come, simulator will be hang in infinite loop.
2. Synthesis will not proceed as this will be treated as error ..


Q14. Given the following FIFO and rules, how deep does the FIFO need to be to prevent underflow or overflow?
RULES:

1) frequency(clk_A) = frequency(clk_B) / 4
2) period(en_B) = period(clk_A) * 100
3) duty_cycle(en_B) = 25%
Assume clk_B = 100MHz (10ns)
From (1), clk_A = 25MHz (40ns)
From (2), period(en_B) = 40ns * 400 = 4000ns, but we only output for

1000ns,due to (3), so 3000ns of the enable we are doing no output work. Therefore, FIFO size = 3000ns/40ns = 75 entries.

Q14B. What if there is large skew in read/write pointers ? How to resolve it since one bit may reach late and in that time frame..there could be many clock cycles passed.

Q. Write speed is 100 MHz and it can write 80 bits in 100 cycles , read speed is 80MHz.
  * can we have fifo which doesn't overflow ?
     Yes
  * How to calculate Depth of the FIFO ?
     need to figure out worst case - >
      in 100 cycles , 80-bit can be written , it could be back to back.
      lets say back to back , 1st 20 cycle .. no bits, next 80cycles, all 80-bits, in next round, first 80-bits in first 80 clock cycles and then no bits for 20 cycles.
     160-bits will be worst case, 80% throughput, with in that time , 80% bits can be read, which is 128bits, need to store 160-128 => 32 tht is the depth of FIFO.

  

A - Functionality wise, there should not be any issue since grey code will change one bit only and if there is skew on the bit which is changing , it will change after some time, latency will increase and pointer movement will have some delay. Data should be stable by that time in same clock domain, but in opposite clock domain, there will be no effect. Full or empty will have some latency added because of this skew.

Verilog Questions ->

Q14A. What is the difference b/w parameter and define ?

Q15. Diff b/t task and function ?

Q16. What is delta simulation time ?

Q17. What is $monitor , $strobe and $display ? how they are different to each other ?

Q18. What is sensitivity list ? how important is this in RTL coding ?

Q19. What is `timescale in Verilog ?

Q20. What is synchronous and asynchronous reset ? which one to prefer in design ?

Q21. What are non-synthesizable syntax in Verilog ?

Q22. What is formal verification ?

Q23. What is Gate Level Simulation ?

Q24. What is SDF and where and why is it used in digital flow ?

Q25. does hold time of a flop depend on frequency or technology or both?
Q26. What is min hold time and max hold time?

Q27. how to fix setup violation?

Q28. How to fix hold violation?
Q29.What is jitter ? types of jitter and how to fix jitter ?

Q30 What is racing condition ?

Q31 What is clock skew and where it is useful ?

Q32. What is data skew and where it is useful ?

Q33 What is the difference between $display and $write ?
These are the main system task routines for displaying information. The two tasks are identical except that $display automatically adds a newline character to the end of its output, whereas the $write task does not. Thus, if you want to print several messages on a single line, you should use the $write task.

Q34 What is the difference between $display and $strobe ?
$strobe and $display are same in syntax, the only difference is , $strobe will executed after all events executed in the current simulation cycles.

Q35 what is the difference between Mealy and Moore FSM ?

Q36 Define clock skew, negative skew , positive skew .

Q37 Design a 4-bit comparator circuit.

Q38 what is 1's compliment and 2's compliment ?

Q39 Design a circuit which double the frequency of input clock signal.

Q40 Design d-latch using 2:1 mux.

Q41 Expand the following - PLA, PAL, CPLD, FPGA

Q42 What are PLA and PAL ?

Q43 What is tie-high and tie-low cells and where it is used ?
Tie-high and Tie-Low cells are used to connect the gate of the transistor to either power or ground. In deep sub micron processes, if the gate is connected to power/ground the transistor might be turned on/off due to power or ground bounce. The suggestion from foundry is to use tie cells for this purpose. These cells are part of standard-cell library. The cells which require Vdd, comes and connect to Tie high...(so tie high is a power supply cell)...while the cells which wants Vss connects itself to Tie-low.

Q44 What is difference between latch based design and flop based design ?
Latches are level-sensitive and flip-flops are edge sensitive. latch based design and flop based design is that latch allows time borrowing which a tradition flop does not. That makes latch based design more efficient. But at the same time, latch based design is more complicated and has more issues in min timing (races).

Q45 What is High Vt and Low Vt cells ?
HVt cells are high threshold cells means for switching activity , voltage has to reach at higher voltage level which means less leakage but additional delay , LVt cells having low voltage threshold which means leakage will be high and switching time will be less. Lvt cells are useful in critical paths and HVt cells are used for power saving.

Q46 What is LEF mean ?
LEF is an ASCII data format from Cadence Design inc, to describe a standard cell library. It includes the design rules for routing and the Abstract layout of the cells. LEF file contains the following,
Technology: layer, design rules, via-definitions, metal-capacitance
Macros : cell descriptions, cell dimensions, layout of pins and blockages,capacitance

Q47 How to detect positive edge of signal ?
Q48 How to detect negative edge of signal ?

Q49 what is PRBS code and how it's generated for different data width ?
PRBS is Pseudo-random code , it is a binary sequence.
Different types of PRBS used in design -
PRBS7 = x7+ x6 +1
PRBS15 = x15 + x14 + 1
PRBS23 = x23 + x18 +1
PRBS31 = x31 + x28 +1

Q50 How can you produce parallel output from PRBS ?

Q51 What is convergence , re-convergence and divergence ?
VHDL
Q What is the difference between entity and architecture ?
Q What are the different types of modelling style of coding ?
Q What are the delays in VHDL and how to model those delays ?
Q What is the difference between concurrent and sequential statement ?
Q What is the difference between signal and variable ?

STATIC TIMING ANALYSIS

Below are the questions related to timing analysis.

Q(1). What is static timing analysis and where and why do we need in digital design flow ?
A. static timing analysis is analysis of different path in same clock domain , this is to ensure all 4 types of path meeting their timing requirement or not.

STA typically we run after place and routing , where we have routing delay of nets and clock tree information. Clock tree basically timing about the clock from source to destination and in STA we set set_propagated_clock in environment to consider all clocks path are real ..not ideal.

Q(2) What is setup/hold time ?
A. The setup time is the time where the data must be held stable before the active clock edge.
The Hold time is the time where data must be stable after the active clock edge.




Q(3) What happen if setup or hold or both gets violated ?
A. That flop will go into metastable state.

Q(4) What is metastable state and how to prevent it ?
A. Metastable state is the state which neither at level 0 nor at level 1. Any small glitch or noise at flop may change the output state of the flop which may/may not be correct and as a result design may show unexpected behavior. This could be fatal for a design. (IP and Soc)

Steps to prevent metastable state -
1. Using proper synchronizers(two-stage or three stage), as soon as the data is coming from the asynchronous domain. Using Synchronizers, recovers from the metastable event.
2. Use synchronizers between cross-clocking domains to reduce the possibility from metastability.
3. Using Faster flip-flops (which has narrower Metastable Window).

Q(5) How to calculate maximum frequency ?
Q(6) How to calculate minimum frequency ?
Q7 What are the various timing path which should be consider while doing the timing analysis ?
1. Flop to Flop
2. Input to Flop
3. Flop to Output
4. Input to Output

Q8 What is virtual clock and why do we need it ?
A-Virtual clock is mainly used to model the I/O timing specification. Based on what clock the output/input pads are passing the data.

Q9 What are the various things which impact timing of the design ?
Q10 What are the various design constraint used while doing synthesis for the design ?
create clock
clock uncertainty
input delay
output delay
false path declaration
multi cycle path declaration
input load, etc

Q11 verilog construct which are not supported by synthesis tool ?
A- delays, real and time data, force, release, fork..join

Q12 What is body effect ?
A- Increase in threshold voltage , due to increase in source voltage is called body effect.
Q13 What are the various design changes you do to meet design power targets?
A-
1. Design with Multi-VDD designs, Areas which requires high performance, goes with high VDD and areas which needs low-performance are working with low Vdd's, by creating Voltage-islands and making sure that appropriate level-shifters are placed in the cross-voltage domains

2. Designing with Multi-Vt's(threshold voltages), areas which require high performance, goes with low Vt, but takes lot of leakage current, and areas which require low performance with high Vt cells, which has low leakage numbers, by incorporating this design process, we can reduce the leakage power.

3. As in the design , clocks consume more amount of power, placing optimal clock-gating cells, in the design and controlling them by the module enable's gives a lot of power-savings.

4. As clock-tree's always switch making sure that most number of clock-buffers are after the clock-gating cells, this reduces the switching there by power reduction.

5. Incorporating Dynamic Voltage & Frequency scaling (DVFS) concepts based on the application , there by reducing the systems voltage and frequency numbers when the application does not require to meet the performance targets.

6. Ensure the design with IR-Drop analysis and ground-bounce analysis, is within the design specification requirement.

7. Place power-switches, so that the leakage power can be reduced. related information.

Q14 What is wireload model ?
A- Wire load models (WLMs) have been serving the chip design community since 1985. Born out of a need to account for the role of interconnect in delay, they have evolved over time to aid in the estimation of chip area and power. However, it has been the estimation of delay — the WLM's reason for existence — where they have come to serve the chip designer most poorly.

Q15 what are the precautions to be taken in design when the chip has both digital analog portions ?
Q16 What is ECO and what are the steps to perform it , how to check ?
Q17 What are the syntax to preserved logic during synthesis, you dont want tool to be optimized that particular module or block.
A- set_dont_touch attribute
Q18 What is guard ring ?
A- A guard ring (or double guard ring) in the substrate helps in shielding the critical analog circuitry from digital noise.

Q19 How to define constraint for DDR ?
A-
Q20 how to generate reports for 16 corners for hold violations ?
To avoid setup time violations:
The combinational logic between the flip-flops should be optimized to get minimum delay.
Redesign the flip-flops to get lesser setup time.
Tweak launch flip-flop to have better slew at the clock pin, this will make launch flip-flop to be fast there by helping fixing setup violations.
Play with clock skew (useful skews).

To avoid hold time violations:
By adding delays (using buffers).
One can add lockup-latches (in cases where the hold time requirement is very huge, basically to avoid data slip).

Q . If you have not specified the min delay on input/output port, what will be the min delay value ?
A. If max delay is specified then Max value will be taken as min delay. min/max will be same in that case.

1) Is there any example of having an intentional combo loop in design? Can combo loops be useful? How does synthesis tool and gate level simulation act on combo loop. How do we meet timings of combo loops?
2) How do we meet timings of mux? Do we need to do set case analysis at all select inputs of mux or does synthesis tool do it for us?
3) Can we have useful latches in our design? Can we use it at any other place than during DFT?
4) The latch borrowing concept...i see it written that it is for fixing hold problems. Latch borrowing is done to increase the sampling time to half clock period, then how does it fix the hold issue? Doesn't it mean increasing the time for setup?
The latch borrowing concept that i have seen is wrt sync clocks with different skews. Is there an example for latch borrowing for async clocks?
5) How can we generate a divide by 3 clock without using negedge flop. I thought i can put an inverter on the clock and then use the inverted clock further. But can we put an inverter on the clock while writing rtl? I thought we should not touch clocks at all in our design except for gating them.
6) How do we implement 4 deep fifo using f/f?
7) Is the reset recovery and removal time only for flops with async resets with sync deassertion or is it for reset synchronizer flops as well? How do we take care of metastability in async flops?
8) I think we cannot have a totally glitchfree circuit. So what do we do to meet the timing. If I have a glitchy logic which meets the timing, is it ok? Can i live with such a design or is there a probability that my circuit my go into metastability anytime?

Q.What is Synthesis ?
A. logic synthesis is a process by which an abstract form of desired circuit behavior, typically at register transfer level (RTL), is turned into a design implementation in terms of logic gates

How do you ensure that your synthesis is complete and successful ?
What are the things you should check after Synthesis ?
What are the different types of Synthesis Flow ?
What to do when you are getting below errors ?

During synthesis there are 3 DRC checks .
Logical DRC
DRC has more priority than timing

What are the various factors that need to be considered while choosing a technology library for a design?
When stated as 0.13μm CMOS technology, what does 0.13 represent?
What is Synthesis?
What happens when a process neither has sensitivity list nor a wait statement?
Where should you declare the index that is used in a for loop? What is its visibility?
What are the three weak strength values in IEEE 9 valued logic?
What is the difference between a transaction and an event?
What is a Moore machine? How is it different from a Mealy machine?
Assume that variable a is integer and b is natural. When are the following statements valid?
a := a + b;
b := a + 3;
What modeling technique will decompose designs hierarchically?
Do variables need time queues?
Does simulation time advance during delta cycles?
Is it true that synthesis transformations take less time at the top abstraction levels?
Is it true that synthesis transformations give refined results at the top abstraction levels?
What will a well formed case statement synthesize to?
What will happen to a design that is synthesized without any constraints?

Explain what role the Synopsys DesignWare libraries fulfill in the synthesis process.

What is the difference between a high level synthesis tool (as represented by Synopsys behavioral Compiler) versus a logic synthesis tool (as represented by Synopsys Design Compiler)?

Explain what it meant for Synopsys DesignWare component to be ‘inferred’ by a synthesis tool?

What are different power reduction techniques?

How do you perform Synthesis activities in Multi vt libraries?

What are the advantages of clock gating?

One circuit will be given to you, where one of the inputs X have a high toggling rate in the circuit. What steps you take to reduce the power in that given circuit?

You will be told to realize a Boolean equation. The next question is how efficient usage of power is achieved in that crcuit?

Some circuit will be given to you and will be instructed to set certain timing exceptions commands on that particular path.

What is the difference in PT timing analysis during post and pre layout designs?

What you mean by FSM States?

Draw the timing waveform for the circuit given?

What is Setup time and hold time effects on the circuit behavior while providing different situations?

What is the difference of constraints file in Pre layout and post layout?

What is SPEF? Have you used it? How you can use it?

What difference you found (or can find) in the netlist and your timing behavior, while performing timing analysis in pre layout and post layout?

What is clock uncertainty, clock skew and clock jitter?

What is the reason for skew and jitter?

What is clock tree synthesis?

What are the timing related commands with respect to clock?

In front end, you set ideal network conditions on certain pins/clocks etc. Why? In Back end how is it taken care?

Which library you have used?

What difference you (can) find in TSMC and IBM libraries?

Draw the LSSD cell structure in TSMC and IBM libraries?

Every tool has some drawbacks? What drawbacks you find in Prime time?
What are the difference you find when you switch from 130nm to 90nm?

Explain the basic ASIC design flow? Where your work starts from? What is your role?
What is 90nm technology means?

What are the issues you faced in your designs?

Perform the setup and hold check for the given circuit.

Why setup and hold required for a flop?

You had any timing buffer between synthesis and P&R? How much should be the margin?

What are the inputs for synthesis and timing analysis from RTL and P&R team? Whether any inputs for changing the scripts?

How will you fix the setup and hold violation?

What are the constraints you used for the synthesis? Who decides the constraints?

What is uncertainty?

What is false path and multi cycle path? Give examples? For given example for false path what you will do for timing analysis?

What strategies used for the power optimization for your recent project?

Why max and min capacitance required?
Capacitance is for load , which is also related to transition.

You have two different frequency for launch (say 75Mhz) and capture (say 100Mhz).
What will happen to data? Write the waveform? If hold problem what you will do?

What is Metastability? How to overcome metastability? If metastable condition exists which frequency you will use as clock- faster or slower? Why?

Have you used formality? For a given block what checks it will do? How it verifies inside the block?

If you changed the port names during the synthesis how will you inform Formality?

Why you use power compiler? What is clock gating? What are advantage and disadvantages of clock gating? Write the clock gating circuit? Explain.

How will you control the clock gating inference for block of register? Write the command for the same?

Write the total power equation? What is leakage power? Write equation for it.

For clock gated flop and non clock gated flop outputs connected to a AND gate what problem can you expect? How to avoid the problem?

Write the sequence detector state which detects 10? How will optimize? write the verilog code for the same?

What is jitter? Why it will come? How to consider? What is the command for that?

What is clock latency? How to specify? What is the command for that?

What is dynamic timing analysis? What is the difference with static timing analysis? Which is accurate? Why it is accurate?

Give any example for Dynamic timing analysis? Do you know anything about GCL simulation?

What is free running clock?

What type of operating condition you consider for post layout timing analysis?

What is one-hot encoding technique? What are advantages? What are types of encoding?

Which scripting language you know?

How will you analysis the timing of different modes in design? How many modes you had in your design? What are the clock frequencies?

What your script contains?

Write the digital circuit for below condition: "when ever data changes from one to zero or zero to one the circuit should generate a pulse of one clock period length"?

Have come across any design with latches? What is the problem in timing analysis if you have latch in your design?

Have you come across any multiple clock design? What are the issues in multiple clock designs?

What you mean by synthesis strategies

Memory->

DDR3

SDRAM

SRAM

How to Implement Sin/Cos function in hardware ->

ABSTRACT -> Trigonometric functions have wide variety of applications in real life. Specially SIN and COSINE waves have been very useful in medical science, signal processing, geology, electronic communication, thermal analysis and many more.
Real life application requires fast calculation capabilities as much as possible. Hardware, due to its hardwired design, provides high speed calculations for such application. This paper presents a hardware design that calculates SIN and COSINE value of a given angle using COordinate Rotation
DIgital Computer (CORDIC) algorithm.
Keywords
CORDIC; Hardware; sine, cosine;

* what are the complex number ?
a+ib -> real + imaginary (i) * b -> where i(2) -> -1

* what is sin(30') ->
* cos function / tan function
Logical ->
Design a lift controller.

Universal NOR Gate

NOR gates are so-called "universal gates" that can be combined to form any other kind of logic gate.

A NOR gate is logically an inverted OR gate. By itself has the following truth table:

Truth Table

Input A Input B Output Q
0 0 1
0 1 0
1 0 0
1 1 0
-----------------------
NOR AS NOT
This is made by joining the inputs of a NOR gate. As a NOR gate is equivalent to an OR gate leading to NOT gate, this automatically sees to the "OR" part of the NOR gate, eliminating it from consideration and leaving only the NOT part.
-----------------------
NOR AS OR
The OR gate is simply a NOR gate followed by a NOT gate.
-----------------------
NOR AS AND
An AND gate gives a 1 output when both inputs are 1; a NOR gate gives a 1 output only when both inputs are 0. Therefore, an AND gate is made by inverting the inputs to a NOR gate.
-----------------------
NOR AS NAND
A NAND gate is made using an AND gate in series with a NOT gate.
-----------------------
NOR AS XOR
An XOR gate is made by connecting the output of 3 NOR gates (connected as an AND gate) and the output of a NOR gate to the respective inputs of a NOR gate. This expresses the logical fomula (A AND B) NOR (A NOR B). This construction entails a propagation delay three times that of a single NOR gate.
-----------------------

NOR AS XNOR
An XNOR gate can be constructed from four NOR gates implementing the expression "(A NOR N) NOR (B NOR N) where N = A NOR B".This construction has a propagation delay three times that of a single NOR gate, and uses more gates.

Prime Time Questions

1) What's PrimeTime?

Answer:
PrimeTime is a full chip static analysis tool that can fully analyze a multimillion gate ASIC in a short
amount of time.

2) What files are required for PrimeTime to run?

Answer:
PrimeTime needs four types of files before you can run it:
1. Netlist file: Verilog, VHDL, EDIF
2. Delay file: SPEF(standard parasitic format, it's from STARRC or place&route tool), SPF, SDF(standard delay format)
3. Library file: DB ( From library vendors)
4. Constrains file: Synopsys Design Constraints(SDC) include 3 min requirement, clock, input delay and output delay

3) Can I use script in PrimeTime?

Answer: Yes, you should use tcl( Tool command language) whenever possible.


4) What PrimeTime check?

Answer:
PrimeTime will check the following violations:
1. Setup violations: The logic is too slow compare to the clock.
With that in mind there are several things a designer can do to fix the setup violations.

Reduce the amount of buffering in the path.
Replace buffers with 2 inverters place farther apart
Reduce larger than normal capacitance on a book’s output pin
Increase the size of books to decrease the delay through the book.
Make sure clock uncertainty is not to large for the technology library that you
are using.
Reduce clock speed. This is a poor design technique and should be used as a
last resort. 2. hold time violations: the logic is too fast.
To fix hold violations in the design, the designer needs to simply add more delay
to the data path. This can be done by

Adding buffers/inverter pairs/delay cells to the data path.
Decreasing the size of certain books in the data path. It is better to reduce the books closer to the capture flip flop because there is less likely hood of affecting other paths and causing new errors.
Add more capacitance to the output pin of books with light capacitance. Fix the setup time violation first, and then hold time violation. If hold violations are not fixed before
the chip is made, more there is nothing that can be done post fabrication to fix hold problems unlike setup violation where the clock speed can be reduced.

3. Transition Violations:
When a signal takes too long transiting from one logic level to another, a transition violation is reported. The violation is a function of the node resistance and capacitance.

The designer has two simple solutions to fix the transitions violations.

Increase the drive capacity of the book to increase the voltage swing or decrease the capacitance and resistance by moving the source gate closer to sink gate.
Increase the width of the route at the violation instance pin. This will decrease the resistance of the route and fix the transition violation4. Capacitance Violations:
The capacitance on a node is a combination of the fan-out of the output pin and
the capacitance of the net. This check ensures that the device does not drive more
capacitance than the device is characterized for.


The violation can be removed by increasing the drive strength of the book
By buffering the some of the fan-out paths to reduce the capacitance seen by the output pin.

5) What conditions are used to check setup violation?

Answer:

WorstCase => setup violations
BestCase => hold violations
We use the worst case delay when testing for setup violations and then we use the best case delay when testing for hold violations.

6) How to run PrimeTime in the unix?

[Linux] user@gmu>> pt_shell –f pt_script.tcl |& tee pt.log

Here are the sample PrimeTime script :

A total of three scripts must be created, one for each timing corner.
# ------------------------------------------------------------
# Library Declarations.
# ------------------------------------------------------------
set search_path ". /proj/timing/etc"
set link_path "*"
lappend link_path "stdCell_tt.db"
# ------------------------------------------------------------
# Read in Design
# ------------------------------------------------------------
# Read in netlist
read_file -f verilog top_level.v
# Define top level in the hierarchy
current_design "top_level"
# Combine verilog and db files and identify any errors.
link_design
# Read in SPEF file
read_parasitics -quiet -format SPEF top_level.spef.gz
# ------------------------------------------------------------
# Apply Constraints
# ------------------------------------------------------------
# Read in timing constraits
read_sdc -echo top_level.sdc
# Propagate clocks and add uncertainty to setup/hold calculations
set_propagated_clock [all_clocks]
set_clock_uncertainty 0.2 [all_clocks]
21
# ------------------------------------------------------------
# Time
# ------------------------------------------------------------
set_operating_conditions -min WORST -max WORST
# Register to Register
report_timing -from [all_registers -clock_pins] \
-to [all_registers -data_pins] -delay_type max \
-path_type full_clock –nosplit \
-max_paths 1 -nworst 1 \
-trans -cap -net > tc_reg2reg_setup.rpt
report_timing -from [all_registers -clock_pins] \
-to [all_registers -data_pins] -delay_type min \
-path_type full_clock –nosplit \
-max_paths 1 -nworst 1 \
-trans -cap -net > tc_reg2reg_hold.rpt
# Register to Out
report_timing -from [all_registers -clock_pins] \
-to [all_outputs] -delay_type max \
-path_type full_clock –nosplit \
-max_paths 1 -nworst 1 \
-trans -cap -net > tc_reg2out_setup.rpt
report_timing -from [all_registers -clock_pins] \
-to [all_outputs] -delay_type min \
-path_type full_clock –nosplit \
-max_paths 1 -nworst 1 \
-trans -cap -net > tc_reg2out_hold.rpt
# In to Register
report_timing -from [all_inputs]
-to [all_registers -data_pins] \
-delay_type max \
-path_type full_clock –nosplit \
-max_paths 1 -nworst 1 -trans \
–cap -net > tc_in2reg_setup.rpt
report_timing -from [all_inputs] \
-to [all_registers -data_pins] \
-delay_type min -path_type full_clock \
-nosplit -max_paths 1 -nworst 1 \
-trans -cap -net > tc_in2reg_hold.rpt
# All Violators – Find Cap/Tran Violations
# Summary of Setup/Hold Violations
report_constraints -all_violators > tc_all_viol.rpt
# Clock Skew
report_clock_timing -type skew -verbose > tc_clockSkew.rpt
exit

Other Topics -


UART FAQ -> 
Q1 How Rx is working ? When will you sample start bit ? 
A. 
Q2 Why there is 16x sampling rate ?
A. 
Q3 How to match baud rate ? 
A. 
Q4. Format of UART frame , why stop bit is 1/1.5/2 ? 
A
Q5 How baud rate is generated ? 


I2C and SPI 
1. Can devices be added and removed while the system is running (Hot swapping) in I2C ?  ( What is hot swapping in I2C ? )
a practical example is HDMI, which has some high speed IO for the video/sound, and then has I2C for control. If you were designing a monitor, or a video out device, you would need to support the plugging in of one or more monitors.

Hot swap can have issues with master reads in that the master will be ack-ing the read data bytes. If a slave is disconnected during a read, the master will see all 1's for the data bits. Some devices will include a checksum (ideally one where all 1's is not a valid choice) which helps to solve that corner-case.

I2C devices are addressed. If a device is hot swapped with one having the same address, there could be issues. If the master polls devices regularly, then it would be able to detect normal unplug/pluge events. Likewise, some circuits might provide an interupt to the uC for the connect plugged/unplugged.  

 
2. What is the standard bus speed in I2C ?
3. How many devices can be connected in a standard I2C communication ?
4. What are the 2 roles of nodes in I2C communication ?
5. What are the modes of operation in I2C communication ?
6. What is bus arbitration ?
7. Advantages and limitations of I2C communication ?

8. How many wires are required for I2C communication ? What are the signals involved in I2C ?

9. What is START bit and STOP bit ?

10. How will the master indicate that it is either address / data ? How will it intimate to the slave that it is going to either read / write ?

11. Is it possible to have multiple masters in I2C ?

12. In write transaction, the Master monitors the last ACK and issues STOP
condition - True/False ?

13. In read transaction, the master doesnot acknowledge the final byte it receives and issues STOP condition - True/False ?

14. What is SPI communication ?

15. How many wires are required for SPI communication ?

16. What are the 4 logic signals specified by SPI bus ?

17. Does SPI slave acknowledge the receipt of data ?

18. SPI has higher throughput than I2C - True / False ?

19. Is it better to use I2C or SPI for data communication between a microprocessor and DSP ?

20. Is it better to use I2C or SPI for data communication from ADC ?

21. Duplex communication is possible by simultaneously using MOSI and MISO during each SPI clock cycle - True / False ?

22. Is it possible to connect SPI slaves in daisy chain ?

23. What is the role of shift register in Master and Slave devices in SPI ?

24. How will the master convey that it is stopping the transmission of data ?

25. What is bit banging ?
7. Advantages and limitations of I2C communication ?

Advantages:-
1. only two wires are required
2. hot pluggable.
3. we can connect up to 128 in that 16 are reserved.
4. it's the address dependent protocol
5. we can communicate multiple identical slaves at a time.
6. it's multi master protocol.

Limitation:-
1. speed is too less when compare to other protocol.
2. very complex protocol.
3. distance of communication also less.
4. acknowledgement is necessary for each byte transfer.

I2C is a byte protocol, with standard structure mostly used in RTC, EEPROM applications

43. What are the types of ethernet frame formats ? Are they compatible with each other ?
Ethernet II, IEEE 802.3, IEEE 802.2 LLC, IEEE 802.2 SNAP. The different frame types have different format and MTU values, but can coexist on the same physical medium.   

44. What is the role of LLC and MAC layer in ethernet ?
LLC interacts with the upper network layer. It is responsible for handling layer 3 protocols (mux/de-mux) and link services like reliability(error management mechanisms such as ARQ) and flow control. MAC layer interacts with the lower PHY layer. It is responsible for framing and media access control for broadcast media.

45. What is carrier sensing ?
This is a media access control protocol where the transmitter determines whether another transmission is in progress before initiating transmission.

46. What is CSMA-CA ?
Carrier sensing is done but nodes attempt to avoid collisions by transmitting only when the channel is sensed to be idle.
   
47. What is the use of preamble and FCS in Ethernet frame ?
The preamble of ethernet packet allows devices to synchronize the receiver clocks. FCS is a error detecting code added to a frame that helps in discarding the damaged frame in a communication protocol if the FCS number calculated by the destination node mismatches with the FCS number sent by the source node.

48. What are the types of CSMA access modes ?
The types of CSMA access nodes are Persistent, Non-persistent, P-persistent and O-persistent.

49. What is port mirroring ? where is port mirroring used ?
Port mirroring sends a copy of network packets seen on one port to a network monitoring connection on another switch port. It is used in network switch.

50. What is a iterative server ?
Iterative server processes one client request at a time in a sequential manner.


* Define setup window and hold window ?
* What is the effect of clock skew on setup and hold ?
* In a multicycle path, where do we analyze setup and where do we analyze hold ?
* How many test clock domains are there in a chip ?
* How enables of clock gating cells are taken care at the time of scan ?
* Difference between functional coverage and code coverage ?
* Does 100% code coverage means 100% functional coverage & vice versa?
* What do you mean by useful skew ?
* What is shift miss and capture miss in transition delay faults ?
* What is the structure of clock gating cell ?
* How can you say that placing clock gating cells at synthesis level will reduce the area of the design ?
* How will you decide to insert the clock gating cell on logic where data enable is going for n number of flops
* What is the concept of synchronizers ?
* What are lockup latches?
* What is the concept of power islands ?
* What does OCV, Derate and CRPR mean in STA ?
* What is dynamic power estimation ?
* On AHB bus which path would you consider for worst timing ?
* What is the difference between blocking & non-blocking statements in verilog ?
* What are the timing equations for setup and hold, with & without considering timing skew ?
* Design a XOR gate with 2-input NAND gates
* Design AND gate with 2X1 MUX
* Design OR gate with 2X1 MUX
* Design T-Flip Flop using D-Flip Flop
* In a synchronizer how you can ensure that the second stage flop is getting stabilized input?
* Design a pulse synchronizer
* Calculate the depth of a buffer whose clock ratio is 4:1 (wr clock is fatser than read clock)
* Design a circuit that detects the negedge of a signal. The output of this circuit should get deasserted along with the input signal
* Design a DECODER using DEMUX
* Design a FSM for 10110 pattern recognition
* 80 writes in 100 clock cycles, 8 reads in 10 clock cycles. What is the minimum depth of FIFO?
* Why APB instead of AHB ?
* case, casex, casez if synthesized what would be the hardware
* Define monitor functions for AHB protocol checker
* What is the use of AHB split, give any application
* Can we synchronize data signals instead of control signals ?


Q: What is the difference between a Verilog task and a Verilog function?
A: The following rules distinguish tasks from functions:

1. A function shall execute in one simulation time unit
A task can contain time-controlling statements.

2. A function cannot enable a task
A task can enable other tasks or functions.

3. A function shall have at least one input type argument and shall not have an output or inout type argument;
A task can have zero or more arguments of any type.

4. A function shall return a single value;
A task shall not return a value.


Q: Given the following Verilog code, what value of "a" is displayed?
always @(clk) begin
a = 0;
a <= 1;
$display(a);
end
A: This is a tricky one! Verilog scheduling semantics basically imply a four-level deep queue for the current simulation time:

1: Active Events (blocking statements)
2: Inactive Events (#0 delays, etc)
3: Non-Blocking Assign Updates (non-blocking statements)
4: Monitor Events ($display, $monitor, etc).

Since the "a = 0" is an active event, it is scheduled into the 1st "queue". The "a <= 1" is a non-blocking event, so it's placed into the 3rd queue. Finally, the display statement is placed into the 4th queue. Only events in the active queue are completed this sim cycle, so the "a = 0" happens, and then the display shows a = 0. If we were to look at the value of a in the next sim cycle, it would show 1.


Q: Given the following snippet of Verilog code, draw out the waveforms for clk and a

always @(clk) begin
a = 0;
#5 a = 1;
end
A:
10 30 50 70 90 110 130
___ ___ ___ ___ ___ ___ ___
clk ___| |___| |___| |___| |___| |___| |___| |___
a ___________________________________________________________
This obviously is not what we wanted, so to get closer, you could use "always @ (posedge clk)" instead, and you'd get
10 30 50 70 90 110 130
___ ___ ___ ___ ___ ___ ___
clk ___| |___| |___| |___| |___| |___| |___| |___
___ ___
a _______________________| |___________________| |_______
Q: What is the difference between the following two lines of Verilog code?

#5 a = b;
a = #5 b;
A:

#5 a = b; Wait five time units before doing the action for "a = b;".
The value assigned to a will be the value of b 5 time units hence.

a = #5 b; The value of b is calculated and stored in an internal temp register.
After five time units, assign this stored value to a.

Q: What is the difference between:
c = foo ? a : b; and

if (foo) c = a;
else c = b;

A:

The ? merges answers if the condition is "x", so for instance if foo = 1'bx, a = 'b10, and b = 'b11, you'd get c = 'b1x.

On the other hand, if treats Xs or Zs as FALSE, so you'd always get c = b.


Q: Using the given, draw the waveforms for the following versions of a (each version is separate, i.e. not in the same run):
reg clk;
reg a;
always #10 clk = ~clk;
(1) always @(clk) a = #5 clk;
(2) always @(clk) a = #10 clk;
(3) always @(clk) a = #15 clk;
Now, change a to wire, and draw for:
(4) assign #5 a = clk;
(5) assign #10 a = clk;
(6) assign #15 a = clk;
A:
10 30 50 70 90 110 130
___ ___ ___ ___ ___ ___ ___
clk ___| |___| |___| |___| |___| |___| |___| |___
___ ___ ___ ___ ___ ___ ___
(1)a ____| |___| |___| |___| |___| |___| |___| |_
___ ___ ___ ___ ___ ___ ___
(2)a ______| |___| |___| |___| |___| |___| |___|
(3)a __________________________________________________________
Since the #delay cancels future events when it activates, any delay over the actual 1/2 period time of the clk flatlines...
With changing a to a wire and using assign, we just accomplish the same thing...
10 30 50 70 90 110 130
___ ___ ___ ___ ___ ___ ___
clk ___| |___| |___| |___| |___| |___| |___| |___
___ ___ ___ ___ ___ ___ ___
(4)a ____| |___| |___| |___| |___| |___| |___| |_
___ ___ ___ ___ ___ ___ ___
(5)a ______| |___| |___| |___| |___| |___| |___|
(6)a __________________________________________________________

  1. Explain why & how a MOSFET works
  2. Draw Vds-Ids curve for a MOSFET. Now, show how this curve changes (a) with increasing Vgs (b) with increasing transistor width (c) considering Channel Length Modulation
  3. Explain the various MOSFET Capacitances & their significance
  4. Draw a CMOS Inverter. Explain its transfer characteristics
  5. Explain sizing of the inverter
  6. How do you size NMOS and PMOS transistors to increase the threshold voltage?
  7. What is Noise Margin? Explain the procedure to determine Noise Margin
  8. Give the expression for CMOS switching power dissipation
  9. What is Body Effect?
  10. Describe the various effects of scaling
  11. Give the expression for calculating Delay in CMOS circuit
  12. What happens to delay if you increase load capacitance?
  13. What happens to delay if we include a resistance at the output of a CMOS circuit?
  14. What are the limitations in increasing the power supply to reduce delay?
  15. How does Resistance of the metal lines vary with increasing thickness and increasing length?
  16. You have three adjacent parallel metal lines. Two out of phase signals pass through the outer two metal lines. Draw the waveforms in the center metal line due to interference. Now, draw the signals if the signals in outer metal lines are in phase with each other
  17. What happens if we increase the number of contacts or via from one metal layer to the next?
  18. Draw a transistor level two input NAND gate. Explain its sizing (a) considering Vth (b) for equal rise and fall times
  19. Let A & B be two inputs of the NAND gate. Say signal A arrives at the NAND gate later than signal B. To optimize delay, of the two series NMOS inputs A & B, which one would you place near the output?
  20. Draw the stick diagram of a NOR gate. Optimize it
  21. For CMOS logic, give the various techniques you know to minimize power consumption
  22. What is Charge Sharing? Explain the Charge Sharing problem while sampling data from a Bus
  23. Why do we gradually increase the size of inverters in buffer design? Why not give the output of a circuit to one large inverter?
  24. In the design of a large inverter, why do we prefer to connect small transistors in parallel (thus increasing effective width) rather than lay out one transistor with large width?
  25. Given a layout, draw its transistor level circuit. (I was given a 3 input AND gate and a 2 input Multiplexer. You can expect any simple 2 or 3 input gates)
  26. Give the logic expression for an AOI gate. Draw its transistor level equivalent. Draw its stick diagram
  27. Why don't we use just one NMOS or PMOS transistor as a transmission gate?
  28. For a NMOS transistor acting as a pass transistor, say the gate is connected to VDD, give the output for a square pulse input going from 0 to VDD
  29. Draw a 6-T SRAM Cell and explain the Read and Write operations
  30. Draw the Differential Sense Amplifier and explain its working. Any idea how to size this circuit? (Consider Channel Length Modulation)
  31. What happens if we use an Inverter instead of the Differential Sense Amplifier?
  32. Draw the SRAM Write Circuitry
  33. Approximately, what were the sizes of your transistors in the SRAM cell? How did you arrive at those sizes?
  34. How does the size of PMOS Pull Up transistors (for bit & bit- lines) affect SRAM's performance?
  35. What's the critical path in a SRAM?
  36. Draw the timing diagram for a SRAM Read. What happens if we delay the enabling of Clock signal?
  37. Give a big picture of the entire SRAM Layout showing your placements of SRAM Cells, Row Decoders, Column Decoders, Read Circuit, Write Circuit and Buffers
  38. In a SRAM layout, which metal layers would you prefer for Word Lines and Bit Lines? Why?
  39. How can you model a SRAM at RTL Level?
  40. What�s the difference between Testing & Verification?
  41. For an AND-OR implementation of a two input Mux, how do you test for Stuck-At-0 and Stuck-At-1 faults at the internal nodes? (You can expect a circuit with some redundant logic)
  42. What is Latch Up? Explain Latch Up with cross section of a CMOS Inverter. How do you avoid Latch Up? 
  1. Give two ways of converting a two input NAND gate to an inverter
  2. Given a circuit, draw its exact timing response. (I was given a Pseudo Random Signal Generator; you can expect any sequential ckt)
  3. What are set up time & hold time constraints? What do they signify? Which one is critical for estimating maximum clock frequency of a circuit?
  4. Give a circuit to divide frequency of clock cycle by two
  5. Design a divide-by-3 sequential circuit with 50% duty circle. (Hint: Double the Clock)
  6. Suppose you have a combinational circuit between two registers driven by a clock. What will you do if the delay of the combinational circuit is greater than your clock signal? (You can't resize the combinational circuit transistors)
  7. The answer to the above question is breaking the combinational circuit and pipelining it. What will be affected if you do this?
  8. What are the different Adder circuits you studied?
  9. Give the truth table for a Half Adder. Give a gate level implementation of the same.
  10. Draw a Transmission Gate-based D-Latch.
  11. Design a Transmission Gate based XOR. Now, how do you convert it to XNOR? (Without inverting the output)
  12. How do you detect if two 8-bit signals are same?
  13. How do you detect a sequence of "1101" arriving serially from a signal line? 
what are the differences between SIMULATION and SYNTHESIS
Simulation <= verify your design.
synthesis <= Check for your timing
Simulation is used to verify the functionality of the circuit.. a)Functional Simulation:study of ckt's operation independent of timing parameters and gate delays. b) Timing Simulation :study including estimated delays, verify setup,hold and other timing requirements of devices like flip flops are met.
Synthesis:One of the foremost in back end steps where by synthesizing is nothing but converting VHDL or VERILOG description to a set of primitives(equations as in CPLD) or components(as in FPGA'S)to fit into the target technology.Basically the synthesis tools convert the design description into equations or components

Can u tell me the differences between latches & flipflops?
There are 2 types of circuits:
1. Combinational
2. Sequential
Latches and flipflops both come under the category of "sequential circuits", whose output depends not only on the current inputs, but also on previous inputs and outputs.
Difference: Latches are level-sensitive, whereas, FF are edge sensitive. By edge sensitive, I mean O/p changes only when there is a clock transition.( from 1 to 0, or from 0 to 1)
Example: In a flipflop, inputs have arrived on the input lines at time= 2 seconds. But, output won't change immediately. At time = 3 seconds, clock transition takes place. After that, O/P will change.
Flip-flops are of 2 types:
1.Positive edge triggered
2. negative edge triggered
1)fllipflops take twice the nymber of gates as latches
2) so automatically delay is more for flipflops
3)power consumption is also more
latch does not have a clock signal, whereas a flip-flop always does.


What is slack?
The slack is the time delay difference from the expected delay(1/clock) to the actual delay in a particular path.
Slack may be +ve or -ve.

Equivalence between VHDL and C?
There is concept of understanding in C there is structure.Based upon requirement structure provide facility to store collection of different data types.
In VHDL we have direct access to memory so instead of using pointer in C (and member of structure) we can write interface store data in memory and access it.
RTL and Behavioral
Register transfer language means there should be data flow between two registers and logic is in between them for end registers data should flow.
Behavioral means how hardware behave determine the exact way it works we write using HDL syntax.For complex projects it is better mixed approach or more behavioral is used.

Hi all, Can you find the difference between these two verilog (VERILOG-1995) statements


$fopen("filename");
$fopen("filename","w");

Define setup window and hold window ?
* What is the effect of clock skew on setup and hold ?
* In a multicycle path, where do we analyze setup and where do we analyze hold ?
* How many test clock domains are there in a chip ?
* How enables of clock gating cells are taken care at the time of scan ?
* Difference between functional coverage and code coverage ?
* Does 100% code coverage means 100% functional coverage & vice versa?
* What do you mean by useful skew ?
* What is shift miss and capture miss in transition delay faults ?
* What is the structure of clock gating cell ?
* How can you say that placing clock gating cells at synthesis level will reduce the area of the design ?
* How will you decide to insert the clock gating cell on logic where data enable is going for n number of flops
* What is the concept of synchronizers ?
* What are lockup latches?
* What is the concept of power islands ?
* What does OCV, Derate and CRPR mean in STA ?
* What is dynamic power estimation ?
* On AHB bus which path would you consider for worst timing ?
* What is the difference between blocking & non-blocking statements in verilog ?
* What are the timing equations for setup and hold, with & without considering timing skew ?
* Design a XOR gate with 2-input NAND gates
* Design AND gate with 2X1 MUX
* Design OR gate with 2X1 MUX
* Design T-Flip Flop using D-Flip Flop
* In a synchronizer how you can ensure that the second stage flop is getting stabilized input?
* Design a pulse synchronizer
* Calculate the depth of a buffer whose clock ratio is 4:1 (wr clock is fatser than read clock)
* Design a circuit that detects the negedge of a signal. The output of this circuit should get deasserted along with the input signal
* Design a DECODER using DEMUX
* Design a FSM for 10110 pattern recognition
* 80 writes in 100 clock cycles, 8 reads in 10 clock cycles. What is the minimum depth of FIFO?
* Why APB instead of AHB ?
* case, casex, casez if synthesized what would be the hardware
* Define monitor functions for AHB protocol checker
* What is the use of AHB split, give any application
* Can we synchronize data signals instead of control signals ?  

# What is the difference between $display and $monitor and $write and $strobe?
# What is the difference between code-compiled simulator and normal simulator?
# What is the difference between wire and reg?
# What is the difference between blocking and non-blocking assignments?
# What is the significance Timescale directivbe?
# What is the difference between bit wise, unary and logical operators?
# What is the difference between task and function?
# What is the difference between casex, casez and case statements?
# Which one preferred-casex or casez?
# For what is defparam used?
# What is the difference between "= =" and "= = =" ?
# What is a compiler directive like 'include' and 'ifdef'?
# Write a verilog code to swap contents of two registers with and without a temporary register?
# What is the difference between inter statement and intra statement delay?
# What is delta simulation time?
# What is difference between Verilog full case and parallel case?
# What you mean by inferring latches?
# How to avoid latches in your design?
# Why latches are not preferred in synthesized design?
# How blocking and non blocking statements get executed?
# Which will be updated first: is it variable or signal?
# What is sensitivity list?
# If you miss sensitivity list what happens?
# In a pure combinational circuit is it necessary to mention all the inputs in sensitivity disk? If yes, why? If not, why?
# In a pure sequential circuit is it necessary to mention all the inputs in sensitivity disk? If yes, why? If not, why?
# What is general structure of Verilog code you follow?
# What are the difference between Verilog and VHDL?
# What are system tasks?
# List some of system tasks and what are their purposes?
# What are the enhancements in Verilog 2001?
# Write a Verilog code for synchronous and asynchronous reset?
# What is pli? why is it used?
# What is file I/O?
# What is difference between freeze deposit and force?
# Will case always infer priority register? If yes how? Give an example.
# What are inertial and transport delays ?
# What does `timescale 1 ns/ 1 ps' signify in a verilog code?
# How to generate sine wav using verilog coding style?
# How do you implement the bi-directional ports in Verilog HDL?
# How to write FSM is verilog?
# What is verilog case (1)?
# What are Different types of Verilog simulators available?
# What is Constrained-Random Verification ?
# How can you model a SRAM at RTL Level?
# What are different types of timing verifications?
# What is the difference between Formal verification and Logic verification?
# What is the difference between verification and validation? And what are procedures of doing the same?
# What is the difference between testing and verification? 


  • What is the frequency of the DDR / voltage
  • What is the memory size ; explain prefetch in memory context 
  • What is the Bit length for data
  • Basic protocol level DDR knowledge
  • What is absolute jitter
  • What are the types of jitter you know
  • How do you make power measurements
  • Asynchronous reset flip flop / Synchronous reset flip flop difference
  • What is a asynchronous reset D flip flop
  • How do you double the clock frequency using combinational logic
  • What do you understand by synthesis
  • What is the basic difference between ASIC and FPGA design flow
  • Blocking and non-blocking statements
  • Tools used for front end
  • PCI clock frequency
  • What is metastability
  • Delay parameters which matter for DDR ( cas latency what do u know about it )
  • RAS / CAS
  • Master – Slave FF
  • Add delay on FF1-FF2 D1Q-D2 path  and analyze a circuit ( a double inverter)
  • Swap the delay onto the clock line and analyze the circuit ( double inverter )
  • Delay nos. given 20 ns (double inverter) on clock skew line, 5 ns on the first FF to second FF line ; 100 ns clock period – analyze the circuit
  • 4:1 mux from 2:1 mux ABCD in order – draw truth table and prove
  • A equality comparator design – make it an inverter
  • XOR gate from NAND gate
  • Explain DDR protocol and timing
  • Ethernet packet format
  • Test setup and explain settings
  • Two critical debug you have done in your career and lessons learnt
  • Decoder design – explain address decoder how it works given x number of rows and columns draw timing and circuit
  • 8085 block diagram ( general uP concepts)
  • DRAM
  • FF can be used in memory? Why / why not ?  FF vs DRAM
  • Five skills obtained from board design / rules – best practices
  • Latch vs FF
  • VHDL code snippet
  • SR FF.
  • DDR banks
  • 100 MHz clock is used to give input – need to send out data at 200 MHz suggest circuits for this
  • DDR explanation – chip level
  • 100 MHz in from 1 PLL clock / 100 MHz out from PLL2 clock – design circuit
  • What problems will come in case (q 20 / 22)
  • FIFO design details and problems
  • some more design problems were asked to be analyzed
  • What is set up time
  • What is hold time
  • ASIC Design flow
  • Challenges in ASIC Design
  • Latch and Flip-Flop
  • Design a simple circuit for motion detector
  • Use of a decoder
  • Types of Flip Flop
  • Which is the most common flip flop used in ASIC designs
  • FF --- Combinational Logic --- FF ( Analysis of standard circuit)
  • Analysis of circuit with delays ( buffers added to clock lines)
  • How to find the maximum clock frequency of a given circuit
  • Synthesis tools and styles
  • Timing constraints to be given for ASIC design
  • What happens when you decrease the clock frequency – does setup / hold time violations at say 300MHz frequency vanish at 3 MHz
  • What all influence the delay of an element ( Flop – capacitance ?)
  • What parameters influence delay ( temperature effect on delay)
  • If input transition is faster what happens to delay of a cell
  • What do you understand by drive strength
  • High drive of a cell – correlates to what ?
  • Importance of hold time (adder can become subtractor – Function change!!)
  • How to solve set-up time violations
  • How to solve hold time violations
  • What is PRBS
  • What is the difference between single ended and differential
  • Why is PRBS needed in a tester
  • USB protocol / packet level understanding? Basics explanation
  • 80 MHz DDR – what do you understand from this
  • SDR and DDR difference and advantages
  • Test setup
  • Triplexer – why passive optical networks what it means
  • WDM – CO – CPE
  • What do you understand by a Loopback why is it needed
  • Challenges in finding maximum clock frequency in ASIC design
  • Power estimation in chips ?
  • Why is place and route important – any understanding of the same
  • What is skew – clock skew
  • What is slew – slew rate
  • Why do you want to do verification and enter ASIC domain
  • What is jitter
  • What is cycle – cycle / period jitter. How is it estimated
  • Common i/fs in a system
  • Pulse width
  • Why is setup and hold time first needed
  • Effect of temperature on delays ( delay increases with temperature)
  • Why clock skew arises
  • What is positive and negative skew
  • Is positive skew and advantage or disadvantage – how does it help
  • What is the worst pattern that can be used to test a set of lines
  • SSN – crosstalk
  • what do you actually look for in SI
  • What do you do in a bring-up
  • What is Custom and Semi-Custom ASIC design
  • ASIC – FPGA difference ( low power is a key)
  • When a Flop is used; when a latch is used and why?
  • Why random patterns?
  • DFM?
  • Clock tree routing problems
  • Models for components
  • Buffer circuit in IOs – Pulse width distortion / duty cycle distortion why it happens performance before and after pads causes for degradation
  •  Can you explain a general verification methodology flow  
  •  Explain your verification architecture  
  •  Why do you think we need functional coverage  
  •  Can you explain e-manager coverage implementation methods you have used  
  •  DDR + problems you faced in bring up  
  •  Can you give me an FSM/code/circuit to implemet code for following waveform  
  •  32 bit addr / 32 bit data / size -- map to 64 bit memory - give structure / how will you sample data for byte, word, half word, dword accesses  
  •  You have 256 MB sys memory - (insufficient say for ur huge ASIC) how will u verify  
  •  Dynamic memory  
  •  List and indexed lists  
  •  Can you explain some RISC processor architecture you know  
  •  RISC vs CISC you know from college  
  • How can specman handle semaphores  
  •  Some addressing fundamentals 
  •  Multiple threads in your env - what did you implement to run three cores simultaneously.  
  •  AXI - addressing ; 4k page boundary cross over fetches; wrapping concept ; Multiple slave out of order transaction support - waveforms as to how these transactions will be ; size / length concepts

Xilinx - >

Question on Latency , Throughput
FIFO Depth

AXI Protocol


PCIE Related FAQ ->
Q. What are the PCIe protocol extensions, and how do they improve PCIe interconnect performance?

A.The PCIe protocol extensions are primarily intended to improve interconnect latency, power and platform efficiency. These protocol extensions pave the way for better access to platform resources by various compute- and I/O-intensive applications as they interact with and through the PCIe interconnect hierarchy. There are multiple protocol extensions and enhancements being developed and they range in scope from data reuse hints, atomic operations, dynamic power adjustment mechanisms, loose transaction ordering, I/O page faults, BAR resizing and so on. Together, these protocol extensions will increase PCIe deployment leadership in emerging and future platform I/O usage models by enabling significant platform efficiencies and performance advantages.


Q.Section 4.2.4.2 - When upconfiguring a Link in the LTSSM Configuration.Linkwidth.Start state, are the Lanes which are being activated required to transmit an EIEOS first when they exit Electrical Idle?
A. No. Lanes being activated for upconfiguration are not required to align their exit of Electrical Idle with the transmission of any Symbol, Block, or Ordered Set type. Furthermore, the Lanes are not required to exit Electrical Idle before the LTSSM enters the Configuration.Linkwidth.Start state.






What is PCI Express (PCIe) 3.0? What are the requirements for this evolution of the PCIe architecture?




PCIe 3.0 is the next evolution of the ubiquitous and general-purpose PCI Express I/O standard. At 8GT/s bit rate, the interconnect performance bandwidth is doubled over PCIe 2.0, while preserving compatibility with software and mechanical interfaces. The key requirement for evolving the PCIe architecture is to continue to provide performance scaling consistent with bandwidth demand from leading applications with low cost, low power and minimal perturbations at the platform level. One of the main factors in the wide adoption of the PCIe architecture is its sensitivity to high-volume manufacturing materials and tolerances such as FR4 boards, low-cost clock sources, connectors and so on. In providing full compatibility, the same topologies and channel reach as in PCIe 2.0 are supported for both client and server configurations. Another important requirement is the manufacturability of products using the most widely available silicon process technology. For the PCIe 3.0 architecture, PCI-SIG believes a 65nm process or better will be required to optimize on silicon area and power.




Section 4.2.7.3 - PCIe 3.0 Base spec section 4.2.7.4 states that "Receivers shall be tolerant to receive and process SKP Ordered Sets at an average interval between 1180 to 1538 Symbol Times when using 8b/10b encoding and 370 to 375 blocks when using 128b/130b encoding.ÌÒ For 128/130 encoding, if the Transmitter sends one SKP OS after 372 blocks and a second after 376 blocks, the average interval comes out to be 374 blocks and that falls in the valid range. So is this allowed, or must every SKP interval count fall inside the 370 to 375 blocks?




At 8 GT/s, a SKP Ordered Set must be scheduled for transmission at an interval between 370 to 375 blocks. However, the Transmitter must not transmit the scheduled SKP Ordered Set until it completes transmission of any TLP or DLLP it is sending, and sends an EDS packet. Therefore, the interval between SKP OS transmissions may not always fall within a 370 to 375 block interval.

For example, if a SKP Ordered Set remains scheduled for 6 block times before framing rules allow it to be transmitted, the interval since the transmission of the previous SKP OS may be 6 blocks longer than normal, and the interval until the transmission of the next SKP OS may be 6 Blocks shorter than normal. But the Transmitter must schedule a new SKP Ordered Set every 370 to 375 blocks, so the long-term average SKP OS transmission rate will match the scheduling rate.

Receivers must size their elastic buffers to tolerate the worst-case transmission interval between any two SKP Ordered Sets (which will depend on the Max Payload Size and the Link width), but can rely on receiving SKP Ordered Sets at a long term average rate of one SKP Ordered Set for every 370 to 375 blocks. The SKP Ordered Set interval is not checked by the Receiver.




Section 7.28.3 - When the maximum M-PCIe Link speed supported is 2.5 GT/s, what will be the Link speed following a reset?




The Link Speed following reset will be the result of Configuration process. During the M-PCIe discovery and Configuration process, RRAP is used to discover M-PHY capabilities, analyze and configure configuration attributes accordingly. Depending on the High speed supported by both components, the Link Speed and Rate Series may get configured for HS-G1, HS-G2 or HS-G3 and RATE A or B respectively. For this particular example Link Speed could be either HS-G1 or HS-G2 depending on the supported Link Speeds of the other Component on the LINK.




Section 4.2.6.4.2 - According to pg227 of spec, "When using 128b/130b encoding, TS1 or TS2 Ordered Sets are considered consecutive only if Symbols 6-9 match Symbols 6-9 of the previous TS1 or TS2 Ordered Set". When in Recovery.Equalization and if using 128b/130b encoding, is it required that lane/link numbers (symbol 2) match in TS1s to be considered as consecutive or is it need not match?




The Receiver is not required to check the Link and Lane numbers while in Recovery.Equalization.




Has there been a new compliance specification developed for PCIe 3.0?




For each revision of its specification, PCI-SIG develops compliance tests and related collateral consistent with the requirements of the new architecture. All of these compliance requirements are incremental in nature and build on the prior generation of the architecture. PCI-SIG anticipates releasing compliance specifications as they mature along with corresponding tests and measurement criteria. Each revision of the PCIe technology maintains its own criteria for product interoperability and admission into the PCI-SIG Integrators List.




Section 4.2.6.9 - When in the Disabled state the Upstream Port transitions to Detect when an Electrical Idle exit is detected at the receiver. Is an Electrical Idle exit required to be detected on all Lanes?




An Electrical Idle exit is required to be detected on at least one Lane.




What is equalization? How is Tx equalization different from Rx equalization? What is trainable equalization?




Equalization is a method of distorting the data signal with a transform representing an approximate inverse of the channel response. It may be applied either at the Tx, the Rx, or both. A simple form of equalization is Tx de-emphasis as specified in PCIe 1.x and PCIe 2.x, where data is sent at full swing after each polarity transition and is sent at reduced swing for all bits of the same polarity thereafter. De-emphasis reduces the low frequency energy seen by the Rx. Since channels exhibit greater loss at high frequencies, the effect of equalization is to reduce these effects. Equalization may also be used to compensate for ripples in the channel that occur due to reflections from impedance discontinuities such as vias or connectors. Equalization may be implemented using various types of algorithms; the two most common are linear (LE) and decision feedback (DFE). Linear equalization may be implemented at the Tx or the Rx, while DFE is implemented at the Rx. Trainable equalization refers to the ability to adjust the tap coefficients. Each combination of Tx, channel, and Rx will have a unique set of coefficients yielding an optimum signal-to-noise ratio. The training sequence consists of adjustments to the tap coefficients while applying a quality metric to minimize the error. The choice for the type of equalization to require in the next revision of the PCIe specifications depends largely on the interconnect channel optimizations that can be derived at the lowest cost point. It is the intent of PCI-SIG to deliver the most optimum combination of channel and silicon enhancements at the lowest cost for the most common topologies.




Section 4.2.6.6.1.3 - How can I configure the RC, if permissible, to send 4096 FTS to EP while RC transits out of L0s?




Setting the Extended Synch bit in the Link Control register of the two devices on the link will increase the number of FTS Ordered Sets to 4096, but the Extended Synch bit is used only for testing purposes.




Will PCIe 3.0 products be compatible with existing PCIe 1.x and PCIe 2.x products?




PCI-SIG is proud of its long heritage of developing compatible architectures and its members have consistently produced compatible and interoperable products. In keeping with this tradition, the PCIe 3.0 architecture is fully compatible with prior generations of this technology, from software to clocking architecture to mechanical interfaces. That is to say PCIe 1.x and 2.x cards will seamlessly plug into PCIe 3.0-capable slots and operate at their highest performance levels. Similarly, all PCIe 3.0 cards will plug into PCIe 1.x- and PCIe 2.x-capable slots and operate at the highest performance levels supported by those configurations.

The following chart summarizes the interoperability between various generations of PCIe and the resultant interconnect performance level:

Transmitter Device Receiver Device Channel Interconnect Data Rate
8GHz 8GHz 8GHz 8.0GT/s
5GHz 5GHz 5GHz 5.0GT/s
2.5GHz 2.5GHz 2.5GHz 2.5GT/s
8GHz 5GHz 8GHz 5.0GT/s
8GHz 2.5GHz 8GHz 2.5GT/s
5GHz 2.5GHz 5GHz 2.5GT/s


In short, the notion of the compatible highest performance level is modeled after the mathematical least common denominator (LCD) concept. Also, PCIe 3.0 products will need to support 8b/10b encoding when operating in a pre-PCIe 3.0 environment.




Section 4.2.6 - Table 4-14 says that Receiver Errors should not be reported in the L0s or L1 states. During L1 entry, an Upstream Port's transmitter may be in Electrical Idle while its receivers are not in Electrical Idle. Similarly, a Port's transmitters may be in Electrical Idle for L0s, while its receivers are not in Electrical Idle. In these situations, should the Port report Receiver Errors such as 8b10b errors?




If the receivers are in L0s, Receiver Errors should not be reported. It does not matter whether the transmitters are in L0 or L0s for reporting of Receiver Errors. Section 4.2.6.5 specifies the 3 conditions required for the LTSSM to transition from L0 to L1. Until all of these conditions are satisfied, the LTSSM is in L0 state, and should report Receiver Errors, even if its transmitters are in Electrical Idle.




Section 4.2.6.4.1 - The specification says to transition from Recovery.RcvrLock to Recovery.RcvrCfg, upon receiving eight consecutive TS1 or TS2 Ordered Sets. If a Port in Recovery.RcvrLock state receives x (where x < 8) number of consecutive TS1 ordered sets and then receives (8 - x) number of consecutive TS2 ordered sets, should it transition to Recovery.RcvrCfg, OR should it wait for receiving 8 consecutive TS2 ordered sets to transition to Recovery.RcvrCfg (basically discarding the received TS1 Ordered Sets).




The transition requirements can be satisfied by receiving 8 TS1s, 8 TS2s, or a combination of TS1s and TS2s totaling 8.




Section 4.2.6.3.1.2 - The question is regarding Configuration.Linkwidth.Start state in the case of upconfiguration. It is written in the spec: "The Transmitter sends out TS1 Ordered Sets with Link numbers and Lane numbers set to PAD on all the active Upstream Lanes; the inactive Lanes it is initiating to upconfigure the Link width; and if upconfigure_capable is set to 1b, on each of the inactive Lanes where it detected an exit from Electrical Idle since entering Recovery and has subsequently received two consecutive TS1 Ordered Sets with Link and Lane numbers, each set to PAD, in this substate." It is not mentioned here if sending of these TS1s should be done ONLY on lanes that detected a receiver in the last time the LTSSM was at Detect state. Is it so?




The only lanes that can be part of an upconfigure sequence are lanes that were part of the configured link when configuration was performed with LinkUp = zero, and Lanes that failed to detect a Receiver can not be part of that initially configured Link. Therefore, a Port in the Configuration.Linkwidth.Start state must only transmit TS1s on a subset of the Lanes that detected a Receiver while last in the Detect state, regardless of if it is attempting to upconfigure the Link width, or not.




Section 8.4.2 - A switch Upstream port receives a Memory Read TLP while in the D3hot State. The Upstream port handles the received Memory Read Request as Unsupported Request. Is the switch Upstream port allowed or required to send a Completion with Completion Status as UR. Or must it transmit the Completion only after the power state is programmed to D0.




The Completion with UR status is required to be transmitted while the Port is in D3hot, assuming that the Port remains powered long enough to transmit the Completion.




Is PCIe 3.0 more expensive to implement than PCIe 2.x?




PCI-SIG attempts to define and evolve the PCIe architecture in a manner consistent with low-cost and high-volume manufacturability considerations. While PCI-SIG cannot comment on design choices and implementation costs, optimized silicon die size and power consumption continue to be overarching imperatives that inform PCIe specification development and architecture evolution.




Section 7.28.3 - The default Target Link Speed field in the Link Control 2 register requires the field be set to the highest support speed. Should the default value of the M-PCIe Target Link Speed Control field be 10b?




The default value of the M-PCIe Target Link Speed Control field is 01b.




What is scrambling? How does scrambling impact the PCIe 3.0 architecture?




Scrambling is a technique where a known binary polynomial is applied to a data stream in a feedback topology. Because the scrambling polynomial is known, the data can be recovered by running it through a feedback topology using the inverse polynomial. Scrambling affects the PCIe architecture at two levels: the PHY layer and the protocol layer immediately above the PHY. At the PHY layer, scrambling introduces more DC wander than an encoding scheme such as 8b/10b; therefore, the Rx circuit must either tolerate the DC wander as margin degradation or implement a DC wander correction capability. Scrambling does not guarantee a transition density over a small number of unit intervals, only over a large number. The Rx clock data recovery circuitry must be designed to remain locked to the relative position of the last data edge in the absence of subsequent edges. At the protocol layer, an encoding scheme such as 8b/10b provides out-of-band control characters that are used to identify the start and end of packets. Without an encoding scheme (i.e. scrambling only) no such characters exist, so an alternative means of delineating the start and end of packets is required. Usually this takes the form of packet length counters in the Tx and Rx and the use of escape sequences. The choice for the scrambling polynomial is currently under study.




Section 4.2.6.4.1 - While in the LTSSM Recovery.RcvrLock state, if a Port receives TS Ordered Sets with a Link or Lane number that does not match those being transmitted on at least one Lane, but receives TS Ordered Sets with Link and Lane numbers that match those being transmitted and the speed_change bit is equal to 1b on at least one other Lane, should the Port transition to the LTSSM Recovery.RcvrCfg state or the LTSSM Detect state after a 24 ms timeout?




The Port should transition to the LTSSM Recovery.RcvrCfg state.




Do PCIe 3.0 specifications only deliver a signaling rate increase?




The PCIe 3.0 specifications comprise the Base and the Card Electro-mechanical (CEM) specifications. There may be updates to other form factor specifications as the need arises. Within the Base specification, which defines a chip-to-chip interface, updates have been made to the electrical section to comprehend 8GT/s signaling. As the technology definition progresses through PCI-SIG specification development process, additional ECN and errata will be incorporated with each review cycle. For example, the current PCIe protocol extensions that address interconnect latency and other platform resource usage considerations have been rolled into the PCIe 3.0 specification revisions. The final PCIe 3.0 specification consolidates all ECN and errata published since the release of the PCIe 2.1 specification, as well as interim errata.




Section 4.2.3 - Section 4.2.3 states, "After entering L0, irrespective of the current Link speed, neither component must transmit any DLLP if the equalization procedure must be performed, and until the equalization procedure completes." Does that result in the following sequence: 1. Negotiate a Link and enter L0. Do not allow DLLP transmission while in L0. 2. Change the data rate to 8.0 GT/s and execute the equalization procedure. 3. Enter L0. Allow DLLP transmission.




Yes, that is the expected sequence when the autonomous equalization mechanism is executed. Note that Section 4.2.3 also describes other equalization mechanisms.




Section 4.2.6.4.3 - While down configured and a rate change request occurs, do the unused lanes also participate in the rate change?




The transmitter of the unused lanes remains in Electrical Idle during the speed change.




Section 4.2.6.2.3 - Can you please clarify the below statement quoted from section "4.2.6.2.3. Polling.Configuration" of PCI Express 3.0 specification: Receiver must invert polarity if necessary (see Section 4.2.4.4). Does this imply the polarity inversion can only be initiated by receiver in Polling.Configuration state or can the inversion happen in Polling.Active state as well?




When polarity needs to be inverted, it must be done before exiting Poilling.Configuration, which permits it to be done in Polling.Active.




Section 5.3.1.4.1 - A Root Port is connected to a multifunction Endpoint. The Root Port is ECRC capable. The multifunction Endpoint only has 1 function that is ECRC capable, the others are not. Software enables ECRC checking and generation in the Root Port and also enables ECRC checking and generation in the 1 Endpoint function that supports it. Given that one function is enabled for ECRC check, is the EP required to check the TD bit & ECRC on all TLPs that target any of the endpoint's functions regardless of whether the receiving function is ECRC capable?




Per Section 2.7.1, the device is required to check ECRC for all TLPs where it is the ultimate PCI Express Receiver. Note that per Section 6.2.4, an ECRC Error is not Function-specific, so it must be logged in all Functions of that device.




Does PCIe 3.0 enable greater power delivery to cards?




The PCIe Card Electromechanical (CEM) 3.0 specification consolidates all previous form factor power delivery specifications, including the 150W and the 300W specifications.




Section 7.11.7 - Software has enabled Virtual Channel VC1 and currently UpdateFC DLLPs for VC1 are being transmitted on the link. Now, software disables VC1. So my question is, should UpdateFC DLLPs for VC1 be transmitted on the link?




When VC Enable for VC1 is set to 0b, the device must stop transmitting UpdateFC DLLPs for VC1.




What is 8b/10b encoding?




8b/10b encoding is a byte-oriented coding scheme that maps each byte of data into a 10-bit symbol. It guarantees a deterministic DC wander and a minimum edge density over a per-bit time continuum. These two characteristics permit AC coupling and a relaxed clock data recovery implementation. Since each byte of data is encoded as a 10-bit quantity, this encoding scheme guarantees that in a multi-lane system, there are no bubbles introduced in the lane striping process.




Section 4.2.6.4.4 - Table 4-5 defines that the valid range of Link Number (Symbol 1 of a TS1 Ordered Set) is 0-31 for Downstream Ports that support 8.0 GT/s or above. If a Downstream Port in the LTSSM Recovery.RcvrCfg state receives TS1 Ordered Sets with a Link Number that is not in the range 0-31, do they qualify as "TS1 Ordered Sets ... with Link or Lane numbers that do not match what is being transmitted" ?




Yes. The received Link Number (not in the range 0-31) does not match the transmitted Link Number (in the range 0-31).




Section 4.2.6.4 - What is the specification condition on transmitting TS1 Ordered sets while in Recover.RcvrLock state?




While the LTSSM is in Recovery.RcvrLock the Transmitter must send TS1 Ordered Sets on all configured lanes continuously with the following exceptions:
1. At data rates above 2.5 GT/s send an EIEOS every 32 TS1 ordered Sets (4.2.4.2)
EIEOS guarantees that electrical Idle exit will be detected by the link partner

2. At all data rates send SKPOS according to 4.2.7.3
for clock compensation




How does the PCIe 3.0 8GT/s "double" the PCIe 2.0 5GT/s bit rate?




The PCIe 2.0 bit rate is specified at 5GT/s, but with the 20 percent performance overhead of the 8b/10b encoding scheme, the delivered bandwidth is actually 4Gbps. PCIe 3.0 removes the requirement for 8b/10b encoding and uses a more efficient 128b/130b encoding scheme instead. By removing this overhead, the interconnect bandwidth can be doubled to 8Gbps with the implementation of the PCIe 3.0 specification. This bandwidth is the same as an interconnect running at 10GT/s with the 8b/10b encoding overhead. In this way, the PCIe 3.0 specifications deliver the same effective bandwidth, but without the prohibitive penalties associated with 10GT/s signaling, such as PHY design complexity and increased silicon die size and power. The following table summarizes the bit rate and approximate bandwidths for the various generations of the PCIe architecture: PCIe architecture Raw bit rate Interconnect bandwidth Bandwidth per lane per direction Total bandwidth for x16 link
PCIe 1.x 2.5GT/s 2Gbps ~250MB/s ~8GB/s
PCIe 2.x 5.0GT/s 4Gbps ~500MB/s ~16GB/s
PCIe 3.0 8.0GT/s 8Gbps ~1GB/s ~32GB/s

Total bandwidth represents the aggregate interconnect bandwidth in both directions.




Section 7.5.3 - An Endpoint sends a Memory Request Upstream to a Switch. How will the Switch determine if it needs to route that packet Upstream or to an Endpoint below another Downstream Port?




Each Port of a Switch contains registers that define Memory Space apertures. The Memory Base/Limit registers define an aperture for 32-bit non-prefetchable Memory Space. The Prefetchable Memory Base/Limit & their corresponding Upper registers define an aperture for 64-bit prefetchable Memory Space. Here is the basic behavior with a properly configured Switch. If the TLP address falls within the aperture of another Downstream Port, the TLP is routed to that Downstream Port and sent Downstream. If the TLP address falls within a Memory Space range mapped by any BAR within the Switch, the TLP is routed to the Function containing that BAR. Otherwise, if the TLP address falls within an aperture of the Upstream Port, the TLP is handled as an Unsupported Request. Otherwise, the TLP is routed to the Upstream Port where it is sent to the Upstream Link or another Function associated with the Upstream Port.

If a Switch implements and supports Access Control Services (ACS), ACS mechanisms provide additional controls governing whether each Memory Request TLP is routed normally to another Downstream Port, blocked as an error, or redirected Upstream even if its address falls within the aperture of another Downstream Port. See Section 6.12.




Section 4.2.6.4.3 - When a device has down configured the number of operational lanes, what is the expected power state and characteristics of the unused lanes?




The unused transmitter Lane is put into Electrical Idle. It is recommended that the receiver terminations be left on.




Section 2.3.1.1 - If an End Point returns a Completion (Cpl) with no data and Successful Completion status to a memory read request, should this be handled as a Malformed TLP or as an Unexpected Completion?




A compliant device would not return a Completion (Cpl) with no data and Successful Completion status to a memory read request, so normally this should not occur. If a properly formed Cpl is received that matches the Transaction ID, it is recommended that it be handled as an Unexpected Completion, but it is permitted to be handled as a Malformed TLP.




Section 5.3.1.4.1 - While in the D2 state, a Function must not initiate any Request TLPs on the Link with the exception of a PME Message. What are the requirements in the D3hot state?




While in the D3hot state, a Function must not initiate any Request TLPs on the Link with the exception of a PME Message.




What are the initial target applications for PCIe 3.0?




It is expected that graphics, Ethernet, InfiniBand, storage and PCIe switches will continue to drive the bandwidth evolution for the PCIe architecture and these applications are the current targets of the PCIe 3.0 technology. In the future, other applications may put additional bandwidth and performance demands on the PCIe architecture.




Section 5.5.3.3.1 - Section 5.5.3.3.1 of the PCIe spec states the following: In order to ensure common mode has been established, the Downstream Port must maintain a timer, and the Downstream Port must not send TS2 training sequences until a minimum of TCOMMONMODE has elapsed since the Downstream Port has started both transmitting and receiving TS1 training sequences.




If the Downstream Port receives no valid TS1 Ordered Sets but does receive valid TS2 Ordered Sets, should it timeout and transition to Detect?

No, the timer is to guarantee that the Transmitter will stay in Recovery.RcvrLock for a minimum time to establish common mode. The Port must wait to transition from Recovery.RcvrLock until this timer has expired, and the timer does not start counting until an exit from Electrical Idle has been detected. Errata A21 modified this section of the L1 PM Substates with CLKREQ ECN document.



PCI Express - 4.0


Why is a new generation of PCIe architecture needed?




PCI-SIG responds to the needs of its members. As applications evolve to consume the I/O bandwidth provided by the current generation of the PCIe architecture, PCI-SIG begins to study the requirements for technology evolution to keep abreast of performance and feature requirements.




What is PCI Express (PCIe) 4.0? What are the requirements for this evolution of the PCIe architecture?




PCIe 4.0 is the next evolution of the ubiquitous and general-purpose PCI Express I/O specification. At 16GT/s bit rate, the interconnect performance bandwidth will be doubled over the PCIe 3.0 specification, while preserving compatibility with software and mechanical interfaces. The key requirement for evolving the PCIe architecture is to continue to provide performance scaling consistent with bandwidth demand from a variety of applications with low cost, low power and minimal perturbations at the platform level. One of the main factors in the wide adoption of the PCIe architecture is its sensitivity to high-volume manufacturing capabilities and materials such as FR4 boards, low-cost connectors and so on.




Will PCIe 4.0 products be compatible with existing PCIe 1.x, PCIe 2.x and PCIe 3.x products?




PCI-SIG is proud of its long heritage of developing compatible architectures and its members have consistently produced compatible and interoperable products. In keeping with this tradition, the PCIe 4.0 architecture is compatible with prior generations of this technology, from software to clocking architecture to mechanical interfaces. That is to say PCIe 1.x, 2.x and 3.x cards will seamlessly plug into PCIe 4.0-capable slots and operate at the highest performance levels possible. Similarly, all PCIe 4.0 cards will plug into PCIe 1.x-, PCIe 2.x- and PCIe 3.x-capable slots and operate at the highest performance levels supported by those configurations.




Will there been a new compliance specification developed for the PCIe 4.0 specification?




For each revision of its specification, PCI-SIG develops compliance tests and related collateral consistent with the requirements of the new architecture. All of these compliance requirements are incremental in nature and build on the prior generation of the architecture. PCI-SIG anticipates releasing compliance specifications as they mature along with corresponding tests and measurement criteria. Each revision of the PCIe technology maintains its own criteria for product interoperability and admission into the PCI-SIG Integrators List.




What were the requirements outlined for the feasibility analysis?




In assessing potential improvements to the connector, materials, silicon and channel improvements, PCI-SIG required that compatibility, low-cost and high-volume manufacturing be maintained.




Is PCIe 4.0 architecture more expensive to implement than PCIe 3.x?




PCI-SIG attempts to define and evolve the PCIe architecture in a manner consistent with low-cost and high-volume manufacturability considerations. While PCI-SIG cannot comment on design choices and implementation costs, optimized silicon die size and power consumption continue to be important considerations that inform PCIe specification development and architecture evolution.




What are the results of the feasibility testing for the PCIe 4.0 specification?




After technical analysis, the PCI-SIG has determined that 16 GT/s on copper, which will double the bandwidth over the PCIe 3.0 specification, is technically feasible at approximately PCIe 3.0 power levels. The preliminary data also confirms that a 16GT/s interconnect can be manufactured in mainstream silicon process technology and can be deployed with existing low-cost materials and infrastructure, while maintaining compatibility with previous generations of PCIe architecture. In addition, the PCI-SIG will investigate advancements in active and idle power optimizations as they become available.




What are the initial target applications for the PCIe 4.0 architecture?




The PCIe 4.0 specification will address the many applications pushing for increased bandwidth at a low cost including server, workstation, desktop PC, notebook PC, tablets, embedded systems, peripheral devices, high-performance computing markets and more. The target implementations are entirely at the discretion of the designer.




What is the bit rate for the PCIe 4.0 specification and how does it compare to prior generations of PCIe?




Based on PCI-SIG feasibility analysis, the bit rate for the PCIe 4.0 specification will be 16GT/s. This bit rate represents the optimum tradeoff between performance, manufacturability, cost, power and compatibility. PCI-SIG analysis covered multiple topologies. All of these studies confirmed the potential feasibility of 16GT/s signaling with low-cost enablers.



PCI Express - M-PHY





PCIe technology is in every server, workstation and laptop PC. Why is PCIe over M-PHY a suitable I/O technology for tablet and smartphone devices?




As a broadly adopted technology standard, PCIe benefits from several decades of innovations with universal support in all major Operating Systems, a robust device discovery and configuration mechanism, and comprehensive power management capabilities that very few, if any, of the other I/O technologies can match. PCIe technology has a flexible, layered protocol that enables innovations to occur at each layer of the architecture independent of the other layers. In this way, power-efficient PHY technologies, such as MIPI M-PHY, can be integrated with the familiar and highly functional PCIe protocol stack to deliver best-in-class and highly scalable I/O performance in tablet and smartphone devices.




When and how will the PCI-SIG release the PCIe adaptation layer specification?




The PCI-SIG will deliver this technology as an extension to the existing PCIe 3.0 Base specification via ECN by the end of 2012. This technology will be fully integrated into the next release of the PCIe Base specification, PCIe 4.0, enabling ease of access and reference.




Is there a name for the PCIe adaptation that operates with the MIPI M-PHY?




The PCI-SIG has recently accepted this technology as a contribution from its members and will soon announce a suitable name for it.




Is there a need for new software to support the PCIe adaptation on the MIPI M-PHY?




This adaptation of the PCIe architecture requires no new software. It reuses the existing, ubiquitous support in all major Operating Systems (e.g. pci.sys bus driver on Windows platforms). This includes existing support for device discovery, configuration and control.




Why is PCI-SIG adapting PCIe protocols to operate over the MIPI M-PHY specification?




As PCs become lighter and thinner and tablets and smartphones become more functional, consumers want seamless, always on/always connected functionality from their computing devices. To respond to these market expectations, device manufacturers need efficient, intelligent I/O technologies. The PCIe architecture satisfies all of these requirements, and with the adaptation to operate over the M-PHY specification it can deliver consistent high performance in power-constrained platforms such as ULT laptops, tablets and smartphones. By delivering this technology, the PCI-SIG is meeting the emerging needs of its members and the industry.






Section 2.2.62. - How does a CPU know a device exists and where the position of the device is?




Configuration softrware reads configuration space address 00h (using different bus, device and function numbers). When it gets a response, it knows a device exists at that ID.




Where can interested parties get more information?




PCI-SIG is the sole source for PCIe specifications. In addition, both the PCI-SIG and its members provide a plethora of technical and marketing collateral in support of the PCIe architecture. Please visit www.pcisig.com for additional information.




SECTION 6.2.3.2.3 -- If a device encounters more than one error, will it log all the errors or the most significant error only (according to the precedence list).




It is recommended that only the highest precedence error associated with a single TLP be reported. However, it is recognized that reasonable implementations may not be able to support the recommended precedence order, which is why this is recommended rather than required behavior.




Are both 2.5GT/s and 5GT/s signaling rates supported in the PCIe 2.0 specification?




The PCIe Base 2.0 specification supports both 2.5GT/s and 5GT/s signaling rates, in order to retain backward compatibility with existing PCIe 1.0 and 1.1 systems. Aside from the faster bit rate, there are a number of improvements in this specification that allow greater flexibility and reliability in designing PCIe links. For example, the interconnect can be dynamically managed for platform power and performance considerations through software controls. Another significant RAS feature is the inclusion of new controls to allow a PCIe link to continue to function even when some lanes become non-operational.




SECTION 7.5.1.1 - We implement Memory Space Enable and IO Space Enable bit in our Endpoint. If the Endpoint receives a Memory Write TLP when Memory Space Enable bit is not set. How should the Endpoint handle this TLP? Also, if the Endpoint receives a Memory Write TLP and its data payload exceeds Max_Payload_Size when Memory Space Enable bit is not set. How should the Endpoint handle this TLP in each case?




For the first case, the Endpoint must handle the Request as an Unsupported Request. For the second case, it is recommended that the Endpoint handle the Request as a Malformed TLP, but the Endpoint is permitted to handle the Request as an Unsupported Request.




SECTION 7.5.3.6 ̐ Can you please clarify the behavior of a Switch Downstream Port when the Secondary Bus Reset bit is Set in its Bridge Control register? It is our understanding that a Secondary Bus Reset will not affect anything in the Downstream Port where it is Set, only in components Downstream (i.e. components on or below the secondary bus of that virtual Bridge). Should the primary side of the virtual Bridge reset or preserve its Requester ID after the Secondary Bus Reset bit is Set?




When software sets the Secondary Bus Reset bit in a Switch Downstream Port, the Downstream Port must not reset any of its own configuration settings, and it must transition the Link below it to the Hot Reset state, assuming the Link is not down. The description of the Secondary Bus Reset bit in Section 7.5.3.6 states "Port configuration registers must not be changed, except as required to update Port status."




Section 2.9.1 - For a PCIe 2.0 Switch, when upstream port goes to DL_down, it is stated in pg. 131 line 11 that the config registers will be reset, also line 15 says propagate reset to all other ports (which I interpret as all downstream ports, am I right?) But on line 11 of pg. 130, it says downstream port registers are not affected except status update, do these contradict?




Yes, when a link reports DL_Down the upsteam port on the switch (and all other downstream devices) are reset.

The section 2.9.1 text covers two contexts. The context of a Downstream Port in DL_Down and the context of an Upstream Port in DL_Down. Care must be taken to apply the requirements in this section to the correct context.




Section 2.2.4.1 - In the PCIe spec 2.0 page 57, there is a sentence "For Memory Read Requests and Memory Write Requests, the Address Type field is encoded as shown in Table 2-5, with full descriptions contained in the Address Translation Services Specification, Revision 1.0." If the value of AT field is invalid, what will PCIe do? Will it report an error, and if so, what error will be reported?




Endpoints that do not support Address Translation Services set the AT field to 00b on transmitted TLPs and ignore the AT field on received TLPs.




SECTION 4.2.6.6.2.2 -- I have an LTSSM L0s question. Let's say we have an EP that has both its RX and TX in L0s - specifically Rx_L0s.Idle and Tx_L0s.Idle. Also assume the EP receives and EI exit, and then the receiver transitions from RX_L0s.Idle to Rx_L0s.FTS. - What should Tx_L0s.Idle transition to, or should it stay in the same state?




The transmitter stays in TX_L0s.Idle.




What test tools and other infrastructure are available to support the development of PCIe 2.0 products?




The established PCIe ecosystem delivers both pre-silicon and post-silicon tools to assist design engineers with implementing PCIe 2.0 products. In addition, PCI-SIG provides updated hardware test fixtures and test software upgrades to facilitate compliance verification at its Compliance Workshops.




SECTION 4.2.6.2.1 -- This is in reference to the Polling.Active state as described in section 4.2.6.2.1 - "Next state is Polling.Configuration after at least 1024 TS1 Ordered Sets were transmitted, and all Lanes that detected a Receiver during Detect receive eight consecutive TS1 or TS2 Ordered Sets or their complement with both of the following conditions." We have a question relative to the statement eight consecutive TS1 or TS2 Ordered Sets". Our understanding is that it means 8 consecutive TS1 or 8 consecutive TS2. It doesn't mean a mixture of TS1 and TS2. "




The transition to Polling.Configuration follows either 8 consecutive TS1s, or 8 consecutive TS2s on all lanes that detected a receiver in Detect. Note that the intent of the spec also is to allow the 8 to be any mixture of 8 consecutive TS1s or TS2s for this particular case (not necessarily for other LTSSM transitions, however). Note also that the PCIe 2.0 Errata item A42 (Polling.Active Substate) modifies this section (see Errata item A42 at www.pcisig.com/specifications/pciexpress/base2/).




What are the benefits of PCIe 2.0? What business opportunities does it bring to the market?




While doubling the bit rate satisfies high-bandwidth applications, faster signaling has the advantage of allowing various interconnect links to save cost by adopting a narrow configuration. For example, a PCI Express 1.1 x8 link (8 lanes) yields a total aggregate bandwidth of 4Gbps, which is the same bandwidth obtained from a PCI Express 2.0 x4 link (4 lanes) that adopts the 5GT/s signaling technology. This can result in significant savings in platform implementation cost while achieving the same performance level. Backward compatibility is retained as 2.5 GT/s adapters can plug into 5.0 GT/s slots and will run at the slower rate. Conversely, PCIe 2.0 adapters running at 5.0 GT/s can plug into existing PCIe slots and run at the slower rate of 2.5 GT/s.




SECTION 4.2.6.5 - In Base Spec 2.1 on page 246 line 10, it states that - "If directed" is defined as both ends of the Link having agreed to enter L1 etc. and then refers to Section 4.3.2.1, but there is no such section in the spec. Is there a section in the spec that provides more detail on this?




The reference in the spec should be to Section 5.3.2.1, which provides more detail (note that this reference will be fixed through upcoming errata to the 2.1 spec).




SECTION 4.2.6.3.5.2 - Based on the PCIe 2.0 spec, Line 13 page 212: - The next state is Configuration.Idle immediately after all Lanes that are transmitting TS2 Ordered Sets receive eight consecutive TS2 Ordered Sets with matching Lane and Link numbers (non-PAD) and identical data rate identifiers (including identical Link Upconfigure Capability (Symbol 4 bit 6)), and 16 consecutive TS2 Ordered Sets are sent after receiving one TS2 Ordered Sets. Does the received eight consecutive TS2 Ordered Sets with identical data rate identifiers (including identical Link Upconfigure Capability (Symbol 4 bit 6)) need to match the transmitted TS2 Ordered Sets if the next state is Configuration.Idle?




The received Link number must match the transmitted Link number. The received Lane number must match the transmitted Lane number. The received data rate identifier must be the same on all received lanes (but is not required to be the same as the transmitted data rate identifier). The received Link Upconfigure Capability bit must be the same on all received lanes (but is not required to be the same as the transmitted Link Upconfigure Capability bit).




Section 2.7.2.2 - In PCIe 2.0 Spec P.128, a Poisoned I/O or Memory Write Request, or a Message with data (except for vendor-defined 25 Messages), that addresses a control register or control structure in the Completer must be handled as an Unsupported Request (UR) by the Completer. The completer receiving this kind of TLP needs to report error as UR or Poison TLP Received?




The intent is for this error case to be handled as a Poisoned TLP Received error. Errata is being developed against the 2.1 Base spec to clarify this. Due to ambiguous language in earlier versions of the spec, a component will be permitted to handle this error as an Unsupported Request, but this will be strongly discouraged.




SECTION 4.2.6.2.1 -- During Polling.Active, should Device A transmit TS1s on 4 lanes while Device B transmits TS1s on 8 lanes? Or, TS1s must be transmitted in both directions on the identical number of lanes?




Since device B has transmitters on only 4 lanes, it cannot transmit TS1s on more than 4 lanes. Device A will transmit TS1s on only the lanes where it detected receivers (and that is a maximum of 4 lanes).




Section 4.2.6.10.1 - The Loopback slave should wait until Symbol lock is archived after link speed change during Loopback.Entry substate. However, the base spec does not appear to define whether symbol lock should be archieved on some Lanes or all Lanes.




The Loopback slave transitions to Loopback.Active immediately after exiting Electrical Idle following the link speed change. It attempts to acquire symbol lock on all of the lanes that were active when it entered Loopback.Entry.




SECTION 7.7 - Is a PCI Express Root Complex required to support MSI?




All PCI Express device Functions (including root ports) that are capable of generating interrupts must implement MSI or MSI-X or both.




What were the initial target applications for PCIe 2.0?




The same set of core applications, high-performance graphics, enterprise-class storage and high-speed networking that benefited from the introduction of PCIe 1.0 architecture have led the charge for adoption of PCIe 2.0.




What prompted the need for another generation of PCI Express (PCIe)?




The PCIe 1.1 specification was developed to meet the needs of most I/O platforms. However a few applications, such as graphics, continue to require more bandwidth in order to enrich user experiences. PCI-SIG also saw the opportunity to add new functional enhancements (listed below), as well as incorporate all edits it had received to the PCIe 1.1 spec (via ECNs). In response to these needs, PCI-SIG developed PCI Express 2.0 (PCIe 2.0). It provides faster signaling, which doubles the bit rate from 2.5GT/s to 5GT/s.




SECTION 4.2.6.4.4 - Referring to section 4.2.6.4.4 (Recovery.Idle), our EP is implemented such that it will send Idle data once entry into recovery.idle. If Hot Reset bit is asserted in two consecutive received TS1 ordered set, then we will move to HotReset state. Will the RC respond to the idle data that the EP sends out and falsely trigger into L0 state even though RC is directed to enter into HotReset?




For this case, the LTSSM of the Downstream Port above the Endpoint is already in the Hot Reset state, since that is how it transmitted TS1 Ordered Sets with the Hot Reset bit asserted.




Section 6.18 - If a Switch supports the LTR feature, which of its ports must support LTR?




If a Switch supports the LTR feature, it must support the feature on its Upstream Port and all Downstream Ports.




Section 5.3.2.3 - Is the following error scenario valid? - RC sends PME_Turn_off message to EP - EP doesn't respond with ACK due to delay - EP responds with PME_TO_Ack message - EP sends PM_Enter_L23 and not sending ACK. Can the EP do without ACK?




The Endpoint is required to send an Ack for the PME_Turn_Off message. There is no valid reason for an extended delay of the Ack.




Section 4.2.4.3 - What is the purpose of the "inferred" electrical idle?




The purpose of the "inferred" electrical idle is to permit a method of detecting an electrical idle that does not use an analog circuit. Using an analog circuit can be difficult at 5.0 GT/s and the inferred method is an alternate (permitted) method.




SECTION 4.2.6.2.1 -- Device A has transmitters on 8 lanes. Device B has transmitters on 4 lanes. Both devices are connected via a link. During Receiver Detection sequence in Detect.Active: Device A detects that Device B has drivers on 4 lanes, and Device B detects that Device A has drivers on 8 lanes.




PCIe Link is symmetric - so each component has the same number of Transmitters as Receivers. Since device B has transmitters on only 4 lanes, it also has receivers on 4 Lanes. Hence it would not be capable of detecting receivers on 8 lanes of device A.




What other features are introduced in the PCIe 2.0 specification?




The most predominant feature in PCIe 2.0 is 5GT/s speed, which includes new mechanisms for software control of link speed, reporting of speed and width changes, and control of loopback. Other new features include:
PCI compatibility using the established PCI software programming models, thus facilitating a smooth transition to new hardware while allowing software to evolve to take advantage of PCI Express features
Enhanced Completion Timeout Control, which includes required and optional aspects, reduces false timeouts and increases the ability to 'tune' the timeouts
Function Level Reset and Access Control Services, giving enhanced robustness and support of certain IOV features (optional)
Slot Power Limit Changes to allow for higher powered slots, which support the newer, high-performance graphics cards; this new feature works in tandem with the 300W Card Electro-mechanical specification
Speed Signaling Controls to enable software to determine whether a device can operate at a specific signaling rate, which can be used to reduce power consumption, as well as provide gross level I/O to memory




SECTION 4.2.6.4.4 - Is the following lane setting valid: executing a downconfiguration from x4 to x2, with lane0=ACTIVE, lane1=INACTIVE, lane2=ACTIVE, lane3=INACTIVE?




The active lanes must be consecutively numbered lanes starting with lane 0. Your example would configure as a x1 link.




How can I get a copy of the PCI Express (PCIe) 2.0 specification?




Members may access specifications online on our Specifications web page or non-members may purchase specifications (order form is available on our Ordering Information web page).




SECTION 6.1.4 - This question relates to MSI. More specifically this question also relates to the Conventional PCI 3.0 spec (on page 237) for MSI where it states that - The Multiple Message Enable field (bits 6-4 of the Message Control register) defines the number of low order message data bits the function is permitted to modify to generate its system software allocated vectors. Does this mean that the binary value of the LSBs of the message data specifies the vector number?




Yes (up to a total of 5 bits). Also to avoid confusion for the function, software sets each of the low order message data bits to 0, that correspond to the low order message data bits the function is permitted to modify to generate its system software allocated vectors.




Section 4.2.6.1.1 - According to Section 4.2.6.1.1 in PCIe Base Specification 2.0, "The next state is Detect.Active after a 12 ms timeout or if Electrical Idle is broken on any Lane". Does this mean next state is Detect.Active only when electrical idle is broken?




It means the next state is Detect.Active after a 12 ms timeout, or the next state is Detect.Active (prior to the end of the 12 ms timer) if Electrical Idle is broken on any Lane.




Section 4.2.6.4.3 - An Endpoint is in Recovery.RcvrCfg state and has received the 8 required consecutive TS2's. But before it is able to complete sending 16 TS2s, the downstream port sends EIEOS and then starts sending TS1s. At this point, should the Endpoint move to Recovery.Idle after sending 16 TS2s? Or is it required to reset its RX counter and start counting TS1s and try to go to Configuration?




Transition to Recovery.Idle after sending the 16 TS2s since the requirements for that transition are met.




Section 4.2.4.1 - What does Link Upconfigure mean? What is it used for?




Link Upconfigure means the device is capable of increasing the link width. When Upconfigure is supported by both devices on a link, the link width may be reduced to conserve power. When link use is going to increase, the devices will increase the link width to support the needed high data rate preferred by the device.




SECTION 7.8.6 -- Relative to Bits 3:0 in Section 7.8.6 - Link Capabilities Register, Supported Link Speeds. Is it OK for my device to support 0010b" and only support 5GT/s (and not support 2.5GT/s)?"




A device that supports 5GT/s must also be able to support and operate at 2.5GT/s.




Is PCIe 2.0 backward compatible with PCIe 1.1 and 1.0?




Yes. The PCIe Base 2.0 specification supports both the 2.5GT/s and 5GT/s signaling technologies. A device designed to the PCIe Base 2.0 specification may support 2.5GT/s, 5GT/s or both. However, a device designed to operate specifically at 5GT/s must also support 2.5GT/s signaling. The PCIe Base specification covers chip-to-chip topologies on the system board. For I/O extensibility across PCIe connectors, the Card Electromechanical (CEM) and ExpressModule? specifications will also need to be updated, but this work will not impact mechanical compatibility of the slots, cards or modules.




SECTION 2.3.1 - What is the correct behavior if a read or write exceeds a bar limit? For example, let's say a BAR is 128 bytes, and the Read or write request to the address space mapped by the BAR is for a size that is larger than 128 bytes. In this case what is the correct response from the device?




It should be handled as an unsupported request.




SECTION 4.2.8 - In the PCIe Base Spec 2.0, Section 4.2.8, page 239, under Key below the table it states - D Delay Symbol K28.5 (with appropriate disparity) What exactly does the term 'appropriate disparity' mean in the above lines from Spec?




Appropriate disparity means that the D symbol must have the correct disparity for the specified sequence of symbols.




Section 4.2.6.10.1 - I have a question about LTSSM in Loopback state. When the LTSSM is in Loopeback.Entry(p.233L24), Loopback master will send TS1 with Compliance Receive bit (Symbol 5 bit 4)=0b and Loopback bit=1b and wait to receive identical TS1 with Loopback bit asserted less than 100 ms. In this time, both sides of link are probably in 5GT/s. Then if Loopback slave cannot do Symbol lock, how long does Loopback slave need to wait, and what is the next substate?




The slave stays in Loopback.Active indefinitely until it receives an EIOS (or detects or infers an Electrical Idle). There is not timeout.



PCI Express - 3.0


Why is a new generation of PCIe needed?




PCI-SIG responds to the needs of its members. As applications evolve to consume the I/O bandwidth provided by the current generation of the PCIe architecture, PCI-SIG begins to study the requirements for technology evolution to keep abreast of performance and feature requirements.




Section 4.2.6.4.1 - When the directed_speed_change variable is changed (as a result of receiving eight consecutive TS1 or TS2 Ordered Sets with the speed_change bit set while in Recovery.RcvrLock), is the eight_consecutive counter cleared and the device does not transition to Recovery.RcvrCfg state at this time?




When setting the directed_speed_change variable (in response to receiving 8 consecutive TS1 or TS2 Ordered Sets with the speed_change bit set), it is recommended, but not required, to reset the counters/status of received TS1 or TS2 Ordered Sets. That is, it is recommended that a Device receive an additional 8 consecutive TS1 or TS2 Ordered Sets with the speed_change bit set after it has started transmitting TS1 Ordered Sets with the speed_change bit set before it transitions from Recovery.RcvrLock to Recovery.RcvrCfg.




Section 6.20 - Is a PASID permitted on a Completion? [refer to Section 6.20 – Lines 8-13 on page 628 of PCIe 3.1]




Section 6.20 – Lines 8-13 on page 628 of PCIe 3.1 states:
A PASID TLP Prefix is permitted on:

- Memory Requests (including AtomicOp Requests) with Untranslated Addresses (See Section 2.2.4.1).

- Translation Requests and Translation Message Requests as defined in the Address Translation Services Specification.

The PASID TLP Prefix is not permitted on any other TLP.

No, the text is correct as-is -- a PASID is not permitted on a Completion. We will consider if an errata is needed to clarify this.”






What are the PCIe protocol extensions, and how do they improve PCIe interconnect performance?




The PCIe protocol extensions are primarily intended to improve interconnect latency, power and platform efficiency. These protocol extensions pave the way for better access to platform resources by various compute- and I/O-intensive applications as they interact with and through the PCIe interconnect hierarchy. There are multiple protocol extensions and enhancements being developed and they range in scope from data reuse hints, atomic operations, dynamic power adjustment mechanisms, loose transaction ordering, I/O page faults, BAR resizing and so on. Together, these protocol extensions will increase PCIe deployment leadership in emerging and future platform I/O usage models by enabling significant platform efficiencies and performance advantages.




Section 4.2.4.2 - When upconfiguring a Link in the LTSSM Configuration.Linkwidth.Start state, are the Lanes which are being activated required to transmit an EIEOS first when they exit Electrical Idle?




No. Lanes being activated for upconfiguration are not required to align their exit of Electrical Idle with the transmission of any Symbol, Block, or Ordered Set type. Furthermore, the Lanes are not required to exit Electrical Idle before the LTSSM enters the Configuration.Linkwidth.Start state.




What is PCI Express (PCIe) 3.0? What are the requirements for this evolution of the PCIe architecture?




PCIe 3.0 is the next evolution of the ubiquitous and general-purpose PCI Express I/O standard. At 8GT/s bit rate, the interconnect performance bandwidth is doubled over PCIe 2.0, while preserving compatibility with software and mechanical interfaces. The key requirement for evolving the PCIe architecture is to continue to provide performance scaling consistent with bandwidth demand from leading applications with low cost, low power and minimal perturbations at the platform level. One of the main factors in the wide adoption of the PCIe architecture is its sensitivity to high-volume manufacturing materials and tolerances such as FR4 boards, low-cost clock sources, connectors and so on. In providing full compatibility, the same topologies and channel reach as in PCIe 2.0 are supported for both client and server configurations. Another important requirement is the manufacturability of products using the most widely available silicon process technology. For the PCIe 3.0 architecture, PCI-SIG believes a 65nm process or better will be required to optimize on silicon area and power.




Section 4.2.7.3 - PCIe 3.0 Base spec section 4.2.7.4 states that "Receivers shall be tolerant to receive and process SKP Ordered Sets at an average interval between 1180 to 1538 Symbol Times when using 8b/10b encoding and 370 to 375 blocks when using 128b/130b encoding.ÌÒ For 128/130 encoding, if the Transmitter sends one SKP OS after 372 blocks and a second after 376 blocks, the average interval comes out to be 374 blocks and that falls in the valid range. So is this allowed, or must every SKP interval count fall inside the 370 to 375 blocks?




At 8 GT/s, a SKP Ordered Set must be scheduled for transmission at an interval between 370 to 375 blocks. However, the Transmitter must not transmit the scheduled SKP Ordered Set until it completes transmission of any TLP or DLLP it is sending, and sends an EDS packet. Therefore, the interval between SKP OS transmissions may not always fall within a 370 to 375 block interval.

For example, if a SKP Ordered Set remains scheduled for 6 block times before framing rules allow it to be transmitted, the interval since the transmission of the previous SKP OS may be 6 blocks longer than normal, and the interval until the transmission of the next SKP OS may be 6 Blocks shorter than normal. But the Transmitter must schedule a new SKP Ordered Set every 370 to 375 blocks, so the long-term average SKP OS transmission rate will match the scheduling rate.

Receivers must size their elastic buffers to tolerate the worst-case transmission interval between any two SKP Ordered Sets (which will depend on the Max Payload Size and the Link width), but can rely on receiving SKP Ordered Sets at a long term average rate of one SKP Ordered Set for every 370 to 375 blocks. The SKP Ordered Set interval is not checked by the Receiver.




Section 4.2.6.4.2 - According to pg227 of spec, "When using 128b/130b encoding, TS1 or TS2 Ordered Sets are considered consecutive only if Symbols 6-9 match Symbols 6-9 of the previous TS1 or TS2 Ordered Set". When in Recovery.Equalization and if using 128b/130b encoding, is it required that lane/link numbers (symbol 2) match in TS1s to be considered as consecutive or is it need not match?




The Receiver is not required to check the Link and Lane numbers while in Recovery.Equalization.




Section 7.28.3 - When the maximum M-PCIe Link speed supported is 2.5 GT/s, what will be the Link speed following a reset?




The Link Speed following reset will be the result of Configuration process. During the M-PCIe discovery and Configuration process, RRAP is used to discover M-PHY capabilities, analyze and configure configuration attributes accordingly. Depending on the High speed supported by both components, the Link Speed and Rate Series may get configured for HS-G1, HS-G2 or HS-G3 and RATE A or B respectively. For this particular example Link Speed could be either HS-G1 or HS-G2 depending on the supported Link Speeds of the other Component on the LINK.




Section 4.2.6.9 - When in the Disabled state the Upstream Port transitions to Detect when an Electrical Idle exit is detected at the receiver. Is an Electrical Idle exit required to be detected on all Lanes?




An Electrical Idle exit is required to be detected on at least one Lane.




Has there been a new compliance specification developed for PCIe 3.0?




For each revision of its specification, PCI-SIG develops compliance tests and related collateral consistent with the requirements of the new architecture. All of these compliance requirements are incremental in nature and build on the prior generation of the architecture. PCI-SIG anticipates releasing compliance specifications as they mature along with corresponding tests and measurement criteria. Each revision of the PCIe technology maintains its own criteria for product interoperability and admission into the PCI-SIG Integrators List.




Section 4.2.6.6.1.3 - How can I configure the RC, if permissible, to send 4096 FTS to EP while RC transits out of L0s?




Setting the Extended Synch bit in the Link Control register of the two devices on the link will increase the number of FTS Ordered Sets to 4096, but the Extended Synch bit is used only for testing purposes




No comments:

Post a Comment